Building with AI for human benefit. Sharing how it's done.
The messy middle between exploration and deployment.
Data science workflow:
Step 3 is where projects die. Claude helps.
> This notebook works but it's a mess. Help me refactor.
Let me analyze the notebook structure...
I see:
- 47 cells
- 12 are markdown (good)
- 8 are exploratory dead ends (can remove)
- 3 have hardcoded paths
- 1 function is duplicated with slight variations
Want me to:
1. Extract the core logic into a module?
2. Clean up the notebook to call that module?
3. Add tests for the extracted functions?
# Notebook has this scattered across 15 cells
def load_data(): ...
def clean_data(): ...
def transform(): ...
def analyze(): ...
def visualize(): ...
# src/pipeline.py
class DataPipeline:
def __init__(self, config):
self.config = config
def run(self):
data = self.load()
data = self.clean(data)
data = self.transform(data)
return self.analyze(data)
# Now the notebook is 5 cells
from src.pipeline import DataPipeline
pipeline = DataPipeline(config)
results = pipeline.run()
visualize(results)
def test_pipeline_handles_missing_data():
config = {"allow_missing": True}
pipeline = DataPipeline(config)
# ...
The human decides what matters. Claude does the tedious extraction.
Notebooks explore. Modules endure.