Data Science Workflows
Build reproducible end-to-end analyses as graphs: ingest, transform, analyze, visualize. Share with stakeholders in View Mode so they see the result and the path that produced it.
The pipeline shape
Most data-science pipelines have the same four phases. Connectify models each as a subgraph so the canvas stays readable:
- Ingest — pull raw data from sources.
- Transform — clean, join, aggregate.
- Analyze — stats, ML, hypothesis testing.
- Visualize — produce charts and summaries.
Step-by-step: a quarterly revenue analysis
-
Ingest
Add two Dataset nodes:
orders.csvandcustomers.csv. Connectify infers schemas automatically and displays column types in the Inspector. -
Join
Add a Logic → join node. Wire both Dataset outputs in. Set
ontocustomer_idandhowtoinner. The output is a single table containing orders enriched with customer attributes. -
Filter and aggregate
Chain a filter node (
order_date >= '2025-01-01') and a group_by node (by=['quarter', 'segment'], aggregatingrevenuewithsum). -
Wrap the prep into a subgraph
Marquee-select the three transform nodes. Right-click → Group into subgraph. Name it "Prep". The subgraph collapses to a single tile labeled "Prep" with one input and one output, keeping the canvas readable.
-
Analyze
Add a Logic → stats/yoy_growth node. Its output is a table with each quarter's YoY % delta per segment.
-
Visualize
Add a Logic → viz/bar_chart node. Set
xtoquarter,ytoyoy_pct,colortosegment. The chart renders in the Run Data panel after the next run. -
Run end-to-end
Click Run. Edges animate as each stage completes. The chart appears in the viz node's Run Data tab.
-
Share with stakeholders
Click Share → Viewer link. Stakeholders open the link in View Mode — they see the chart, can click any node to inspect its data, and can follow the path back to the source files.
Reproducibility for free
Because every step lives on the graph and runs deterministically, anyone can refresh the analysis just by clicking Run again. No notebook-cell-execution-order gotchas.
Patterns
Parameterizing the time window
Add a Logic → param/date_range node at the top of the graph and wire its output to your filter node. Now changing the analysis window is one config edit, not a hunt across the graph.
Versioned datasets
Dataset nodes can pin to a specific snapshot of the underlying source (Inspector → Config → version). Pin in production analyses so refreshes don't move under you.
Custom data sources
Wrap an internal API or warehouse query in a Custom node. Once wrapped, it composes with everything else on the canvas.