Workflow tutorial

Data Science Workflows

Build reproducible end-to-end analyses as graphs: ingest, transform, analyze, visualize. Share with stakeholders in View Mode so they see the result and the path that produced it.

The pipeline shape

Most data-science pipelines have the same four phases. Connectify models each as a subgraph so the canvas stays readable:

Ingest — pull raw data from sources.
Transform — clean, join, aggregate.
Analyze — stats, ML, hypothesis testing.
Visualize — produce charts and summaries.

Step-by-step: a quarterly revenue analysis

Ingest

Add two Dataset nodes: orders.csv and customers.csv. Connectify infers schemas automatically and displays column types in the Inspector.
Join

Add a Logic → join node. Wire both Dataset outputs in. Set on to customer_id and how to inner. The output is a single table containing orders enriched with customer attributes.
Filter and aggregate

Chain a filter node (order_date >= '2025-01-01') and a group_by node (by=['quarter', 'segment'], aggregating revenue with sum).
Wrap the prep into a subgraph

Marquee-select the three transform nodes. Right-click → Group into subgraph. Name it "Prep". The subgraph collapses to a single tile labeled "Prep" with one input and one output, keeping the canvas readable.
Analyze

Add a Logic → stats/yoy_growth node. Its output is a table with each quarter's YoY % delta per segment.
Visualize

Add a Logic → viz/bar_chart node. Set x to quarter, y to yoy_pct, color to segment. The chart renders in the Run Data panel after the next run.
Run end-to-end

Click Run. Edges animate as each stage completes. The chart appears in the viz node's Run Data tab.
Share with stakeholders

Click Share → Viewer link. Stakeholders open the link in View Mode — they see the chart, can click any node to inspect its data, and can follow the path back to the source files.

Reproducibility for free

Because every step lives on the graph and runs deterministically, anyone can refresh the analysis just by clicking Run again. No notebook-cell-execution-order gotchas.

Patterns

Parameterizing the time window

Add a Logic → param/date_range node at the top of the graph and wire its output to your filter node. Now changing the analysis window is one config edit, not a hunt across the graph.

Versioned datasets

Dataset nodes can pin to a specific snapshot of the underlying source (Inspector → Config → version). Pin in production analyses so refreshes don't move under you.

Custom data sources

Wrap an internal API or warehouse query in a Custom node. Once wrapped, it composes with everything else on the canvas.

The pipeline shape

Step-by-step: a quarterly revenue analysis

Ingest

Join

Filter and aggregate

Wrap the prep into a subgraph

Analyze

Visualize

Run end-to-end

Share with stakeholders

Patterns

Parameterizing the time window

Versioned datasets

Custom data sources

Related