Kousa4 Stack
ArticlesCategories
Education & Careers

Unlocking Agentic Data Science: A Step-by-Step Guide to marimo Pair Programming

Published 2026-05-04 07:49:08 · Education & Careers

Overview

Modern data science workflows often involve repetitive tasks like data cleaning, exploration, and debugging. What if you could pair with an intelligent coding agent that understands your notebook and helps you iterate faster? That's exactly what marimo pair offers: an agentic layer within the marimo reactive notebook environment that assists with data wrangling, research, and code generation. This guide will walk you through adding agent skills to your data science pipeline using marimo pair. You'll learn how to set it up, start a pair session, and collaborate with the agent on real-world tasks such as wrangling messy datasets and performing exploratory analysis. By the end, you'll be able to accelerate your data science projects while maintaining full control over your code.

Unlocking Agentic Data Science: A Step-by-Step Guide to marimo Pair Programming
Source: realpython.com

Prerequisites

  • Python 3.8 or later – marimo runs on standard Python environments.
  • pip – package installer for Python.
  • Basic Python knowledge – familiarity with variables, functions, and common data science libraries (pandas, numpy).
  • An API key for a large language model – marimo pair uses LLMs (e.g., OpenAI, Anthropic) to power the agent. Have your key ready.
  • A spreadsheet or CSV file – for practicing data wrangling (optional but recommended).

Step-by-Step Instructions

1. Installing marimo and Enabling marimo Pair

First, install marimo from PyPI. Open a terminal and run:

pip install marimo

After installation, start a new notebook:

marimo edit my_notebook.py

This launches the marimo editor in your default browser. To activate the pair agent, you need to set your LLM API key as an environment variable. For example, if using OpenAI:

export OPENAI_API_KEY='your-api-key-here'

Then, inside the notebook, click on the Pair icon in the toolbar (or use the keyboard shortcut Cmd+Shift+P). This opens the pair panel where you can start a conversation with the agent.

2. Initializing Your Data Science Session

With the pair panel ready, load your dataset. Create a new cell and write:

import pandas as pd
df = pd.read_csv('sales_data.csv')
df.head()

marimo automatically runs the cell and displays the result. Now you can ask the agent for help. In the pair chat, type: "Is there any missing data in this dataframe?" The agent will inspect the notebook's state and respond with a code snippet you can insert.

3. Invoking the marimo Pair Agent

The agent understands the current notebook context. For instance, you can ask: "Show me all rows where revenue is null". The agent will generate code like:

df[df['revenue'].isnull()]

Click Insert Code in the chat bubble to place it into a new cell. You can also request explanations: "Why are there 120 missing values in the date column?" The agent might suggest checking the data source or imputing with a forward fill.

4. Guided Data Wrangling with the Agent

Let's walk through a common wrangling task: cleaning a column and merging two datasets. Start by asking: "Help me clean the 'price' column – it has dollar signs and commas." The agent may propose:

Unlocking Agentic Data Science: A Step-by-Step Guide to marimo Pair Programming
Source: realpython.com
df['price'] = df['price'].replace('[\$,]', '', regex=True).astype(float)

Insert it and run. Next, load a second dataset and ask: "Merge this inventory data with the main sales table on product_id." The agent will generate the merge code. You can then request: "Create a summary table of total sales per region." The agent produces the appropriate groupby and aggregation.

5. Collaborative Research and Analysis

Beyond wrangling, marimo pair helps with exploratory research. For example, ask: "What are the top 3 factors affecting sales?" The agent cannot run statistical models alone, but it can suggest a correlation analysis or a simple linear regression. It writes the code, you run it. The agent also helps interpret results: "Explain this confusion matrix." This back‑and‑forth turns your notebook into a collaborative whiteboard.

Common Mistakes

  • Not restarting the kernel after activating pair – Changes to environment variables (like API keys) require a kernel restart. Always restart the marimo kernel after setting the key.
  • Over‑relying on the agent – The agent is a helper, not a replacement for understanding your data. Always double‑check generated code for logical errors.
  • Ignoring agent privacy – The agent sends notebook code to the LLM provider. Avoid using sensitive data in shared environments without anonymization.
  • Not using version control – Agent‑generated code can introduce bugs. Use Git or marimo's built‑in checkpointing to track changes.
  • Assuming the agent knows external context – The agent only sees the notebook. Provide clear prompts with necessary details.

Summary

marimo pair transforms your data science notebook into an interactive partner for wrangling, analysis, and research. By following the steps above – installing marimo, activating pair with an LLM, and collaborating on tasks – you can accelerate your workflow while maintaining code quality. With agentic pair programming, you gain a second pair of eyes that never sleeps.