Product Requirements Document: Jupyter Calliope Data Agent
Calliope Integration: This component is integrated into the Calliope AI platform. Some features and configurations may differ from the upstream project.
Overview
The Jupyter Calliope Data Agent is an intelligent assistant integrated into JupyterLab, designed to simplify data analysis tasks through natural language interaction. Inspired by Vanna AI and CodeAct, it leverages Retrieval-Augmented Generation (RAG) and executable Python code actions to provide powerful, intuitive data analysis capabilities directly within JupyterLab.
Target Users
- Data Analysts & Scientists: Users who frequently interact with data, databases, and visualization tasks.
- Business Analysts: Users who need quick insights from data without deep technical expertise.
- Developers: Users who want to automate repetitive data tasks or prototype data-driven workflows quickly.
Key Use Cases
1. Natural Language Data Queries
- Users ask natural language questions about their data.
- Agent generates SQL queries using RAG-based context retrieval.
- Results are displayed directly in Jupyter notebooks as Pandas DataFrames or visualizations.
Example Notebook Interaction:
# User asks a natural language question
%%agent
What are the top 10 products by sales in Q1 2024?The agent generates and executes SQL, returning results as a DataFrame and visualization.
User Experience:
- User types a natural language query in a notebook cell.
- Agent retrieves relevant context, generates SQL, executes it, and displays results inline.
- Optionally, the agent suggests follow-up questions or visualizations.
Core Use Cases
1. Natural Language to SQL Queries (RAG-based)
- Users ask questions in plain English.
- Agent retrieves relevant context (schema, documentation, examples) via RAG.
- Generates accurate SQL queries, executes them, and returns results.
2. Data Visualization & Exploration
- Automatically generates visualizations (Plotly, Matplotlib) based on query results.
- Users can refine visualizations through natural language instructions.
3. Multi-Step Data Tasks via Executable Python Code (CodeAct)
- Users describe complex multi-step tasks.
- Agent generates Python code to execute tasks (data loading, transformation, analysis).
- Results and intermediate steps are displayed clearly in notebook cells.
3. Interactive Chat Interface
- Similar to GitHub Copilot Chat, users interact with the agent via a dedicated chat panel in JupyterLab.
- Supports multi-turn conversations, context retention, and follow-up questions.
User Experience (UX) Design
Notebook Integration
- Magic Commands: Users invoke the agent directly within notebook cells using intuitive magic commands (
%%agent). - Inline Results: SQL queries, DataFrames, and visualizations appear inline within notebook cells.
- Interactive Widgets: Provide interactive widgets for refining queries, adjusting parameters, and exploring data visually.
Chat Interface Integration
- Dedicated Sidebar Panel: Persistent chat interface accessible from any notebook.
- Conversational Flow: Users can refine queries, request visualizations, or ask follow-up questions seamlessly.
- Contextual Awareness: Maintains conversation context across multiple interactions.
Example Chat Interaction:
User: Show me sales trends for electronics products in 2024.
Agent: [Generates SQL, executes query, returns visualization]
User: Can you break this down by month?
Agent: [Generates and displays updated visualization]Key Use Cases
Use Case 1: Ad-Hoc Data Queries
- Quickly answer ad-hoc questions without writing SQL manually.
- Example: “List customers who purchased more than $1000 last month.”
Use Case Notebook Example:
%%agent
List customers who purchased more than $1000 last month.Use Case Chat Example:
User: List customers who purchased more than $1000 last month.
Agent: Here are the customers who purchased more than $1000 last month:
[Displays DataFrame]
User: Visualize this data as a bar chart.2. Exploratory Data Analysis (EDA)
- Users explore datasets interactively, generating visualizations and summaries.
- Example: “Show a scatter plot of petal length vs petal width for the Iris dataset.”
Notebook Example:
%%agent
Show a scatter plot of petal length vs petal width for the Iris dataset.Chat Example:
User: Show a scatter plot of petal length vs petal width for the Iris dataset.
Agent: [Displays Plotly scatter plot inline]3. Multi-Step Data Tasks
- Perform complex, multi-step data operations in a single interaction.
- Example: “Load the Iris dataset, save it as CSV, reload it, and calculate average petal length per species.”
Notebook Example:
%%agent
Load the Iris dataset using seaborn, save it as iris.csv, reload it, and calculate average petal length per species.Chat Example:
User: Load the Iris dataset using seaborn, save it as iris.csv, reload it, and calculate average petal length per species.Technical Requirements
Backend (Server Extension)
- Python-based Jupyter server extension.
- Handles REST API requests from frontend, proxies to Dockerized agent.
- Manages session state and context.
Frontend Extension (TypeScript/React)
- Provides UI components for notebook integration and chat interface.
- Communicates with backend via REST API.
External Dockerized Agent
- REST API service inspired by Vanna AI and CodeAct.
- Handles RAG-based SQL generation, code execution, and data querying.
Success Criteria
- Users can intuitively interact with data using natural language.
- High accuracy in SQL generation and visualization tasks.
- Seamless integration within JupyterLab notebooks and chat interface.
- Robust error handling, security, and performance.