CodeAct Agent - Project Documentation

CodeAct Agent - Project Documentation

Calliope Integration: This component is integrated into the Calliope AI platform. Some features and configurations may differ from the upstream project.

1. Project Goal

The primary goal of this project is to create a versatile and intelligent agent service designed for data analysis tasks, primarily intended for integration with environments like Jupyter notebooks. The service aims to understand natural language requests, interact with data sources (filesystems, databases), perform computations and analysis using code execution, and present results back to the user, potentially including visualizations and summaries.

2. Core Concepts

The system is built around a composable architecture featuring specialized agents:

2.1. CodeAct Agent

  • Purpose: A general-purpose agent focused on executing Python code to fulfill user requests. It acts as the primary interface for tasks not directly related to querying pre-configured databases.
  • Capabilities:
    • Code Execution: Executes Python code snippets in a sandboxed environment.
    • Tool Usage: Can leverage predefined tools for tasks like web searches (search), file system operations (read_file, write_file, list_directory), downloading datasets (download_dataset), and saving visualizations (save_visualization).
    • State Management: Manages state between execution steps primarily through saving and loading complex objects (like Pandas DataFrames) to/from the /data directory. Simple variables might persist, but reliance is on file I/O for robustness.
    • Planning & Reflection: Uses a LangGraph workflow involving planning (breaking down tasks), code generation, execution, and reflection (error checking and correction).
    • Database Interaction (Dynamic): Can create, connect to, and interact with databases (especially SQLite) dynamically within its execution environment using libraries like SQLAlchemy and langchain_community.utilities.sql_database.SQLDatabase.

2.2. Calliope Data Agent (Vanna AI Inspired)

  • Purpose: A specialized agent designed to answer natural language questions by interacting with pre-configured SQL databases. Its design is inspired by the principles of the Vanna AI project.
  • Capabilities:
    • RAG for SQL Generation: Utilizes a Retrieval-Augmented Generation (RAG) approach. It retrieves relevant context (DDL statements, documentation, and successful question-SQL examples) from a vector store (ChromaDB) to help the LLM generate accurate SQL queries for the target database dialect.
    • Multi-Datasource Awareness: Can be configured with multiple distinct SQL datasources and attempts to select the most appropriate one based on the question and retrieved context.
    • SQL Generation & Execution: Generates SQL queries based on the user’s question and RAG context, then executes them against the chosen database.
    • Result Processing:
      • Summarization: Can generate natural language summaries of query results.
      • Visualization: Can generate Plotly chart code based on the results.
      • Follow-up Questions: Can suggest relevant follow-up questions for further exploration.
    • Training: Supports “training” by adding DDL, documentation, and question-SQL pairs to the RAG vector store via API endpoints.

2.3. Composable Agent (Router)

  • Purpose: Acts as the main entry point and orchestrator. It analyzes the user’s request and routes it to the most appropriate specialized agent (CodeAct or SQL).
  • Mechanism: Uses an LLM to classify the user’s intent based on the request and conversation history.

3. Use Cases

The agent system is designed to support a variety of data analysis workflows:

3.1. CodeAct Agent Use Cases

  • Loading data from files (CSV, Excel, JSON) into Pandas DataFrames.
  • Cleaning, transforming, and manipulating data using Pandas.
  • Performing statistical analysis using libraries like SciPy or Statsmodels.
  • Generating data visualizations (e.g., using Matplotlib, Seaborn, Plotly) and saving them to files.
  • Interacting with the local filesystem within the /data directory (reading, writing, listing).
  • Downloading datasets or files from URLs.
  • Performing web searches to gather context or information.
  • Creating and interacting with temporary or dynamically generated databases (e.g., SQLite).
  • Executing general Python code for calculations or logic.
  • Handling multi-step tasks requiring intermediate results to be saved and loaded.

3.2. Calliope Data Agent Use Cases

  • Answering natural language questions about data stored in pre-configured SQL databases (e.g., “What were the total sales last month?”, “Show me the top 5 products by quantity sold.”).
  • Generating complex SQL queries involving joins, aggregations, and filtering based on natural language.
  • Executing provided SQL queries against a specific configured datasource.
  • Providing natural language summaries of query results.
  • Generating Plotly charts to visualize query results.
  • Suggesting relevant follow-up questions based on the initial query and results.
  • Training the RAG system by adding database schema (DDL), documentation, or example question-SQL pairs.
  • Handling follow-up questions that refine or build upon previous database queries within a session.

3.3. Composable Agent Use Cases

  • Receiving a user request and determining whether it’s best handled by querying a known database (Calliope Data Agent) or requires code execution/file operations/web search (CodeAct Agent).
  • Seamlessly switching between agents based on the conversation flow.

4. Architecture

The system employs a modular architecture built primarily using Langchain and LangGraph.

  • Web Server (Flask): Provides the external API interface.
    • /api/chat: Endpoint for the Composable Agent (main entry point).
    • /api/sql/*: Endpoints for direct interaction with the Calliope Data Agent (mimicking Vanna AI API: ask, generate_sql, run_sql, generate_summary, etc.).
    • /api/rag/*: Endpoints for managing the RAG store (train, clear).
    • /api/sessions/*: Endpoints for managing conversation sessions.
    • /api/health: Health check endpoint.
  • Agent Core (LangGraph):
    • Composable Agent: A LangGraph graph with a router node that directs flow to sub-graphs.
    • CodeAct Agent: A LangGraph graph implementing a plan-execute-reflect loop. Nodes include planner, call_model (for code generation/correction), execute_code, and reflector.
    • Calliope Data Agent: A LangGraph graph implementing the Vanna-style workflow. Nodes include rewrite_question, retrieve_context (RAG), generate_sql, execute_sql, and optional nodes for summarize, generate_chart, generate_followups.
  • LLM Providers: A pluggable system (llm_providers.py) to integrate different LLMs (currently Ollama and Gemini are supported) for different components (routing, code generation, SQL generation).
  • RAG Store (rag_store.py):
    • Uses ChromaDB as the vector store.
    • Stores embeddings of DDL, documentation, and SQL examples.
    • Provides functions to add training data and retrieve relevant context based on question similarity.
    • Uses embedding models compatible with the chosen LLM provider or a default fallback.
  • Tools (tools.py): Defines functions callable by the CodeAct agent (e.g., search, download_dataset, filesystem tools). SQL database interaction is handled directly via the SQLDatabase utility within the CodeAct agent’s execution environment or by the Calliope Data Agent itself.
  • Session Management (session_manager.py):
    • Manages conversation history per session ID.
    • Supports different backends (in-memory, PostgreSQL).
    • Uses LangGraph’s checkpointer interface (MemorySaver or potentially others).
  • Configuration (config.py):
    • Loads settings from environment variables with sensible defaults.
    • Configures LLM providers, models, database connections, RAG settings, server port, etc.
  • Prompts (prompts.py): Contains all prompt templates used by the agents for planning, routing, code generation, SQL generation, summarization, etc.

5. Libraries & Technologies

  • Web Framework: Flask
  • Core Orchestration: Langchain, LangGraph
  • LLM Integration:
    • langchain-ollama
    • langchain-google-genai
  • Vector Store (RAG): ChromaDB (langchain-chroma, chromadb)
  • Embeddings:
    • langchain-ollama
    • langchain-google-genai
    • langchain-community (for fallback HuggingFace embeddings)
  • Data Handling: Pandas
  • SQL Interaction: SQLAlchemy, langchain-community (SQLDatabase utility), Psycopg2 (for PostgreSQL), PyMySQL (for MySQL)
  • Visualization: Plotly
  • Configuration: Python os module (for environment variables)
  • Standard Libraries: json, logging, re, io, contextlib, abc, etc.

6. Setup & Running

  • Configuration: Primarily through environment variables (see config.py and README.md). Key variables include LLM_PROVIDER, model names (e.g., OLLAMA_CODEACT_MODEL, GEMINI_SQL_MODEL), OLLAMA_BASE_URL, GOOGLE_API_KEY, SQL_AGENT_ENABLED, database connection details (e.g., SQL_DATASOURCES_JSON, POSTGRES_HOST), SESSION_STORE, etc.
  • Dependencies: Install via pip install -r requirements.txt (assuming a requirements file exists).
  • Running: Start the server using python main.py.

7. Future Considerations / Potential Improvements

  • Support for more LLM providers.
  • More sophisticated state management for the CodeAct agent beyond file I/O.
  • Enhanced error handling and recovery.
  • More robust RAG training and management interface.
  • Support for asynchronous operations.
  • Integration with more data sources and tools.
  • Improved security for code execution and database access.
  • More advanced planning capabilities.
  • UI for interacting with the agent and managing RAG data.