LangGraph CodeAct: Building Agents with Executable Code

Calliope Integration: This component is integrated into the Calliope AI platform. Some features and configurations may differ from the upstream project.

The Idea Behind CodeAct

CodeAct is an innovative approach for building LLM agents where the agent’s actions are represented as executable Python code. Instead of generating JSON or text in a predefined format for tool calls, CodeAct agents write and execute Python code snippets to interact with their environment and utilize tools. This method offers increased flexibility, allowing for complex operations, potential self-correction through code execution feedback, and compatibility with a wider range of language models.

How LangGraph CodeAct Works

LangGraph CodeAct is a Python library that implements the CodeAct architecture within the LangGraph framework. It enables the creation of agents that can:

Generate Code: The LLM is prompted to write Python code to accomplish tasks.
Execute Code: A code sandbox (or a safe eval function for demonstration) executes the generated code.
Interact with Tools: Tools are provided as Python functions that the generated code can call.
Maintain State: LangGraph manages the conversation history and variables set during code execution across multiple turns.

Key Advantages

Enhanced Flexibility: Leverage the full power of Python and existing libraries.
Complex Action Composition: Perform multiple operations within a single code action.
Self-Debugging Potential: Agents can potentially identify and fix errors based on execution feedback.
Broader LLM Compatibility: Works with models that may not support traditional function calling.

Relevant Code Examples (Conceptual)

1. Defining Tools:

from langchain_core.tools import tool

@tool
def search_internet(query: str) -> str:
    """Searches the internet for the given query."""
    # Implementation using a search library
    return "Search results for: " + query

@tool
def calculate(expression: str) -> float:
    """Evaluates a mathematical expression."""
    return eval(expression) # Use with caution in production

2. Creating the CodeAct Agent:

from langchain.chat_models import ChatAnthropic
from langgraph_codeact import create_codeact
from langgraph.checkpoint.memory import MemorySaver

model = ChatAnthropic(model_name="claude-3-7-sonnet-latest")
tools = [search_internet, calculate]

def safe_eval(code: str, _locals: dict):
    # Simplified (and unsafe for production) eval
    try:
        exec(code, globals(), _locals)
        return "Code executed successfully", _locals
    except Exception as e:
        return f"Error: {e}", _locals

code_act = create_codeact(model, tools, safe_eval)
agent = code_act.compile(checkpointer=MemorySaver())

3. Interacting with the Agent:

result = agent.invoke({"messages": [("user", "Search for the current weather in London and then calculate 25 * 4.")], "name": "default"})
print(result)

The agent would generate Python code to first call the search_internet tool and then the calculate tool to answer the user’s request.

Important Details

The CodeAct research paper introduces a novel framework where LLM agents perform actions by generating and executing Python code. This approach contrasts with traditional methods that rely on generating JSON or text in predefined formats.

How CodeAct Works:

Unified Action Space: CodeAct consolidates all the agent’s actions into executable Python code.
Python Interpreter Integration: The generated code is executed using a Python interpreter.
Dynamic Action Revision: Based on the observations from the environment (e.g., code execution results, errors), the agent can dynamically revise previous actions or generate new ones in subsequent turns. This multi-turn interaction capability is a key aspect of CodeAct.
Leveraging Existing Software: CodeAct allows LLMs to utilize readily available Python packages, expanding the action space beyond hand-crafted tools.
Self-Debugging: The agent can leverage automated feedback, such as error messages from the Python interpreter, to identify and rectify errors in its generated code.

Key Aspects and Findings from the Paper:

Performance Improvement: The paper presents an extensive analysis across 17 LLMs, demonstrating that CodeAct achieves comparable or better performance than using text or JSON for actions, with up to a 20% higher success rate on a new benchmark called M³ToolEval. This benchmark consists of complex tasks requiring multiple tool calls.
Fewer Interactions: CodeAct often requires fewer interaction turns to solve complex problems due to its ability to incorporate control and data flow within the generated code (e.g., using loops and conditional statements).
CodeActInstruct Dataset: To further enhance the capabilities of LLMs in using CodeAct, the researchers created a new instruction-tuning dataset called CodeActInstruct, comprising 7k multi-turn interactions using CodeAct. This dataset focuses on agent-environment interactions across various domains like information seeking, software package usage, external memory access, and robot planning.
CodeActAgent: By fine-tuning Llama2 and Mistral models on the CodeActInstruct dataset, the researchers developed CodeActAgent. This agent demonstrates improved performance on agent tasks while maintaining general capabilities. It can perform sophisticated tasks by leveraging existing Python packages and can autonomously self-debug using error messages.
Suitability for Open-Source LLMs: The paper suggests that optimizing for CodeAct is a promising route for improving the tool-use capabilities of open-source LLMs, as they already possess a good initial understanding of code due to extensive pre-training on code data.
Comparison to Other Approaches: The paper distinguishes CodeAct from other methods that use code generation for problem-solving, highlighting CodeAct’s multi-turn interaction capability and its ability to dynamically adjust actions based on environmental observations.

In essence, the CodeAct paper proposes and validates a novel approach for building more capable and flexible LLM agents by leveraging the power and expressiveness of executable Python code as the action space. The framework’s ability to handle complex operations, self-debug, and interact dynamically with the environment positions it as a significant advancement in the field of autonomous agents.

Further Exploration

For a complete implementation and more details, refer to the official LangGraph CodeAct repository: langchain-ai/langgraph-codeact - GitHub.

Implementation Organize_Code_Prompt