Agent Types

Agent Types

Agentic Browser uses four specialized agents that work together to complete web automation tasks.

Orchestrator

The Orchestrator is the “brain” that coordinates all activity.

Responsibilities

  • Task Planning: Breaks your request into actionable steps
  • Agent Assignment: Decides which agent handles each step
  • Progress Tracking: Monitors completion and handles errors
  • Context Management: Maintains state across the task

How It Works

When you submit a task:

  1. Orchestrator analyzes your request
  2. Creates a plan with ordered steps
  3. Assigns each step to the appropriate agent
  4. Monitors execution and adjusts as needed
  5. Compiles results and responds

Example

Your request:

Find the contact email for Acme Corp and save it to a file

Orchestrator’s plan:

  1. Web Surfer → Navigate to acmecorp.com
  2. Web Surfer → Find contact page
  3. Web Surfer → Extract email address
  4. File Surfer → Save to contacts.txt

Web Surfer

The Web Surfer handles all browser interactions.

Capabilities

  • Navigation: Go to URLs, click links, use browser controls
  • Interaction: Click buttons, fill forms, scroll, select dropdowns
  • Reading: Extract text, find elements, capture page state
  • Waiting: Handle page loads, AJAX requests, animations

Interaction Types

ActionDescription
gotoNavigate to a URL
clickClick an element
typeEnter text in a field
scrollScroll the page
selectChoose from dropdown
hoverMouse over element
waitWait for element/condition

Element Finding

Web Surfer locates elements using:

  • Text content: “Click the ‘Submit’ button”
  • Position: “Click the first link in the navigation”
  • Attributes: “Fill the email input field”
  • Context: “Click ‘Add to Cart’ next to Product A”

Example Actions

Navigate to the login page
→ Web Surfer: goto("https://example.com/login")

Enter username
→ Web Surfer: type("input[name='username']", "john@example.com")

Click submit
→ Web Surfer: click("button[type='submit']")

Coder

The Coder writes and executes code for complex operations.

When Coder Is Used

  • Extracting structured data from complex pages
  • Transforming or processing collected data
  • Handling pagination or infinite scroll
  • Custom logic that’s hard to express as browser actions

Capabilities

  • JavaScript: Execute in browser context
  • Python: Run scripts for data processing
  • Data Parsing: JSON, HTML, CSV handling
  • API Calls: Programmatic web requests

Example Use Cases

Complex Extraction:

The page has a dynamic table. Coder writes JavaScript to:
1. Find all table rows
2. Extract specific columns
3. Return structured data

Pagination Handling:

Results span 10 pages. Coder creates a loop to:
1. Extract current page data
2. Click "Next"
3. Repeat until no more pages
4. Combine all results

Data Transformation:

Raw data needs processing. Coder:
1. Parses the extracted HTML
2. Cleans and normalizes values
3. Converts to requested format (CSV, JSON)

File Surfer

The File Surfer manages files and documents.

Capabilities

  • Download Management: Handle file downloads
  • File Reading: Read downloaded content
  • File Writing: Save extracted data
  • Organization: Manage workspace files

File Operations

OperationDescription
downloadSave file from URL
readRead file contents
writeCreate or update file
listShow workspace files
moveRename or relocate

Example Actions

Save the PDF
→ File Surfer: download("report.pdf")

Extract text from document
→ File Surfer: read("report.pdf") → extract_text()

Save results
→ File Surfer: write("results.json", data)

Workspace Integration

Files are saved to your Calliope workspace:

  • Accessible from other tools (Lab, IDE)
  • Persistent across sessions
  • Downloadable to your local machine

Agent Collaboration

Sequential Tasks

User: "Download the Q4 report from investor.example.com"

1. Orchestrator → Creates plan
2. Web Surfer → Navigates to site
3. Web Surfer → Finds report link
4. Web Surfer → Clicks download
5. File Surfer → Saves to workspace
6. Orchestrator → Reports completion

Parallel Tasks

For efficiency, agents can work simultaneously:

User: "Get contact info from these 5 company websites"

Orchestrator assigns:
- Web Surfer handles navigation and extraction
- Coder processes and structures data
- File Surfer saves results as they complete

Error Recovery

When something goes wrong:

  1. Agent reports the issue
  2. Orchestrator evaluates alternatives
  3. Different approach attempted
  4. User notified if intervention needed
Web Surfer: "Button not found"
Orchestrator: "Try scrolling down first"
Web Surfer: scroll(down) → click(button)

Viewing Agent Activity

Progress Indicators

The interface shows:

  • Which agent is currently active
  • What action is being performed
  • Progress through the plan
  • Any issues encountered

Detailed Logs

Click “Show Details” to see:

  • Full agent communication
  • Exact actions taken
  • Element selectors used
  • Timing information