Agent Types
Agent Types
Agentic Browser uses four specialized agents that work together to complete web automation tasks.
Orchestrator
The Orchestrator is the “brain” that coordinates all activity.
Responsibilities
- Task Planning: Breaks your request into actionable steps
- Agent Assignment: Decides which agent handles each step
- Progress Tracking: Monitors completion and handles errors
- Context Management: Maintains state across the task
How It Works
When you submit a task:
- Orchestrator analyzes your request
- Creates a plan with ordered steps
- Assigns each step to the appropriate agent
- Monitors execution and adjusts as needed
- Compiles results and responds
Example
Your request:
Find the contact email for Acme Corp and save it to a fileOrchestrator’s plan:
- Web Surfer → Navigate to acmecorp.com
- Web Surfer → Find contact page
- Web Surfer → Extract email address
- File Surfer → Save to contacts.txt
Web Surfer
The Web Surfer handles all browser interactions.
Capabilities
- Navigation: Go to URLs, click links, use browser controls
- Interaction: Click buttons, fill forms, scroll, select dropdowns
- Reading: Extract text, find elements, capture page state
- Waiting: Handle page loads, AJAX requests, animations
Interaction Types
| Action | Description |
|---|---|
goto | Navigate to a URL |
click | Click an element |
type | Enter text in a field |
scroll | Scroll the page |
select | Choose from dropdown |
hover | Mouse over element |
wait | Wait for element/condition |
Element Finding
Web Surfer locates elements using:
- Text content: “Click the ‘Submit’ button”
- Position: “Click the first link in the navigation”
- Attributes: “Fill the email input field”
- Context: “Click ‘Add to Cart’ next to Product A”
Example Actions
Navigate to the login page
→ Web Surfer: goto("https://example.com/login")
Enter username
→ Web Surfer: type("input[name='username']", "john@example.com")
Click submit
→ Web Surfer: click("button[type='submit']")Coder
The Coder writes and executes code for complex operations.
When Coder Is Used
- Extracting structured data from complex pages
- Transforming or processing collected data
- Handling pagination or infinite scroll
- Custom logic that’s hard to express as browser actions
Capabilities
- JavaScript: Execute in browser context
- Python: Run scripts for data processing
- Data Parsing: JSON, HTML, CSV handling
- API Calls: Programmatic web requests
Example Use Cases
Complex Extraction:
The page has a dynamic table. Coder writes JavaScript to:
1. Find all table rows
2. Extract specific columns
3. Return structured dataPagination Handling:
Results span 10 pages. Coder creates a loop to:
1. Extract current page data
2. Click "Next"
3. Repeat until no more pages
4. Combine all resultsData Transformation:
Raw data needs processing. Coder:
1. Parses the extracted HTML
2. Cleans and normalizes values
3. Converts to requested format (CSV, JSON)File Surfer
The File Surfer manages files and documents.
Capabilities
- Download Management: Handle file downloads
- File Reading: Read downloaded content
- File Writing: Save extracted data
- Organization: Manage workspace files
File Operations
| Operation | Description |
|---|---|
download | Save file from URL |
read | Read file contents |
write | Create or update file |
list | Show workspace files |
move | Rename or relocate |
Example Actions
Save the PDF
→ File Surfer: download("report.pdf")
Extract text from document
→ File Surfer: read("report.pdf") → extract_text()
Save results
→ File Surfer: write("results.json", data)Workspace Integration
Files are saved to your Calliope workspace:
- Accessible from other tools (Lab, IDE)
- Persistent across sessions
- Downloadable to your local machine
Agent Collaboration
Sequential Tasks
User: "Download the Q4 report from investor.example.com"
1. Orchestrator → Creates plan
2. Web Surfer → Navigates to site
3. Web Surfer → Finds report link
4. Web Surfer → Clicks download
5. File Surfer → Saves to workspace
6. Orchestrator → Reports completionParallel Tasks
For efficiency, agents can work simultaneously:
User: "Get contact info from these 5 company websites"
Orchestrator assigns:
- Web Surfer handles navigation and extraction
- Coder processes and structures data
- File Surfer saves results as they completeError Recovery
When something goes wrong:
- Agent reports the issue
- Orchestrator evaluates alternatives
- Different approach attempted
- User notified if intervention needed
Web Surfer: "Button not found"
Orchestrator: "Try scrolling down first"
Web Surfer: scroll(down) → click(button)Viewing Agent Activity
Progress Indicators
The interface shows:
- Which agent is currently active
- What action is being performed
- Progress through the plan
- Any issues encountered
Detailed Logs
Click “Show Details” to see:
- Full agent communication
- Exact actions taken
- Element selectors used
- Timing information