RAG Features

Deep Data Agent

RAG Features

Deep Data Agent uses Retrieval-Augmented Generation (RAG) to answer questions using your documents and code.

How RAG Works

The Process

You ask a question: “What’s our refund policy?”
Semantic search: Deep Data Agent searches your indexed documents
Context retrieval: Relevant chunks are pulled from the index
Augmented response: The AI answers using both its knowledge and your documents

Benefits

Accurate answers based on your actual documentation
Source citations so you can verify information
Up-to-date with your latest content
Domain-specific understanding of your terminology

Setting Up RAG

Indexing Documents

Open Settings → RAG
Click Add Documents
Upload files or point to a directory
Wait for indexing to complete

Supported Content

Type	Formats	Best For
Documents	PDF, DOCX, TXT	Policies, reports, manuals
Markdown	MD	Technical docs, READMEs
Code	PY, JS, TS, SQL	Codebase understanding
Notebooks	IPYNB	Data analysis history
Data	CSV, JSON	Reference data

Indexing Options

Chunk Size Controls how documents are split:

Smaller chunks (256-512 tokens): More precise retrieval
Larger chunks (1024-2048 tokens): More context per match

Chunk Overlap How much chunks share with neighbors:

Higher overlap: Better continuity
Lower overlap: Less redundancy

Default settings work well for most cases.

Using RAG

Asking Questions

Simply ask questions—RAG is automatic:

What does our documentation say about API rate limits?

How do I configure the authentication module?

Find the function that handles payment processing

Source Citations

Responses include source references:

Based on docs/api-guide.md, the rate limit is 100 requests per minute…

Click citations to see the source content.

Focusing Searches

Narrow your search to specific content:

In the Python codebase, find the database connection logic

According to the Q4 report, what were our top products?

Search only the API documentation for webhook examples

Combining with Analysis

RAG works alongside other features:

Load sales.csv and compare the results to what's documented in the data dictionary

Query the database for customer counts, then check if it matches our documentation

RAG Best Practices

Document Organization

DO:

Keep documents well-structured with clear headings
Use descriptive file names
Maintain a single source of truth
Update documents when information changes

DON’T:

Index duplicate content
Include outdated documentation
Mix unrelated content in single files
Index auto-generated or minified files

Effective Queries

Specific questions work best:

What are the required parameters for the /users endpoint?

Better than:

Tell me about users

Reference document types:

According to the style guide, how should error messages be formatted?

Better than:

How do I format errors?

Content Quality

The quality of RAG answers depends on your documents:

Well-written docs → Clear, accurate answers
Vague docs → Vague answers
Outdated docs → Outdated answers
Missing docs → “I don’t have information about that”

Managing RAG

Viewing Indexed Content

See what’s in your RAG index:

Open Settings → RAG
Browse indexed documents
View chunk counts and status

Updating Content

When documents change:

Open Settings → RAG
Click Re-index for updated documents
Or delete and re-add

New files in indexed directories are picked up automatically.

Removing Content

To remove documents from the index:

Open Settings → RAG
Select documents to remove
Click Remove from index

Clearing the Index

To start fresh:

Open Settings → RAG
Click Clear All
Confirm the action

Advanced Usage

Code Understanding

Index your codebase for development help:

Find where the User class is defined

What functions call the payment API?

Show me examples of error handling in the codebase

Cross-Referencing

Combine RAG with database queries:

Our docs say we have 5 product categories—verify this against the database

Check if the column names in our schema match the data dictionary

Analysis History

Index past notebooks and reports:

How did we calculate churn rate in the Q3 analysis?

What visualization approach did we use for the executive dashboard?

Troubleshooting

“I don’t have information about that”

Check if the relevant document is indexed
Verify the document contains the information
Try rephrasing your question
Use more specific search terms

Wrong or Outdated Answers

Re-index the updated documents
Remove old versions from the index
Check for duplicate/conflicting documents

Slow Responses

Large indexes take longer to search
Complex queries need more processing
Consider indexing only essential documents

Missing Context

If answers lack detail:

Increase chunk size for more context
Ensure related content is indexed
Ask follow-up questions for more depth

Connecting Datasources Tips & Examples