RAG Features

RAG Features

Deep Data Agent uses Retrieval-Augmented Generation (RAG) to answer questions using your documents and code.

How RAG Works

The Process

  1. You ask a question: “What’s our refund policy?”
  2. Semantic search: Deep Data Agent searches your indexed documents
  3. Context retrieval: Relevant chunks are pulled from the index
  4. Augmented response: The AI answers using both its knowledge and your documents

Benefits

  • Accurate answers based on your actual documentation
  • Source citations so you can verify information
  • Up-to-date with your latest content
  • Domain-specific understanding of your terminology

Setting Up RAG

Indexing Documents

  1. Open SettingsRAG
  2. Click Add Documents
  3. Upload files or point to a directory
  4. Wait for indexing to complete

Supported Content

TypeFormatsBest For
DocumentsPDF, DOCX, TXTPolicies, reports, manuals
MarkdownMDTechnical docs, READMEs
CodePY, JS, TS, SQLCodebase understanding
NotebooksIPYNBData analysis history
DataCSV, JSONReference data

Indexing Options

Chunk Size Controls how documents are split:

  • Smaller chunks (256-512 tokens): More precise retrieval
  • Larger chunks (1024-2048 tokens): More context per match

Chunk Overlap How much chunks share with neighbors:

  • Higher overlap: Better continuity
  • Lower overlap: Less redundancy

Default settings work well for most cases.

Using RAG

Asking Questions

Simply ask questions—RAG is automatic:

What does our documentation say about API rate limits?
How do I configure the authentication module?
Find the function that handles payment processing

Source Citations

Responses include source references:

Based on docs/api-guide.md, the rate limit is 100 requests per minute…

Click citations to see the source content.

Focusing Searches

Narrow your search to specific content:

In the Python codebase, find the database connection logic
According to the Q4 report, what were our top products?
Search only the API documentation for webhook examples

Combining with Analysis

RAG works alongside other features:

Load sales.csv and compare the results to what's documented in the data dictionary
Query the database for customer counts, then check if it matches our documentation

RAG Best Practices

Document Organization

DO:

  • Keep documents well-structured with clear headings
  • Use descriptive file names
  • Maintain a single source of truth
  • Update documents when information changes

DON’T:

  • Index duplicate content
  • Include outdated documentation
  • Mix unrelated content in single files
  • Index auto-generated or minified files

Effective Queries

Specific questions work best:

What are the required parameters for the /users endpoint?

Better than:

Tell me about users

Reference document types:

According to the style guide, how should error messages be formatted?

Better than:

How do I format errors?

Content Quality

The quality of RAG answers depends on your documents:

  • Well-written docs → Clear, accurate answers
  • Vague docs → Vague answers
  • Outdated docs → Outdated answers
  • Missing docs → “I don’t have information about that”

Managing RAG

Viewing Indexed Content

See what’s in your RAG index:

  1. Open SettingsRAG
  2. Browse indexed documents
  3. View chunk counts and status

Updating Content

When documents change:

  1. Open SettingsRAG
  2. Click Re-index for updated documents
  3. Or delete and re-add

New files in indexed directories are picked up automatically.

Removing Content

To remove documents from the index:

  1. Open SettingsRAG
  2. Select documents to remove
  3. Click Remove from index

Clearing the Index

To start fresh:

  1. Open SettingsRAG
  2. Click Clear All
  3. Confirm the action

Advanced Usage

Code Understanding

Index your codebase for development help:

Find where the User class is defined
What functions call the payment API?
Show me examples of error handling in the codebase

Cross-Referencing

Combine RAG with database queries:

Our docs say we have 5 product categories—verify this against the database
Check if the column names in our schema match the data dictionary

Analysis History

Index past notebooks and reports:

How did we calculate churn rate in the Q3 analysis?
What visualization approach did we use for the executive dashboard?

Troubleshooting

“I don’t have information about that”

  • Check if the relevant document is indexed
  • Verify the document contains the information
  • Try rephrasing your question
  • Use more specific search terms

Wrong or Outdated Answers

  • Re-index the updated documents
  • Remove old versions from the index
  • Check for duplicate/conflicting documents

Slow Responses

  • Large indexes take longer to search
  • Complex queries need more processing
  • Consider indexing only essential documents

Missing Context

If answers lack detail:

  • Increase chunk size for more context
  • Ensure related content is indexed
  • Ask follow-up questions for more depth