RAG Features
Deep Data Agent uses Retrieval-Augmented Generation (RAG) to answer questions using your documents and code.
How RAG Works
The Process
- You ask a question: “What’s our refund policy?”
- Semantic search: Deep Data Agent searches your indexed documents
- Context retrieval: Relevant chunks are pulled from the index
- Augmented response: The AI answers using both its knowledge and your documents
Benefits
- Accurate answers based on your actual documentation
- Source citations so you can verify information
- Up-to-date with your latest content
- Domain-specific understanding of your terminology
Setting Up RAG
Indexing Documents
- Open Settings → RAG
- Click Add Documents
- Upload files or point to a directory
- Wait for indexing to complete
Supported Content
| Type | Formats | Best For |
|---|---|---|
| Documents | PDF, DOCX, TXT | Policies, reports, manuals |
| Markdown | MD | Technical docs, READMEs |
| Code | PY, JS, TS, SQL | Codebase understanding |
| Notebooks | IPYNB | Data analysis history |
| Data | CSV, JSON | Reference data |
Indexing Options
Chunk Size Controls how documents are split:
- Smaller chunks (256-512 tokens): More precise retrieval
- Larger chunks (1024-2048 tokens): More context per match
Chunk Overlap How much chunks share with neighbors:
- Higher overlap: Better continuity
- Lower overlap: Less redundancy
Default settings work well for most cases.
Using RAG
Asking Questions
Simply ask questions—RAG is automatic:
What does our documentation say about API rate limits?How do I configure the authentication module?Find the function that handles payment processingSource Citations
Responses include source references:
Based on
docs/api-guide.md, the rate limit is 100 requests per minute…
Click citations to see the source content.
Focusing Searches
Narrow your search to specific content:
In the Python codebase, find the database connection logicAccording to the Q4 report, what were our top products?Search only the API documentation for webhook examplesCombining with Analysis
RAG works alongside other features:
Load sales.csv and compare the results to what's documented in the data dictionaryQuery the database for customer counts, then check if it matches our documentationRAG Best Practices
Document Organization
DO:
- Keep documents well-structured with clear headings
- Use descriptive file names
- Maintain a single source of truth
- Update documents when information changes
DON’T:
- Index duplicate content
- Include outdated documentation
- Mix unrelated content in single files
- Index auto-generated or minified files
Effective Queries
Specific questions work best:
What are the required parameters for the /users endpoint?Better than:
Tell me about usersReference document types:
According to the style guide, how should error messages be formatted?Better than:
How do I format errors?Content Quality
The quality of RAG answers depends on your documents:
- Well-written docs → Clear, accurate answers
- Vague docs → Vague answers
- Outdated docs → Outdated answers
- Missing docs → “I don’t have information about that”
Managing RAG
Viewing Indexed Content
See what’s in your RAG index:
- Open Settings → RAG
- Browse indexed documents
- View chunk counts and status
Updating Content
When documents change:
- Open Settings → RAG
- Click Re-index for updated documents
- Or delete and re-add
New files in indexed directories are picked up automatically.
Removing Content
To remove documents from the index:
- Open Settings → RAG
- Select documents to remove
- Click Remove from index
Clearing the Index
To start fresh:
- Open Settings → RAG
- Click Clear All
- Confirm the action
Advanced Usage
Code Understanding
Index your codebase for development help:
Find where the User class is definedWhat functions call the payment API?Show me examples of error handling in the codebaseCross-Referencing
Combine RAG with database queries:
Our docs say we have 5 product categories—verify this against the databaseCheck if the column names in our schema match the data dictionaryAnalysis History
Index past notebooks and reports:
How did we calculate churn rate in the Q3 analysis?What visualization approach did we use for the executive dashboard?Troubleshooting
“I don’t have information about that”
- Check if the relevant document is indexed
- Verify the document contains the information
- Try rephrasing your question
- Use more specific search terms
Wrong or Outdated Answers
- Re-index the updated documents
- Remove old versions from the index
- Check for duplicate/conflicting documents
Slow Responses
- Large indexes take longer to search
- Complex queries need more processing
- Consider indexing only essential documents
Missing Context
If answers lack detail:
- Increase chunk size for more context
- Ensure related content is indexed
- Ask follow-up questions for more depth