RAG Optimization Techniques for Production AI
How a data analytics dashboard evolved into a natural language query system for complex datasets.
When I joined as an intern last summer, the brief was straightforward: build a data analytics dashboard for Pre-Employment Transition Services. What started as a visualization tool evolved into something more interesting: a system that combines traditional data processing with retrieval-augmented generation to make complex datasets actually queryable in natural language.
The Problem Space
State administrators needed to evaluate provider performance across thousands of participants, multiple service types, and various time periods. The existing workflow involved manual CSV analysis, spreadsheet pivots, and hours of report generation.
The real challenge wasn't just visualizing the data. It was making it interpretable and actionable without requiring a data science background.
Architecture Decisions
Client-Side Processing
Vanilla JavaScript, no framework bloat. Uploaded CSVs never leave the user's browser, ensuring data privacy.
Server-Side Intelligence
Gemini API for natural language interpretation and synthesis. The LLM handles analysis, not data retrieval.
Lightweight Query Engine
Custom retrieval layer that parses natural language into precise data queries, sitting between user questions and raw data.
The Retrieval Component
Rather than fine-tuning a model or building a complex vector database, I built a lightweight query engine that sits between user questions and the raw data. The key insight: most analytics questions follow predictable patterns.
The engine extracts temporal references, entity mentions (agencies, services), and performance indicators from questions, then builds precise queries against the loaded dataset. This gives the LLM actual data to work with rather than trying to hallucinate answers about CSVs it's never seen.
Augmented Generation
Once the retrieval engine pulls relevant data, it gets packaged into a structured prompt for the Gemini API. The prompt includes:
Filtered Records
Participant IDs, agencies, scores, and timestamps matching the query
Aggregate Metrics
Calculated means, totals, and distributions for the filtered set
Statistical Context
Standard deviations, peer comparisons, and trend data
Pre-computed SWOT
Strengths, weaknesses, opportunities, and threats already analyzed
Key Insight: The LLM's job is interpretation and synthesis, not data retrieval. Ask "which agencies had missing follow-ups in the last 3 months?" and you get exact participant IDs from the query engine + contextual analysis from the LLM.
Statistical Rigor Without Overengineering
The performance scoring system needed to be defensible, not vibes-based. Each agency gets a composite score:
Improvement Metrics
Average pre-post score delta across all participants
Follow-up Completion
Percentage of participants with both assessments
Service Variety
Breadth of services offered by the agency
Recency Weighting
Recent activity matters more than historical data
Confidence Adjustment
Agencies with n < 25 participants get penalized
Why confidence adjustment matters: It prevents agencies that served 5 participants (all successful) from outranking agencies with 500 participants and a 90% success rate. Basic statistical power considerations.
For outlier detection: flag anything beyond 1σ from the mean within peer groups. Not sophisticated, but administrators aren't looking for p-values. They need "this agency is underperforming, investigate why."
What I'd Do Differently
API Key Management
The prototype hardcodes the Gemini API key in JavaScript (demo only). Production needs server-side proxying or at minimum, environment variables and request signing.
Caching Layer
Every "Generate Analysis" click hits the Gemini API. For production, cache analyses by state+period hash, invalidate on data upload. Could cut API costs by 80%.
Chunking Strategies
Currently sending entire filtered datasets in prompts. For states with thousands of participants, this hits token limits. Should implement hierarchical summarization: aggregate at agency level for the prompt, keep raw data for follow-up queries.
Drupal Integration
Migration path to Drupal for proper user management and data persistence. The current decoupled frontend architecture makes this relatively clean, but the statistical analysis functions need to move server-side.
Measuring Success
The dashboard went live for pilot testing in three states. Early feedback showed administrators were actually using the follow-up question feature, asking things like "show me participants who started strong but dropped off" or "compare urban vs rural provider performance."
More importantly, the system surfaced actionable insights that weren't obvious from raw CSVs: correlation between service variety and participant improvement, or specific agencies with concerning follow-up gaps.
Technical Takeaways
"The LLM is the least interesting part."
The difference between a hallucinating chatbot and a reliable analytics tool is how you structure the retrieval layer.
Built During Internship
Summer 2025 internship project. Currently being integrated into a production Drupal environment.