AI / RAG Engineering10 min read

RAG Optimization Techniques for Production AI

How a data analytics dashboard evolved into a natural language query system for complex datasets.

Vanilla JSGemini APIRAGCSV ProcessingStatistical Analysis

When I joined as an intern last summer, the brief was straightforward: build a data analytics dashboard for Pre-Employment Transition Services. What started as a visualization tool evolved into something more interesting: a system that combines traditional data processing with retrieval-augmented generation to make complex datasets actually queryable in natural language.

The Problem Space

State administrators needed to evaluate provider performance across thousands of participants, multiple service types, and various time periods. The existing workflow involved manual CSV analysis, spreadsheet pivots, and hours of report generation.

The real challenge wasn't just visualizing the data. It was making it interpretable and actionable without requiring a data science background.

Architecture Decisions

Client-Side Processing

Vanilla JavaScript, no framework bloat. Uploaded CSVs never leave the user's browser, ensuring data privacy.

Server-Side Intelligence

Gemini API for natural language interpretation and synthesis. The LLM handles analysis, not data retrieval.

Lightweight Query Engine

Custom retrieval layer that parses natural language into precise data queries, sitting between user questions and raw data.

The Retrieval Component

Rather than fine-tuning a model or building a complex vector database, I built a lightweight query engine that sits between user questions and the raw data. The key insight: most analytics questions follow predictable patterns.

// DataQueryEngine core methods

parseTimeframe(question)

// "3 months ago", "May to August", "last quarter"

// Convert natural language → date ranges

identifyEntities(question, data)

// Extract agencies, service types, metrics

// Match against actual dataset values

buildQuery(parsedQuestion)

// Construct precise data filters

The engine extracts temporal references, entity mentions (agencies, services), and performance indicators from questions, then builds precise queries against the loaded dataset. This gives the LLM actual data to work with rather than trying to hallucinate answers about CSVs it's never seen.

Augmented Generation

Once the retrieval engine pulls relevant data, it gets packaged into a structured prompt for the Gemini API. The prompt includes:

Filtered Records

Participant IDs, agencies, scores, and timestamps matching the query

Aggregate Metrics

Calculated means, totals, and distributions for the filtered set

Statistical Context

Standard deviations, peer comparisons, and trend data

Pre-computed SWOT

Strengths, weaknesses, opportunities, and threats already analyzed

Key Insight: The LLM's job is interpretation and synthesis, not data retrieval. Ask "which agencies had missing follow-ups in the last 3 months?" and you get exact participant IDs from the query engine + contextual analysis from the LLM.

Statistical Rigor Without Overengineering

The performance scoring system needed to be defensible, not vibes-based. Each agency gets a composite score:

Improvement Metrics

Average pre-post score delta across all participants

Follow-up Completion

Percentage of participants with both assessments

Service Variety

Breadth of services offered by the agency

Recency Weighting

Recent activity matters more than historical data

Confidence Adjustment

Agencies with n < 25 participants get penalized

Why confidence adjustment matters: It prevents agencies that served 5 participants (all successful) from outranking agencies with 500 participants and a 90% success rate. Basic statistical power considerations.

For outlier detection: flag anything beyond 1σ from the mean within peer groups. Not sophisticated, but administrators aren't looking for p-values. They need "this agency is underperforming, investigate why."

What I'd Do Differently

API Key Management

The prototype hardcodes the Gemini API key in JavaScript (demo only). Production needs server-side proxying or at minimum, environment variables and request signing.

Caching Layer

Every "Generate Analysis" click hits the Gemini API. For production, cache analyses by state+period hash, invalidate on data upload. Could cut API costs by 80%.

Chunking Strategies

Currently sending entire filtered datasets in prompts. For states with thousands of participants, this hits token limits. Should implement hierarchical summarization: aggregate at agency level for the prompt, keep raw data for follow-up queries.

Drupal Integration

Migration path to Drupal for proper user management and data persistence. The current decoupled frontend architecture makes this relatively clean, but the statistical analysis functions need to move server-side.

Measuring Success

The dashboard went live for pilot testing in three states. Early feedback showed administrators were actually using the follow-up question feature, asking things like "show me participants who started strong but dropped off" or "compare urban vs rural provider performance."

~2h → 30s

Analysis Generation Time

3 States

Pilot Testing Scope

80%

Potential API Cost Savings

90%

Value vs Full Vector DB Setup

More importantly, the system surfaced actionable insights that weren't obvious from raw CSVs: correlation between service variety and participant improvement, or specific agencies with concerning follow-up gaps.

Technical Takeaways

RAG doesn't require heavy infrastructure. A well-designed query parser + LLM delivers 90% of the value with 10% of the complexity.

Know when to stop processing client-side. CSV parsing in JS is fine. Complex statistical analysis? Plan for server-side from day one.

Statistical literacy matters. Domain expertise + mathematical rigor beats fancy ML every time for structured data use cases.

Prompt engineering is system design. The retrieval layer and context structure matter more than the LLM itself.

"The LLM is the least interesting part."

The difference between a hallucinating chatbot and a reliable analytics tool is how you structure the retrieval layer.

Built During Internship

Summer 2025 internship project. Currently being integrated into a production Drupal environment.

Vanilla JS + Gemini APIStatistical Analysis