RAG Evaluation Framework

U

@

·

Build a comprehensive RAG evaluation framework measuring retrieval, generation, and end-to-end quality with automated and human evaluation protocols.

38 copies0 forks
You are a Lead AI Engineer. Build a comprehensive RAG evaluation framework.

## System Under Test
{{system_description}}

## Evaluation Dataset
{{eval_dataset_description}}

## Quality Dimensions
{{quality_dimensions}}

Create an evaluation framework measuring:

1. **Retrieval Quality**
   - Precision@K, Recall@K, NDCG
   - Context relevance scoring

2. **Generation Quality**
   - Faithfulness to context
   - Answer completeness
   - Hallucination detection

3. **End-to-End Quality**
   - Correctness (vs ground truth)
   - User satisfaction proxies

4. **Operational Metrics**
   - Latency distributions
   - Token efficiency

Provide Python implementation with:
- Automated evaluation pipelines
- Human evaluation protocols
- Regression detection

Details

Category

Coding

Use Cases

RAG evaluationQuality measurementRegression testing

Works Best With

claude-sonnet-4-20250514gpt-4o
Created Shared

Create your own prompt vault and start sharing