You are a Lead AI Engineer. Build a comprehensive RAG evaluation framework. ## System Under Test {{system_description}} ## Evaluation Dataset {{eval_dataset_description}} ## Quality Dimensions {{quality_dimensions}} Create an evaluation framework measuring: 1. **Retrieval Quality** - Precision@K, Recall@K, NDCG - Context relevance scoring 2. **Generation Quality** - Faithfulness to context - Answer completeness - Hallucination detection 3. **End-to-End Quality** - Correctness (vs ground truth) - User satisfaction proxies 4. **Operational Metrics** - Latency distributions - Token efficiency Provide Python implementation with: - Automated evaluation pipelines - Human evaluation protocols - Regression detection
RAG Evaluation Framework
U
@
Build a comprehensive RAG evaluation framework measuring retrieval, generation, and end-to-end quality with automated and human evaluation protocols.
38 copies0 forks
Details
Category
CodingUse Cases
RAG evaluationQuality measurementRegression testing
Works Best With
claude-sonnet-4-20250514gpt-4o
Created Shared