Design a comprehensive benchmark for evaluating embedding models for our RAG system. ## Use Case {{use_case_description}} ## Dataset Characteristics {{dataset_description}} ## Models to Evaluate {{model_list}} Create a benchmark that measures: 1. Retrieval quality (MRR, Recall@K, NDCG) 2. Inference latency (P50, P95, P99) 3. Throughput (embeddings/second) 4. Memory footprint 5. Cost per 1M tokens Provide: - Python code structure for the benchmark - Recommended test dataset size - Statistical significance testing approach - Visualization recommendations
Embedding Model Benchmark Template
Create a rigorous embedding model evaluation framework measuring retrieval quality, performance, and cost metrics for production RAG systems.
96 copies0 forks
Share this prompt:
Details
Category
CodingUse Cases
Model evaluationBenchmark designPerformance testing
Works Best With
claude-sonnet-4-20250514gpt-4o
Created Updated Shared