Benchmark Creation Pipeline

Priya Ramanathan

@priya-ramanathan

·December 31, 2025

Decompose benchmark creation into sequential steps.

95 copies0 forks

Share this prompt:

Decompose creating {{benchmark_name}} for {{capability}} into steps.

Phase 1 - Design:
- Define test categories
- Specify difficulty levels
- Create rubric drafts

Phase 2 - Development:
- Generate test cases
- Validate ground truth
- Calibrate scoring

Phase 3 - Validation:
- Pilot with {{pilot_models}}
- Refine based on results
- Document administration

Output detailed task list with effort estimates and {{resource_requirements}}.

Details

Category

Analysis

Use Cases

Benchmark planningPipeline designProject management

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash

Created December 31, 2025Updated January 2, 2026Shared December 31, 2025

Related Prompts

Meta-Prompt Benchmark Suite Creator

by @ethan-park

Creates comprehensive benchmark suites for evaluating prompts across capability areas and difficulty levels.

Reflection Comparative Benchmark Creator

by @ethan-park

Creates benchmarks to compare different reflection methods across standardized evaluation tasks.

Meta-Prompt: Benchmark Suite Generator

by @samira-el-masri

Generate prompts for creating RAG system benchmark suites

Customer Benchmark Report

by @aisha-bello

Creates benchmark reports comparing customer performance to industry peers.

Task Decomposition: RAG Evaluation Framework

by @samira-el-masri

Break down RAG evaluation framework project into phased implementation tasks

Embedding Model Benchmark Template

by @samira-el-masri

Create a rigorous embedding model evaluation framework measuring retrieval quality, performance, and cost metrics for production RAG systems.

More from @priya-ramanathan

Mitigation Strategy Branching

Instruction Complexity Scoring

Deployment Scenario Analysis

Capability Probe Designer

Create your own prompt vault and start sharing