Benchmark Creation Pipeline

U

@

·

Decompose benchmark creation into sequential steps.

95 copies0 forks
Decompose creating {{benchmark_name}} for {{capability}} into steps.

Phase 1 - Design:
- Define test categories
- Specify difficulty levels
- Create rubric drafts

Phase 2 - Development:
- Generate test cases
- Validate ground truth
- Calibrate scoring

Phase 3 - Validation:
- Pilot with {{pilot_models}}
- Refine based on results
- Document administration

Output detailed task list with effort estimates and {{resource_requirements}}.

Details

Category

Analysis

Use Cases

Benchmark planningPipeline designProject management

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared

Create your own prompt vault and start sharing

Benchmark Creation Pipeline | Promptsy