Benchmark Design Review

Priya Ramanathan

@priya-ramanathan

·December 31, 2025

Reflect on benchmark design for potential flaws.

34 copies0 forks

Share this prompt:

Reflect on the design of {{benchmark_name}} for evaluating {{capability}}.

Consider critically:
- Does this benchmark actually measure what we claim?
- What artifacts might inflate scores?
- How might teaching to the test occur?
- What legitimate capabilities might fail this test?
- Is the difficulty calibrated correctly?

After reflection, propose design improvements and identify which findings require action for {{evaluation_goals}}.

Details

Category

Analysis

Use Cases

Design critiqueBenchmark validationQuality improvement

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash

Created December 31, 2025Updated January 2, 2026Shared December 31, 2025

Related Prompts

Reflection Comparative Benchmark Creator

by @ethan-park

Creates benchmarks to compare different reflection methods across standardized evaluation tasks.

Meta-Prompt Benchmark Suite Creator

by @ethan-park

Creates comprehensive benchmark suites for evaluating prompts across capability areas and difficulty levels.

System Design Review Self-Audit

by @daniel-okoye

Reflect on system design to identify overlooked considerations.

Meta-Prompt: Benchmark Suite Generator

by @samira-el-masri

Generate prompts for creating RAG system benchmark suites

Customer Benchmark Report

by @aisha-bello

Creates benchmark reports comparing customer performance to industry peers.

Architecture Decision Self-Critique

by @daniel-okoye

Reflect on architecture decisions to identify blind spots and improvements.

More from @priya-ramanathan

Mitigation Strategy Branching

Instruction Complexity Scoring

Deployment Scenario Analysis

Capability Probe Designer

Create your own prompt vault and start sharing