Evaluation Result Self-Critique

Priya Ramanathan

@priya-ramanathan

·December 31, 2025

Critically review evaluation results for blind spots.

23 copies0 forks

Share this prompt:

Review evaluation results for {{model}} on {{benchmark}} critically.

Reflect on:
- What assumptions did the evaluation make?
- What scenarios were not tested?
- Could the metrics be gamed?
- Are there confounding variables?
- What would invalidate these results?

After reflection, identify the top 3 evaluation blind spots and recommend additional tests to address them for {{deployment_context}}.

Details

Category

Analysis

Use Cases

Result validationBlind spot detectionEvaluation improvement

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash

Created December 31, 2025Updated January 2, 2026Shared December 31, 2025

Related Prompts

Architecture Decision Self-Critique

by @daniel-okoye

Reflect on architecture decisions to identify blind spots and improvements.

Production Deployment Debrief

by @daniel-okoye

Reflect on deployment execution to improve future releases.

System Design Review Self-Audit

by @daniel-okoye

Reflect on system design to identify overlooked considerations.

Reflection: Model Serving Architecture Review

by @samira-el-masri

Self-reflective analysis of model serving architecture decisions and outcomes

Content Performance Self-Critique

by @jamie-torres

Systematically critique content performance to identify improvement opportunities and generate optimized revisions.

Technical Interview Debrief

by @daniel-okoye

Reflect on interview assessment to improve evaluation accuracy.

More from @priya-ramanathan

Mitigation Strategy Branching

Instruction Complexity Scoring

Deployment Scenario Analysis

Capability Probe Designer

Create your own prompt vault and start sharing