Evaluation Result Self-Critique

U

@

·

Critically review evaluation results for blind spots.

23 copies0 forks
Review evaluation results for {{model}} on {{benchmark}} critically.

Reflect on:
- What assumptions did the evaluation make?
- What scenarios were not tested?
- Could the metrics be gamed?
- Are there confounding variables?
- What would invalidate these results?

After reflection, identify the top 3 evaluation blind spots and recommend additional tests to address them for {{deployment_context}}.

Details

Category

Analysis

Use Cases

Result validationBlind spot detectionEvaluation improvement

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared

Create your own prompt vault and start sharing

Evaluation Result Self-Critique | Promptsy