Reflect on metrics chosen to evaluate {{model}} on {{task_type}}. Critique metric selection: - Do these metrics capture what matters? - What important aspects are not measured? - Could optimizing these metrics cause harm? - Are metrics aligned with user value? - What gaming opportunities exist? After reflection, recommend metric additions or replacements with justification for {{business_goals}}.
40 copies0 forks
Details
Category
AnalysisUse Cases
Metric reviewSelection critiqueMeasurement alignment
Works Best With
claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared