Design evaluation metrics for {{model}} on {{unique_task}}. Step 1: Define what success means for this task Step 2: Identify measurable proxies for success Step 3: Design metric formulas with clear semantics Step 4: Validate metrics correlate with human judgment Step 5: Establish baseline scores and thresholds Step 6: Document metric interpretation guidelines Justify each design choice with examples.
33 copies0 forks
Details
Category
AnalysisUse Cases
Metric designEvaluation creationQuality measurement
Works Best With
claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared