Define evaluation metrics for {{model}} on {{novel_capability}}. Generate metric definitions: 1. Primary success metric with formula 2. Supporting quality metrics (3-5) 3. Guardrail metrics to avoid gaming 4. Composite scoring approach For each metric, provide calculation method, interpretation guide, and baseline expectations for {{performance_tier}}.
65 copies0 forks
Details
Category
AnalysisUse Cases
Metric creationDefinition designMeasurement planning
Works Best With
claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared