Create a benchmark suite to evaluate {{model}} on {{capability}}. Design the benchmark by: 1. Defining test categories covering {{evaluation_dimensions}} 2. Generating 10 representative test cases per category 3. Creating scoring rubrics for each category 4. Specifying baseline scores for comparison Output complete benchmark specification with administration instructions and {{interpretation_guide}}.
74 copies0 forks
Details
Category
AnalysisUse Cases
Benchmark creationSuite designEvaluation scaffolding
Works Best With
claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared