Grade {{model}} responses to {{test_queries}} using this rubric: Grade A example: {{grade_a_response}} - Comprehensive, accurate, well-structured Grade C example: {{grade_c_response}} - Partially correct, missing details Grade F example: {{grade_f_response}} - Incorrect or unhelpful Grade test responses and report distribution.
83 copies0 forks
Details
Category
AnalysisUse Cases
Response gradingQuality assessmentRubric evaluation
Works Best With
claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared