Quality Score Aggregation

U

@

·

Aggregate quality scores across multiple evaluators.

53 copies0 forks
Score {{model}} output quality on {{evaluation_criteria}} using multiple evaluators.

Run {{evaluator_count}} independent quality assessments:
- Each evaluator scores 1-10 per criterion
- Record all scores
- Calculate inter-evaluator agreement

Report aggregated scores with confidence intervals. Flag criteria with high disagreement. Weight final scores by evaluator reliability on {{calibration_set}}.

Details

Category

Analysis

Use Cases

Quality aggregationMulti-evaluator scoringAgreement analysis

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared

Create your own prompt vault and start sharing

Quality Score Aggregation | Promptsy