LLM Response Quality Scorer

U

@

·

Build an automated multi-dimensional quality scorer for LLM responses with LLM-as-judge and calibration against human labels.

12 copies0 forks
Build an automated quality scorer for LLM responses.

## Quality Dimensions
{{quality_dimensions}}

## Reference Data
{{reference_data}}

## Scoring Requirements
{{scoring_requirements}}

Implement a multi-dimensional scorer:

```python
class ResponseQualityScorer:
    def score(self, query: str, response: str, context: Optional[str] = None) -> QualityScore:
        """
        Dimensions to score:
        - Relevance (0-1)
        - Completeness (0-1)
        - Accuracy (0-1)
        - Coherence (0-1)
        - Conciseness (0-1)
        """
        pass
    
    def explain_scores(self, scores: QualityScore) -> str:
        """Generate human-readable explanation"""
        pass
```

Include:
- LLM-as-judge implementation
- Calibration with human labels
- Batch scoring optimization
- Score aggregation strategies

Details

Category

Coding

Use Cases

Quality scoringResponse evaluationAutomated review

Works Best With

claude-sonnet-4-20250514gpt-4o
Created Shared

Create your own prompt vault and start sharing

LLM Response Quality Scorer | Promptsy