Multi-Run Accuracy Validation

U

@

·

Validate accuracy through multiple independent evaluation runs.

39 copies0 forks
Evaluate {{model}} accuracy on {{dataset}} using self-consistency.

Run evaluation 5 independent times with different random seeds:
- Sample different test subsets
- Vary evaluation order
- Record accuracy for each run

Calculate mean accuracy with confidence interval. Flag if variance exceeds {{variance_threshold}}. Report most reliable estimate with uncertainty bounds.

Details

Category

Analysis

Use Cases

Accuracy validationConsistency checkingReliability testing

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared

Create your own prompt vault and start sharing

Multi-Run Accuracy Validation | Promptsy