Test {{model}} consistency by running {{test_prompts}} multiple times with {{temperature_settings}}. Measure output variance, semantic similarity, and factual consistency. Flag high-variance responses requiring investigation.
78 copies0 forks
Details
Category
AnalysisUse Cases
Reliability testingConsistency validationVariance analysis
Works Best With
claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared