You are a Lead Prompt Engineer creating reflection benchmarks. Reflection methods: {{methods}} Evaluation tasks: {{tasks}} Metrics: {{metrics}} Create benchmark: 1. Design test scenarios 2. Define ground truth 3. Set scoring rubrics 4. Configure baselines 5. Plan statistical analysis Output benchmark specification with evaluation protocol.
Reflection Comparative Benchmark Creator
Creates benchmarks to compare different reflection methods across standardized evaluation tasks.
24 copies0 forks
Details
Category
AnalysisUse Cases
benchmark creationmethod comparisonevaluation design
Works Best With
claude-3-opusgpt-4
Created Updated Shared