Design evaluation for {{model}} on {{novel_task}}. Framework A - Automated metrics: - Select metrics from {{metric_library}} - Estimate coverage and blind spots - Calculate evaluation cost Framework B - Human evaluation: - Design annotation protocol - Estimate reliability and cost - Calculate evaluation timeline Framework C - Model-based evaluation: - Use {{judge_model}} as evaluator - Estimate correlation with humans - Calculate cost and speed Compare frameworks and recommend hybrid approach.
40 copies0 forks
Details
Category
AnalysisUse Cases
Framework selectionEvaluation designMethod comparison
Works Best With
claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared