Decompose comprehensive evaluation of {{model}} into subtasks. Break down into: 1. Accuracy evaluation subtasks - Task types to test - Metrics per task type 2. Safety evaluation subtasks - Risk categories to probe - Test case requirements 3. Performance evaluation subtasks - Latency tests - Throughput tests 4. Integration subtasks - API compatibility - Error handling For each subtask, specify inputs, outputs, and dependencies. Create execution schedule for {{timeline}}.
70 copies0 forks
Details
Category
AnalysisUse Cases
Evaluation decompositionTask planningProject breakdown
Works Best With
claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared