Model Evaluation Breakdown

U

@

·

Break down comprehensive model evaluation into subtasks.

70 copies0 forks
Decompose comprehensive evaluation of {{model}} into subtasks.

Break down into:
1. Accuracy evaluation subtasks
   - Task types to test
   - Metrics per task type
2. Safety evaluation subtasks
   - Risk categories to probe
   - Test case requirements
3. Performance evaluation subtasks
   - Latency tests
   - Throughput tests
4. Integration subtasks
   - API compatibility
   - Error handling

For each subtask, specify inputs, outputs, and dependencies. Create execution schedule for {{timeline}}.

Details

Category

Analysis

Use Cases

Evaluation decompositionTask planningProject breakdown

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared

Create your own prompt vault and start sharing