Analyze this production outage step by step. Incident timeline: {{timeline}} Affected services: {{services}} Error logs: {{error_logs}} Recent changes: {{recent_deployments}} Think through this systematically: 1. TIMELINE ANALYSIS: What was the first anomaly? When did symptoms start? 2. CHANGE CORRELATION: What deployments or changes happened within 24 hours? 3. DEPENDENCY MAPPING: Which upstream/downstream services were affected? 4. ERROR PATTERN: What do the error logs reveal about the failure mode? 5. HYPOTHESIS FORMATION: What are the top 3 possible root causes? 6. EVIDENCE EVALUATION: Which hypothesis has the strongest supporting evidence? 7. ROOT CAUSE: What is the most likely root cause? 8. CONTRIBUTING FACTORS: What made the system vulnerable? 9. PREVENTION: How can we prevent recurrence?
Production Outage Root Cause Analysis
Systematically analyze a production outage to find the root cause.
0 copies0 forks
Share this prompt:
Details
Category
CodingUse Cases
Post-incident analysis workflowBlameless postmortem facilitationIncident learning extraction
Works Best With
claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Updated Shared