Production Outage Root Cause Analysis

D

Daniel Okoye

@daniel-okoye

·

Systematically analyze a production outage to find the root cause.

0 copies0 forks
Share this prompt:
Analyze this production outage step by step.

Incident timeline:
{{timeline}}

Affected services: {{services}}
Error logs: {{error_logs}}
Recent changes: {{recent_deployments}}

Think through this systematically:

1. TIMELINE ANALYSIS: What was the first anomaly? When did symptoms start?

2. CHANGE CORRELATION: What deployments or changes happened within 24 hours?

3. DEPENDENCY MAPPING: Which upstream/downstream services were affected?

4. ERROR PATTERN: What do the error logs reveal about the failure mode?

5. HYPOTHESIS FORMATION: What are the top 3 possible root causes?

6. EVIDENCE EVALUATION: Which hypothesis has the strongest supporting evidence?

7. ROOT CAUSE: What is the most likely root cause?

8. CONTRIBUTING FACTORS: What made the system vulnerable?

9. PREVENTION: How can we prevent recurrence?

Details

Category

Coding

Use Cases

Post-incident analysis workflowBlameless postmortem facilitationIncident learning extraction

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Updated Shared

Related Prompts

Create your own prompt vault and start sharing