Production Outage Root Cause Analysis

Daniel Okoye

@daniel-okoye

·December 31, 2025

Systematically analyze a production outage to find the root cause.

0 copies0 forks

Share this prompt:

Analyze this production outage step by step.

Incident timeline:
{{timeline}}

Affected services: {{services}}
Error logs: {{error_logs}}
Recent changes: {{recent_deployments}}

Think through this systematically:

1. TIMELINE ANALYSIS: What was the first anomaly? When did symptoms start?

2. CHANGE CORRELATION: What deployments or changes happened within 24 hours?

3. DEPENDENCY MAPPING: Which upstream/downstream services were affected?

4. ERROR PATTERN: What do the error logs reveal about the failure mode?

5. HYPOTHESIS FORMATION: What are the top 3 possible root causes?

6. EVIDENCE EVALUATION: Which hypothesis has the strongest supporting evidence?

7. ROOT CAUSE: What is the most likely root cause?

8. CONTRIBUTING FACTORS: What made the system vulnerable?

9. PREVENTION: How can we prevent recurrence?

Details

Category

Coding

Use Cases

Post-incident analysis workflowBlameless postmortem facilitationIncident learning extraction

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash

Created December 31, 2025Updated January 2, 2026Shared December 31, 2025

Related Prompts

Self-Consistency: Failure Root Cause Analysis

by @samira-el-masri

Multi-hypothesis root cause analysis using parallel reasoning chains

Root Cause Analysis Template

by @jordan-reyes

Conduct thorough root cause analysis using multiple frameworks

Reflection on Production Incident

by @samira-el-masri

Structured post-incident reflection covering detection, response, root cause, prevention, and process with actionable improvements.

Persona-Based Incident Analysis

by @samira-el-masri

Multi-perspective incident analysis from on-call, architect, product, and management viewpoints with consolidated action items.

Chain of Thought for Debugging

by @samira-el-masri

Apply chain-of-thought debugging with systematic symptom analysis, hypothesis generation, evidence collection, and root cause identification.

Production Incident Response Template

by @samira-el-masri

Structured incident response guidance for AI system production issues with immediate actions, investigation steps, and communication templates.

More from @daniel-okoye

Product Engineer Feature Design

Technical Estimate Accuracy Review

Dependency Upgrade Risk Assessment

Postmortem Analysis Quality Check

Create your own prompt vault and start sharing