Analyze this end-to-end latency breakdown and identify optimization opportunities. ## Request Trace {{trace_data}} ## Latency Breakdown - Embedding generation: {{embed_latency}}ms - Vector search: {{search_latency}}ms - Context assembly: {{context_latency}}ms - LLM inference: {{llm_latency}}ms - Post-processing: {{post_latency}}ms ## Target Latency {{target_latency}}ms For each component: 1. **Current vs Optimal**: What is the theoretical minimum? 2. **Bottleneck Analysis**: What causes the current latency? 3. **Quick Wins**: Optimizations achievable in <1 week 4. **Medium-term**: Optimizations requiring 2-4 weeks 5. **Strategic**: Architectural changes for long-term gains Prioritize by impact/effort ratio.
Latency Breakdown Analyzer
Systematically analyze end-to-end latency in ML pipelines to identify bottlenecks and prioritize optimization efforts by impact and implementation effort.
40 copies0 forks
Share this prompt:
Details
Category
AnalysisUse Cases
Latency analysisPerformance optimizationBottleneck detection
Works Best With
claude-sonnet-4-20250514gpt-4o
Created Updated Shared