Analyze this end-to-end latency breakdown and identify optimization opportunities. ## Request Trace {{trace_data}} ## Latency Breakdown - Embedding generation: {{embed_latency}}ms - Vector search: {{search_latency}}ms - Context assembly: {{context_latency}}ms - LLM inference: {{llm_latency}}ms - Post-processing: {{post_latency}}ms ## Target Latency {{target_latency}}ms For each component: 1. **Current vs Optimal**: What is the theoretical minimum? 2. **Bottleneck Analysis**: What causes the current latency? 3. **Quick Wins**: Optimizations achievable in <1 week 4. **Medium-term**: Optimizations requiring 2-4 weeks 5. **Strategic**: Architectural changes for long-term gains Prioritize by impact/effort ratio.
Latency Breakdown Analyzer
U
@
Systematically analyze end-to-end latency in ML pipelines to identify bottlenecks and prioritize optimization efforts by impact and implementation effort.
40 copies0 forks
Details
Category
AnalysisUse Cases
Latency analysisPerformance optimizationBottleneck detection
Works Best With
claude-sonnet-4-20250514gpt-4o
Created Shared