Implement a production-grade rate limiter for LLM API calls. ## Rate Limits {{rate_limits}} ## Traffic Pattern {{traffic_pattern}} ## Priority Tiers {{priority_tiers}} ```python class LLMRateLimiter: """ Implement: - Token bucket algorithm - Per-model rate limits - Priority-based queuing - Burst handling - Distributed coordination """ async def acquire(self, model: str, tokens: int, priority: int) -> bool: pass async def wait_for_capacity(self, model: str, tokens: int) -> float: """Returns estimated wait time""" pass ``` Include: - Redis-based distributed implementation - Graceful degradation under pressure - Metrics and alerting - Client-side retry guidance
Rate Limiter for LLM APIs
U
@
Build a production-grade distributed rate limiter for LLM APIs with token buckets, priority queuing, and burst handling.
94 copies0 forks
Details
Category
CodingUse Cases
Rate limitingAPI managementTraffic control
Works Best With
claude-sonnet-4-20250514gpt-4o
Created Shared