System Design

Rate Limiting & Cost

Rate limiting and cost management for LLM APIs: token budgets, per-user quotas, and cost optimization strategies.

Last updated 2026-06-12

SD-18

Rate Limiting & Cost

Requests, tokens, and dollars are three separate currencies — any one of them can veto a call. Four algorithms, a four-layer enforcement stack, and the five levers that shrink what you need to limit.

A classic API gateway counts requests per second and calls it done. An LLM gateway cannot: one request can carry 50 tokens or 500,000, cost $0.0003 or $3, and the provider meters you on all three axes at once, per model. So rate limiting for LLM systems is really budget enforcement in three currencies, with a fallback plan for every “no.” This section builds the real thing: the four classic algorithms with running code, a four-layer stack that enforces per-tenant quotas, and the cost math that decides how much limiting you even need.

Where you are

System Design 101

Free

System design fundamentals for AI engineers: client-server, APIs, latency percentiles, caching, load balancing, databases, and queues — each explained from zero, then mapped to how LLM systems change it.

AI System Design Vocabulary

Free

The 60-term plain-English glossary for AI system design: LLM basics, retrieval, agents, infrastructure, reliability, scaling, cost, and safety — with deep-dive links into every guide section.

Your First Agentic System

Free

Build a support bot end to end: six iterations from one API call to a production-shaped architecture with retrieval, caching, model routing, guardrails, and observability — runnable code at every step.

The Paradigm Shift

Free

Traditional vs agentic system design: the 7 dimensions that transform, anatomy of an agentic system, control flow paradigms, failure modes, and when to go agentic.

Get full access to all 87+ sections with code examples, diagrams, and interactive animations.

Unlock Premium

Rate Limiting & Cost

More in System Design

System Design 101

AI System Design Vocabulary

Your First Agentic System

The Paradigm Shift