Streaming with Claude
Stream tokens as they arrive — critical for user-facing applications. Without streaming, users stare at a blank screen for 3–15 seconds. With streaming, they see the first token in <200ms.
Why Streaming Matters
| Metric | No Streaming | With Streaming |
|---|---|---|
| Time to first byte (TTFB) | 3-15s (full generation) | 100-300ms |
| Perceived latency | Same as actual latency | Near-instant |
| User abandonment | 40%+ after 5s | <5% |
| Memory on client | Buffer entire response | Process token by token |
How SSE (Server-Sent Events) Works
# SSE Protocol — one-way server → client over HTTP
# 1. Client opens persistent HTTP connection
# 2. Server pushes events as they happen
# 3. Client processes each event immediately
# Claude API sends these SSE event types:
# message_start → conversation metadata
# content_block_start → new text or tool_use block
# content_block_delta → incremental text token
# content_block_stop → block complete
# message_delta → stop_reason, usage
# message_stop → done
Basic Text Streaming
with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain agents"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Streaming with Tool Use (Agent Loop)
Continue Reading
This topic continues with more in-depth content, code examples, and diagrams. Sign up free to unlock the full guide with all 87 sections.
Sign Up Free to UnlockFree access · No credit card required