LLM & Agentic

Streaming with Claude

Stream Claude API responses for real-time UX: server-sent events, token-by-token rendering, and production streaming patterns.

Last updated 2026-06-11

Streaming with Claude

Stream tokens as they arrive — critical for user-facing applications. Without streaming, users stare at a blank screen for 3–15 seconds. With streaming, they see the first token in <200ms.

Streaming vs. blocking — time-to-first-token is the UX that matters

LLM Lifecycle

Preview

Complete lifecycle of large language models from pre-training through fine-tuning, RLHF, and deployment — with architecture diagrams and production considerations.

How LLMs Call Tools

Preview

How LLMs use function calling and tool use — the mechanics behind tool-calling agents, from prompt engineering to structured output.

When to Fine-Tune: The Decision Framework

Preview

Should you fine-tune at all? A structured decision framework for prompt engineering vs RAG vs fine-tuning.

LoRA, QLoRA & PEFT Methods

Preview

How LoRA works inside transformer layers, QLoRA for memory-efficient training, and the full PEFT method comparison with code examples.

Get full access to all 87+ sections with code examples, diagrams, and interactive animations.

Unlock Premium

Streaming with Claude

More in LLM & Agentic

LLM Lifecycle

How LLMs Call Tools

When to Fine-Tune: The Decision Framework

LoRA, QLoRA & PEFT Methods