Explore AI Engineering Topics

Complete lifecycle of large language models from pre-training through fine-tuning, RLHF, and deployment — with architecture diagrams and production considerations.

Fine-Tuning Framework

When and how to fine-tune LLMs: LoRA, QLoRA, full fine-tuning decision framework with cost analysis and real-world examples.

How LLMs Call Tools

How LLMs use function calling and tool use — the mechanics behind tool-calling agents, from prompt engineering to structured output.

Anatomy of a Tool Call

Step-by-step breakdown of an LLM tool call: request, schema validation, execution, and result handling with code examples.

Stateless vs Stateful

Stateless vs stateful LLM architectures: trade-offs for agent design, conversation management, and production deployment.

The Agent Loop

Build a complete tool-calling AI agent in 15 lines of Python. Understand the core agent loop pattern that powers all LLM agents.

Agentic Spectrum

The spectrum of AI agent architectures from simple prompt-response to fully autonomous multi-agent systems, with trade-offs at each level.

ReAct Pattern

ReAct (Reasoning + Acting) pattern for AI agents: how to combine chain-of-thought reasoning with tool use for better agent performance.

Self-Reflection

Reflexion pattern for self-improving AI agents: how agents evaluate their own outputs and iteratively refine responses.

Hierarchical Delegation

Hierarchical delegation pattern for multi-agent systems: orchestrating specialized agents with a coordinator for complex tasks.

Planner-Executor

Planner-Executor agent pattern: separating planning and execution phases for more reliable and debuggable AI agent workflows.

Memory & State

Memory and state management for AI agents: short-term, long-term, and episodic memory patterns with vector store integration.

Messages API

Claude Messages API deep dive: request/response format, system prompts, multi-turn conversations, and best practices.

Tool Use with Claude

Implement tool use with Claude API: define tools, handle tool calls, and build reliable function-calling agents.

Streaming with Claude

Stream Claude API responses for real-time UX: server-sent events, token-by-token rendering, and production streaming patterns.

Structured Output

Get structured JSON output from Claude: constrained generation, schema validation, and reliable data extraction patterns.

Agent SDK Patterns

Production patterns for building AI agents with the Claude Agent SDK: guardrails, handoffs, tool orchestration, and error handling.

Metrics Framework

Observability metrics for LLM applications: latency, token usage, cost tracking, and quality scoring dashboards.

Eval & Observability

Complete guide to LLM evaluation and observability: automated evals, human feedback loops, A/B testing, and monitoring.

Agentic Evaluation

Evaluate AI agents in production: task completion metrics, trajectory analysis, and automated agent quality benchmarks.

RAG & MCP

20 topics

RAG Architecture

Retrieval-Augmented Generation (RAG) architecture explained: ingestion pipeline, vector search, prompt augmentation, and production patterns.

Document Processing

Document processing for RAG pipelines: PDF parsing, OCR, table extraction, and multi-modal document understanding.

Chunking Strategies

Text chunking strategies for RAG: fixed-size, semantic, recursive, and document-aware chunking with performance comparisons.

Embedding & Indexing

Embedding models and vector indexing for RAG: choosing embeddings, HNSW vs IVF, dimensionality, and index optimization.

Embeddings

Understanding embeddings for AI applications: text, image, and multi-modal embeddings with similarity search and clustering.

Metadata Strategies

Metadata strategies for RAG: filtering, hybrid search, metadata extraction, and structured metadata for improved retrieval.

Retrieval & Reranking

Advanced retrieval and reranking for RAG: BM25, dense retrieval, cross-encoder reranking, and hybrid search strategies.

RAG Evaluation

Evaluate RAG system quality: retrieval precision/recall, answer faithfulness, and end-to-end pipeline benchmarking.

RAGAS Framework

RAGAS evaluation framework for RAG: faithfulness, answer relevancy, context precision, and automated quality scoring.

RAG Monitoring

Production monitoring for RAG systems: retrieval quality dashboards, drift detection, and automated alerting.

Advanced RAG Patterns

Advanced RAG techniques: query decomposition, self-RAG, corrective RAG, adaptive retrieval, and multi-hop reasoning.

RAG Best Practices

Production RAG best practices: pipeline optimization, failure handling, testing strategies, and common pitfalls to avoid.

MCP Overview

Model Context Protocol (MCP) explained: the open standard for connecting AI models to tools, data sources, and external systems.

MCP Architecture

MCP architecture deep dive: client-server model, protocol layers, message types, and connection lifecycle.

Building MCP Servers

Build MCP servers step-by-step: Python and TypeScript implementations with tools, resources, and prompts.

MCP Transport

MCP transport layers: stdio, SSE, and streamable HTTP transports with implementation details and trade-offs.

MCP Discovery

MCP tool discovery and capability negotiation: how clients discover server capabilities and tools dynamically.

MCP Security

Security considerations for MCP: authentication, authorization, input validation, and sandboxing strategies.

MCP in Production

Deploy MCP servers in production: scaling, monitoring, error handling, and reliability patterns.

MCP on GCP

Run MCP on Google Cloud Platform: Cloud Run deployment, IAM integration, and GCP-native tool implementations.

System Design

21 topics

GCP Reference Architecture

GCP reference architecture for AI applications: Vertex AI, Cloud Run, Pub/Sub, and BigQuery integration patterns.

5-Phase Framework

Five-phase system design framework for AI interviews: requirements, architecture, data flow, scaling, and production readiness.

10-Layer Architecture

Staff-level 10-layer architecture for AI-native systems: from infrastructure to user experience, with production examples.

Scaling 10k to 1M

Scale AI systems from 10K to 1M users: caching, sharding, async processing, and infrastructure evolution strategies.

Reliability & Scale

Reliability and production patterns for AI systems: circuit breakers, graceful degradation, and SRE practices.

Security Overview

Security and privacy for AI applications: threat models, data protection, compliance frameworks, and defense-in-depth.

Guardrails & Safety

AI guardrails and safety: content filtering, output validation, safety classifiers, and responsible AI deployment.

PII Detection

PII detection and redaction in LLM applications: entity recognition, masking strategies, and compliance automation.

Prompt Injection Defense

Defend against prompt injection attacks: detection techniques, input sanitization, and multi-layer defense strategies.

Multi-Tenant Isolation

Multi-tenant isolation for AI platforms: data separation, model isolation, rate limiting, and tenant-aware architectures.

Audit & Compliance

Audit logging and compliance for AI systems: SOC2, HIPAA, GDPR requirements, and automated compliance monitoring.

Semantic Caching

Semantic caching for LLM applications: reduce costs and latency by caching semantically similar queries with vector similarity.

Model Routing

Tiered model routing: route queries to the right model (GPT-4, Claude, Haiku) based on complexity, cost, and latency requirements.

Rate Limiting & Cost

Rate limiting and cost management for LLM APIs: token budgets, per-user quotas, and cost optimization strategies.

Inference Optimization

LLM inference optimization: batching, quantization, KV-cache, speculative decoding, and hardware selection.

Hallucination Detection

Detect and prevent LLM hallucinations: factuality checking, grounding verification, and off-brand content filtering.

Agent Failure Modes

Common AI agent failure modes: infinite loops, tool misuse, context window overflow, and recovery strategies.

Deployment & Rollout

Deploy and roll out AI systems: canary releases, feature flags, A/B testing, and safe rollback strategies.

Event-Driven Async

Event-driven async architectures for AI: message queues, webhook patterns, and asynchronous agent orchestration.

Data Flywheel

Build a data flywheel for AI products: feedback loops, continuous learning, and data-driven model improvement cycles.

Claude Agent SDK

Claude Agent SDK patterns: building production multi-agent systems with guardrails, handoffs, and tool orchestration.

Interview Prep

7 topics

Cost vs Latency

LLM inference cost vs latency trade-offs: optimization strategies for production AI systems with budget constraints.

Code Leakage Prevention

Prevent code and data leakage in LLM applications: sandboxing, output filtering, and secure coding practices.

Format & Rubric

AI engineering interview format and evaluation rubrics: what interviewers look for and how to structure your responses.

6-Step Process

Six-step process for acing AI engineering interviews: from clarification to trade-off analysis with real examples.

Discovery Questions

Discovery questions framework for AI interviews: how to ask the right clarifying questions before designing a system.

Communication

Communication structure for technical interviews: how to articulate your thinking clearly and concisely under pressure.

Pitfalls & Recovery

Common interview pitfalls and recovery strategies: how to handle mistakes, blanking, and difficult follow-up questions.

Python

5 topics

Python Idioms

Pythonic idioms and patterns every AI engineer should know: comprehensions, generators, context managers, and clean code.

Data Structures

Python data structures for coding interviews: lists, dicts, sets, heaps, deques, and their time complexity trade-offs.

Async Python (Quick)

Quick guide to async Python: asyncio basics, await patterns, and concurrent task execution for AI applications.

Async Python (Complete)

Complete async Python guide: event loops, coroutines, task groups, semaphores, and production async patterns.

Python Collections

Python collections module deep dive: Counter, defaultdict, OrderedDict, namedtuple, and deque with interview examples.