1:1 Mentoring with Big Tech AI Engineers
Back
Infra · AI System Designstaff

Synthetic Data Generation & Curation

Synthetic Data Generation & Curation

Design a pipeline that generates synthetic training/eval data at scale — diverse, high-quality, and uncontaminated.

Key Requirements

  • Diversity engineering (taxonomy coverage, persona/param variation)
  • Quality filtering with verifiers where checkable + calibrated judge
  • Contamination checks against eval/benchmark sets (~0)
  • Dedup; model-collapse awareness (mix in real data)
  • Provenance/versioning; downstream improvement as the true metric

AI Review

0/5

Review me as:

Draw your design on the canvas before submitting.

Build your design, then submit for an AI-powered review with dimension scores, strengths, gaps, and actionable suggestions.



Comments (0)

Sign in to leave a comment

Loading comments...