April 2026 Special Course - Optional Lecture 1: Datasets, Captions, and Data Filtering

AI Course

Current Price:

$3.00

Digital products are non-refundable after purchase.

Why it matters:
Model quality is not just architecture. Data quality, coverage, and filtering often matter just as much.

Topics:

  • Where image-text and video-text data come from

  • Caption generation

  • Deduplication

  • NSFW and copyright filtering

  • Why video data is harder than image data

  • Why hands, text rendering, and rare actions often fail

Suggested materials:

  • LAION

  • DataComp

  • WebVid-10M

  • HD-VILA

  • Data sections of SVD, Imagen Video, and HunyuanVideo reports