CHAPTER 16 · LLM Training Data Pipeline This chapter contains the following subtopics: 01 · Data Format Standards 02 · JSONL Generation 03 · Quality Checks 04 · Full Dataset Assembly 05 · Contamination and Dedup Audits 06 · Documentation and Governance 07 · Data Mixture Optimization ← Previous 13 · Serving and Systems Tradeoffs Next → 01 · Data Format Standards