Glossary Model Training 1 min read

Training Data

Also known as: Training Dataset, Training Corpus, Training Set

The curated dataset used to train machine learning models, whose quality, diversity, size, and representativeness directly determine the model's capabilities and limitations.

Dataset, training set, labeled data, data collection, data preprocessing, data augmentation, data quality, data curation, synthetic data, benchmark dataset, data bias, data annotation, data pipeline, corpus, data governance

Sources & References

1 Research
2
Datasheets for Datasets

Microsoft Research

Research