Skip to the content.
Home | Blog | Book Reading |About | All Courses

Machine Learning Cross-Validation


There are various pipelines for the machine learning use cases:

  • Data collection,
  • Feature engineering,
  • Feature selection,
  • Model creation, and
  • Model deployment.

So always remember before model creation what we do is

  • whenever we have a dataset (suppose I have 1000 records)
    • we usually perform a train test split saying that
      • Training set has 70% of data and
      • Test set have 30%
    • or
      • 80% train set and
      • 20% test set
    • depends on the count of the dataset.

So our model will use 70% of the data to only train the model itself and the remaining 30% we will use to check the accuracy.

So when we do train test split 70% of data will randomly select and 30% also randomly selected.

When this kind of random selection happens, the type of data present in the test may not be present in train set.

Due to this our model accuracy may go down.

To prevent above problem we have a concept called Cross Validation.

Please check the below mentioned links to know more about cross-validation. https://dslearningthoughts.blogspot.com/2021/10/machine-learning-cross-validation.html