×
Saturday, February 8, 2025

Bhusan Chettri describes overfitting & underfitting, the two prime factors that needs attention during Machine Learning

Last updated Monday, October 17, 2022 19:21 ET , Source: Bhusan Chettri

Bhusan Chettri a PhD graduate in Machine Learning for Voice Technology from QMUL, UK describes about overfitting & underfitting, two prime factors that needs attention during Machine Learning.

Temi, India, 10/17/2022 / SubmitMyPR /

“Machine learning (ML) is an art of teaching computers to solve problems by guiding them to map a given input into an appropriate output by exploiting patterns and relationships within training data”, says Bhusan Chettri a PhD graduate in Machine Learning for Voice Technology from QMUL, UK. In simple words a mapping function is learned that maps an input into an appropriate output guided by a loss function, often called objective function, that measures how well an algorithm is doing its intended job. In this document, Bhusan Chettri summarises the two dangers one may fall into during machine learning: overfitting and underfitting. These are the two key points that any ML practitioner (or a beginner) must account for in order to ensure that a trained ML model would show expected behaviour when deployed in real world applications. This article, taking a deep neural network based machine learning as an example, summarises the two concepts briefly and further outlines key steps to overcome them during machine learning.

ML models are data-driven. This means they are data hungry. A massive amount of data, therefore, is often required to train and test them before deploying in a production pipeline. ML engineers and researchers often believe that throwing as many data as possible into a ML pipeline yields better results. While this is true to some degree, quality of data is also a key in machine learning to ensure that trained model is neither biased nor exploited irrelevant cues in data (that may have accidently occurred may be during data collection process). Usually, having identified a ML problem (for any business or research) the next crucial step is acquiring data. One may purchase data or download them from www if freely available or setup data collection pipelines if such data is unavailable for purchase or download. Thus having colleced the dataset, one important tasks that ML practitioner needs to do is to partition the data into three disjoint subsets: training, development and test sets. The training dataset is used to learn model parameters – in simple terms the mapping function that maps input into appropriate output is learned from the training data. Development set is used for model selection. In other words, during the training process, performance is often tested on both training and development set, and based on how model perform on development set ML practitioners make judgement about when to stop training step and which model setting to chose for final use. This so called training a.k.a learning process happens in an iterative process. Due to computational constraints the training samples are shown to learning algorithm in steps.

Bhusan Chettri says “Expecting model to learn relevant patterns and solving a problem in a single step (by showing whole training data at one-go) is not right, and is not the best practise to follow”. Usually, deep neural networks, a kind of machine learning, are trained iteratively by showing them small group of data samples called mini-batch. The iterative learnig process is often referred as stochastic gradient descent. Here the term stochastic implies that such group of data samples are picked at random, and gradient descent simply means a method used to learn a suitable mapping function through optimisation of the objective function. As algorithm is run several times to find the optimal setting the performance on seen training examples starts to improve with every new iterations. Initially, the performance on both training and development data is poor. After certain iterations the performance on training and development set start to improve. Eventually after say few-hundred of steps/iterations development set performance stall while training set performance might reach a 100% accuracy. It means that the model has learned to perform a good job on the training data – the data it has seen during training. However, it performs poorly on the development set - a small fragment of disjoint data. The trained model is finally evaluated on a test set which is a large disjoint data set kept aside to test model performance.

Usually, the test set is designed in such a way so as to reflect real-world use case of deployment and therefore it follows a completely different distribution than those seen in training and development dataset. So what would one expect here? Does model perform well on unseen test set? Would this model show similar performance as it had shown on the training set? The answer is no. This model would show poor performance on the test set, as this model has been overtrained to learn noise in the training data. In simple words model has learned to memorise almost everything including irrelevant patterns in the training data, and hence it failed to show good performance on the development data as well. And, therefore it is obvious that it would show poor performance on the test set. The performance gap between training and development set is what ML practitioners are usually concerned. This phenomenon is called overfitting – models showing good performance on the seen data but failing to perform well on unseen data. Thus the model has poor generalisation capability. The fundamental problem therefore is the ability to train ML models that show good generalisations: that is minimizing the gapbetween performance on training and test datasets. Likewise, it often happens that the model being trained fails to even perform well on the training dataset even after undergoing several number of training iterations. This suggests that the model is unable to even learn the underlying patterns from training data. This means that the model is not powerfull or flexible enough to be able to discover patterns in the data. This phenomenon is called underfitting, which needs to be taken into account while building ML models. Therefore, a good ML model is one that neither overfits nor underfits on the training data.

A simple solution that is often effective to deal with underfitting is simply to increase the model complexity. Bhusan given an example, increasing the number of hidden layers or adding more units in a deep neural network solves theproblem. On the contrary, dealing with overfitting is something not very straightforward. Different solution approaches to combat overfitting are: (1) increasing the amount of training data; (2) reducing the model complexity, Bhusan given another example, reducing the size of the neural network; (3) adding regularisation techniques during model training, for example, L1 and L2 weight decay (reducing the magnitude of weight parameters of model); (4) adding dropout techniques: removing randomly some of the units in a neural network to force model to avoid memorising the patterns in training data and encouraging them to rather learn generalisable patterns.

Original Source of the original story >> Bhusan Chettri describes overfitting & underfitting, the two prime factors that needs attention during Machine Learning