Bias vs Variance

2022, June, 01

ExplainerData Science

Bias and Variance are measures of how well a machine learning model is performing.

Model is trained with a dataset of inputs(x_train) & outputs(y_train). Training data is "seen" by the model. After training model can do predictions (y_prediction).

Bias

If model is given x_train, the one that it had already seen during input, it will generate predictions. We take the average of absolute difference of real data(y_train) & predictions. This error difference between real and predicted of seen/training data is called Bias.

When the bias is high we can say the model has not learnt from the input dataset properly and may fail to make good predictions. Model is underfitting the training data.

But if the bias is very low, then the model may have memorized the input data and it may fail to make good predictions for any unseen inputs. Model is overfitting the training data.

Variance

Once model is trained we can test it using an unseen dataset (x_test, y_test).

When model is given x_test, it will generate predictions. We take the average of absolute difference of known test results(y_test) & predictions. This error difference between known and predicted values of unseen/test data is called Variance.


Reference

bias variance and how they are related to underfitting overfitting