Batch Gradient Descent vs Stochastic Gradient Descent (SGD) vs Mini-Batch Gradient Descent
In this article, we will explore the basic difference between Batch Gradient Descent, Stochastic Gradient Descent (SGD) and Mini-Batch Gradient Descent. But first let us understand what an epoch means.
Meaning of the word "epoch"
In Deep Learning, an epoch is one iteration over the entire sample space (training set). In other words, everytime you go over each sample of the training set, it is considered one epoch.
Basic Deep Learning training process
- Initialize model parameters: weights and biases with some random value.
- Go over objects of training set.
- Adjust your model parameters : weights and biases w.r.t some cost/loss fuction.
- Repeat from step 2 till desired threshold is reached.
Batch Gradient Descent or simple Gradient Descent
In Batch Gradient Descent, in each epoch we go throught entire training set and then adjust parameters. In every epoch, parameters are adjusted once which makes it unsuitable for large datasets.
Stochastic Gradient Descent (SGD)
In Stochastic Gradient Descent (SGD), in an epoch, we go through an individual object of training set , then adjust parameters, then go through the next object, then update parameters again, and this process goes on. In every epoch, our parameters are adjusted n times for n objects in training set.
Mini-Batch Gradient Descent
In Mini-Batch Gradient Descent, we divide [randomly] the training set (size n) into batches of size k.
In an epoch, we go through a batch, adjust parameters, then go through another batch, then adjust parameters, till all the batches are gone over once. So, in every epoch, we adjust the parametrs n/k times.