I was studying about Epochs , however it is not very clear when it comes to gradient descent and epochs. Following is my understanding about the two:
In case of gradient descent, the weights are updated step by step till the time it reaches to an optimal point. And this process involves one epoch where one epoch means that the entire training dataset is passed once to our gradient descent algorithm.
Now when it comes to increasing the number of epochs then does the gradient descent computation increases ? As gradient descent itself is an iterative process and now with passing of training data more number of times, the algorithm has to update the weights also recursively.
Could anyone explain me this process behind gradient descent and epochs? How exactly the algorithm works here.
Thanks in advance.
The batch size is a hyperparameter that defines the number of samples to work through before updating the internal model parameters.
Think of a batch as a for-loop iterating over one or more samples and making predictions. At the end of the batch, the predictions are compared to the expected output variables and an error is calculated. From this error, the update algorithm is used to improve the model, e.g. move down along the error gradient.
A training dataset can be divided into one or more batches.
When all training samples are used to create one batch, the learning algorithm is called batch gradient descent. When the batch is the size of one sample, the learning algorithm is called stochastic gradient descent. When the batch size is more than one sample and less than the size of the training dataset, the learning algorithm is called mini-batch gradient descent.
Batch Gradient Descent . Batch Size = Size of Training Set
Stochastic Gradient Descent . Batch Size = 1
Mini-Batch Gradient Descent . 1 < Batch Size < Size of Training Set
In the case of mini-batch gradient descent, popular batch sizes include 32, 64, and 128 samples. You may see these values used in models in the literature and in tutorials.