I think all your articles are very great and I hope this encourages you to continue writing more…

1 min readNov 16, 2018

I think all your articles are very great and I hope this encourages you to continue writing more awesome matterial :)

So, I did a little more research regarding SDG. Your respons is great and it’s the answer I was looking for. A few more comments for other readers:

when people mention SGD in training NN, it means data gets shuffled at each epoch.
batch size in NN, is where we specify how many examples to use at once to compute the gradients.
in classical SGD we would choose one example at random from our data and compute gradients only on that single example (that is why is faster). And normal GD we use the whole data (as a batch) to compute gradients (that is why is slower).

Stochastic gradient descent | Wikiwand

www.wikiwand.com

If I’m missing or misinterpret please let me know.

Thank you!