Thank you for such a great tutorial Piotr!

…h is equal to the whole dataset, then we are essentially dealing with an ordinary gradient descent. On the other hand, if the size is 1, then in each iteration we are using only one example from our dataset, as a consequence losing the benefits of vectorisation. This approach is sometimes justified and is called stochastic gradient descent. In reality we usually choose an intermediate value — typically selected from the range of 64 to 512…
How to train Neural Network faster with optimizers?
2.5K
10
Piotr Skalski
George Mihaila
·Follow
1 min read·
Nov 14, 2018
--
Thank you for such a great tutorial Piotr!
So if batch size is 1 and we use each example why is it stochastic? Do we randomly shuffle data after each epoch to make it stochastic? Normally, SGD should be faster than regular GD.
Thank you!
--
--
Written by George Mihaila141 Followers
·31 Following
PhD Computer Science 👨‍💻 | Working 🏋️ with love ❤️ on Deep Learning 🤖 & Natural Language Processing 🗣️.
Responses (1)
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams