Thank you for such a great tutorial Piotr!
So if batch size is 1 and we use each example why is it stochastic? Do we randomly shuffle data after each epoch to make it stochastic? Normally, SGD should be faster than regular GD.
Thank you!
Thank you for such a great tutorial Piotr!
So if batch size is 1 and we use each example why is it stochastic? Do we randomly shuffle data after each epoch to make it stochastic? Normally, SGD should be faster than regular GD.
Thank you!
PhD Computer Science 👨💻 | Working 🏋️ with love ❤️ on Deep Learning 🤖 & Natural Language Processing 🗣️.