Dear Yaguang,

Thank you for your interest in my article! You brought up a very interesting point with which I had issues myself before.

You can check this PR comment that explains the necessity of left padding for GPT2:

This shows that there is no conflict with my tutorial.

Also, the padding on the left is done on the text sequence and the position embedding ignores the padding at the beginning to make sure the model behaves normally. Generative models use the last token embedding from the input sequence to generate next token so the left padding will preserve more information from the sequence. If you right pad you will pass a padding embedding to predict next token which does not make sense.

The paragraph you mention is correct but is from a scientific standpoint and not on from a implementation standpoint.



George Mihaila

PhD Computer Science πŸ‘¨β€πŸ’» | Working πŸ‹οΈ with love ❀️ on Deep Learning πŸ€– & Natural Language Processing πŸ—£οΈ.