πŸ‡ Better Batches with PyTorchText BucketIterator

How to use PyTorchText BucketIterator to sort text data for better batching.

What should I know for this notebook?

How to use this notebook?

Dataset

Coding

Downloads

Installs

Imports

Using PyTorch Dataset

Dataset Class

Train β€” Validation Datasets

PyTorch DataLoader

PyTorchText Bucket Iterator Dataloader

Compare DataLoaders

PyTorch DataLoaderBatch size: 10LABEL LENGTH TEXT
pos 1037 Fascinating movie, based on a true story, about an...
neg 1406 Or maybe that's what it feels like. Anyway, "The B...
pos 679 Far by my most second favourite cartoon Spielberg ...
neg 922 This movie reminds me of "IrrΓ©versible (2002)", an...
pos 214 There's never a dull moment in this movie. Wonderf...
neg 1288 I don't think any player in Hollywood history last...
pos 605 The thing I remember most about this film is that ...
pos 1411 Fabulous, fantastic, probably Disney's best musica...
neg 604 Just another film that exploits gratuitous frontal...
pos 368 What can i say about the first film ever?<br /><br...

PyTorchText BuketIterator
Batch size: 10LABEL LENGTH TEXT
pos 609 That's My Bush is a live action project made by So...
neg 610 Terminus Paradis was exceptional, but "Niki ardele...
neg 612 Awesomely improbable and foolish potboiler that at...
pos 613 The events of September 11 2001 do not need extra ...
pos 613 Okay, first of all I got this movie as a Christmas...
neg 617 I have been known to fall asleep during films, but...
pos 625 Fragglerock is excellent in the way that Schindler...
neg 625 Sure I've seen bad movies in my life, but this one...
neg 626 Even 20+ years later, Ninja Mission stands out as ...
pos 626 This film is excellently paced, you never have to ...

Train Loop Examples

Batch examples lengths: [848, 848, 849, 849, 850, 852, 853, 854, 856, 857] 
Batch examples lengths: [779, 780, 780, 781, 781, 782, 782, 782, 783, 784]
Batch examples lengths: [2100, 2103, 2104, 2109, 2114, 2135, 2147, 2151, 2158, 2164]
Batch examples lengths: [903, 905, 910, 910, 910, 910, 914, 915, 916, 919]
Batch examples lengths: [968, 968, 970, 970, 971, 972, 973, 975, 981, 982]
Batch examples lengths: [806, 806, 807, 807, 808, 809, 810, 810, 811, 811]
Batch examples lengths: [731, 733, 734, 735, 736, 736, 737, 737, 738, 739]
Batch examples lengths: [357, 357, 358, 361, 362, 362, 362, 364, 366, 371]
Batch examples lengths: [2330, 2335, 2337, 2350, 2351, 2353, 2367, 2374, 2376, 2383]
Batch examples lengths: [1916, 1920, 1921, 1936, 1951, 1953, 1967, 1970, 1981, 1985]
Batch examples lengths: [1395, 1398, 1399, 1402, 1403, 1412, 1412, 1413, 1414, 1414]

Using PyTorchText TabularDataset

Data to Files

/content/aclImdb/train 
pos Files: 100% |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12500/12500 [00:34<00:00, 367.26it/s]
neg Files: 100% |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12500/12500 [00:21<00:00, 573.00it/s]
/content/aclImdb/test
pos Files: 100% |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12500/12500 [00:11<00:00, 1075.80it/s]
neg Files: 100% |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12500/12500 [00:12<00:00, 1037.94it/s]

TabularDataset

PyTorchText Bucket Iterator Dataloader

Compare DataLoaders

PyTorch DataLoaderBatch size: 10LABEL LENGTH TEXT
neg 1205 This movie is bad news and I'm really surprised at...
pos 762 Micro-phonies is a classic Stooge short. The guys ...
pos 782 After becoming completely addicted to Six Feet Und...
neg 1708 You do realize that you've been watching the EXACT...
neg 2341 Okay, as a long time Disney fan, I really -hate- d...
neg 705 This movie is simply not worth the time or money s...
neg 1370 I'm sorry to say that there isn't really any way, ...
pos 681 Something about "Paulie" touched my heart as few m...
neg 1401 It really is that bad of a movie. My buddy rented ...
neg 656 my friend bought the movie for 5€ (its is not even...
PyTorchText BuketIteratorBatch size: 10LABEL LENGTH TEXT
pos 609 That's My Bush is a live action project made by So...
neg 610 Terminus Paradis was exceptional, but "Niki ardele...
neg 612 Awesomely improbable and foolish potboiler that at...
pos 613 Okay, first of all I got this movie as a Christmas...
neg 617 I have been known to fall asleep during films, but...
pos 625 Fragglerock is excellent in the way that Schindler...
neg 625 Sure I've seen bad movies in my life, but this one...
pos 626 This film is excellently paced, you never have to ...

Train Loop Examples

Batch examples lengths: [848, 848, 849, 849, 850, 852, 853, 854, 856, 857]
Batch examples lengths: [779, 780, 780, 781, 781, 782, 782, 782, 783, 784]
Batch examples lengths: [2100, 2103, 2104, 2109, 2114, 2135, 2147, 2151, 2158, 2164]
Batch examples lengths: [903, 905, 910, 910, 910, 910, 914, 915, 916, 919]
Batch examples lengths: [968, 968, 970, 970, 971, 972, 973, 975, 981, 981]
Batch examples lengths: [806, 806, 807, 807, 808, 809, 810, 810, 811, 811]
Batch examples lengths: [731, 733, 734, 735, 736, 736, 737, 737, 738, 739]
Batch examples lengths: [357, 357, 358, 361, 362, 362, 362, 364, 366, 371]
Batch examples lengths: [2330, 2335, 2337, 2350, 2351, 2353, 2367, 2374, 2376, 2381]
Batch examples lengths: [1916, 1920, 1921, 1936, 1951, 1953, 1967, 1970, 1981, 1985]
Batch examples lengths: [1395, 1398, 1399, 1402, 1403, 1412, 1412, 1413, 1414, 1414]

Final Note

PhD Computer Science πŸ‘¨β€πŸ’» | Working πŸ‹οΈ with love ❀️ on Deep Learning πŸ€– & Natural Language Processing πŸ—£οΈ.