Home

Awesome

A plug-in ImageNet DataLoader for PyTorch. Uses Tensorpack DataFlow's sequential loading to load fast even if you're using a HDD.

Install

Requirements:

If you use pip's editable install, you can tune the speed of the DataLoader on your system by modifying this code.

git clone https://github.com/BayesWatch/sequential-imagenet-dataloader.git
cd sequential-imagenet-dataloader
pip install -e .

Or install directly:

pip install git+https://github.com/BayesWatch/sequential-imagenet-dataloader.git

Preprocessing

Before being able to train anything, you have to run the preprocessing script preprocess_sequential.py to create the LMDB binary files. They will get put in the directory specified and they will take up 140G for train, plus more for val. Use the script with these arguments:

python preprocess_sequential.py <imagenet directory> <directory to save lmdb files>

Usage

Wherever the DataLoader is defined in your Pytorch code, replaced that with imagenet_seq.data.Loader; although you can't call it with exactly the same arguments. For an example, this would be the substitution in the PyTorch ImageNet example:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                  std=[0.229, 0.224, 0.225])
imagenet_transforms = transforms.Compose([
            transforms.RandomResizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ])

# train_dataset = datasets.ImageFolder(traindir, imagenet_transforms)
# train_loader = torch.utils.data.DataLoader(
#     train_dataset, batch_size=args.batch_size, shuffle=(train_sampler is None),
#    num_workers=args.workers, pin_memory=True, sampler=train_sampler)

train_loader = ImagenetLoader(args.data, 'train', imagenet_transforms,
        batch_size=args.batch_size, num_workers=args.workers, shuffle=True)

For a complete example, see the example ImageNet training script provided.

Experiments

Running the example ImageNet script on a workstation with a single Titan X using 4 workers.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:17:00.0 Off |                  N/A |
| 52%   80C    P2    85W / 250W |   8837MiB / 12196MiB |    100%      Default |
|                               |                      |                  N/A |

Comparing estimated hours to completion, each with 4 threads dedicated to workers loading the dataset:

Estimated times neglect the validation set. Real completion time for the experiment using LMDB as DataFlow with the code in this repository completed in 75 hours.

To check that this still converges to the benchmark accuracy (the shuffling is only local so may not match), I ran the experiment until completion using this DataLoader. The final validation top-1 accuracy was 69.76% and the charts detailing this experiment can be found here.