Home

Awesome

Colab-traiNNer

This repo uses a lot of code from victorca25/traiNNer, but this code is fundamentally different and is build with PyTorch Lightning. There are lots of differences in regards to architectures, optimizers, loss functions, augmentations, etc. It is worth looking into both. My main goal in this repo is to add as many features as possible. Be aware that the code is experimental.

Simply download the .ipynb and open it inside your Google Drive or click here and copy the file with "File > Save a copy to Drive..." into your Google Drive.

You can also use the code locally. Install commands for local usage:

pip install lion-pytorch pytorch-lightning==2.0.4 \
    git+https://github.com/vballoli/nfnets-pytorch \
    git+https://github.com/styler00dollar/BasicSR albumentations \
    IPython scipy pandas opencv-python pillow wget \
    tfrecord x-transformers adamp efficientnet_pytorch \
    tensorboardX vit-pytorch swin-transformer-pytorch madgrad \
    git+https://github.com/huggingface/pytorch-image-models pillow-avif-plugin \
    kornia omegaconf git+https://github.com/styler00dollar/pytorch_optimizer \
    git+https://github.com/huggingface/transformers gdown PyTurboJPEG wavemix pyiqa

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 --force-reinstall

(You need this specific piq version, or it won't work properly.)

Lots of stuff (mmcv, ninja, correlation-package, cupy, Adam8Bit) is optional and the requirements will depend on what you train how. Look into the Colab file for more details. For basic usage, the above commands should be sufficient.

Brief guide:

Configure paths in config.yaml. /content/drive/MyDrive/ is the path to your personal Google Drive folder. Be aware that all files inside Colab will be deleted once the Colab session closes and you should backup everything in your Google Drive. Do not store data in Colab if you want to have that later. Be also aware, that indent (the amount of spaces) is very important in the config file. Don't try to change that, or it will result in errors. Example config:

path:
    pretrain_model_G: "/content/drive/MyDrive/model.pth"
    pretrain_model_D: 
    checkpoint_path:
    checkpoint_save_path: '/content/drive/MyDrive/my_model/'
    validation_output_path: '/content/drive/MyDrive/my_model/val'
    log_path: '/content/drive/MyDrive/my_model/'

Choose the correct dataloader. They all have some specific purpose.

# training super resolution
mode: DS_lrhr
# training a video model
mode: DS_video
# there are some more, look into config.yaml

Uncomment the generator you want to use. Comments are done with #. Don't forget to comment other generators. The first line is usually just a comment, don't uncomment that. Only one generator should be uncommented at a time when you start training. If you want to use ESRGAN, then you would have this in your config file:

    # ESRGAN:
    netG: RRDB_net
    norm_type: null
    mode: CNA
    nf: 64
    nb: 23
    in_nc: 3 # of input image channels: 3 for RGB and 1 for grayscale
    out_nc: 3 # of output image channels: 3 for RGB and 1 for grayscale
    gc: 32
    group: 1
    convtype: Conv2D # Conv2D | PartialConv2D
    net_act: leakyrelu # swish | leakyrelu
    gaussian: false # true | false
    plus: false # true | false
    finalact: None #tanh # Test
    upsample_mode: 'upconv'
    nr: 3

The same applies to the discriminator, but the discriminator is optional. Uncomment one by removing #. Example:

    # resnet
    netD: resnet
    resnet_arch: resnet50 # resnet50, resnet101, resnet152
    num_classes: 1
    pretrain: True

Losses do have the extension _weight. If that is set to a value that is bigger than zero, then that loss will be active. Example:

    L1Loss_weight: 1

If you do not want to use specific losses, just leave the number 0. Finally, adjust lr, batch_size, HR_size, n_workers, gpus, scale and use_amp to desired values.

If you want to start training locally, just do python train.py. If you want to visualize the training loss with graphs, go into the log folder and use tensorboard --logdir .. It will create an URL that you can open in your browser. (Only works locally, download logs to do this if you use Colab.)

During training you will only save .ckpt files. If you want to extract the generator and discriminator as a .pth file, then you need to use this script.

I also created a dockerfile, which can be used to train with. Sadly it's very hard to make a perfect docker file, so there currently isn't a perfect one. They should cover nearly all dependencies.

# install docker, command for arch
yay -S docker nvidia-docker nvidia-container-toolkit
# Download from dockerhub
docker pull styler00dollar/trainner:latest
# Or build it yourself, put the dockerfile in a directory and run that inside that directory
docker build -t trainner:latest .
# run with a mounted folder, inside that folder, the folder Colab-traiNNer should be
# inside /workspace/tensorrt you can access all files then
docker run --privileged --gpus all -it --rm -v /path_to_own_folder/:/workspace/tensorrt trainner:latest
# docker may or may not need an extra parameter for shared memory
docker run --privileged --gpus all -it --rm -v /path_to_own_folder/:/workspace/tensorrt --shm-size 8G trainner:latest

If you have problems getting docker to start, try these commands and use the dockercommand again

# fixing docker errors
systemctl start docker
sudo chmod 666 /var/run/docker.sock