Home

Awesome

augmeNNt

This repository is intended first as a faster drop-in replacement of Pytorch's Torchvision default augmentations in the "transforms" package, based on NumPy and OpenCV (PIL-free) for computer vision pipelines. Additionally, many useful functions and augmentations for image to image translation, super-resolution and restoration (deblur, denoise, etc) are also available.

Supported Augmentations

Most functions from the original Torchvision transforms are reimplemented, with some considerations:

  1. ToPILImage is not implemented or needed, we use OpenCV instead (ToCVImage). However, the original ToPILImage in ~transforms can be used to save the tensor as a PIL image if required. Once transformed into tensor format, images have RGB channel order in both cases.
  2. OpenCV images are Numpy arrays. OpenCV supports uint8, int8, uint16, int16, int32, float32, float64. Certain operations (like cv.CvtColor()) do require to convert the arrays to OpenCV type (with cv.fromarray()).
  3. The affine transform in the original one only has 5 degrees of freedom, YU-Zhiyang implemented an Affine transform with 6 degress of freedom called RandomAffine6 (can be found in transforms.py). The original method RandomAffine is also available and reimplemented with OpenCV.
  4. The rotate function is clockwise, however the original one is anticlockwise.
  5. Some new augmentations have been added, in comparison to Torchvision's, refer to the list below.
  6. The outputs of the OpenCV versions are almost the same as the original one's (it's possible to test by running test.py) directly with test images.

These are the basic transforms, equivalent to torchvision's:

The additional transforms can be used to train models such as Noise2Noise, BSRGAN, Real-ESRGAN, White-box Cartoonization and EdgeConnect, among others. There are some general augmentations:

Noise augmentations, with options for artificial noises and realistic noise generation:

Blurs and different kind of kernels generation and use, with standard blurs, isotropic and anisotropic Gaussian filters and simple and complex motion blur kernels:

Filters to modify the images, including color quantization, superpixel segmentation and CLAHE:

Edge filters:

Requirements

Optional requirements

In order to use the additional Superpixel options (skimage SLIC and Felzenszwalb algorithms), segments reduction algorithms (selective search and RAG merging), the Menon demosaicing algorithm and the sinc filter, there are additional requirements:

Usage

  1. git clone https://github.com/victorca25/augmennt.git .
  2. Add augmennt to your python path.
  3. Add from augmennt import augmennt as transforms in your python file.
  4. From here, almost everything should work exactly as the original transforms.

Example: Image resizing

import numpy as np
from augmennt import augmennt as transforms
image = np.random.randint(low=0, high=255, size=(1024, 2048, 3))
resize = transforms.Resize(size=(256,256))
image = resize(image)

Should be 1.5 to 10 times faster than PIL. See benchmarks

Example: Composing transformations

transform = transforms.Compose([
   transforms.RandomAffine(degrees=10, translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 0)),
   transforms.Resize(size=(350, 350), interpolation="BILINEAR"),
   transforms.ToTensor(),
   transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

More examples can be found in the official Pytorch tutorials.

Attention

The multiprocessing used in Pytorch's dataloader may have issues with lambda functions (using Lambda in transforms.py) in Windows, as lambda functions can't be pickled (https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled). This issue also happens with Torchvision's Lambda function.

These issues happen when using, num_workers > 0 in a Pytorch DataLoader class when the transformations are initialized in the class init. The issue can be prevented either by using proper functions (not lambda) when composing the transformations or by initializing it in the DataLoader call instead.

Performance

The following are the performance tests as executed by jbohnslav.

resize random crop change brightness change brightness and contrast change contrast only random horizontal flips

The changes start to add up when you compose multiple transformations together. composed transformations

Compared to regular Pillow, cv2 is around three times faster than PIL, as shown in this article.

Additionally, the Albumentations project, mostly based on Numpy and OpenCV also has shown better performance than other options, including torchvision with a fast Pillow-SIMD backend.

But it can also be the case that Pillow-SIMD can be faster in some cases, as tested in this article

Alternatives

There are multiple image augmentation and manipulation frameworks available, each with its own strengths and limitations. Some of these alternatives are:

Postscript