Awesome

SYNLA+ Dataset

Successor to the original Synthetic Line Art (SYNLA) Dataset.

Improvements:

Huge dataset (~10GB), 65536 high-quality 256x256 color images
Color gradients for lines and background
Improved data augmentation
Linear/correct color blending
Better resampling and reduced artifacts
Contains real images as background (DIV2K + random anime images)

Example Example Example Example Example Example Example Example Example

Fair Use

<ins>The background source images used to generate this dataset may or may not be copyrighted</ins>, however their use are justfied by:
Canadian Copyright Act (R.S.C., 1985, c. C-42), 29 - Fair dealing for the purpose of research, private study, education, parody or satire does not infringe copyright. (Country of issue)

U.S. Code Title 17. - COPYRIGHTS (17 U.S. Code § 107), [...] the fair use of a copyrighted work, [...] for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. (Country of provider)

The purpose of use is for nonprofit educational/research purposes;
Original images cannot be recovered without significant effort and redrawing from this dataset, which makes it nonrepresentative of the original work.
Effort is made to use the least amount of each copyrighted work as possible. The goal is to have a large variance on the images' content, thus a very small amount of many individual work was used.
There are no public alternatives for high quality line art datasets.
It is easier to distribute the original work as intact images rather than distributing them within this dataset. The impact of the dataset's distribution on the original work is minimal and the dataset does not facilitate/promote unauthorized distribution of originals.
As the dataset obfuscates large amounts of the original images, negative financial/market impact on the artist/creator is minimal.

Description

This dataset is designed to simulate complex line art. Useful for training machine learning models which perform any of the following:

Super-Resolution/Deblurring
Denoising
Artifact removal (de-ringing, non-gaussian degradation, etc.)
Inpainting
User-Guided Colorization
Style Transfer
And more...

Most line art are licensed and have copyright. Using private datasets discourages reproducibility of results. This dataset offers an open alternative and is released under MIT license.

Three color datasets are available. The full dataset contains 65536 (2^16) images of size 256x256. All images were generated using images in the folder /Generator_Images, which is also public, allowing custom generation. Smaller preview datasets (1024 and 4096 images) are also available. They are mutually exclusive with the full dataset can be used as validation/test datasets.

The code used to generate the images is not yet public.