Home

Awesome

Waste datasets review

List of datasets with any kind of litter, garbage, waste and trash. Created during the detectwaste.ml project

Today, more than 300 million tons of plastic are produced annually. Plastic is everywhere and we constantly use it in our daily life.

The idea of detect waste project is to use Artificial Intelligence to detect plastic waste in the environment. Our solution will be applicable for video and photography. Our goal is to use AI for Good.

Visit majsylw/litter-detection-review to see broader review of papers, projects and other resources concering the problem of litter in an environment.

Contributing

Feel free to add issue with short description of new dataset or create a pull request - add the new dataset to the table or fill missing description.

Summary

NameNo. categoriesNo. subcategoriesNo. imagesAnnotationCommentWebsiteLicenseDescription
TrashCan 1.03347 212Instance-SegmentationUnderwater imageswebsiteFree for academic teaching/research use, must obtain JAMSTEC permission for commercial use.:heavy_check_mark:
Trash-ICRA193345 700DetectionUnderwater imageswebsiteFree for academic teaching/research use, must obtain JAMSTEC permission for commercial use.:heavy_check_mark:
TACO28601 500SegmentationWaste in the wildwebsiteMIT license:heavy_check_mark:
TACO bboxes760WIPDetectionWaste in the wildWIP?:heavy_check_mark:
UAVVaste1-772SegmentationDrone datasetgithubApache license:heavy_check_mark:
Trashnet6-2 527ClassificationClear backgroundgithubMIT license:heavy_check_mark:
WaDaBa8color,size, shape, or material4 000ClassificationPlastic dataset, clear backgroundwebsite?:heavy_check_mark:
GLASSENSE-VISION71362 000ClassificationHome-supplies, clear backgroundwebsite?:heavy_check_mark:
Waste Classification data2-~25 000ClassificationScraped from google searchkaggleCC BY-SA 4.0:heavy_check_mark:
Waste Classification Data v23-~27 500ClassificationScraped from google searchkaggleCC BY-SA 4.0:heavy_check_mark:
Waste Images from Sushi Restaurant16-500ClassificationClear backgroundkaggleDatabase: Open Database, Contents: © Original Authors:heavy_check_mark:
Open litter map11187> 100kMultilabel classificationWaste in the wildwebsite?:heavy_check_mark:
Litter24size, shape, or material~14 000DetectionWaste in the wild, paid licensewebsite?:heavy_check_mark:
Drinking Waste Classification4-9640DetectionClear background, (cans and bottles)kaggleCC0: Public Domain:heavy_check_mark:
waste_pictures34-~24 000ClassificationScraped from google searchkaggleUnknown:heavy_check_mark:
spotgarbage3-~2 400ClassificationScraped from Bing searchkaggle<br> githubCC0: Public Domain:heavy_check_mark:
DeepSeaWaste5-3 055ClassificationUnderwater imageskaggleUnknown:heavy_check_mark:
MJU-Waste v1.01-2475SegmentationPlain background, indoor RGBD imagesgithubMIT license:heavy_check_mark:
Domestic Trash Dataset10-> 9000Classification/DetectionWaste inn the wild, paid license, 250 images for freegithub?:heavy_check_mark:
Cigarette butt dataset1-2200DetectionWaste inn the wild, synthetic imageswebsiteNon-Commercial, Educational License Agreement:heavy_check_mark:
TrashBox72517785Classification/DetectionScraped from webgithub?:heavy_check_mark:
PortlandStateSingh5-11500Classification/DetectionOriginal photoswebsite?
TIDY9-304ClassificationOriginal photosgithubMIT license

Description

TrashCan 1.0

An Instance-Segmentation Labeled Dataset of Trash Observations

7212 images under 3 main categories: bio, trash, unknown. Categories:

Trash-ICRA19:

A Bounding Box Labeled Dataset of Underwater Tras 5,700 underwater images extracted from video https://jungseokhong.github.io/

Download: Directly from website https://conservancy.umn.edu/handle/11299/214366

TACO

Open dataset with 1500 images from 28 categories and 60 detailed sub-categories of waste in the wild. Annotations available in COCO-json.

Download: Directly from website http://tacodataset.org/

TACO bboxes

Additional hand-labelled annotations for images from TACO dataset. There are seven recognized waste categories:

Read more about it in the paper Deep learning-based waste detection in natural and urban environments,.

Download: Directly from detect waste repository

UAVVaste

Drone rubbish detection intelligent technology The UAVVaste dataset consists to date of 772 images and 3716 annotations. The main motivation for creation of the dataset was the lack of domain-specific data. The datasets that are widely used for object detection evaluation benchmarking. The dataset is made publicly available and is intended to be expanded.

Avaiable annotations for Detection and Segmentation https://github.com/UAVVaste/UAVVaste

Download: Directly from annotations json on github https://github.com/UAVVaste/UAVVaste

Trashnet

The dataset spans six classes: glass, paper, cardboard, plastic, metal, and trash. Currently, the dataset consists of 2527 images:

Download: Directly from github https://github.com/garythung/trashnet

also is known as Garbage Classification Data

The Garbage Classification Dataset contains 2467 images from 6 categories: cardboard (393), glass (491), metal (400), paper(584), plastic (472) and trash(127).

Download: Directly from kaggle https://www.kaggle.com/asdasdasasdas/garbage-classification

Plastic Waste DataBase of Images – WaDaBa

4000 images with detailed description of a plastic type (PET, PP, PE-HD...), object color, deformation level, dirtiness and others. [classification]

The object were put on the research position and next photographed with first and second type of light. There were series carried out of 10 photographs with differ in the angle of the turnover for every object (in the vertical axis). Next the object was damaged to varying degrees: small, medium and large. For each type of destruction have been made 10 photographs. So considering all variants for every object 40 photographs were taken, multiplying it by the number of objects, 4 000 of photographs were created in the database.

Download: Images free-to-download directly from website. Annotations available after signing license http://wadaba.pcz.pl/#download

GLASSENSE-VISION

Home-supplies classification. It is not strict litter dataset but it gathers over 2000 images with objects well-spareted from background. Covers 7 main categories of (Banknotes, Cereals, Medicines, Cans, Tomato sauces, Water bottle, Deodorant stick) and 136 subcategories.

Glassense-Vision is a set of data we acquired and annotated to the purpose of providing a quantitative and repeatable assessment of the proposed method. The dataset includes 7 different use cases, meaning different object categories, where for each one of them we provide training (reference images used also to build dictionaries) and test images. All images in the dataset are manually annotated. The different use cases (object categories) can be grouped in three main geometrical types:

Download: http://www.slipguru.unige.it/Data/glassense_vision/

Waste Classification data

Over 25k images already divided into training data - 22564 images and test data - 2513 images. Two main categories: Organic and recyclable

Download: Directly from kaggle https://www.kaggle.com/techsash/waste-classification-data

Waste Classification Data v2

A variation about the Waste Classification data: extended by the new category "N" - Nonrecyclable added.

Over 25k images already divided into training data - 22564 + 2508 (N) images and test data - 2513 images + new 397 from category nonrecyclable. Three main categories: Organic (O) and recyclable (R), and nonrecyclable (N). TRAIN folder contains 2508 images in the "N" directory. The TEST folder contains 397 images in the "N" directory.

Download: Directly from kaggle https://www.kaggle.com/sapal6/waste-classification-data-v2

Open litter map

The biggest dataset with over 100k images in total with 11 main categories and 187 subcategories.[multilabel] [classification] https://openlittermap.com/

Download: Only from json with scraper - detectwaste scraper

Litter

The Litter dataset contains 14k images with 20k annotations (bounding boxes) and 24 classes. Each class represents an object (cup), while subclasses determine its size, shape, or material (long paper cup/short paper cup).

Download: After buying a license https://www.imageannotation.ai/litter-dataset

Drinking Waste Classification

The dataset contains ~10k images grupped by 4 classes of drinking waste: Aluminium Cans, Glass bottles, PET (plastic) bottles and HDPE (plastic) Milk bottles. Pictures were taken with 12 MP phone camera as a part of final year Individual Project at University College London. The dataset used parts of manually collected images from TrashNet.

Download: Directly from kaggle https://www.kaggle.com/arkadiyhacks/drinking-waste-classification

waste_pictures

The dataset contains ~24k images grupped by 34 classes of waste for classification purposes. The images were divided into train and test subsets.

Download: Directly from kaggle https://www.kaggle.com/wangziang/waste-pictures

spotgarbage - GINI dataset

The Garbage in Images (GINI) dataset with 2561 images with unspecified resolution, 1496 images were annotated by bounding boxes (one class - trash). Bing Image Search API was used to create their dataset.

Download: Directly from github https://github.com/spotgarbage/spotgarbage-GINI

DeepSeaWaste

This dataset consists of ~3k images divided by 4 categories, and taken under water. In csv file annotations were provided as:

Download: Directly from kaggle https://www.kaggle.com/henryhaefliger/deepseawaste

MJU-Waste v1.0

This dataset was created by capture collected waste items from a university campus in a lab background (people hold waste items in their hands). All images in the dataset are captured using a Microsoft Kinect RGBD camera. All annotations are provided in PASCAL VOC and COCO format.

MJU-Waste v1, contains 2475 co-registered RGB and depth image pairs. Images are randomly splited into a training set, a validation set and a test set of 1485, 248 and 742 images, respectively. Authors used single class label for all waste objects.

Download: From Google Drive link placed on https://github.com/realwecan/mju-waste/

Domestic Trash Dataset

Domestic Trash Dataset consists of images of domestic common trash objects. Images were captured and crowdsourced under wide variety of lighting conditions, weather, indoor and outdoor. This dataset can be used for make trash/litter detection models, eco-friendly alternative suggestions, carbon footprint generation etc.

Dataset Features

Dataset Format

Download Images available for download after buying a license. Contact them from their support details at: https://github.com/datacluster-labs/Datacluster-Datasets

Cigarette butt dataset

This dataset consists of a set of 2200 synthetically composed images of cigarettes on the ground. It is designed for training CNNs (convolutional neural networks). You must read and accept the terms of the Non-Commercial, Educational License Agreement to download and use its content.

Dataset Features

Download Images available for download after accepting the terms of the Non-Commercial, Educational License Agreement at: https://www.immersivelimit.com/datasets/cigarette-butts

TrashBox dataset

Dataset of trash objects for waste classification and detection (no detection annotations provided in repository). Contains 17785 waste object images scraped from web.

Waste categories are as follows:

  1. Medical waste : Syringes, Surgical Gloves, Surgical Masks, Medicines( Drugs and Pills) [Number of images: 2010]
  2. E-Waste : Electronic chips, Laptops and Smartphones, Applicances, Electric wires, cords and cables [Number of images: 2883]
  3. Plastic : Bags, Bottles, Containers, Cups, Cigarette Butts (which have a plastic filter) [Number of images: 2669]
  4. Paper : Tetra Pak, News Papers, Paper Cups, Paper Tissues [Number of images: 2695]
  5. Metal : Beverage Cans, Cnostruction Scrap, Spray Cans, Food Grade Cans, Other metal objects. [Number of images: 2586]
  6. Glass [Number of images: 2528]
  7. Cardboard [Number of images: 2414]

Download Images are available for download at github repository: nikhilvenkatkumsetty/TrashBox

<img src="https://github.com/nikhilvenkatkumsetty/TrashBox/blob/main/Trash_dataset/e-waste/e-waste%201.jpg" width="300"> <img src="https://github.com/nikhilvenkatkumsetty/TrashBox/blob/main/Trash_dataset/plastic/plastic%201001.jpg" width="300"> <img src="https://github.com/nikhilvenkatkumsetty/TrashBox/blob/main/Trash_dataset/medical/medical%201005.jpg" width="300">