Awesome
🏷 Open Source Data Annotation & Labeling Tools
At ZenML we believe that annotation and labeling workflows are a core part of the machine learning lifecycle. As an open-source tool, we wanted to highlight and recognize the variety of tools that are available to help your workflows become more data-centric. We had three core criteria to decide whether a particular tool could make it into the list:
- The tool has an open-source licence.
- The tool is actively maintained.
- The tool is functional and fit for purpose.
We welcome contributions to this list, so if you know of a tool that we've missed or if you've built one yourself, please do create a PR!
🔥 Do you use these tools or do you want to add one to your MLOps stack? At ZenML, we are looking for design partnerships and collaboration to develop the integrations and workflows around using annotation within the MLOps lifecycle. If you'd like to learn more, please join our Slack and leave us a message!
Contents
Multi Modal / Multi Domain
Name | Description | License |
---|---|---|
Acharya | A Data Centric MLOps tool for your Named Entity Recognition projects | ? |
Adala | An Autonomous Data (Labeling) Agent framework. | Apache-2 |
Classifai | A comprehensive open-source data annotation platform | Apache-2 |
Computer Vision Annotation Tool (CVAT) | A free, online, interactive video and image annotation tool for computer vision | MIT |
Data Annotator for Machine Learning (DAML) | An application that helps machine learning teams facilitating the creation and management of annotations | Apache-2 |
DataGym | Open source annotation and labeling tool for image and video assets | MIT |
Diffgram | Training Data (Data Labeling, Annotation, Workflow) for all Data Types (Image, Video, 3D, Text, Geo, Audio, more) at scale | ELv2 |
Hover | Explore and label on a map of raw data. Handles text, audio and images. | MIT |
Label Studio | A multi-type data labeling and annotation tool with standardized output format | Apache-2 |
Pigeon | A simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort of your Jupyter notebook | Apache-2 |
QSL: Quick and Simple Labeler | A quick and simple tool for labeling images, videos and time series data, right from Jupyter | MIT |
Shoonya | Platform to Annotate and label data at scale | MIT |
Tator | Video analytics web platform | AGPL-3 |
TornadoAi | A human-in-the-loop machine learning framework | AGPL-3 |
Universal Data Tool | A web/desktop app for editing and annotating images, text, audio, documents and to view and edit any data defined in the extensible .udt.json and .udt.csv standard | MIT |
VGG Image Annotator (VIA) | A standalone image annotator application packaged as a single HTML file (< 400 KB) that runs on most modern web browsers | BSD-2 |
VIAME | Video and Image Analytics for Multiple Environments | Custom |
Xtreme1 | An all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM | Apache-2 |
Text
Name | Description | License |
---|---|---|
Annotation Lab | An NLP annotation tool included in spark-nlp | Apache-2 |
Argilla | A production-ready Python framework for exploring, annotating, and managing data in NLP projects | Apache-2 |
bulk | Bulk is a quick developer tool to apply some bulk labels | MIT |
CoreNLP | A Java suite of core NLP tools | GPL-3 |
DataQA | Labeling platform for text using weak supervision | GPL-3 |
doccano | An open source text annotation tool supporting text classification, sequence labeling and sequence to sequence tasks | MIT |
FLAT - FoLiA Linguistic Annotation Tool | A web-based linguistic annotation environment based around the FoLiA format, an XML-based format for linguistic annotation | GPL-3 |
INCEpTION | A semantic annotation platform offering intelligent annotation assistance and knowledge management | Apache-2 |
knodle | Knodle (Knowledge-supervised Deep Learning Framework) | Apache-2 |
Markup | A web-based document annotation tool, powered by GPT-4 | Unknown |
NER Annotator for Spacy | NER Annotator for SpaCy allows you to create training data for creating a custom NER Model with custom tags. | MIT |
NPLM | Noisy Partial Label Model(NPLM) | N/A |
Potato | An annotation framework with 20+ templates, editable UI, quality control, data management and an option to add a survey for crowdsourcing | PolyForm Shield |
refinery | The data scientist's open-source choice to scale, assess and maintain natural language data. | Apache-2 |
Slate | A Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python | ISC |
SMART | A tool for building labeled training datasets for supervised machine learning tasks in NLP | MIT |
SpaCy annotator | Spacy NER annotator using ipywidgets | N/A |
Small-Text | Active Learning for Text Classification | MIT |
Snorkel | Programmatically Build and Manage Training Data | Apache-2 |
skweak | skweak: Weak supervision for NLP | MIT |
TALEN | A way to do annotations for NER | Custom |
Theme | Minimalistic CLI labeling tool for text classification | MIT |
YEDDA | A lightweight collaborative text span annotation tool | Apache-2 |
WeaSEL | WeaSEL: Weakly Supervised End-to-end Learning | Apache-2 |
Images
Name | Description | License |
---|---|---|
3D Slicer | Visualization, processing, segmentation, registration, and analysis of medical, biomedical, and other 3D images and meshes | BSD |
Annotate Lab | Simplifying Image Annotation | MIT |
Annotorious | A JavaScript library for image annotation | BSD-3 |
AnyLabeling | Effortless AI-assisted data labeling with AI support from YOLO, Segment Anything, MobileSAM | GPL-3 |
autodistill | Images to inference with no labeling (use foundation models to train supervised models) | Apache-2 |
bbox-visualizer | Make drawing and labeling bounding boxes easy as cake | MIT |
Bounding Box Editor | A JavaFX desktop application for creating image-object-annotations with bounding boxes | GPL-3 |
CATMAID | The Collaborative Annotation Toolkit for Massive Amounts of Image Data | GPL-3 |
COCO Annotator | A web-based image segmentation tool for object detection, localization, and keypoints | MIT |
DeepLabel | A cross-platform desktop image annotation tool for machine learning | MIT |
ilastik | Segment, classify, track and count your cells or other experimental data | Custom |
ImageTagger | An open source online platform for collaborative image labeling | MIT |
imglab | A web based tool to label images for objects that can be used to train dlib or other object detectors | MIT |
KNOSSOS | A software tool for the visualization and annotation of 3D image data and was developed for the rapid reconstruction of neural morphology and connectivity | GPL-2 |
labelCloud | A lightweight tool for labeling 3D bounding boxes in point clouds | GPL-3 |
LabelFlow | An open platform for image labeling | Custom |
labelme | Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation) | Custom |
LabelImg | A graphical image annotation tool and label object bounding boxes in images | MIT |
LOST | A flexible web-based framework for semi-automatic image annotation | MIT |
Make Sense | A free-to-use online tool for labeling photos | GPL-3 |
MyVision | Computer vision based ML training data generation tool | GPL-3 |
OHIF Medical Imaging Viewer | OHIF zero-footprint DICOM viewer and oncology specific Lesion Tracker | MIT |
OpenLabeler | An open source desktop application for annotating objects for AI appplications | Apache-2 |
Pixano | A web-based smart-annotation tool for computer vision applications | CeCILL-C |
Scalabel | A web-based visual data annotation tool, supporting both 2D and 3D data labeling | Apache-2 |
webKnossos | A fully cloud- and browser-based 3D annotation tool for distributed large-scale data analysis in light- and electron-microscopy based Connectomics | AGPL-3 |
Yolo_Label | GUI for marking bounded boxes of objects in images for training neural network YOLO | MIT |
Video
Name | Description | License |
---|---|---|
DIVE | Media annotation and analysis tools for web and desktop | Apache-2 |
UltimateLabeling | A multi-purpose Video Labeling GUI in Python with integrated SOTA detector and tracker | MIT |
Audio
Name | Description | License |
---|---|---|
aubio | A library for audio and music analysis | GPL-3 |
audino | Open source audio annotation tool | MIT |
Praat | Annotation tool for phonetics analysis | GPL-3 |
Peaks.js | JavaScript UI component for interacting with audio waveforms | LGPL-3 |
Wavesurfer.js | Navigable waveform built on Web Audio and Canvas | BSD-3 |
Time Series
Name | Description | License |
---|---|---|
sktime | A framework for machine learning with time series | BSD-3 |
Other
Name | Description | License |
---|---|---|
Compose | Automated prediction engineering. Allows you to easily structure prediction problems and generate labels for supervised learning | BSD-3 |
Encord Active | Toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling | Apache-2 |
NeuroTrALE | Annotation software for brain mapping, supporting 3D imaging and annotation | BSD-2 |
OpenCRAVAT | A modular annotation tool for genomic variants | MIT |
PatchSorter | An open-source digital pathology tool for histologic object labeling | BSD-3 |
Personal Cancer Genome Reporter (PCGR) | A stand-alone software package for translation of individual tumor genomes for precision cancer medicine | MIT |
Quepid | Gather Human Judgements (aka Explicit Ratings) for Search Quality. Also a safe space to play with your search algorithm. | Apache-2 |
Acknowledgements
Thanks to the creators of these other repositories (and this one!) for getting us going down the path of creating our own. I used these efforts to get started in my survey of the space before adding, updating and pruning as per the open-source and other criteria specified above.