Home

Awesome

🏷 Open Source Data Annotation & Labeling Tools

maintained-by-zenml

At ZenML we believe that annotation and labeling workflows are a core part of the machine learning lifecycle. As an open-source tool, we wanted to highlight and recognize the variety of tools that are available to help your workflows become more data-centric. We had three core criteria to decide whether a particular tool could make it into the list:

We welcome contributions to this list, so if you know of a tool that we've missed or if you've built one yourself, please do create a PR!

🔥 Do you use these tools or do you want to add one to your MLOps stack? At ZenML, we are looking for design partnerships and collaboration to develop the integrations and workflows around using annotation within the MLOps lifecycle. If you'd like to learn more, please join our Slack and leave us a message!

Contents

Multi Modal / Multi Domain

NameDescriptionLicense
AcharyaA Data Centric MLOps tool for your Named Entity Recognition projects ?
AdalaAn Autonomous Data (Labeling) Agent framework. Apache-2
ClassifaiA comprehensive open-source data annotation platform Apache-2
Computer Vision Annotation Tool (CVAT)A free, online, interactive video and image annotation tool for computer vision MIT
Data Annotator for Machine Learning (DAML)An application that helps machine learning teams facilitating the creation and management of annotations Apache-2
DataGymOpen source annotation and labeling tool for image and video assets MIT
DiffgramTraining Data (Data Labeling, Annotation, Workflow) for all Data Types (Image, Video, 3D, Text, Geo, Audio, more) at scale ELv2
HoverExplore and label on a map of raw data. Handles text, audio and images. MIT
Label StudioA multi-type data labeling and annotation tool with standardized output format Apache-2
PigeonA simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort of your Jupyter notebook Apache-2
QSL: Quick and Simple LabelerA quick and simple tool for labeling images, videos and time series data, right from Jupyter MIT
ShoonyaPlatform to Annotate and label data at scale MIT
TatorVideo analytics web platform AGPL-3
TornadoAiA human-in-the-loop machine learning framework AGPL-3
Universal Data ToolA web/desktop app for editing and annotating images, text, audio, documents and to view and edit any data defined in the extensible .udt.json and .udt.csv standard MIT
VGG Image Annotator (VIA)A standalone image annotator application packaged as a single HTML file (< 400 KB) that runs on most modern web browsersBSD-2
VIAMEVideo and Image Analytics for Multiple Environments Custom
Xtreme1An all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM Apache-2

Text

NameDescriptionLicense
Annotation LabAn NLP annotation tool included in spark-nlp Apache-2
ArgillaA production-ready Python framework for exploring, annotating, and managing data in NLP projects Apache-2
bulkBulk is a quick developer tool to apply some bulk labels MIT
CoreNLPA Java suite of core NLP tools GPL-3
DataQALabeling platform for text using weak supervision GPL-3
doccanoAn open source text annotation tool supporting text classification, sequence labeling and sequence to sequence tasks MIT
FLAT - FoLiA Linguistic Annotation ToolA web-based linguistic annotation environment based around the FoLiA format, an XML-based format for linguistic annotation GPL-3
INCEpTIONA semantic annotation platform offering intelligent annotation assistance and knowledge management Apache-2
knodleKnodle (Knowledge-supervised Deep Learning Framework) Apache-2
MarkupA web-based document annotation tool, powered by GPT-4 Unknown
NER Annotator for SpacyNER Annotator for SpaCy allows you to create training data for creating a custom NER Model with custom tags. MIT
NPLMNoisy Partial Label Model(NPLM) N/A
PotatoAn annotation framework with 20+ templates, editable UI, quality control, data management and an option to add a survey for crowdsourcing PolyForm Shield
refineryThe data scientist's open-source choice to scale, assess and maintain natural language data. Apache-2
SlateA Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python ISC
SMARTA tool for building labeled training datasets for supervised machine learning tasks in NLP MIT
SpaCy annotatorSpacy NER annotator using ipywidgets N/A
Small-TextActive Learning for Text Classification MIT
SnorkelProgrammatically Build and Manage Training Data Apache-2
skweakskweak: Weak supervision for NLP MIT
TALENA way to do annotations for NER Custom
ThemeMinimalistic CLI labeling tool for text classification MIT
YEDDAA lightweight collaborative text span annotation tool Apache-2
WeaSELWeaSEL: Weakly Supervised End-to-end Learning Apache-2

Images

NameDescriptionLicense
3D SlicerVisualization, processing, segmentation, registration, and analysis of medical, biomedical, and other 3D images and meshes BSD
Annotate LabSimplifying Image Annotation MIT
AnnotoriousA JavaScript library for image annotation BSD-3
AnyLabelingEffortless AI-assisted data labeling with AI support from YOLO, Segment Anything, MobileSAM GPL-3
autodistillImages to inference with no labeling (use foundation models to train supervised models) Apache-2
bbox-visualizerMake drawing and labeling bounding boxes easy as cake MIT
Bounding Box EditorA JavaFX desktop application for creating image-object-annotations with bounding boxes GPL-3
CATMAIDThe Collaborative Annotation Toolkit for Massive Amounts of Image Data GPL-3
COCO AnnotatorA web-based image segmentation tool for object detection, localization, and keypoints MIT
DeepLabelA cross-platform desktop image annotation tool for machine learning MIT
ilastikSegment, classify, track and count your cells or other experimental data Custom
ImageTaggerAn open source online platform for collaborative image labeling MIT
imglabA web based tool to label images for objects that can be used to train dlib or other object detectors MIT
KNOSSOSA software tool for the visualization and annotation of 3D image data and was developed for the rapid reconstruction of neural morphology and connectivity GPL-2
labelCloudA lightweight tool for labeling 3D bounding boxes in point clouds GPL-3
LabelFlowAn open platform for image labeling Custom
labelmeImage Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation) Custom
LabelImgA graphical image annotation tool and label object bounding boxes in images MIT
LOSTA flexible web-based framework for semi-automatic image annotation MIT
Make SenseA free-to-use online tool for labeling photos GPL-3
MyVisionComputer vision based ML training data generation tool GPL-3
OHIF Medical Imaging ViewerOHIF zero-footprint DICOM viewer and oncology specific Lesion Tracker MIT
OpenLabelerAn open source desktop application for annotating objects for AI appplications Apache-2
PixanoA web-based smart-annotation tool for computer vision applications CeCILL-C
ScalabelA web-based visual data annotation tool, supporting both 2D and 3D data labeling Apache-2
webKnossosA fully cloud- and browser-based 3D annotation tool for distributed large-scale data analysis in light- and electron-microscopy based Connectomics AGPL-3
Yolo_LabelGUI for marking bounded boxes of objects in images for training neural network YOLO MIT

Video

NameDescriptionLicense
DIVEMedia annotation and analysis tools for web and desktop Apache-2
UltimateLabelingA multi-purpose Video Labeling GUI in Python with integrated SOTA detector and tracker MIT

Audio

NameDescriptionLicense
aubioA library for audio and music analysis GPL-3
audinoOpen source audio annotation tool MIT
PraatAnnotation tool for phonetics analysis GPL-3
Peaks.jsJavaScript UI component for interacting with audio waveforms LGPL-3
Wavesurfer.jsNavigable waveform built on Web Audio and Canvas BSD-3

Time Series

NameDescriptionLicense
sktimeA framework for machine learning with time series BSD-3

Other

NameDescriptionLicense
ComposeAutomated prediction engineering. Allows you to easily structure prediction problems and generate labels for supervised learning BSD-3
Encord ActiveToolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling Apache-2
NeuroTrALEAnnotation software for brain mapping, supporting 3D imaging and annotation BSD-2
OpenCRAVATA modular annotation tool for genomic variants MIT
PatchSorterAn open-source digital pathology tool for histologic object labeling BSD-3
Personal Cancer Genome Reporter (PCGR)A stand-alone software package for translation of individual tumor genomes for precision cancer medicine MIT
QuepidGather Human Judgements (aka Explicit Ratings) for Search Quality. Also a safe space to play with your search algorithm. Apache-2

Acknowledgements

Thanks to the creators of these other repositories (and this one!) for getting us going down the path of creating our own. I used these efforts to get started in my survey of the space before adding, updating and pruning as per the open-source and other criteria specified above.