Home

Awesome

<div align="center"> <h1>Tokenize Anything via Prompting</h1>

Ting Pan<sup>1,2*</sup>,   Lulu Tang<sup>2*</sup>,   Xinlong Wang<sup></sup>,   Shiguang Shan<sup>1</sup>

<sup>1</sup>ICT-CAS,   <sup>2</sup>BAAI<br> <sup>*</sup> Equal Contribution, <sup></sup>Project Lead

[Paper] [🤗 Demo] <br><br><image src="assets/model_overview.png"/>

</div>

We present Tokenize Anything via Prompting, a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning arbitrary regions, with flexible visual prompts (point, box and sketch). The model is trained with exhaustive segmentation masks sourced from SA-1B, coupled with semantic priors from a pre-trained EVA-CLIP with 5 billion parameters.

Installation

Preliminaries

torch >= 2.1

flash-attn >= 2.3.3 (for TextGeneration)

gradio-image-prompter (for GradioApp, Install from URL)

Installing Package

Clone this repository to local disk and install:

cd tokenize-anything && pip install .

You can also install from the remote repository:

pip install git+ssh://git@github.com/baaivision/tokenize-anything.git

Quick Start

Development

The TAP models can be used for diverse vision and language tasks.

We adopt a modular design that decouples all components and predictors.

As a best practice, implement your custom predictor and asynchronous pipeline as follows:

from tokenize_anything import model_registry

with <distributed_actor>:
    model = model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
    results = <custom_predictor>(model, *args, **kwargs)

server.collect_results()

See builtin examples (web-demo and evaluations) provided in scripts for more details.

Inference

See Inference Guide.

See Concept Guide.

Evaluation

See Evaluation Guide for TAP-H.

See Evaluation Guide for TAP-L.

See Evaluation Guide for TAP-B.

Models

Model weights

V1.1 Release Notes

ModelDescriptionScheduleMD5Weights
tap_vit_hViT-H TAP v1.1 model(100% SA-1B, 180k), (VG, 50ep)4bdfb9🤗 HF link
tap_vit_lViT-L TAP v1.1 model(100% SA-1B, 180k), (VG, 50ep)c1d41f🤗 HF link
tap_vit_bViT-B TAP v1.1 model(100% SA-1B, 180k), (VG, 50ep)707f80🤗 HF link

V1.0 Release Notes

ModelDescriptionScheduleMD5Weights
tap_vit_lViT-L TAP v1.0 model(50% SA-1B, 90k), (VG, 25ep)03f8ec🤗 HF link
tap_vit_bViT-B TAP v1.0 model(50% SA-1B, 90k), (VG, 25ep)b45cbf🤗 HF link

Concept weights

Note: You can generate these weights following the Concept Guide.

ConceptDescriptionWeights
Merged-2560Merged concepts🤗 HF link
LVIS-1203LVIS concepts🤗 HF link
COCO-80COCO concepts🤗 HF link

License

Apache License 2.0

Citation

@article{pan2023tap,
  title={Tokenize Anything via Prompting},
  author={Pan, Ting and Tang, Lulu and Wang, Xinlong and Shan, Shiguang},
  journal={arXiv preprint arXiv:2312.09128},
  year={2023}
}

Acknowledgement

We thank the repositories: SAM, EVA, LLaMA, FlashAttention, Gradio, Detectron2 and CodeWithGPU.