Awesome

Awesome-Open-Vocabulary-Perception

Papers and codes for open-vocabulary perception (3D&2D). 😎

This repo mainly focuses on the open-vocabulary perception tasks (both 3D and 2D). Please pull requests or email me by yangcao.cs@gmail.com if you want to recommend papers.

3D

Open-Vocabulary 3D Object Detection

[CoDAv2] Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection, Arxiv2024. [Code]
[ImOV3D] ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images, NeurIPS2024. [Code]
[INHA] Unlocking textual and visual wisdom: Open-vocabulary 3d object detection enhanced by comprehensive guidance from text and image, ECCV2024.
[GLIS] Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection, ECCV2024. [Code]
[CoDA] Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection, NeurIPS2023. [Code]
[OV-3DET] Open-Vocabulary Point-Cloud Object Detection without 3D Annotation, CVPR2023. [Code]
[FM-OV3D] FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection, AAAI2024. [Code]

Open-Vocabulary 3D Segmentation

[OpenMask3D] OpenMask3D: Open-Vocabulary 3D Instance Segmentation, NeurIPS2023. [Code]
[OpenScene] OpenScene: 3D Scene Understanding with Open Vocabularies, CVPR2023. [Code]
[3D-OVS] Weakly Supervised 3D Open-vocabulary Segmentation, CVPR2023. [Code]
[PLA] PLA: Language-Driven Open-Vocabulary 3D Scene Understanding, CVPR2023. [Code]
[Open3DIS] Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance, CVPR2024. [Code]
[MaskClustering] MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation, CVPR2024. [Code
[LEGaussians] LEGaussians: Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding, CVPR2024. [Code

2D

Open-Vocabulary 2D Object Detection

[Detclip] Dictionary-enriched visual-concept paralleled pre-training for open-world detection, NeurIPS2023
[Detclipv2] Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment, CVPR2023
[Detclipv3] DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection, CVPR2024
[YOLO-World] YOLO-World: Real-Time Open-Vocabulary Object Detection, CVPR2024. [Code]

Open-Vocabulary 2D Segmentation

[ODISE] Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models, CVPR2023 Highlight. [Code]
[FreeDA] Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation, CVPR2024. [Code]
[OVAM] Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models, CVPR2024. [Code]
[PnP-OVSS] Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models, CVPR2024. [Code]
[OVFoodSeg] OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation, CVPR2024.
[SED] SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation, CVPR2024.