Home

Awesome

Awesome-Open-Vocabulary-Perception Awesome

Papers and codes for open-vocabulary perception (3D&2D). 😎

This repo mainly focuses on the open-vocabulary perception tasks (both 3D and 2D). Please pull requests or email me by yangcao.cs@gmail.com if you want to recommend papers.

3D

Open-Vocabulary 3D Object Detection

  1. <span id = "16001">[CoDAv2] Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection, Arxiv2024. [Code]
  2. <span id = "16001">[ImOV3D] ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images, NeurIPS2024. [Code]
  3. <span id = "16001">[INHA] Unlocking textual and visual wisdom: Open-vocabulary 3d object detection enhanced by comprehensive guidance from text and image, ECCV2024.
  4. <span id = "16001">[GLIS] Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection, ECCV2024. [Code]
  5. <span id = "16001">[CoDA] Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection, NeurIPS2023. [Code]
  6. <span id = "18001">[OV-3DET] Open-Vocabulary Point-Cloud Object Detection without 3D Annotation, CVPR2023. [Code]
  7. <span id = "16001">[FM-OV3D] FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection, AAAI2024. [Code]

Open-Vocabulary 3D Segmentation

  1. <span id = "16001">[OpenMask3D] OpenMask3D: Open-Vocabulary 3D Instance Segmentation, NeurIPS2023. [Code]
  2. <span id = "16001">[OpenScene] OpenScene: 3D Scene Understanding with Open Vocabularies, CVPR2023. [Code]
  3. <span id = "16001">[3D-OVS] Weakly Supervised 3D Open-vocabulary Segmentation, CVPR2023. [Code]
  4. <span id = "16001">[PLA] PLA: Language-Driven Open-Vocabulary 3D Scene Understanding, CVPR2023. [Code]
  5. <span id = "16001">[Open3DIS] Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance, CVPR2024. [Code]
  6. <span id = "16001">[MaskClustering] MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation, CVPR2024. [Code
  7. <span id = "16001">[LEGaussians] LEGaussians: Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding, CVPR2024. [Code

2D

Open-Vocabulary 2D Object Detection

  1. <span id = "16001">[Detclip] Dictionary-enriched visual-concept paralleled pre-training for open-world detection, NeurIPS2023
  2. <span id = "16001">[Detclipv2] Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment, CVPR2023
  3. <span id = "16001">[Detclipv3] DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection, CVPR2024
  4. <span id = "16001">[YOLO-World] YOLO-World: Real-Time Open-Vocabulary Object Detection, CVPR2024. [Code]

Open-Vocabulary 2D Segmentation

  1. <span id = "16001">[ODISE] Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models, CVPR2023 Highlight. [Code]
  2. <span id = "16001">[FreeDA] Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation, CVPR2024. [Code]
  3. <span id = "16001">[OVAM] Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models, CVPR2024. [Code]
  4. <span id = "16001">[PnP-OVSS] Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models, CVPR2024. [Code]
  5. <span id = "16001">[OVFoodSeg] OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation, CVPR2024.
  6. <span id = "16001">[SED] SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation, CVPR2024.