Awesome
Awesome-Open-Vocabulary-Perception
Papers and codes for open-vocabulary perception (3D&2D). 😎
This repo mainly focuses on the open-vocabulary perception tasks (both 3D and 2D). Please pull requests or email me by yangcao.cs@gmail.com
if you want to recommend papers.
3D
Open-Vocabulary 3D Object Detection
- <span id = "16001">[CoDAv2] Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection,
Arxiv2024
. [Code] - <span id = "16001">[ImOV3D] ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images,
NeurIPS2024
. [Code] - <span id = "16001">[INHA] Unlocking textual and visual wisdom: Open-vocabulary 3d object detection enhanced by comprehensive guidance from text and image,
ECCV2024
. - <span id = "16001">[GLIS] Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection,
ECCV2024
. [Code] - <span id = "16001">[CoDA] Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection,
NeurIPS2023
. [Code] - <span id = "18001">[OV-3DET] Open-Vocabulary Point-Cloud Object Detection without 3D Annotation,
CVPR2023
. [Code] - <span id = "16001">[FM-OV3D] FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection,
AAAI2024
. [Code]
Open-Vocabulary 3D Segmentation
- <span id = "16001">[OpenMask3D] OpenMask3D: Open-Vocabulary 3D Instance Segmentation,
NeurIPS2023
. [Code] - <span id = "16001">[OpenScene] OpenScene: 3D Scene Understanding with Open Vocabularies,
CVPR2023
. [Code] - <span id = "16001">[3D-OVS] Weakly Supervised 3D Open-vocabulary Segmentation,
CVPR2023
. [Code] - <span id = "16001">[PLA] PLA: Language-Driven Open-Vocabulary 3D Scene Understanding,
CVPR2023
. [Code] - <span id = "16001">[Open3DIS] Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance,
CVPR2024
. [Code] - <span id = "16001">[MaskClustering] MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation,
CVPR2024
. [Code - <span id = "16001">[LEGaussians] LEGaussians: Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding,
CVPR2024
. [Code
2D
Open-Vocabulary 2D Object Detection
- <span id = "16001">[Detclip] Dictionary-enriched visual-concept paralleled pre-training for open-world detection,
NeurIPS2023
- <span id = "16001">[Detclipv2] Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment,
CVPR2023
- <span id = "16001">[Detclipv3] DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection,
CVPR2024
- <span id = "16001">[YOLO-World] YOLO-World: Real-Time Open-Vocabulary Object Detection,
CVPR2024
. [Code]
Open-Vocabulary 2D Segmentation
- <span id = "16001">[ODISE] Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models,
CVPR2023 Highlight
. [Code] - <span id = "16001">[FreeDA] Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation,
CVPR2024
. [Code] - <span id = "16001">[OVAM] Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models,
CVPR2024
. [Code] - <span id = "16001">[PnP-OVSS] Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models,
CVPR2024
. [Code] - <span id = "16001">[OVFoodSeg] OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation,
CVPR2024
. - <span id = "16001">[SED] SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation,
CVPR2024
.