Home

Awesome

Awesome-Referring-Image-Segmentation

Awesome

A collection of referring image segmentation papers and datasets.

Feel free to create a PR or an issue.

examples

Outline

1. Datasets

Short namePaperSourceCode/Project Link
MeViSMeViS: A Large-scale Benchmark for Video Segmentation with Motion ExpressionsICCV 2023[dataset] [project]
gRefCOCOGRES: Generalized Referring Expression SegmentationCVPR 2023[dataset] [project]
ClevrTexClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object SegmentationNeurIPS Datasets and Benchmarks 2021[project]
ScanReferScanRefer: 3D Object Localization in RGB-D Scans using Natural LanguageECCV 2020[project]
VGPhraseCutPhraseCut: Language-based Image Segmentation in the WildCVPR 2020[project]
CLEVR-Ref+CLEVR-Ref+: Diagnosing Visual Reasoning with Referring ExpressionsCVPR 2019[project]
UNCModeling context in referring expressionsECCV 2016[dataset]
UNC+Modeling context in referring expressionsECCV 2016[dataset]
Google-RefGeneration and comprehension of unambiguous object descriptionsCVPR 2016[dataset]
ReferItReferit game: Referring to objects in photographs of natural scenesEMNLP 2014[project]

2. Challenges

NameWorkshopDateSubmission Link
1st MeViS ChallengeCVPR 2024 Workshop: Pixel-level Video Understanding in the WildMay 2024[CodaLab]
RVOS ChallengeECCV 2024 Workshop: The 6th Large-scale Video Object Segmentation ChallengeAug 2024[CodaLab]

3. Traditional Referring Image Segmentation

Short namePaperSourceCode/Project Link
VATEXVision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context UnderstandingWACV 2025[code] [webpage]
Shared-RISA Simple Baseline with Single-encoder for Referring Image Segmentationarxiv 24.08[code]
ASDAAdaptive Selection based Referring Image SegmentationACM MM 2024code
NeMoFinding NeMo: Negative-mined Mosaic Augmentation for Referring Image SegmentationECCV 2024[webpage] [code]
ReMamberReMamber: Referring Image Segmentation with Mamba TwisterECCV 2024[code]
GTMSGTMS: A Gradient-driven Tree-guided Mask-free Referring Image Segmentation MethodECCV 2024[code]
SAM4MLLMSAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression SegmentationECCV 2024[code]
Pseudo-RISPseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image SegmentationECCV 2024[code]
SafaRiSafaRi: Adaptive Sequence Transformer for Weakly Supervised Referring Expression SegmentationECCV 2024[webpage]
CM-MaskSDCM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image SegmentationTMM 2024
Prompt-RISPrompt-Driven Referring Image Segmentation with Instance ContrastingCVPR 2024
LQMFormerLQMFormer: Language-aware Query Mask Transformer for Referring Image SegmentationCVPR 2024
PPTCurriculum Point Prompting for Weakly-Supervised Referring Image SegmentationCVPR 2024
GSVAGSVA: Generalized Segmentation via Multimodal Large Language ModelsCVPR 2024[code]
RMSINRotated Multi-Scale Interaction Network for Referring Remote Sensing Image SegmentationCVPR 2024[code]
MRESUnveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression SegmentationCVPR 2024[code] [webpage]
MagNetMask Grounding for Referring Image SegmentationCVPR 2024[webpage]
LISALISA: Reasoning Segmentation via Large Language ModelCVPR 2024[code]
RefSegformerTowards Robust Referring Image SegmentationTIP 2024[code]
JMCELNReferring Image Segmentation via Joint Mask Contextual Embedding Learning and Progressive Alignment NetworkEMNLP 2023[code]
CVMNUnsupervised Domain Adaptation for Referring Semantic SegmentationACM MM 2023[code]
CARISCARIS: Context-Aware Referring Image SegmentationACM MM 2023[code]
TASText Augmented Spatial-aware Zero-shot Referring Image SegmentationEMNLP 2023
BKINetBilateral Knowledge Interaction Network for Referring Image SegmentationTMM 2023[code]
Group-RESAdvancing Referring Expression Segmentation Beyond Single ImageICCV 2023[code]
Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk ConsistencyICCV 2023
Shatter and Gather: Learning Referring Image Segmentation with Text SupervisionICCV 2023
TRISReferring Image Segmentation Using Text SupervisionICCV 2023[code]
RIS-DMMIBeyond One-to-One: Rethinking the Referring Image SegmentationICCV 2023[code]
ETRISBridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image SegmentationICCV 2023[code]
SEEMSegment Everything Everywhere All at OncearXiv 23.04[code]
SLViTSLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image SegmentationIJCAI 2023[code]
WiCoWiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image SegmentationIJCAI 2023
M3AttMulti-Modal Mutual Attention and Iterative Interaction for Referring Image SegmentationTIP 2023
X-DecoderX-Decoder: Generalized Decoding for Pixel, Image and LanguageCVPR 2023[code] [project]
Partial-RESLearning to Segment Every Referring Object Point by PointCVPR 2023[code]
MCRESMeta Compositional Referring Expression SegmentationCVPR 2023
Global-Local CLIPZero-shot Referring Image Segmentation with Global-Local Context FeaturesCVPR 2023[code]
PolyFormerPolyFormer: Referring Image Segmentation as Sequential Polygon GenerationCVPR 2023[code] [project]
GRESGRES: Generalized Referring Expression SegmentationCVPR 2023[code] [dataset] [project]
CGFormerContrastive Grouping with Transformer for Referring Image SegmentationCVPR 2023[code]
SADLRSemantics-Aware Dynamic Localization and Refinement for Referring Image SegmentationAAAI 2023
R-RISTowards Robust Referring Image SegmentationarXiv 22.09[code] [project]
-Learning From Box Annotations for Referring Image SegmentationTNNLS 2022[code]
-Instance-Specific Feature Propagation for Referring SegmentationTMM 2022
LAVTLAVT: Language-Aware Vision Transformer for Referring Image SegmentationCVPR 2022[code]
CRISCRIS: CLIP-Driven Referring Image SegmentationCVPR 2022[code]
ReSTRReSTR: Convolution-free Referring Image Segmentation Using TransformersCVPR 2022[project]
TV-NetTwo-stage Visual Cues Enhancement Network for Referring Image SegmentationACM MM 2021[code]
VLTVision-Language Transformer and Query Generation for Referring SegmentationICCV 2021[code]
MDETRMDETR - Modulated Detection for End-to-End Multi-Modal UnderstandingICCV 2021[code] [project]
CEFNetEncoder Fusion Network with Co-Attention Embedding for Referring Image SegmentationCVPR 2021[code]
BUSNetBottom-Up Shift and Reasoning for Referring Image SegmentationCVPR 2021[code]
LTSLocate then Segment: A Strong Pipeline for Referring Image SegmentationCVPR 2021
CGANCascade Grouped Attention Network for Referring Expression SegmentationACM MM 2020
LSCMLinguistic Structure Guided Context Modeling for Referring Image SegmentationECCV 2020[code]
CMPC-RefsegReferring Image Segmentation via Cross-Modal Progressive ComprehensionCVPR 2020[code]
BRINetBi-directional Relationship Inferring Network for Referring Image SegmentationCVPR 2020[code]
PhraseCutPhraseCut: Language-based Image Segmentation in the WildCVPR 2020[code] [project]
MCNMulti-task Collaborative Network for Joint Referring Expression Comprehension and SegmentationCVPR 2020[code]
-Dual Convolutional LSTM Network for Referring Image SegmentationTMM 2020
STEPSee-Through-Text Grouping for Referring Image SegmentationICCV 2019
lang2segReferring Expression Object Segmentation with Caption-Aware ConsistencyBMVC 2019[code]
CMSACross-Modal Self-Attention Network for Referring Image SegmentationCVPR 2019[code]
KWAKey-Word-Aware Network for Referring Expression Image SegmentationECCV 2018[code]
DMNDynamic Multimodal Instance Segmentation Guided by Natural Language QueriesECCV 2018[code]
RRNReferring Image Segmentation via Recurrent Refinement NetworksCVPR 2018[code]
MAttNetMAttNet: Modular Attention Network for Referring Expression ComprehensionCVPR 2018[code] [Demo]
RMIRecurrent Multimodal Interaction for Referring Image SegmentationICCV 2017[code]
LSTM-CNNSegmentation from natural language expressionsECCV 2016[code] [project]

4. Interactive Referring Image Segmentation

Short namePaperSourceCode/Project Link
PhraseClickPhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and ClickECCV 2020

5. Referring Video Object Segmentation

Short namePaperSourceCode/Project Link
VD-ITExploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object SegmentationECCV 2024[code]
DsHmpDecoupling Static and Hierarchical Motion Perception for Referring Video SegmentationCVPR 2024[code]
LoShLoSh: Long-Short Text Joint Prediction Network for Referring Video Object SegmentationCVPR 2024[code]
SOCSOC: Semantic-Assisted Object Cluster for Referring Video Object SegmentationNeurIPS 2023[code]
LocaterLocal-Global Context Aware Transformer for Language-Guided Video SegmentationTPAMI 2023[code] [dataset]
TempCDTemporal Collection and Distribution for Referring Video Object SegmentationICCV 2023[project] [code]
HTMLHTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object SegmentationICCV 2023[project]
LMPMMeViS: A Large-scale Benchmark for Video Segmentation with Motion ExpressionsICCV 2023[code] [project]
OnlineReferOnlineRefer: A Simple Online Baseline for Referring Video Object SegmentationICCV 2023[code]
SgMgSpectrum-guided Multi-granularity Referring Video Object SegmentationICCV 2023[code]
R2VOSTowards Robust Referring Video Object Segmentation with Cyclic Relational ConsistencyICCV 2023[code]
MANetMulti-Attention Network for Compressed Video Referring Object SegmentationACM MM 2022[code]
MTTREnd-to-End Referring Video Object Segmentation with Multimodal TransformersCVPR 2022[code]
ReferFormerLanguage as Queries for Referring Video Object SegmentationCVPR 2022[code]
LBDTLanguage-Bridged Spatial-Temporal Interaction for Referring Video Object SegmentationCVPR 2022[code]
-Multi-Level Representation Learning with Semantic Alignment for Referring Video Object SegmentationCVPR 2022
YOFOYou Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object SegmentationAAAI 2022
CITDRethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object SegmentationCVPRW 2021
ClawCraneNetClawCraneNet: Leveraging Object-level Relation for Text-based Video SegmentationarXiv 21.03
RefVOSRefVOS: A Closer Look at Referring Expressions for Video Object SegmentationarXiv 20.10
URVOSURVOS: Unified Referring Video Object Segmentation Network with a Large-Scale BenchmarkECCV 2020[code]
Video Object Segmentation with Language Referring ExpressionsACCV 2018

6. 3D Referring Segmentation

Short namePaperSourceCode/Project Link
X-RefSeg3DX-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural NetworksAAAI 2024[code]
3D-STMN3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression SegmentationAAAI 2024[code]
SegPointSegPoint: Segment Any Point Cloud via Large Language ModelECCV 2024[project]
3D-GRES3D-GRES: Generalized 3D Referring Expression SegmentationACM MM 2024[code]
RefMask3DRefMask3D: Language-Guided Transformer for 3D Referring SegmentationACM MM 2024[code]
TGNNText-Guided Graph Neural Networks for Referring 3D Instance SegmentationAAAI 2021
InstanceReferInstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual ReferringICCV 2021[code]