Awesome

ECCV-2022-Papers

9361c19ba6cbc5ae7be1fba8d82759b

官网链接：https://eccv2022.ecva.net/

截稿日期：2022年3月7日(9:59PM CET, 11:59AM PST)

会议日期：2022年10月24日-2022年10月28日

历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~

2022 年论文分类汇总戳这里

↘️CVPR-2022-Papers ↘️WACV-2022-Papers ↘️ECCV-2022-Papers

2021年论文分类汇总戳这里

↘️ICCV-2021-Papers ↘️CVPR-2021-Papers

2020 年论文分类汇总戳这里

↘️CVPR-2020-Papers ↘️ECCV-2020-Papers

❣❣❣另外打包下载ECCV 2022论文，可在【我爱计算机视觉】微信公众号后台回复“paper”。共计 1645 篇。分类完成

:cat:	:dog:	:tiger:	:wolf:
1.其它	2.Image Segmentation(图像分割)	3.Image Progress(图像处理)	4.Image Captioning(图像字幕)
5.Image/Video Retrieval(图像/视频检索)	6.Object Detection(目标检测)	7.Object Tracking(目标跟踪)	8.3D(三维视觉)
9.Human Pose Estimation(人体姿态估计)	10.Pose Estimation(物体姿势估计)	11.Video	12.Action Detection(人体动作检测与识别)
13.Human-Object Interaction(人物交互)	14.Visual Answer Questions(视觉问答)	15.Vision-Language(视觉语言)	16.Transformer
17.GAN	18.Image-to-Image Translation(图像到图像翻译)	19.Image Synthesis/Generation(图像合成)	20.Face(人脸)
21.Semi/self-supervised learning(半/自监督)	22.OCR	23.Medical Image(医学影像)	24.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)
25.Autonomous vehicles(自动驾驶)	26.Video/Image Super-Resolution(视频/图像超分辨率)	27.Image Classification(图像分类)	28.Neural Architecture Search(神经架构搜索)
29.Re-identification(重识别)	30.Optical Flow(光流)	31.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)	32.Point Cloud(点云)
33.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)	34.Meta-Learning(元学习)	35.Feature Learning(联邦学习)	36.Machine Learning(机器学习)
37.Open-set Recognition(开集识别)	38.Contrastive Learning(对比学习)	39.Transfer Learning(迁移学习)	40.Adversarial Learning(对抗学习)
41.Incremental Learning(增量学习)	42.Reinforcement Learning(强化学习)	43.Lifelong Learning(终生学习)	44.Active Learning(主动学习)
45.Metric Learning(度量学习)	46.Continual Learning(持续学习)	47.GNN/GCN(图神经网络)	48.Semantic Correspondence(语义对应)
49.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/适应)	50.Neural Rendering(渲染)	51.Anomaly Detection(异常检测)	52.Scene Flow Estimation(场景流估计)
53.Dataset(数据集)	54.View Generation(视图生成)	55.Style Transfer(风格迁移)	56.Sound
57.Scene Graph Generation(场景图生成)	58.Human Motion Prediction(人体动作预测)	59.Image Matching(图像匹配)	60.Data Augmentation(数据增强)
61.Light Field(光学、几何、光场成像)

:trophy::trophy::trophy: 获奖论文

最佳论文奖
- On the Versatile Uses of Partial Distance Correlation in Deep Learning<br>:star:code
最佳论文荣誉奖
- A Level Set Theory for Neural Implicit Evolution under Explicit Flows
- Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields<br>:star:code
Koenderink Prize (test of time)
- A naturalistic open source movie for optical flow evaluation
- Indoor Segmentation and Support Inference from RGBD Images
Best Demo Award
- [Using a Smartphone for Augmented Reality in a Classroom]<br>:tv:video
Everingham Prize
- 【The UCF101 and HMD51 dataset teams】&【Walter J. Scheirer 】

<a name="61"/>

61.Light Field(光学、几何、光场成像)

相机相关
- Learned Monocular Depth Priors in Visual-Inertial Initialization
- 相机姿态估计
  - E-Graph: Minimal Solution for Rigid Rotation with Extensibility Graphs
  - Camera Pose Estimation and Localization with Active Audio Sensing
相机姿势
- Camera Pose Auto-Encoders for Improving Pose Regression<br>:star:code
相机估计
- A Reliable Online Method for Joint Estimation of Focal Length and Camera Rotation<br>:star:code
相机自动校准
- Camera Auto-Calibration from the Steiner Conic of the Fundamental Matrix
事件相机
- DVS-Voltmeter: Stochastic Process-Based Event Simulator for Dynamic Vision Sensors<br>:star:code
- Selection and Cross Similarity for Event-Image Deep Stereo<br>:star:code
相机重识别
- SC-wLS: Towards Interpretable Feed-forward Camera Re-localization<br>:star:code
相机定位
- Towards Accurate Active Camera Localization<br>:star:code
光场

<a name="60"/>

60.Data Augmentation(数据增强)

<a name="59"/>

59.Image Matching(图像匹配)

ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer<br>:house:project
ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement<br>:star:code:house:project

<a name="58"/>

58.Human Motion Prediction(人体动作预测)

ERA: Expert Retrieval and Assembly for Early Action Prediction
Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction
GIMO: Gaze-Informed Human Motion Prediction in Context<br>:star:code
Diverse Human Motion Prediction Guided by Multi-level Spatial-Temporal Anchors<br>:star:code
行动预测
- Rethinking Learning Approaches for Long-Term Action Anticipation<br>:star:code
运动估计
- PREF: Predictability Regularized Neural Motion Fields<br>:open_mouth:oral
人体运动合成
- Learning Uncoupled-Modulation CVAE for 3D Action-Conditioned Human Motion Synthesis
- MotionCLIP: Exposing Human Motion Generation to CLIP Space<br>:star:code

<a name="57"/>

57.Scene Graph Generation(场景图生成)

<a name="56"/>

56.Sound

Learning Visual Styles from Audio-Visual Associations<br>:house:project
Active Audio-Visual Separation of Dynamic Sound Sources<br>:house:project
声源定位
- Localizing Visual Sounds the Easy Way<br>:star:code
有源扬声器检测
- End-to-End Active Speaker Detection
音频驱动的视频肖像生成
- Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation<br>:open_mouth:oral:house:project
视听分割
- Audio-Visual Segmentation<br>:star:code
- Audio—Visual Segmentation<br>:star:code
语音合成
- VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
声音分离
- AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
- VoViT: Low Latency Graph-Based Audio-Visual Voice Separation Transformer<br>:house:project

<a name="55"/>

55.Style Transfer(风格迁移)

CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer<br>:open_mouth:oral:star:code
Learning Graph Neural Networks for Image Style Transfer
ARF: Artistic Radiance Fields<br>:house:project
图像风格化
- WISE: Whitebox Image Stylization by Example-based Learning<br>:star:code
发型迁移
- Style Your Hair: Latent Optimization for Pose-Invariant Hairstyle Transfer via Local-Style-Aware Hair Alignment<br>:star:code

<a name="54"/>

54.View Generation(视图生成)

<a name="53"/>

53.Dataset(数据集)

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning<br>:sunflower:dataset
Responsive Listening Head Generation: A Benchmark Dataset and Baseline<br>:sunflower:dataset
Online Segmentation of LiDAR Sequences: Dataset and Algorithm<br>:sunflower:dataset
COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts<br>:star:code<br>用于识别任意或截断文本的漫画拟声词数据集
BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis<br>:sunflower:dataset<br>用于舞蹈动作合成的霹雳舞比赛数据集
CelebV-HQ: A Large-Scale Video Facial Attributes Dataset<br>:sunflower:dataset:house:project<br>一个大规模的视频人脸属性数据集
UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture<br>:star:code:house:project<br>用于鲁棒性以自我为中心的三维人类运动捕捉的新数据集
BEAT: A Large-Scale Semantic and Emotional Multi-modal Dataset for Conversational Gestures Synthesis<br>:sunflower:dataset<br>:newspaper:ECCV 2022 | 76小时动捕，最大规模数字人多模态数据集开源
MovieCuts: A New Dataset and Benchmark for Cut Type Recognition<br>:sunflower:dataset<br>剪切类型识别
A Real World Dataset for Multi-View 3D Reconstruction<br>:sunflower:dataset<br>三维重建
Capturing, Reconstructing, and Simulating: The UrbanScene3D Dataset<br>:sunflower:dataset<br>城市场景重建
PartImageNet: A Large, High-Quality Dataset of Parts<br>分割
A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge<br>:sunflower:dataset<br>VQA
OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing<br>:sunflower:dataset<br>视频编辑
ClearPose: Large-Scale Transparent Object Dataset and Benchmark<br>:sunflower:dataset<br>深度估计
AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment<br>:sunflower:dataset<br>动画名人头像数据集
A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing<br>用于室内和室外场景解析的密集材料分割数据集
MimicME: A Large Scale Diverse 4D Database for Facial Expression Analysis<br>用于面部表情分析的大规模多样化4D数据库
Delving into Universal Lesion Segmentation: Method, Dataset, and Benchmark<br>:sunflower:dataset<br>病变分割

<a name="52"/>

52.Scene Flow Estimation(场景流估计)

<a name="51"/>

51.Anomaly Detection(异常检测)

<a name="50"/>

50.Neural Rendering(渲染)

Relighting4D: Neural Relightable Human from Videos<br>:star:code:house:project:tv:video
MPIB: An MPI-Based Bokeh Rendering Framework for Realistic Partial Occlusion Effects<br>:star:code:house:project
NeuMan: Neural Human Radiance Field from a Single Video<br>:star:code
Approximate Differentiable Rendering with Algebraic Surfaces<br>:star:code:house:project
AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields<br>:star:code:house:project
Generalizable Patch-Based Neural Rendering<br>:open_mouth:oral:star:code:house:project
Deforming Radiance Fields with Cages<br>:star:code:house:project
NeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing<br>:open_mouth:oral:star:code:house:project
ActiveNeRF: Learning where to See with Uncertainty Estimation<br>:star:code
ARAH: Animatable Volume Rendering of Articulated Human SDFs<br>:star:code:house:project
LaTeRF: Label and Text Driven Object Radiance Fields
MoFaNeRF: Morphable Facial Neural Radiance Field<br>:star:code
Conditional-Flow NeRF: Accurate 3D Modelling with Reliable Uncertainty Quantification
Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields<br>:star:code
KeypointNeRF: Generalizing Image-Based Volumetric Avatars Using Relative Spatial Encoding of Keypoints<br>:house:project
ViewFormer: NeRF-Free Neural Rendering from Few Images Using Transformers<br>:star:code
GeoAug: Data Augmentation for Few-Shot NeRF with Geometry Constraints
SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image<br>:star:code
BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-Scale Scene Rendering

<a name="49"/>

49.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/适应)

<a name="48"/>

48.Semantic Correspondence(语义对应)

<a name="47"/>

47.GNN/GCN(图神经网络)

<a name="46"/>

46.Continual Learning(持续学习)

<a name="45"/>

45.Metric Learning(度量学习)

<a name="44"/>

44.Active Learning(主动学习)

<a name="43"/>

43.Lifelong Learning(终生学习)

Anti-Retroactive Interference for Lifelong Learning<br>:star:code

<a name="42"/>

42.Reinforcement Learning(强化学习)

<a name="41"/>

41.Incremental Learning(增量学习)

<a name="40"/>

40.Adversarial Learning(对抗学习)

<a name="39"/>

39.Transfer Learning(迁移学习)

<a name="38"/>

38.Contrastive Learning(对比学习)

<a name="37"/>

37.Open-set Recognition(开集识别)

<a name="36"/>

36.Machine Learning(机器学习)

Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning

<a name="35"/>

35.Feature Learning(联邦学习)

<a name="34"/>

34.Meta-Learning(元学习)

<a name="33"/>

33.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)

<a name="32"/>

32.Point Cloud(点云)

<a name="31"/>

31.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)

增强现实
- LaMAR: Benchmarking Localization and Mapping for Augmented Reality<br>:star:code:house:project
VR
- LiP-Flow: Learning Inference-Time Priors for Codec Avatars via Normalizing Flows in Latent Space
- human volumetric capture(容积捕获)
  - AvatarCap: Animatable Avatar Conditioned Monocular Human Volumetric Capture<br>:star:code:house:project
虚拟试穿
视觉定位(相机姿势估计)
- MeshLoc: Mesh-Based Visual Localization<br>:star:code
机器人
- Visual Cross-View Metric Localization with Dense Uncertainty Estimates<br>:star:code

<a name="30"/>

30.Optical Flow(光流)

<a name="29"/>

29.Re-identification(重识别)

<a name="28"/>

28.Neural Architecture Search(神经架构搜索)

<a name="27"/>

27.Image Classification(图像分类)

<a name="26"/>

26.Video/Image Super-Resolution(视频/图像超分辨率)

<a name="25"/>

25.Autonomous vehicles(自动驾驶)

<a name="24"/>

24.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)

遥感
- Tomography of Turbulence Strength Based on Scintillation Imaging
- TD-Road: Top-Down Road Network Extraction with Holistic Graph Construction
航空视频识别
FAR: Fourier Aerial Video Recognition<br>:house:project

<a name="23"/>

23.Medical Image(医学影像)

The Surprisingly Straightforward Scene Text Removal Method With Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis<br>:star:code:house:project
医学图像分割
放射科报告生成
- Cross-modal Prototype Driven Network for Radiology Report Generation<br>:star:code
密集预测
- ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images<br>:star:code
retinal image matching(视网膜图像匹配)
- Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching<br>:star:code
支架追踪
- Robust Landmark-based Stent Tracking in X-ray Fluoroscopy
病变检测
- Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection
医学图像分析
- UniMiSS: Universal Medical Self-Supervised Learning via Breaking Dimensionality Barrier<br>:star:code
- K-SALSA: K-Anonymous Synthetic Averaging of Retinal Images via Local Style Alignment<br>:star:code
医学图像分类
- Differentiable Zooming for Multiple Instance Learning on Whole-Slide Images<br>:star:code
- 疾病分类
  - RadioTransformer: A Cascaded Global-Focal Transformer for Visual Attention-Guided Disease Classification<br>:star:code
医学关键点定位
- One-Shot Medical Landmark Localization by Edge-Guided Transform and Noisy Landmark Refinement<br>:star:code

<a name="22"/>

22.OCR

Levenshtein OCR
文本识别
- Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and Document Enhancement
手写数学表达式识别
- CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition<br>:star:code
- When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition<br>:star:code
场景文本检测
视频文本检测
- Real-time End-to-End Video Text Spotter with Contrastive Representation Learning<br>:star:code
文本检测
- Unitail: Detecting, Reading, and Matching in Retail Scene"<br>:house:project
文件图像矫正
- Geometric Representation Learning for Document Image Rectification<br>:star:code
document unwarping
- Learning an Isometric Surface Parameterization for Texture Unwrapping<br>:star:code

<a name="21"/>

21.Semi/self-supervised learning(半/自监督)

<a name="20"/>

20.Face(人脸)

Effective Presentation Attack Detection Driven by Face Related Task<br>:star:code
Facial Depth and Normal Estimation Using Single Dual-Pixel Camera<br>:star:code
StyleFace: Towards Identity-Disentangled Face Generation on Megapixels
Augmentation of rPPG Benchmark Datasets: Learning to Remove and Embed rPPG Signals via Double Cycle Consistent Learning from Unpaired Facial Videos<br>:star:code
Custom Structure Preservation in Face Aging
deepfake检测
- Detecting and Recovering Sequential DeepFake Manipulation<br>:star:code:house:project
- Explaining Deepfake Detection by Analysing Image Matching<br>:star:code
- Hierarchical Contrastive Inconsistency Learning for Deepfake Video Detection
三维人脸
- Structure-aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation<br>:star:code
活体检测
- Generative Domain Adaptation for Face Anti-Spoofing
- Multi-domain Learning for Updating Face Anti-spoofing Models<br>:star:code
- Source-Free Domain Adaptation with Contrastive Domain Alignment and Self-Supervised Exploration for Face Anti-Spoofing<br>:star:code
- Adaptive Transformers for Robust Few-Shot Cross-Domain Face Anti-Spoofing
人脸识别
- Controllable and Guided Face Synthesis for Unconstrained Face Recognition<br>:star:code:house:project
- Towards Robust Face Recognition with Comprehensive Search
- BoundaryFace: A mining framework with noise label self-correction for Face Recognition<br>:star:code
- Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain<br>:star:code
- OneFace: One Threshold for All
- AgeTransGAN for Facial Age Transformation with Rectified Performance Metrics<br>:star:code
- Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition<br>:star:code
- CoupleFace: Relation Matters for Face Recognition Distillation
- Towards Racially Unbiased Skin Tone Estimation via Scene Disambiguation<br>:house:project
- Pre-training Strategies and Datasets for Facial Representation Learning
- Unsupervised and Semi-Supervised Bias Benchmarking in Face Recognition
人脸聚类
- On Mitigating Hard Clusters for Face Clustering<br>:open_mouth:oral:star:code
说话人脸合成
- StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN<br>:star:code
谈话头像合成
- Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis<br>:star:code:house:project
人脸姿势估计
- Towards Unbiased Label Distribution Learning for Facial Pose Estimation Using Anisotropic Spherical Gaussian
人脸交换
- StyleSwap: Style-Based Generator Empowers Robust Face Swapping<br>:star:code:house:project
- Designing One Unified Framework for High-Fidelity Face Reenactment and Swapping<br>:star:code
假脸检测
- UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection<br>:open_mouth:oral
- An Information Theoretic Approach for Attention-Driven Face Forgery Detection
- Exploring Disentangled Content Information for Face Forgery Detection
- Adaptive Face Forgery Detection in Cross Domain
人脸捕捉
- Practical and Scalable Desktop-Based High-Quality Facial Capture
人脸表情识别
- How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset?<br>:star:code
- Teaching with Soft Label Smoothing for Mitigating Noisy Labels in Facial Expressions<br>:star:code
- Order Learning Using Partially Ordered Data via Chainization<br>:star:code
- Emotion-Aware Multi-View Contrastive Learning for Facial Emotion Recognition<br>:star:code
- Learn-to-Decompose: Cascaded Decomposition Network for Cross-Domain Few-Shot Facial Expression Recognition<br>:star:code
- Learn from All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition<br>:star:code
三维人脸重建
- REALY: Rethinking the Evaluation of 3D Face Reconstruction<br>:star:code:house:project
- AU-Aware 3D Face Reconstruction through Personalized AU-Specific Blendshape Learning
- 3D Face Reconstruction with Dense Landmarks
- Towards Metrical Reconstruction of Human Faces<br>:house:project
人脸重现
- Face2Faceρ: Real-Time High-Resolution One-Shot Face Reenactment
人脸身份操作
- MFIM: Megapixel Facial Identity Manipulation
人脸纹理合成与重建
- Unsupervised High-Fidelity Facial Texture Generation and Reconstruction<br>:star:code
人脸恢复
- VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder<br>:star:code
表情识别
- Emotion Recognition for Multiple Context Awareness<br>:house:project
- S2-VER: Semi-Supervised Visual Emotion Recognition<br>:star:code

<a name="19"/>

19.Image Synthesis/Generation(图像合成)

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis<br>:star:code:house:project
GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing
Generalized Brain Image Synthesis with Transferable Convolutional Sparse Coding Networks
Auto-regressive Image Synthesis with Integrated Quantization<br>:open_mouth:oral
Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing<br>:star:code
Improved Masked Image Generation with Token-Critic
Weakly-Supervised Stitching Network for Real-World Panoramic Image Generation
SCAM! Transferring humans between images with Semantic Cross Attention Modulation<br>:house:project
PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation<br>:star:code
Adaptive Feature Interpolation for Low-Shot Image Generation
Few-Shot Image Generation with Mixup-Based Distance Learning<br>:star:code
Multimodal Conditional Image Synthesis with Product-of-Experts GANs<br>:house:project
Any-Resolution Training for High-Resolution Image Synthesis<br>:house:project
3D-Aware Indoor Scene Synthesis with Depth Priors<br>:house:project
图像生成
样本引导下的图像生成
- DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation<br>:star:code
文本-图像合成
从文本描述中生成不同的人类动作
- TEMOS: Generating Diverse Human Motions from Textual Descriptions<br>:open_mouth:oral:star:code:house:project

<a name="18"/>

18.Image-to-Image Translation(图像到图像翻译)

<a name="17"/>

17.GAN

VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
Quantized GAN for Complex Music Generation from Dance Videos<br>:star:code
RepMix: Representation Mixing for Robust Attribution of Synthesized Images<br>:star:code
FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs<br>:star:code
Generative Multiplane Images: Making a 2D GAN 3D-Aware<br>:star:code:house:project
Generator Knows What Discriminator Should Learn in Unconditional GANs<br>:star:code
Hierarchical Semantic Regularization of Latent Spaces in StyleGANs<br>:star:code:house:project
Mind the Gap in Distilling StyleGANs<br>:star:code
FurryGAN: High Quality Foreground-aware Image Synthesis<br>:house:project
Improving GANs for Long-Tailed Data through Group Spectral Regularization<br>:star:code:house:project
3D-FM GAN: Towards 3D-Controllable Face Manipulation<br>:house:project
Exploring Gradient-based Multi-directional Controls in GANs<br>:star:code
Studying Bias in GANs through the Lens of Race
FairStyle: Debiasing StyleGAN2 with Style Channel Manipulations<br>:house:project
FingerprintNet: Synthesized Fingerprints for Generated Image Detection
Detecting Generated Images by Real Images<br>:star:code
High-Fidelity GAN Inversion with Padding Space<br>:house:project
A Style-Based GAN Encoder for High Fidelity Reconstruction of Images and Videos<br>:star:code
BlobGAN: Spatially Disentangled Scene Representations<br>:house:project
GAN with Multivariate Disentangling for Controllable Hair Editing<br>:star:code
StyleGAN-Human: A Data-Centric Odyssey of Human Generation<br>:star:code
EAGAN: Efficient Two-Stage Evolutionary Architecture Search for GANs<br>:star:code
JoJoGAN: One Shot Face Stylization
HairNet: Hairstyle Transfer with Pose Changes
EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer<br>:star:code
Editing Out-of-Domain GAN Inversion via Differential Activations<br>:star:code
On the Robustness of Quality Measures for GANs<br>:star:code
Diverse Generation from a Single Video Made Possible<br>:house:project
Rayleigh EigenDirections (REDs): Nonlinear GAN Latent Space Traversals for Multidimensional Features
Generating Natural Images with Direct Patch Distributions Matching<br>:star:code
TREND: Truncated Generalized Normal Density Estimation of Inception Embeddings for GAN Evaluation
Neural Scene Decoration from a Single Photograph<br>:star:code
ChunkyGAN: Real Image Inversion via Segments
GAN Cocktail: Mixing GANs without Dataset Access<br>:house:project
DuelGAN: A Duel between Two Discriminators Stabilizes the GAN Training<br>:star:code
线稿上色
- Eliminating Gradient Conflict in Reference-based Line-Art Colorization<br>:star:code
图像生成
- WaveGAN: Frequency-aware GAN for High-Fidelity Few-shot Image Generation<br>:star:code
GAN逆映射
- IntereStyle: Encoding an Interest Region for Robust StyleGAN Inversion
妆发迁移
- RamGAN: Region Attentive Morphing GAN for Region-Level Makeup Transfer
文本消除
- Don't Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Context<br>:star:code

<a name="16"/>

16.Transformer

<a name="15"/>

15.Vision-Language(视觉语言)

<a name="14"/>

14.Visual Answer Questions(视觉问答)

<a name="13"/>

13.Human-Object Interaction(人物交互)

Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection<br>:star:code
Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos<br>:star:code
IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition
Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection<br>:star:code
Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows
SAGA: Stochastic Whole-Body Grasping with Contact<br>:house:project
Chairs Can Be Stood On: Overcoming Object Bias in Human-Object Interaction Detection
Discovering Human-Object Interaction Concepts via Self-Compositional Learning<br>:star:code
交互式物体分割
- Self-Supervised Interactive Object Segmentation Through a Singulation-and-Grasping Approach<br>:house:project
HOS
- Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications<br>:star:code
手物交互
- TOCH: Spatio-Temporal Object-to-Hand Correspondence for Motion Refinement
- 抓握合成(手物交互)
  - Grasp'D: Differentiable Contact-Rich Grasp Synthesis for Multi-Fingered Hands<br>:house:project
人椅互动
- COUCH: Towards Controllable Human-Chair Interactions

<a name="12"/>

12.Action Detection(人体动作检测与识别)

<a name="11"/>

11.Video

Dynamic Temporal Filtering in Video Models<br>:star:code
Delta Distillation for Efficient Video Processing
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks<br>:star:code
视频合成
- Layered Controllable Video Generation<br>:house:project
- Sound-Guided Semantic Video Generation<br>:house:project
- Controllable Video Generation through Global and Local Motion Dynamics<br>:house:project
- Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer<br>:house:project
视频-视频合成
- Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis<br>:star:code:house:project
视频帧插值
- A Perceptual Quality Metric for Video Frame Interpolation<br>:star:code
视频生成
- RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos<br>:open_mouth:oral:star:code
视频质量评估
- FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling<br>:star:code
- Telepresence Video Quality Assessment
视频修复
- Error Compensation Framework for Flow-Guided Video Inpainting
- Flow-Guided Transformer for Video Inpainting<br>:star:code
- Video Restoration Framework and Its Meta-Adaptations to Data-Poor Conditions<br>:star:code
视频去模糊
- Spatio-Temporal Deformable Attention Network for Video Deblurring<br>:star:code:house:project
- Efficient Video Deblurring Guided by Motion Magnitude<br>:star:code
视频对话
- Video Dialog as Conversation about Objects Living in Space-Time<br>:star:code
有源扬声器检测(视频会议)
- Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection<br>:star:code
VOS
- XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model<br>:star:code:house:project:tv:video
- Tackling Background Distraction in Video Object Segmentation<br>:star:code
- BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation
- Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation<br>:star:code
- Learning Quality-aware Dynamic Memory for Video Object Segmentation<br>:star:code
- Global Spectral Filter Memory Network for Video Object Segmentation<br>:star:code
VIS
- In Defense of Online Models for Video Instance Segmentation<br>:open_mouth:oral:star:code
- Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation<br>:star:code
- Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer
- Less than Few: Self-Shot Video Instance Segmentation<br>:star:code
- Video Mask Transfiner for High-Quality Video Instance Segmentation
- SeqFormer: Sequential Transformer for Video Instance Segmentation<br>:star:code
VSS
- Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation<br>:star:code
- Domain Adaptive Video Segmentation via Temporal Pseudo Supervision<br>:star:code
- Is It Necessary to Transfer Temporal Knowledge for Domain Adaptive Video Semantic Segmentation?<br>:star:code
VPS
- Waymo Open Dataset: Panoramic Video Panoptic Segmentation<br>:house:project
- PolyphonicFormer: Unified Query Learning for Depth-Aware Video Panoptic Segmentation<br>:star:code
视频抠图
- One-Trimap Video Matting<br>:star:code:tv:video
视频表征
- E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context<br>:star:code
- Static and Dynamic Concepts for Self-supervised Video Representation Learning
视频传输
- Efficient Meta-Tuning for Content-aware Neural Video Delivery<br>:star:code
运动分割
- ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild<br>:star:code:house:project
视频异常检测
- Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles<br>:star:code
- Towards Open Set Video Anomaly Detection
- Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection<br>:star:code
- Self-Supervised Sparse Representation for Video Anomaly Detection<br>:star:code
视频识别
- Temporal Saliency Query Network for Efficient Video Recognition<br>:house:project
- NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition<br>:house:project
- Expanding Language-Image Pretrained Models for General Video Recognition<br>:open_mouth:oral:star:code
- AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition
- DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition<br>:star:code
- K-Centered Patch Sampling for Efficient Video Recognition
视频理解
- Spotting Temporally Precise, Fine-Grained Events in Video<br>:star:code:house:project
- Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding
- Panoramic Vision Transformer for Saliency Detection in 360° Videos<br>:star:code
- Streaming Multiscale Deep Equilibrium Models<br>:house:project
- Learning Shadow Correspondence for Video Shadow Detection
- Federated Self-Supervised Learning for Video Understanding<br>:star:code
- Prompting Visual-Language Models for Efficient Video Understanding
- GraphVid: It Only Takes a Few Nodes to Understand a Video
视频分类
- Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments<br>:star:code
- MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning<br>:star:code
视频卷帘快门(Rolling shutter)
- Combining Internal and External Constraints for Unrolling Shutter in Videos
Video Transition Effects(视频转场特效)
- AutoTransition: Learning to Recommend Video Transition Effects<br>:star:code
图像-视频编解码
- AlphaVC: High-Performance and Efficient Learned Video Compression
- A Cloud 3D Dataset and Application-Specific Learned Image Compression in Cloud 3D<br>:star:code
- CANF-VC: Conditional Augmented Normalizing Flows for Video Compression<br>:star:code
- Expanded Adaptive Scaling Normalization for End to End Image Compression
- Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction<br>:star:code
- Content Adaptive Latents and Decoder for Neural Image Compression
- Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression
- RAWtoBit: A Fully End-to-End Camera ISP Network
- Content-Oriented Learned Image Compression
- Implicit Neural Representations for Image Compression
- Neural Video Compression Using GANs for Detail Synthesis and Propagation
视频摘要
- TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency<br>:star:code:house:project
Video Grounding
- Graph2Vid: Flow graph to Video Grounding forWeakly-supervised Multi-Step Localization<br>:open_mouth:oral
帧插值
- FILM: Frame Interpolation for Large Motion<br>:house:project
- Real-Time Intermediate Flow Estimation for Video Frame Interpolation<br>:star:code
- Deep Bayesian Video Frame Interpolation<br>:star:code
- Improving the Perceptual Quality of 2D Animation Interpolation
视频分析
- Event Neural Networks
- Sports Video Analysis on Large-Scale Data<br>:star:code
视频编辑
- Temporally Consistent Semantic Video Editing
视频增强
- Learning Cross-Video Neural Representations for High-Quality Frame Interpolation<br>:house:project
视频目标重识别
- CAViT: Contextual Alignment Vision Transformer for Video Object Re-identification<br>:star:code
图像视频编辑
- Text2LIVE: Text-Driven Layered Image and Video Editing
视频升格
- Learning Spatio-Temporal Downsampling for Effective Video Upscaling
视频色彩传播
- Learned Variational Video Color Propagation<br>:star:code
视听事件定位
- Dual Perspective Network for Audio-Visual Event Localization
视频活动定位
- Video Activity Localisation with Uncertainties in Temporal Boundary
视听视频解析
- Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing<br>:star:code
Video Highlight Detection
- PAC-Net: Highlight Your Video via History Preference Modeling
视频片段分类
- Long Movie Clip Classification with State-Space Video Models<br>:star:code
Video Relation Grounding
- Asymmetric Relation Consistency Reasoning for Video Relation Grounding
视频片段检索
- Selective Query-Guided Debiasing for Video Corpus Moment Retrieval

<a name="10"/>

10.Pose Estimation(物体姿势估计)

<a name="9"/>

9.Human Pose Estimation(人体姿态估计)

Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation
Pose for Everything: Towards Category-Agnostic Pose Estimation<br>:open_mouth:oral:star:code
BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking
PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation
Learning Visibility for Robust Dense Human Body Estimation<br>:star:code
D&D: Learning Human Dynamics from Dynamic Camera<br>:open_mouth:oral:star:code
PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation<br>:star:code
DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation<br>:star:code
SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos<br>:star:code
Poseur: Direct Human Pose Regression with Transformers<br>:star:code
SimCC: A Simple Coordinate Classification Perspective for Human Pose Estimation<br>:star:code
Regularizing Vector Embedding in Bottom-Up Human Pose Estimation<br>:star:code
Hallucinating Pose-Compatible Scenes
A Unified Framework for Domain Adaptive Pose Estimation<br>:star:code
运动捕捉
- TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts<br>:star:code:house:project
- HULC: 3D HUman Motion Capture with Pose Manifold SampLing and Dense Contact Guidance<br>:house:project
基于点的衣着人体建模
- Learning Implicit Templates for Point-Based Clothed Human Modeling<br>:star:code:house:project
动态人体数字化
- NDF: Neural Deformable Fields for Dynamic Human Modelling<br>:star:code
人体姿势与形状估计
- CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation<br>:open_mouth:oral:star:code
- Super-Resolution 3D Human Shape from a Single Low-Resolution Image<br>:star:code:house:project
三维人体姿势估计
- DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose Estimation<br>:star:code
- Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection
- Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation<br>:star:code
- PoseScript: 3D Human Poses from Natural Language<br>:house:project
- Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement
- 3D Human Pose Estimation Using Möbius Graph Convolutional Networks
- P-STMO: Pre-trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation<br>:star:code
- C3P: Cross-Domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation<br>:star:code
- Structural Triangulation: A Closed-Form Solution to Constrained 3D Human Pose Estimation<br>:star:code
- VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual Data<br>:star:code
- Learning to Fit Morphable Models
- EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices<br>:house:project
- AutoAvatar: Autoregressive Neural Fields for Dynamic Avatar Modeling<br>:house:project
- FLEX: Extrinsic Parameters-Free Multi-View 3D Human Motion Reconstruction<br>:house:project
Mul-Pose
- Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation<br>:star:code
三维人体重建
- 3D Clothed Human Reconstruction in the Wild<br>:star:code
- DiffuStereo: High Quality Human Reconstruction via Diffusion-Based Stereo Using Sparse Cameras
- UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation<br>:star:code
- The One Where They Reconstructed 3D Humans and Environments in TV Shows<br>:star:code:house:project
- Neural Capture of Animatable 3D Human from Monocular Video
- SUPR: A Sparse Unified Part-Based Human Representation<br>:star:code:house:project
- IntegratedPIFu: Integrated Pixel Aligned Implicit Function for Single-view Human Reconstruction<br>:star:code
- Learned Vertex Descent:A New Direction for 3D Human Model Fitting<br>:star:code:house:project
三维交互式手部姿势估计
- 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal<br>:star:code
- S2Contact: Graph-based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning<br>:star:code:house:project
姿势合成
- TIPS: Text-Induced Pose Synthesis<br>:star:code:house:project
手物重建
- AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction<br>:star:code:house:project
人体与场景的交互
- Compositional Human-Scene Interaction Synthesis with Semantic Control<br>:star:code
人体姿势建模
- Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields<br>:open_mouth:oral:house:project
姿势跟踪
- AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing<br>:star:code
三维人体网格恢复
- Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers<br>:star:code
三维人体运动预测与生成
- Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction<br>:star:code
- PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting<br>:house:project
姿势迁移
- Skeleton-free Pose Transfer for Stylized 3D Characters<br>:star:code:house:project
- Cross Attention Based Style Distribution for Controllable Person Image Synthesis<br>:star:code
人体姿势预测
- Pose Forecasting in Industrial Human-Robot Collaboration<br>:star:code
4D
- LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human Modeling<br>:star:code:house:project
人体网格恢复
- Self-supervised Human Mesh Recovery with Cross-Representation Alignment
手部网格估计
- Identity-Aware Hand Mesh Estimation and Personalization from RGB Images<br>:star:code
头部网格重建
- Realistic One-Shot Mesh-Based Head Avatars<br>:house:project
人体网格动画
- CLIP-Actor: Text-Driven Recommendation and Stylization for Animating Human Meshes
音频驱动的风格化手势生成
- Audio-Driven Stylized Gesture Generation with Flow-Based Model

<a name="8"/>

8.3D(三维视觉)

<a name="7"/>

7.Object Tracking(目标跟踪)

Towards Grand Unification of Object Tracking<br>:open_mouth:oral:star:code<br>:newspaper:ECCV 2022 Oral《Unicorn》首次统一了四项目标跟踪任务的网络结构与学习范式，在8个富有挑战性的数据集上SOTA
HVC-Net: Unifying Homography, Visibility, and Confidence Learning for Planar Object Tracking
Tracking by Associating Clips
ByteTrack: Multi-Object Tracking by Associating Every Detection Box<br>:star:code
Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework<br>:star:code
Backbone Is All Your Need: A Simplified Architecture for Visual Object Tracking<br>:star:code
Robust Visual Tracking by Segmentation<br>:star:code
FEAR: Fast, Efficient, Accurate and Robust Visual Tracker<br>:star:code
3D跟踪
多目标跟踪
视觉跟踪
- AiATrack: Attention in Attention for Transformer Visual Tracking<br>:star:code
- Towards Sequence-Level Training for Visual Tracking<br>:star:code
- Hierarchical Feature Embedding for Visual Tracking<br>:star:code
细胞跟踪
- Graph Neural Network for Cell Tracking in Microscopy Videos<br>:star:code

<a name="6"/>

6.Object Detection(目标检测)

Should All Proposals be Treated Equally in Object Detection?<br>:star:code
TIDEE: Tidying Up Novel Rooms Using Visuo-Semantic Commonsense Priors<br>:house:project
TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices Using Submodular Mutual Information<br>:star:code
HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors<br>:star:code
Adversarially-Aware Robust Object Detector<br>:open_mouth:oral:star:code
ObjectBox: From Centers to Boxes for Anchor-Free Object Detection<br>:open_mouth:oral:star:code
Point-to-Box Network for Accurate Object Detection via Single Point Supervision<br>:star:code
You Should Look at All Objects<br>:star:code
Class-agnostic Object Detection with Multi-modal Transformer<br>:star:code<br>使用多模态 ViTs 和人类可理解的文本查询来生成高质量的OP
Exploiting Unlabeled Data with Vision and Language Models for Object Detection<br>:star:code
PoserNet: Refining Relative Camera Poses Exploiting Object Detections<br>:star:code
Robust Object Detection With Inaccurate Bounding Boxes<br>:star:code
UC-OWOD: Unknown-Classified Open World Object Detection<br>:star:code
Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object<br>:star:code
Unifying Visual Perception by Dispersible Points Learning<br>:star:code
A Large-scale Multiple-objective Method for Black-box Attack against Object Detection<br>:star:code
Distilling Object Detectors With Global Knowledge<br>:star:code
PANDORA: A Panoramic Detection Dataset for Object with Orientation<br>:star:code
Exploring Plain Vision Transformer Backbones for Object Detection<br>:star:code
Long-Tail Detection with Effective Class-Margins<br>:star:code
Detecting Twenty-Thousand Classes Using Image-Level Supervision<br>:star:code
Exploring Resolution and Degradation Clues As Self-Supervised Signal for Low Quality Object Detection<br>:star:code
Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection
MTTrans: Cross-Domain Object Detection with Mean Teacher Transformer
PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images<br>:house:project
Cornerformer: Purifying Instances for Corner-Based Detectors
Efficient Decoder-Free Object Detection with Transformers<br>:star:code
W2N: Switching from Weak Supervision to Noisy Supervision for Object Detection<br>:star:code
Towards Data-Efficient Detection Transformers<br>:star:code
Open-Vocabulary DETR with Conditional Matching<br>:star:code
Prediction-Guided Distillation for Dense Object Detection<br>:star:code
Multimodal Object Detection via Probabilistic Ensembling<br>:star:code
Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
GLAMD: Global and Local Attention Mask Distillation for Object Detectors
Object Detection As Probabilistic Set Prediction
Out-of-Distribution Identification: Let Detector Tell Which I Am Not Sure
Simple Open-Vocabulary Object Detection with Vision Transformers<br>:star:code
A Simple Approach and Benchmark for 21,000-Category Object Detection<br>:star:code
EAutoDet: Efficient Architecture Search for Object Detection<br>:star:code
Few-Shot End-to-End Object Detection via Constantly Concentrated Encoding across Heads
3D目标检测
- DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection<br>:star:code
- Rethinking IoU-based Optimization for Single-stage 3D Object Detection<br>:star:code
- Densely Constrained Depth Estimator for Monocular 3D Object Detection<br>:star:code
- Learning Ego 3D Representation As Ray Tracing<br>:house:project
- LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection<br>:star:code
- SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention<br>:star:code
- AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection<br>:star:code
- DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection<br>:star:code
- Label-Guided Auxiliary Training Improves 3D Object Detector<br>:star:code
- Monocular 3D Object Detection with Depth from Motion<br>:open_mouth:oral:star:code
- MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones<br>:open_mouth:oral:star:code
- Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph<br>:open_mouth:oral:star:code
- Multimodal Transformer for Automatic 3D Annotation and Object Detection<br>:star:code
- Semi-Supervised 3D Object Detection with Proficient Teachers<br>:star:code
- ProposalContrast: Unsupervised Pre-training for LiDAR-Based 3D Object Detection<br>:star:code
- CenterFormer: Center-based Transformer for 3D Object Detection<br>:open_mouth:oral:star:code
- SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
- Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction<br>:star:code
- CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection
- Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
- Plausibility Verification For 3D Object Detectors Using Energy-Based Optimization
- Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection<br>:star:code
- PETR: Position Embedding Transformation for Multi-View 3D Object Detection<br>:star:code
- Lidar Point Cloud Guided Monocular 3D Object Detection<br>:star:code
- INT: Towards Infinite-Frames 3D Detection with an Efficient Framework
- Semi-Supervised Monocular 3D Object Detection by Multi-View Consistency
- Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training<br>:star:code
- MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection<br>:star:code
- PillarNet: Real-Time and High-Performance Pillar-Based 3D Object Detection<br>:star:code
- Improving the Intra-Class Long-Tail in 3D Detection via Rare Example Mining
- 3D Object Detection with a Self-Supervised Lidar Scene Flow Backbone<br>:star:code
- DetMatch: Two Teachers Are Better than One for Joint 2D and 3D Semi-Supervised Object Detection<br>:star:code
- FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection<br>:star:code
- Enhancing Multi-modal Features Using Local Self-Attention for 3D Object Detection
半监督目标检测
- Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection<br>:star:code
- Semi-Supervised Object Detection via Virtual Category Learning<br>:star:code
- Open-Set Semi-Supervised Object Detection<br>:star:code:house:project
- PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection<br>:star:code
- Diverse Learner: Exploring Diverse Supervision for Semi-Supervised Object Detection
小样本目标检测
- Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark<br>:star:code
- Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection<br>:star:code
- AcroFOD: An Adaptive Method for Cross-domain Few-shot Object Detection<br>:star:code
- Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot Object Detection<br>:star:code
- AirDet: Few-Shot Detection without Fine-Tuning for Autonomous Exploration<br>:star:code
- Few-Shot Object Detection by Knowledge Distillation Using Bag-of-Visual-Words Representations
- Few-Shot Object Detection with Model Calibration<br>:star:code
- Few-Shot Video Object Detection<br>:star:code
- Mutually Reinforcing Structure with Proposal Contrastive Consistency for Few-Shot Object Detection<br>:star:code
显著目标检测
- SESS: Saliency Enhancing with Scaling and Sliding<br>:star:code
- SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection<br>:star:code
- Salient Object Detection for Point Clouds<br>:star:code
- KD-SCFNet: Towards More Accurate and Efficient Salient Object Detection via Knowledge Distillation<br>:star:code
- Saliency Hierarchy Modeling via Generative Kernels for Salient Object Detection
- MVSalNet:Multi-View Augmentation for RGB-D Salient Object Detection
弱监督目标检测
- Active Learning Strategies for Weakly-supervised Object Detection<br>:star:code
- W2N:Switching From Weak Supervision to Noisy Supervision for Object Detection<br>:star:code
- Object Discovery via Contrastive Learning for Weakly Supervised Object Detection<br>:star:code
- End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution
目标定位
- Object Manipulation via Visual Target Localization<br>:house:project
- On Label Granularity and Object Localization
- 弱监督目标定位
单阶目标检测
- Unsupervised Domain Adaptation for One-stage Object Detector using Offsets to Bounding Box
目标计数
- Few-shot Object Counting and Detection<br>:star:code
- Class-Agnostic Object Counting Robust to Intraclass Diversity<br>:star:code
OOD
- Out-of-Distribution Detection with Semantic Mismatch under Masking<br>:star:code
- Out-of-Distribution Detection with Boundary Aware Learning
- DICE: Leveraging Sparsification for Out-of-Distribution Detection<br>:star:code
- Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-of-Distribution Generalization<br>:star:code
- Data Invariants to Understand Unsupervised Out-of-Distribution Detection
VOD
- PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer towards Video Object Detection<br>:star:code
- SALISA: Saliency-Based Input Sampling for Efficient Video Object Detection
- Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection
- Efficient One-Stage Video Object Detection by Exploiting Temporal Consistency<br>:star:code
小目标检测
- RFLA: Gaussian Receptive Field based Label Assignment for Tiny Object Detection<br>:star:code
图像检测
- Discovering Transferable Forensic Features for CNN-generated Images Detection<br>:open_mouth:oral:star:code:house:project
目标发现
- Object Discovery and Representation Networks
变化检测
- Objects Can Move: 3D Change Detection by Geometric Transformation Consistency<br>:star:code

<a name="5"/>

5.Image/Video Retrieval(图像/视频检索)

Text-Based Temporal Localization of Novel Events
跨域检索
- Feature Representation Learning for Unsupervised Cross-domain Image Retrieval<br>:star:code
图像检索
视频检索
- LocVTP: Video-Text Pre-training for Temporal Localization<br>:star:code
- Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval
- Multi-Query Video Retrieval<br>:star:code
- Learning Audio-Video Modalities from Image Captions<br>:house:project
- Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment
- ECLIPSE: Efficient Long-Range Video Retrieval Using Sight and Sound<br>:star:code
- Video Geo-localization(检索)
  - GAMa: Cross-view Video Geo-localization<br>:star:code
文本-视频检索
图像-文本检索
- CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
细粒度图像检索
- SEMICON: A Learning-to-hash Solution for Large-scale Fine-grained Image Retrieval<br>:star:code
视频时刻检索
- Selective Query-guided Debiasing Network for Video Corpus Moment Retrieval
视频-文本检索
- VTC: Improving Video-Text Retrieval with User Comments<br>:star:code:house:project
- MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval
最近邻搜索
- Connecting Compression Spaces with Transformer for Approximate Nearest Neighbor Search<br>:star:code

<a name="4"/>

4.Video/Image Captioning(视频/图像字幕)

<a name="3"/>

3.Image Progress(图像处理)

图像质量评估
- Shift-tolerant Perceptual Similarity Metric<br>:star:code
图像修补(image retouching)
- Neural Color Operators for Sequential Image Retouching<br>:star:code
图像变形(Image Warping)
- Learning Local Implicit Fourier Representation for Image Warping<br>:star:code
图像恢复
- D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration<br>:star:code
- Simple Baselines for Image Restoration<br>:star:code
- Improving Image Restoration by Revisiting Global Information Aggregation<br>:star:code
- Seeing through a Black Box: Toward High-Quality Terahertz Imaging via Subspace-and-Attention Guided Restoration
- JPEG Artifacts Removal via Contrastive Representation Learning<br>:star:code
- TAPE: Task-Agnostic Prior Embedding for Image Restoration
- Spectrum-Aware and Transferable Architecture Search for Hyperspectral Image Restoration
- DRCNet: Dynamic Image Restoration Contrastive Network
图像修复
- Learning Prior Feature and Attention Enhanced Image Inpainting<br>:star:code
- Inpainting at Modern Camera Resolution by Guided PatchMatch with Auto-Curation<br>:star:code
- High-Fidelity Image Inpainting with GAN Inversion
- Unbiased Multi-Modality Guidance for Image Inpainting
- Image Inpainting with Cascaded Modulation GAN and Object-Aware Training<br>:star:code
- Perceptual Artifacts Localization for Inpainting<br>:star:code
- Hourglass Attention Network for Image Inpainting<br>:star:code
- Diverse Image Inpainting with Normalizing Flow
图像增强
- SepLUT: Separable Image-adaptive Lookup Tables for Real-time Image Enhancement
- Uncertainty Inspired Underwater Image Enhancement<br>:star:code
- Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression<br>:star:code
- LEDNet: Joint Low-Light Enhancement and Deblurring in the Dark<br>:star:code:house:project
- NEST: Neural Event Stack for Event-Based Image Enhancement<br>:star:code
- Seeing Far in the Dark with Patterned Flash<br>:star:code
- Local Color Distributions Prior for Image Enhancement<br>:house:project
- SemAug: Semantically Meaningful Image Augmentations for Object Detection through Language Grounding
图像和谐化
- DCCF: Deep Comprehensible Color Filter Learning Framework for High-Resolution Image Harmonization<br>:open_mouth:oral:star:code
- Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization<br>:star:code
- Semantic-Guided Multi-Mask Image Harmonization<br>:star:code
图像去卷积
- Learning Discriminative Shrinkage Deep Networks for Image Deconvolution
去雾
- Boosting Supervised Dehazing Methods via Bi-Level Patch Reweighting
- Unpaired Deep Image Dehazing Using Contrastive Disentanglement Learning
- Perceiving and Modeling Density for Image Dehazing<br>:star:code
- Frequency and Spatial Dual Guidance for Image Dehazing<br>:star:code
去噪
- Deep Semantic Statistics Matching (D2SM) Denoising Network<br>:star:code:house:project
- Optimizing Image Compression via Joint Learning with Denoising<br>:star:code
- Fast and High Quality Image Denoising via Malleable Convolution<br>:house:project
- Unidirectional Video Denoising by Mimicking Backward Recurrent Modules with Look-Ahead Forward Ones<br>:star:code
- TempFormer: Temporally Consistent Transformer for Video Denoising
去雪
- SLiDE: Self-supervised LiDAR De-snowing through Reconstruction Difficulty
去雨
- Not Just Streaks: Towards Ground Truth for Single Image Deraining<br>:house:project
- Blind Image Decomposition<br>:star:code
- ART-SS: An Adaptive Rejection Technique for Semi-Supervised Restoration for Adverse Weather-Affected Images<br>:star:code
- Rethinking Video Rain Streak Removal: A New Synthesis Model and a Deraining Network with Video Rain Prior<br>:star:code
去模糊
- Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance<br>:star:code
- United Defocus Blur Detection and Deblurring via Adversarial Promoting Learning<br>:star:code
- Learning Degradation Representations for Image Deblurring<br>:star:code
- Learning Deep Non-Blind Image Deconvolution without Ground Truths
- DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with Flow-Guided Attentive Correlation and Recursive Boosting<br>:star:code
- Realistic Blur Synthesis for Learning Image Deblurring<br>:house:project
- Stripformer: Strip Transformer for Fast Image Deblurring<br>:star:code
- Event-Based Fusion for Motion Deblurring with Cross-Modal Attention<br>:house:project
- ERDN: Equivalent Receptive Field Deformable Network for Video Deblurring<br>:star:code
- Event-Guided Deblurring of Unknown Exposure Time Videos<br>:house:project
去摩尔纹
- Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing<br>:star:code:house:project
去反射
- Zero-Shot Learning for Reflection Removal of Single 360-Degree Image
去阴影
- Style-Guided Shadow Removal<br>:star:code
语义图像编辑
- Context-Consistent Semantic Image Editing with Style-Preserved Modulation<br>:star:code
图像着色
- PalGAN: Image Colorization with Palette Generative Adversarial Networks<br>:star:code
- Semantic-Sparse Colorization Network for Deep Exemplar-Based Colorization
- CT2: Colorization Transformer via Color Tokens
- BigColor: Colorization Using a Generative Color Prior for Natural Images
- Colorization for In Situ Marine Plankton Images
- ColorFormer: Image Colorization via Color Memory Assisted Hybrid-Attention Transformer<br>:star:code
- Bridging the Domain Gap towards Generalization in Automatic Colorization<br>:star:code
- L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer
图像裁剪
- Human-Centric Image Cropping with Partition-Aware and Content-Preserving Features<br>:star:code
图像融合
- Neural Image Representations for Multi-Image Fusion and Layer Separation<br>:house:project
Rolling shutter(果冻效应)
- Bringing Rolling Shutter Images Alive with Dual Reversed Distortion<br>:star:code

<a name="2"/>

2.Image Segmentation(图像分割)

<a name="1"/>

1.其它

扫码CV君微信（注明：CVPR）入微信交流群：