Home

Awesome

CVPR2024-Papers-with-Code-Demo

:star_and_crescent:添加微信: nvshenj125, 备注方向,进交流学习群

欢迎关注公众号:AI算法与图像处理

:star2: CVPR 2024 持续更新最新论文/paper和相应的开源代码/code!

B站demo:https://space.bilibili.com/288489574

:hand: ​注:欢迎各位大佬提交issue,分享CVPR 2022论文/paper和开源项目!共同完善这个项目

往年顶会论文汇总:

CVPR2021

CVPR2022

CVPR2023

ICCV2021

ECCV2022

:fireworks: 欢迎进群 | Welcome

CVPR 2024 论文/paper交流群已成立!已经收录的同学,可以添加微信:nvshenj125,请备注:CVPR+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群。

<a name="Contents"></a>

:hammer: 目录 |Table of Contents(点击直接跳转)

<details open> <summary> 目录(右侧点击可折叠)</summary> </details>

<a name="Backbone"></a>

Backbone

返回目录/back

<a name="Dataset"></a>

数据集/Dataset

HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

Traffic Scene Parsing through the TSP6K Dataset

返回目录/back

<a name="DiffusionModel"></a>

Diffusion Model

Balancing Act: Distribution-Guided Debiasing in Diffusion Models

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly

Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

Few-shot Learner Parameterization by Diffusion Time-steps

MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Face2Diffusion for Fast and Editable Face Personalization

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

MACE: Mass Concept Erasure in Diffusion Models

It's All About Your Sketch: Democratising Sketch Control in Diffusion Models

SemCity: Semantic Scene Generation with Triplane Diffusion

返回目录/back

<a name="T2I"></a>

Text-to-Image

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging

Discriminative Probing and Tuning for Text-to-Image Generation

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation

Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

返回目录/back

<a name="NAS"></a>

NAS

返回目录/back

<a name="NeRF"></a>

NeRF

GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes

返回目录/back

<a name="KnowledgeDistillation"></a>

Knowledge Distillation

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Logit Standardization in Knowledge Distillation

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections

返回目录/back

<a name="Multimodal"></a>

多模态 / Multimodal

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework

Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations

返回目录/back

<a name="ContrastiveLearning"></a>

Contrastive Learning

Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning

返回目录/back

<a name="CapsuleNetwork"></a>

胶囊网络 / Capsule Network

返回目录/back

<a name="ImageClassification"></a>

图像分类 / Image Classification

返回目录/back

<a name="ObjectDetection"></a>

目标检测/Object Detection

UniMODE: Unified Monocular 3D Object Detection

CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images

Memory-based Adapters for Online 3D Scene Perception

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

返回目录/back

<a name="ObjectTracking"></a>

目标跟踪/Object Tracking

DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

返回目录/back

3D Object Tracking

返回目录/back

<a name="TrajectoryPrediction"></a>

轨迹预测/Trajectory Prediction

返回目录/back

<a name="Segmentation"></a>

语义分割/Segmentation

PEM: Prototype-based Efficient MaskFormer for Image Segmentation

Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation

Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation

返回目录/back

<a name="WSSS"></a>

弱监督语义分割/Weakly Supervised Semantic Segmentation

返回目录/back

<a name="MedicalImageSegmentation"></a>

医学图像/Medical Image

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

返回目录/back

<a name="VideoObjectSegmentation"></a>

视频目标分割/Video Object Segmentation

Depth-aware Test-Time Training for Zero-shot Video Object Segmentation

返回目录/back

<a name="InteractiveVideoObjectSegmentation"></a>

交互式视频目标分割/Interactive Video Object Segmentation

返回目录/back

<a name="VisualTransformer"></a>

Visual Transformer

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

返回目录/back

<a name="DepthEstimation"></a>

深度估计/Depth Estimation

Representations for Recognition and Retrieval

返回目录/back

<a name="Retrieval"></a>

图像、视频检索 / Image Retrieval/Video retrieval

Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?

返回目录/back

<a name="SuperResolution"></a>

超分辨率/Super Resolution

SeD: Semantic-Aware Discriminator for Image Super-Resolution

Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts

CAMixerSR: Only Details Need More "Attention"

Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

返回目录/back

<a name="ImageRestoration"></a>

图像复原/Image Restoration

Boosting Image Restoration via Priors from Pre-trained Models

返回目录/back

<a name="ImageDenoising"></a>

图像去噪/Image Denoising

返回目录/back

<a name="ImageEditing"></a>

图像编辑/Image Editing

Doubly Abductive Counterfactual Inference for Text-based Image Editing

返回目录/back

<a name="ImageCompression"></a>

图像压缩/Image Compression

返回目录/back

<a name="ImageDeblur"></a>

图像去模糊/Image Deblur

A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning

返回目录/back

<a name="AutonomousDriving"></a>

自动驾驶 / Autonomous Driving

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

返回目录/back

<a name="FaceRecognition"></a>

人脸识别/Face Recognition

返回目录/back

<a name="FaceDetection"></a>

人脸检测/Face Detection

返回目录/back

<a name="FaceAnti-Spoofing"></a>

人脸活体检测/Face Anti-Spoofing

Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing

返回目录/back

<a name="FaceReconstruction"></a>

人脸重建/Face Reconstruction

返回目录/back

<a name="VideoActionDetection"></a>

视频动作检测/Video Action Detection

返回目录/back

<a name="SignLanguageTranslation"></a>

手语翻译/Sign Language Translation

返回目录/back

<a name="PersonRe-identification"></a>

行人重识别/Person Re-identification

返回目录/back

<a name="TalkingFace"></a>

Talking Face

返回目录/back

<a name="HumanPoseEstimation"></a>

姿态估计/Pose Estimation

FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

返回目录/back

<a name="GAN"></a>

GAN

返回目录/back

<a name="AgeEstimation"></a>

人脸年龄估计/Age Estimation

返回目录/back

<a name="FacialExpressionRecognition"></a>

人脸表情识别/Facial Expression Recognition

返回目录/back

<a name="HandPoseEstimation"></a>

手势姿态估计(重建)/Hand Pose Estimation( Hand Mesh Recovery)

返回目录/back

<a name="3DReconstruction"></a>

3D Reconstruction

UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Data Sets

DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction

Memory-based Adapters for Online 3D Scene Perception

Bayesian Diffusion Models for 3D Shape Reconstruction

返回目录/back

<a name="FrameInterpolation"></a>

视频插帧/Frame Interpolation

返回目录/back

<a name="3DPointCloud"></a>

3D点云/3D point cloud

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension

Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds

返回目录/back

<a name="AnomalyDetection"></a>

Anomaly Detection

Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts

RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection

返回目录/back

<a name="Other"></a>

其他/Other

DisCo: Disentangled Control for Realistic Human Dance Generation

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing

Misalignment-Robust Frequency Distribution Loss for Image Transformation

3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Boosting Neural Representations for Videos with a Conditional Decoder

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

SeMoLi: What Moves Together Belongs Together

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

MemoNav: Working Memory Model for Visual Navigation

VideoMAC: Video Masked Autoencoders Meet ConvNets

Theoretically Achieving Continuous Representation of Oriented Bounding Boxes

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts

Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation

SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition

NARUTO: Neural Active Reconstruction from Uncertain Target Observations

Towards Generalizable Tumor Synthesis

Rethinking Multi-domain Generalization with A General Learning Objective

Rethinking Inductive Biases for Surface Normal Estimation

SURE: SUrvey REcipes for building reliable and robust deep networks

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Deformable One-shot Face Stylization via DINO Semantic Guidance

CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes

Learning Group Activity Features Through Person Attribute Prediction

Interactive Continual Learning: Fast and Slow Thinking

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes

Learning Group Activity Features Through Person Attribute Prediction

Interactive Continual Learning: Fast and Slow Thinking

Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation

DART: Implicit Doppler Tomography for Radar Novel View Synthesis

MeaCap: Memory-Augmented Zero-shot Image Captioning

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

Continual Segmentation with Disentangled Objectness Learning and Class Recognition

HDRFlow: Real-Time HDR Video Reconstruction with Large Motions

LEAD: Learning Decomposition for Source-free Universal Domain Adaptation

F$^3$Loc: Fusion and Filtering for Floorplan Localization

Enhancing Vision-Language Pre-training with Rich Supervisions

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning

Learning to Remove Wrinkled Transparent Film with Polarized Prior

LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking

Active Generalized Category Discovery

MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

Seamless Human Motion Composition with Blended Positional Encodings

DiffusionLight: Light Probes for Free by Painting a Chrome Ball

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Real-Time Simulated Avatar from Head-Mounted Sensors

DiaLoc: An Iterative Approach to Embodied Dialog Localization

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

EarthLoc: Astronaut Photography Localization by Indexing Earth from Space

CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective

Distributionally Generative Augmentation for Fair Facial Attribute Classification

Exploiting Style Latent Flows for Generalizing Deepfake Detection Video Detection

MoST: Motion Style Transformer between Diverse Action Contents

Coherent Temporal Synthesis for Incremental Action Segmentation

Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis

Robust Synthetic-to-Real Transfer for Stereo Matching

CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers

Masked AutoDecoder is Effective Multi-Task Vision Generalist

PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution

Unleashing Network Potentials for Semantic Scene Completion

Open-World Semantic Segmentation Including Class Similarity

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

FSC: Few-point Shape Completion

Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

A Bayesian Approach to OOD Robustness in Image Classification

返回目录/back