Awesome

CVPR2024-Papers-with-Code-Demo

:star_and_crescent:添加微信: nvshenj125, 备注方向，进交流学习群

欢迎关注公众号：AI算法与图像处理

:star2: CVPR 2024 持续更新最新论文/paper和相应的开源代码/code！

B站demo：https://space.bilibili.com/288489574

:hand: 注：欢迎各位大佬提交issue，分享CVPR 2022论文/paper和开源项目！共同完善这个项目

往年顶会论文汇总：

CVPR2021

CVPR2022

CVPR2023

ICCV2021

ECCV2022

:fireworks: 欢迎进群 | Welcome

CVPR 2024 论文/paper交流群已成立！已经收录的同学，可以添加微信：nvshenj125，请备注：CVPR+姓名+学校/公司名称！一定要根据格式申请，可以拉你进群。

:hammer: 目录 |Table of Contents（点击直接跳转）

<details open> <summary> 目录（右侧点击可折叠）</summary>

Backbone
数据集/Dataset
Diffusion Model
Text-to-Image
NAS
NeRF
Knowledge Distillation
多模态 / Multimodal
对比学习/Contrastive Learning
图神经网络 / Graph Neural Networks
胶囊网络 / Capsule Network
图像分类 / Image Classification
目标检测/Object Detection
目标跟踪/Object Tracking
轨迹预测/Trajectory Prediction
语义分割/Segmentation
弱监督语义分割/Weakly Supervised Semantic Segmentation
医学图像分割
视频目标分割/Video Object Segmentation
交互式视频目标分割/Interactive Video Object Segmentation
Visual Transformer
深度估计/Depth Estimation
人脸识别/Face Recognition
人脸检测/Face Detection
人脸活体检测/Face Anti-Spoofing
人脸年龄估计/Age Estimation
人脸表情识别/Facial Expression Recognition
人脸属性识别/Facial Attribute Recognition
人脸编辑/Facial Editing
人脸重建/Face Reconstruction
Talking Face
换脸/Face Swap
姿态估计/Pose Estimation
手势姿态估计（重建）/Hand Pose Estimation( Hand Mesh Recovery)
视频动作检测/Video Action Detection
手语翻译/Sign Language Translation
3D人体重建
行人重识别/Person Re-identification
行人搜索/Person Search
人群计数 / Crowd Counting
GAN
彩妆迁移 / Color-Pattern Makeup Transfer
字体生成 / Font Generation
场景文本检测、识别/Scene Text Detection/Recognition
图像、视频检索 / Image Retrieval/Video retrieval
Image Animation
抠图/Image Matting
超分辨率/Super Resolution
图像复原/Image Restoration
图像补全/Image Inpainting
图像去噪/Image Denoising
图像编辑/Image Editing
图像拼接/Image stitching
图像匹配/Image Matching
图像融合/Image Blending
图像去雾/Image Dehazing
图像去模糊/Image Deblur
图像压缩/Image Compression
反光去除/Reflection Removal
车道线检测/Lane Detection
自动驾驶 / Autonomous Driving
流体重建/Fluid Reconstruction
场景重建 / Scene Reconstruction
3D Reconstruction
视频插帧/Frame Interpolation
视频超分 / Video Super-Resolution
3D点云/3D point cloud
标签噪声 / Label-Noise
对抗样本/Adversarial Examples
Anomaly Detection
其他/Other

</details>

Backbone

返回目录/back

数据集/Dataset

HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

论文/Paper: http://arxiv.org/pdf/2403.02640
代码/Code: None

Traffic Scene Parsing through the TSP6K Dataset

论文/Paper: https://arxiv.org/pdf/2303.02835.pdf
代码/Code: https://github.com/PengtaoJiang/TSP6K

返回目录/back

Diffusion Model

Balancing Act: Distribution-Guided Debiasing in Diffusion Models

论文/Paper: http://arxiv.org/pdf/2402.18206
代码/Code: None

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

论文/Paper: http://arxiv.org/pdf/2402.19481
代码/Code: https://github.com/mit-han-lab/distrifuser

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly

论文/Paper: http://arxiv.org/pdf/2402.19302
代码/Code: https://github.com/iit-pavis/diffassemble

Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

论文/Paper: http://arxiv.org/pdf/2403.00644
代码/Code: None

Few-shot Learner Parameterization by Diffusion Time-steps

论文/Paper: http://arxiv.org/pdf/2403.02649
代码/Code: https://github.com/yue-zhongqi/tif

MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

论文/Paper: http://arxiv.org/pdf/2403.04290
代码/Code: None

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

论文/Paper: https://arxiv.org/abs/2403.06951
代码/Code: https://github.com/Tianhao-Qi/DEADiff_code

Face2Diffusion for Fast and Editable Face Personalization

论文/Paper: http://arxiv.org/pdf/2403.05094
代码/Code: https://github.com/mapooon/Face2Diffusion

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

论文/Paper: http://arxiv.org/pdf/2403.06951
代码/Code: None

MACE: Mass Concept Erasure in Diffusion Models

论文/Paper: http://arxiv.org/pdf/2403.06135
代码/Code: https://github.com/Shilin-LU/MACE

It's All About Your Sketch: Democratising Sketch Control in Diffusion Models

论文/Paper: http://arxiv.org/pdf/2403.07234
代码/Code: https://github.com/subhadeepkoley/demosketch2rgb

SemCity: Semantic Scene Generation with Triplane Diffusion

论文/Paper: http://arxiv.org/pdf/2403.07773
代码/Code: https://github.com/zoomin-lee/semcity

返回目录/back

Text-to-Image

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

论文/Paper: http://arxiv.org/pdf/2403.00483
代码/Code: None

NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging

论文/Paper: http://arxiv.org/pdf/2403.03485
代码/Code: https://github.com/univ-esuty/noisecollage

Discriminative Probing and Tuning for Text-to-Image Generation

论文/Paper: http://arxiv.org/pdf/2403.04321
代码/Code: None

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

论文/Paper: http://arxiv.org/pdf/2403.05239
代码/Code: None

Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation

论文/Paper: http://arxiv.org/pdf/2403.06452
代码/Code: https://github.com/mulns/Text2QR

Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

论文/Paper: http://arxiv.org/pdf/2403.07214
代码/Code: None

返回目录/back

NAS

返回目录/back

NeRF

GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

论文/Paper: http://arxiv.org/pdf/2403.03608
代码/Code: None

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

论文/Paper: http://arxiv.org/pdf/2403.06912
代码/Code: https://github.com/fictionarry/dngaussian

S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes

论文/Paper: http://arxiv.org/pdf/2403.06205
代码/Code: None

返回目录/back

Knowledge Distillation

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

论文/Paper: http://arxiv.org/pdf/2403.02781
代码/Code: https://github.com/zhengli97/PromptKD

Logit Standardization in Knowledge Distillation

论文/Paper: https://arxiv.org/abs/2403.01427
代码/Code: https://github.com/sunshangquan/logit-standardization-KD

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

论文/Paper: http://arxiv.org/pdf/2403.05061
代码/Code: None

$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections

论文/Paper: http://arxiv.org/pdf/2403.06213
代码/Code: https://github.com/roymiles/vkd

返回目录/back

多模态 / Multimodal

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

论文/Paper: https://arxiv.org/abs/2312.07472
代码/Code: https://github.com/IranQin/MP5
主页/Website：https://iranqin.github.io/MP5.github.io/

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

论文/Paper: http://arxiv.org/pdf/2402.18091
代码/Code: None

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

论文/Paper: http://arxiv.org/pdf/2403.02991
代码/Code: None

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

论文/Paper: http://arxiv.org/pdf/2403.05105
代码/Code: https://github.com/hhc1997/L2RM

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

论文/Paper: http://arxiv.org/pdf/2403.07839
代码/Code: None

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework

论文/Paper: http://arxiv.org/pdf/2403.07636
代码/Code: https://github.com/hieuphan33/mavl

Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations

论文/Paper: http://arxiv.org/pdf/2403.07241
代码/Code: None

返回目录/back

Contrastive Learning

Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning

论文/Paper: http://arxiv.org/pdf/2403.06122
代码/Code: https://github.com/root0yang/blindnet

返回目录/back

胶囊网络 / Capsule Network

返回目录/back

图像分类 / Image Classification

返回目录/back

目标检测/Object Detection

UniMODE: Unified Monocular 3D Object Detection

论文/Paper: http://arxiv.org/pdf/2402.18573
代码/Code: None

CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images

论文/Paper: http://arxiv.org/pdf/2403.04198
代码/Code: https://github.com/SerCharles/CN-RMA

Memory-based Adapters for Online 3D Scene Perception

论文/Paper: https://arxiv.org/abs/2403.06974
代码/Code:https://github.com/xuxw98/Online3D

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

论文/Paper: https://arxiv.org/abs/2403.16131
代码/Code:https://github.com/xiuqhou/Salience-DETR

Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

论文/Paper: http://arxiv.org/pdf/2403.06093
代码/Code: https://github.com/nullmax-vision/QAF2D

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

论文/Paper: http://arxiv.org/pdf/2403.05817
代码/Code: https://github.com/zhanggang001/hednet

返回目录/back

目标跟踪/Object Tracking

DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking

论文/Paper: http://arxiv.org/pdf/2403.02767
代码/Code: None

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

论文/Paper: http://arxiv.org/pdf/2403.04700
代码/Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT

返回目录/back

3D Object Tracking

返回目录/back

轨迹预测/Trajectory Prediction

返回目录/back

语义分割/Segmentation

PEM: Prototype-based Efficient MaskFormer for Image Segmentation

论文/Paper: http://arxiv.org/pdf/2402.19422
代码/Code: https://github.com/niccolocavagnero/pem

Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation

论文/Paper: http://arxiv.org/pdf/2403.06462
代码/Code: https://github.com/Gavinwxy/DDFP

Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation

论文/Paper: http://arxiv.org/pdf/2403.06247
代码/Code: None

返回目录/back

弱监督语义分割/Weakly Supervised Semantic Segmentation

返回目录/back

医学图像/Medical Image

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

论文/Paper: http://arxiv.org/pdf/2402.18933
代码/Code: None

返回目录/back

视频目标分割/Video Object Segmentation

Depth-aware Test-Time Training for Zero-shot Video Object Segmentation

论文/Paper: http://arxiv.org/pdf/2403.04258
代码/Code: None

返回目录/back

交互式视频目标分割/Interactive Video Object Segmentation

返回目录/back

Visual Transformer

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

论文/Paper: http://arxiv.org/pdf/2403.05419
代码/Code: https://github.com/techmn/satmae_pp

返回目录/back

深度估计/Depth Estimation

Representations for Recognition and Retrieval

论文/Paper: https://arxiv.org/pdf/2403.07535.pdf
代码/Code: https://github.com/Junda24/AFNet

返回目录/back

图像、视频检索 / Image Retrieval/Video retrieval

Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval

论文/Paper: http://arxiv.org/pdf/2403.00272
代码/Code: None

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

论文/Paper: http://arxiv.org/pdf/2403.05105
代码/Code: https://github.com/hhc1997/L2RM

How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?

论文/Paper: http://arxiv.org/pdf/2403.07203
代码/Code: None

返回目录/back

超分辨率/Super Resolution

SeD: Semantic-Aware Discriminator for Image Super-Resolution

论文/Paper: http://arxiv.org/pdf/2402.19387
代码/Code: None

Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts

论文/Paper: http://arxiv.org/pdf/2402.19215
代码/Code: https://github.com/mandalinadagi/wgsr

CAMixerSR: Only Details Need More "Attention"

论文/Paper: http://arxiv.org/pdf/2402.19289
代码/Code: https://github.com/icandle/camixersr

Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

论文/Paper: http://arxiv.org/pdf/2403.02601
代码/Code: None

返回目录/back

图像复原/Image Restoration

Boosting Image Restoration via Priors from Pre-trained Models

论文/Paper: http://arxiv.org/pdf/2403.06793
代码/Code: None

返回目录/back

图像去噪/Image Denoising

返回目录/back

图像编辑/Image Editing

Doubly Abductive Counterfactual Inference for Text-based Image Editing

论文/Paper: http://arxiv.org/pdf/2403.02981
代码/Code: https://github.com/xuesong39/DAC

返回目录/back

图像压缩/Image Compression

返回目录/back

图像去模糊/Image Deblur

A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning

论文/Paper: http://arxiv.org/pdf/2403.02611
代码/Code: https://github.com/PieceZhang/MPT-CataBlur

返回目录/back

自动驾驶 / Autonomous Driving

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

论文/Paper: http://arxiv.org/pdf/2403.00436
代码/Code: None

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

论文/Paper: http://arxiv.org/pdf/2403.07535
代码/Code: website:https://github.com/Junda24/AFNet/

返回目录/back

人脸识别/Face Recognition

返回目录/back

人脸检测/Face Detection

返回目录/back

人脸活体检测/Face Anti-Spoofing

Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing

论文/Paper: http://arxiv.org/pdf/2402.19298
代码/Code: https://github.com/omggggg/mmdg

返回目录/back

人脸重建/Face Reconstruction

返回目录/back

视频动作检测/Video Action Detection

返回目录/back

手语翻译/Sign Language Translation

返回目录/back

行人重识别/Person Re-identification

返回目录/back

Talking Face

返回目录/back

姿态估计/Pose Estimation

FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

论文/Paper: http://arxiv.org/pdf/2403.03221
代码/Code: None

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

论文/Paper: http://arxiv.org/pdf/2403.04381
代码/Code: https://github.com/MickeyLLG/S2DHand

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

论文/Paper: https://arxiv.org/pdf/2311.12028.pdf
代码/Code: https://github.com/NationalGAILab/HoT

返回目录/back

GAN

返回目录/back

人脸年龄估计/Age Estimation

返回目录/back

人脸表情识别/Facial Expression Recognition

返回目录/back

手势姿态估计（重建）/Hand Pose Estimation( Hand Mesh Recovery)

返回目录/back

3D Reconstruction

UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Data Sets

论文/Paper: http://arxiv.org/pdf/2403.05086
代码/Code: https://github.com/Youngju-Na/UFORecon

DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction

论文/Paper: http://arxiv.org/pdf/2403.05005
代码/Code: None

Memory-based Adapters for Online 3D Scene Perception

论文/Paper: http://arxiv.org/pdf/2403.06974
代码/Code: None

Bayesian Diffusion Models for 3D Shape Reconstruction

论文/Paper: http://arxiv.org/pdf/2403.06973
代码/Code: None

返回目录/back

视频插帧/Frame Interpolation

返回目录/back

3D点云/3D point cloud

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

论文/Paper: http://arxiv.org/pdf/2403.00592
代码/Code: https://github.com/ZhaochongAn/COSeg

Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension

论文/Paper: http://arxiv.org/pdf/2403.03532
代码/Code: https://github.com/liuquan98/eyoc

Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds

论文/Paper: http://arxiv.org/pdf/2403.05247
代码/Code: https://github.com/TRLou/HiT-ADV

返回目录/back

Anomaly Detection

Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts

论文/Paper: http://arxiv.org/pdf/2403.06495
代码/Code: https://github.com/mala-lab/inctrl

RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection

论文/Paper: http://arxiv.org/pdf/2403.05897
代码/Code: https://github.com/cnulab/realnet

返回目录/back

其他/Other

DisCo: Disentangled Control for Realistic Human Dance Generation

论文/Paper: https://arxiv.org/abs/2307.00040
代码/Code: https://github.com/Wangt-CN/DisCo

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

论文/Paper: http://arxiv.org/pdf/2402.18528
代码/Code: None

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

论文/Paper: http://arxiv.org/pdf/2402.18490
代码/Code: None

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

论文/Paper: http://arxiv.org/pdf/2402.18330
代码/Code: https://github.com/tho-kn/egotap

Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing

论文/Paper: http://arxiv.org/pdf/2402.18277
代码/Code: None

Misalignment-Robust Frequency Distribution Loss for Image Transformation

论文/Paper: http://arxiv.org/pdf/2402.18192
代码/Code: https://github.com/eezkni/FDL

3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

论文/Paper: http://arxiv.org/pdf/2402.18146
代码/Code: https://github.com/jiangchaokang/3dsflabelling

OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction

论文/Paper: http://arxiv.org/pdf/2402.18140
代码/Code: None

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

论文/Paper: http://arxiv.org/pdf/2402.18115
代码/Code: https://github.com/minghanli/univs

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

论文/Paper: http://arxiv.org/pdf/2402.18078
代码/Code: https://github.com/YanzuoLu/CFLD

Boosting Neural Representations for Videos with a Conditional Decoder

论文/Paper: http://arxiv.org/pdf/2402.18152
代码/Code: None

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

论文/Paper: http://arxiv.org/pdf/2402.18133
代码/Code: None

QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction

论文/Paper: http://arxiv.org/pdf/2402.17951
代码/Code: None

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

论文/Paper: http://arxiv.org/pdf/2402.19479
代码/Code: None

SeMoLi: What Moves Together Belongs Together

论文/Paper: http://arxiv.org/pdf/2402.19463
代码/Code: None

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

论文/Paper: http://arxiv.org/pdf/2402.19326
代码/Code: None

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

论文/Paper: http://arxiv.org/pdf/2402.19231
代码/Code: https://github.com/lu-feng/cricavpr

MemoNav: Working Memory Model for Visual Navigation

论文/Paper: http://arxiv.org/pdf/2402.19161
代码/Code: None

VideoMAC: Video Masked Autoencoders Meet ConvNets

论文/Paper: http://arxiv.org/pdf/2402.19082
代码/Code: https://github.com/nust-machine-intelligence-laboratory/videomac

Theoretically Achieving Continuous Representation of Oriented Bounding Boxes

论文/Paper: http://arxiv.org/pdf/2402.18975
代码/Code: https://github.com/Jittor/JDet

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

论文/Paper: http://arxiv.org/pdf/2402.18969
代码/Code: None

WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts

论文/Paper: http://arxiv.org/pdf/2402.18956
代码/Code: None

Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation

论文/Paper: http://arxiv.org/pdf/2402.18920
代码/Code: None

SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

论文/Paper: http://arxiv.org/pdf/2402.18848
代码/Code: None

ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

论文/Paper: http://arxiv.org/pdf/2402.18842
代码/Code: None

OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition

论文/Paper: http://arxiv.org/pdf/2402.18786
代码/Code: None

NARUTO: Neural Active Reconstruction from Uncertain Target Observations

论文/Paper: http://arxiv.org/pdf/2402.18771
代码/Code: None

Towards Generalizable Tumor Synthesis

论文/Paper: http://arxiv.org/pdf/2402.19470
代码/Code: None

Rethinking Multi-domain Generalization with A General Learning Objective

论文/Paper: http://arxiv.org/pdf/2402.18853
代码/Code: None

Rethinking Inductive Biases for Surface Normal Estimation

论文/Paper: http://arxiv.org/pdf/2403.00712
代码/Code: https://github.com/baegwangbin/DSINE

SURE: SUrvey REcipes for building reliable and robust deep networks

论文/Paper: http://arxiv.org/pdf/2403.00543
代码/Code: https://github.com/YutingLi0606/SURE

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

论文/Paper: http://arxiv.org/pdf/2403.00486
代码/Code: https://github.com/Windsrain/Selective-Stereo.

Deformable One-shot Face Stylization via DINO Semantic Guidance

论文/Paper: http://arxiv.org/pdf/2403.00459
代码/Code: https://github.com/zichongc/DoesFS

CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

论文/Paper: http://arxiv.org/pdf/2403.00274
代码/Code: None

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

论文/Paper: http://arxiv.org/pdf/2403.03122
代码/Code: None

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

论文/Paper: http://arxiv.org/pdf/2403.02782
代码/Code: None

HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes

论文/Paper: http://arxiv.org/pdf/2403.02769
代码/Code: None

Learning Group Activity Features Through Person Attribute Prediction

论文/Paper: http://arxiv.org/pdf/2403.02753
代码/Code: https://github.com/chihina/GAFL-CVPR2024.

Interactive Continual Learning: Fast and Slow Thinking

论文/Paper: http://arxiv.org/pdf/2403.02628
代码/Code: None

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

论文/Paper: http://arxiv.org/pdf/2403.03122
代码/Code: None

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

论文/Paper: http://arxiv.org/pdf/2403.02782
代码/Code: None

HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes

论文/Paper: http://arxiv.org/pdf/2403.02769
代码/Code: None

Learning Group Activity Features Through Person Attribute Prediction

论文/Paper: http://arxiv.org/pdf/2403.02753
代码/Code: https://github.com/chihina/GAFL-CVPR2024.

Interactive Continual Learning: Fast and Slow Thinking

论文/Paper: http://arxiv.org/pdf/2403.02628
代码/Code: None

Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation

论文/Paper: http://arxiv.org/pdf/2403.03890
代码/Code: None

DART: Implicit Doppler Tomography for Radar Novel View Synthesis

论文/Paper: http://arxiv.org/pdf/2403.03896
代码/Code: None

MeaCap: Memory-Augmented Zero-shot Image Captioning

论文/Paper: http://arxiv.org/pdf/2403.03715
代码/Code: https://github.com/joeyz0z/MeaCap

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

论文/Paper: http://arxiv.org/pdf/2403.03561
代码/Code: None

Continual Segmentation with Disentangled Objectness Learning and Class Recognition

论文/Paper: http://arxiv.org/pdf/2403.03477
代码/Code: https://github.com/jordangong/CoMasTRe

HDRFlow: Real-Time HDR Video Reconstruction with Large Motions

论文/Paper: http://arxiv.org/pdf/2403.03447
代码/Code: None

LEAD: Learning Decomposition for Source-free Universal Domain Adaptation

论文/Paper: http://arxiv.org/pdf/2403.03421
代码/Code: https://github.com/ispc-lab/lead

F$^3$Loc: Fusion and Filtering for Floorplan Localization

论文/Paper: http://arxiv.org/pdf/2403.03370
代码/Code: None

Enhancing Vision-Language Pre-training with Rich Supervisions

论文/Paper: http://arxiv.org/pdf/2403.03346
代码/Code: None

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

论文/Paper: http://arxiv.org/pdf/2403.04765
代码/Code: None

Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning

论文/Paper: http://arxiv.org/pdf/2403.04492
代码/Code: https://github.com/rashindrie/dipa

Learning to Remove Wrinkled Transparent Film with Polarized Prior

论文/Paper: http://arxiv.org/pdf/2403.04368
代码/Code: https://github.com/jqtangust/filmremoval

LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking

论文/Paper: http://arxiv.org/pdf/2403.04303
代码/Code: None

Active Generalized Category Discovery

论文/Paper: http://arxiv.org/pdf/2403.04272
代码/Code: https://github.com/mashijie1028/activegcd

MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection

论文/Paper: http://arxiv.org/pdf/2403.04149
代码/Code: https://github.com/ispc-lab/map

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

论文/Paper: http://arxiv.org/pdf/2403.04245
代码/Code: https://github.com/dalision/modalbiasavsr

Seamless Human Motion Composition with Blended Positional Encodings

论文/Paper: https://arxiv.org/abs/2402.15509
代码/Code:https://github.com/BarqueroGerman/FlowMDM

DiffusionLight: Light Probes for Free by Painting a Chrome Ball

论文/Paper: https://arxiv.org/abs/2312.09168
代码/Code:https://github.com/DiffusionLight/DiffusionLight

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

论文/Paper: http://arxiv.org/pdf/2403.05087
代码/Code: https://github.com/initialneil/SplattingAvatar

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

论文/Paper: http://arxiv.org/pdf/2403.06946
代码/Code: https://github.com/tl-uestc/unimos

Real-Time Simulated Avatar from Head-Mounted Sensors

论文/Paper: http://arxiv.org/pdf/2403.06862
代码/Code: None

DiaLoc: An Iterative Approach to Embodied Dialog Localization

论文/Paper: http://arxiv.org/pdf/2403.06846
代码/Code: None

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

论文/Paper: http://arxiv.org/pdf/2403.06775
代码/Code: https://github.com/modelscope/facechain

EarthLoc: Astronaut Photography Localization by Indexing Earth from Space

论文/Paper: http://arxiv.org/pdf/2403.06758
代码/Code: https://github.com/gmberton/earthloc

CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective

论文/Paper: http://arxiv.org/pdf/2403.06676
代码/Code: https://github.com/snskysk/cam-back-again

Distributionally Generative Augmentation for Fair Facial Attribute Classification

论文/Paper: http://arxiv.org/pdf/2403.06606
代码/Code: https://github.com/heqianpei/diga

Exploiting Style Latent Flows for Generalizing Deepfake Detection Video Detection

论文/Paper: http://arxiv.org/pdf/2403.06592
代码/Code: None

MoST: Motion Style Transformer between Diverse Action Contents

论文/Paper: http://arxiv.org/pdf/2403.06225
代码/Code: https://github.com/Boeun-Kim/MoST.

Coherent Temporal Synthesis for Incremental Action Segmentation

论文/Paper: http://arxiv.org/pdf/2403.06102
代码/Code: None

Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?

论文/Paper: http://arxiv.org/pdf/2403.06092
代码/Code: None

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

论文/Paper: http://arxiv.org/pdf/2403.05854
代码/Code: None

PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor

论文/Paper: http://arxiv.org/pdf/2403.06668
代码/Code: None

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

论文/Paper: http://arxiv.org/pdf/2403.03170
代码/Code: None

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

论文/Paper: https://arxiv.org/abs/2403.17749
代码/Code: https://github.com/YuqiYang213/MLoRE

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

论文/Paper: http://arxiv.org/pdf/2403.07874
代码/Code: https://github.com/zh460045050/v2l-tokenizer

Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis

论文/Paper: http://arxiv.org/pdf/2403.07719
代码/Code: https://github.com/wonderlandxd/wikg

Robust Synthetic-to-Real Transfer for Stereo Matching

论文/Paper: http://arxiv.org/pdf/2403.07705
代码/Code: https://github.com/jiaw-z/dkt-stereo

CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers

论文/Paper: http://arxiv.org/pdf/2403.07700
代码/Code: https://github.com/shahaf-arica/cuvler

Masked AutoDecoder is Effective Multi-Task Vision Generalist

论文/Paper: http://arxiv.org/pdf/2403.07692
代码/Code: https://github.com/hanqiu-hq/mad

PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution

论文/Paper: http://arxiv.org/pdf/2403.07589
代码/Code: None

Unleashing Network Potentials for Semantic Scene Completion

论文/Paper: http://arxiv.org/pdf/2403.07560
代码/Code: https://github.com/fereenwong/ammnet

Open-World Semantic Segmentation Including Class Similarity

论文/Paper: http://arxiv.org/pdf/2403.07532
代码/Code: https://github.com/PRBonn/ContMAV

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

论文/Paper: http://arxiv.org/pdf/2403.07392
代码/Code: https://github.com/Traffic-X/ViT-CoMer

FSC: Few-point Shape Completion

论文/Paper: http://arxiv.org/pdf/2403.07359
代码/Code: None

Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

论文/Paper: http://arxiv.org/pdf/2403.07347
代码/Code: https://github.com/jiafei127/fd4mm

A Bayesian Approach to OOD Robustness in Image Classification

论文/Paper: http://arxiv.org/pdf/2403.07277
代码/Code: None

返回目录/back