Awesome
CVPR2024-Papers-with-Code-Demo
:star_and_crescent:添加微信: nvshenj125, 备注方向,进交流学习群
欢迎关注公众号:AI算法与图像处理
:star2: CVPR 2024 持续更新最新论文/paper和相应的开源代码/code!
B站demo:https://space.bilibili.com/288489574
:hand: 注:欢迎各位大佬提交issue,分享CVPR 2022论文/paper和开源项目!共同完善这个项目
往年顶会论文汇总:
:fireworks: 欢迎进群 | Welcome
CVPR 2024 论文/paper交流群已成立!已经收录的同学,可以添加微信:nvshenj125,请备注:CVPR+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群。
<a name="Contents"></a>
:hammer: 目录 |Table of Contents(点击直接跳转)
<details open> <summary> 目录(右侧点击可折叠)</summary>- Backbone
- 数据集/Dataset
- Diffusion Model
- Text-to-Image
- NAS
- NeRF
- Knowledge Distillation
- 多模态 / Multimodal
- 对比学习/Contrastive Learning
- 图神经网络 / Graph Neural Networks
- 胶囊网络 / Capsule Network
- 图像分类 / Image Classification
- 目标检测/Object Detection
- 目标跟踪/Object Tracking
- 轨迹预测/Trajectory Prediction
- 语义分割/Segmentation
- 弱监督语义分割/Weakly Supervised Semantic Segmentation
- 医学图像分割
- 视频目标分割/Video Object Segmentation
- 交互式视频目标分割/Interactive Video Object Segmentation
- Visual Transformer
- 深度估计/Depth Estimation
- 人脸识别/Face Recognition
- 人脸检测/Face Detection
- 人脸活体检测/Face Anti-Spoofing
- 人脸年龄估计/Age Estimation
- 人脸表情识别/Facial Expression Recognition
- 人脸属性识别/Facial Attribute Recognition
- 人脸编辑/Facial Editing
- 人脸重建/Face Reconstruction
- Talking Face
- 换脸/Face Swap
- 姿态估计/Pose Estimation
- 手势姿态估计(重建)/Hand Pose Estimation( Hand Mesh Recovery)
- 视频动作检测/Video Action Detection
- 手语翻译/Sign Language Translation
- 3D人体重建
- 行人重识别/Person Re-identification
- 行人搜索/Person Search
- 人群计数 / Crowd Counting
- GAN
- 彩妆迁移 / Color-Pattern Makeup Transfer
- 字体生成 / Font Generation
- 场景文本检测、识别/Scene Text Detection/Recognition
- 图像、视频检索 / Image Retrieval/Video retrieval
- Image Animation
- 抠图/Image Matting
- 超分辨率/Super Resolution
- 图像复原/Image Restoration
- 图像补全/Image Inpainting
- 图像去噪/Image Denoising
- 图像编辑/Image Editing
- 图像拼接/Image stitching
- 图像匹配/Image Matching
- 图像融合/Image Blending
- 图像去雾/Image Dehazing
- 图像去模糊/Image Deblur
- 图像压缩/Image Compression
- 反光去除/Reflection Removal
- 车道线检测/Lane Detection
- 自动驾驶 / Autonomous Driving
- 流体重建/Fluid Reconstruction
- 场景重建 / Scene Reconstruction
- 3D Reconstruction
- 视频插帧/Frame Interpolation
- 视频超分 / Video Super-Resolution
- 3D点云/3D point cloud
- 标签噪声 / Label-Noise
- 对抗样本/Adversarial Examples
- Anomaly Detection
- 其他/Other
<a name="Backbone"></a>
Backbone
<a name="Dataset"></a>
数据集/Dataset
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
- 论文/Paper: http://arxiv.org/pdf/2403.02640
- 代码/Code: None
Traffic Scene Parsing through the TSP6K Dataset
- 论文/Paper: https://arxiv.org/pdf/2303.02835.pdf
- 代码/Code: https://github.com/PengtaoJiang/TSP6K
<a name="DiffusionModel"></a>
Diffusion Model
Balancing Act: Distribution-Guided Debiasing in Diffusion Models
- 论文/Paper: http://arxiv.org/pdf/2402.18206
- 代码/Code: None
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
- 论文/Paper: http://arxiv.org/pdf/2402.19481
- 代码/Code: https://github.com/mit-han-lab/distrifuser
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
- 论文/Paper: http://arxiv.org/pdf/2402.19302
- 代码/Code: https://github.com/iit-pavis/diffassemble
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
- 论文/Paper: http://arxiv.org/pdf/2403.00644
- 代码/Code: None
Few-shot Learner Parameterization by Diffusion Time-steps
- 论文/Paper: http://arxiv.org/pdf/2403.02649
- 代码/Code: https://github.com/yue-zhongqi/tif
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
- 论文/Paper: http://arxiv.org/pdf/2403.04290
- 代码/Code: None
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
- 论文/Paper: https://arxiv.org/abs/2403.06951
- 代码/Code: https://github.com/Tianhao-Qi/DEADiff_code
Face2Diffusion for Fast and Editable Face Personalization
- 论文/Paper: http://arxiv.org/pdf/2403.05094
- 代码/Code: https://github.com/mapooon/Face2Diffusion
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
- 论文/Paper: http://arxiv.org/pdf/2403.06951
- 代码/Code: None
MACE: Mass Concept Erasure in Diffusion Models
- 论文/Paper: http://arxiv.org/pdf/2403.06135
- 代码/Code: https://github.com/Shilin-LU/MACE
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
- 论文/Paper: http://arxiv.org/pdf/2403.07234
- 代码/Code: https://github.com/subhadeepkoley/demosketch2rgb
SemCity: Semantic Scene Generation with Triplane Diffusion
- 论文/Paper: http://arxiv.org/pdf/2403.07773
- 代码/Code: https://github.com/zoomin-lee/semcity
<a name="T2I"></a>
Text-to-Image
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
- 论文/Paper: http://arxiv.org/pdf/2403.00483
- 代码/Code: None
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
- 论文/Paper: http://arxiv.org/pdf/2403.03485
- 代码/Code: https://github.com/univ-esuty/noisecollage
Discriminative Probing and Tuning for Text-to-Image Generation
- 论文/Paper: http://arxiv.org/pdf/2403.04321
- 代码/Code: None
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
- 论文/Paper: http://arxiv.org/pdf/2403.05239
- 代码/Code: None
Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
- 论文/Paper: http://arxiv.org/pdf/2403.06452
- 代码/Code: https://github.com/mulns/Text2QR
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
- 论文/Paper: http://arxiv.org/pdf/2403.07214
- 代码/Code: None
<a name="NAS"></a>
NAS
<a name="NeRF"></a>
NeRF
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
- 论文/Paper: http://arxiv.org/pdf/2403.03608
- 代码/Code: None
DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
- 论文/Paper: http://arxiv.org/pdf/2403.06912
- 代码/Code: https://github.com/fictionarry/dngaussian
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
- 论文/Paper: http://arxiv.org/pdf/2403.06205
- 代码/Code: None
<a name="KnowledgeDistillation"></a>
Knowledge Distillation
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
- 论文/Paper: http://arxiv.org/pdf/2403.02781
- 代码/Code: https://github.com/zhengli97/PromptKD
Logit Standardization in Knowledge Distillation
- 论文/Paper: https://arxiv.org/abs/2403.01427
- 代码/Code: https://github.com/sunshangquan/logit-standardization-KD
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
- 论文/Paper: http://arxiv.org/pdf/2403.05061
- 代码/Code: None
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections
- 论文/Paper: http://arxiv.org/pdf/2403.06213
- 代码/Code: https://github.com/roymiles/vkd
<a name="Multimodal"></a>
多模态 / Multimodal
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
- 论文/Paper: https://arxiv.org/abs/2312.07472
- 代码/Code: https://github.com/IranQin/MP5
- 主页/Website:https://iranqin.github.io/MP5.github.io/
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
- 论文/Paper: http://arxiv.org/pdf/2402.18091
- 代码/Code: None
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
- 论文/Paper: http://arxiv.org/pdf/2403.02991
- 代码/Code: None
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
- 论文/Paper: http://arxiv.org/pdf/2403.05105
- 代码/Code: https://github.com/hhc1997/L2RM
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
- 论文/Paper: http://arxiv.org/pdf/2403.07839
- 代码/Code: None
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework
- 论文/Paper: http://arxiv.org/pdf/2403.07636
- 代码/Code: https://github.com/hieuphan33/mavl
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
- 论文/Paper: http://arxiv.org/pdf/2403.07241
- 代码/Code: None
<a name="ContrastiveLearning"></a>
Contrastive Learning
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
- 论文/Paper: http://arxiv.org/pdf/2403.06122
- 代码/Code: https://github.com/root0yang/blindnet
<a name="CapsuleNetwork"></a>
胶囊网络 / Capsule Network
<a name="ImageClassification"></a>
图像分类 / Image Classification
<a name="ObjectDetection"></a>
目标检测/Object Detection
UniMODE: Unified Monocular 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2402.18573
- 代码/Code: None
CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images
- 论文/Paper: http://arxiv.org/pdf/2403.04198
- 代码/Code: https://github.com/SerCharles/CN-RMA
Memory-based Adapters for Online 3D Scene Perception
- 论文/Paper: https://arxiv.org/abs/2403.06974
- 代码/Code:https://github.com/xuxw98/Online3D
Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement
-
论文/Paper: https://arxiv.org/abs/2403.16131
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
- 论文/Paper: http://arxiv.org/pdf/2403.06093
- 代码/Code: https://github.com/nullmax-vision/QAF2D
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2403.05817
- 代码/Code: https://github.com/zhanggang001/hednet
<a name="ObjectTracking"></a>
目标跟踪/Object Tracking
DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking
- 论文/Paper: http://arxiv.org/pdf/2403.02767
- 代码/Code: None
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
- 论文/Paper: http://arxiv.org/pdf/2403.04700
- 代码/Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT
3D Object Tracking
<a name="TrajectoryPrediction"></a>
轨迹预测/Trajectory Prediction
<a name="Segmentation"></a>
语义分割/Segmentation
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
- 论文/Paper: http://arxiv.org/pdf/2402.19422
- 代码/Code: https://github.com/niccolocavagnero/pem
Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2403.06462
- 代码/Code: https://github.com/Gavinwxy/DDFP
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
- 论文/Paper: http://arxiv.org/pdf/2403.06247
- 代码/Code: None
<a name="WSSS"></a>
弱监督语义分割/Weakly Supervised Semantic Segmentation
<a name="MedicalImageSegmentation"></a>
医学图像/Medical Image
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
- 论文/Paper: http://arxiv.org/pdf/2402.18933
- 代码/Code: None
<a name="VideoObjectSegmentation"></a>
视频目标分割/Video Object Segmentation
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation
- 论文/Paper: http://arxiv.org/pdf/2403.04258
- 代码/Code: None
<a name="InteractiveVideoObjectSegmentation"></a>
交互式视频目标分割/Interactive Video Object Segmentation
<a name="VisualTransformer"></a>
Visual Transformer
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
- 论文/Paper: http://arxiv.org/pdf/2403.05419
- 代码/Code: https://github.com/techmn/satmae_pp
<a name="DepthEstimation"></a>
深度估计/Depth Estimation
Representations for Recognition and Retrieval
- 论文/Paper: https://arxiv.org/pdf/2403.07535.pdf
- 代码/Code: https://github.com/Junda24/AFNet
<a name="Retrieval"></a>
图像、视频检索 / Image Retrieval/Video retrieval
Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval
- 论文/Paper: http://arxiv.org/pdf/2403.00272
- 代码/Code: None
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
- 论文/Paper: http://arxiv.org/pdf/2403.05105
- 代码/Code: https://github.com/hhc1997/L2RM
How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?
- 论文/Paper: http://arxiv.org/pdf/2403.07203
- 代码/Code: None
<a name="SuperResolution"></a>
超分辨率/Super Resolution
SeD: Semantic-Aware Discriminator for Image Super-Resolution
- 论文/Paper: http://arxiv.org/pdf/2402.19387
- 代码/Code: None
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
- 论文/Paper: http://arxiv.org/pdf/2402.19215
- 代码/Code: https://github.com/mandalinadagi/wgsr
CAMixerSR: Only Details Need More "Attention"
- 论文/Paper: http://arxiv.org/pdf/2402.19289
- 代码/Code: https://github.com/icandle/camixersr
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
- 论文/Paper: http://arxiv.org/pdf/2403.02601
- 代码/Code: None
<a name="ImageRestoration"></a>
图像复原/Image Restoration
Boosting Image Restoration via Priors from Pre-trained Models
- 论文/Paper: http://arxiv.org/pdf/2403.06793
- 代码/Code: None
<a name="ImageDenoising"></a>
图像去噪/Image Denoising
<a name="ImageEditing"></a>
图像编辑/Image Editing
Doubly Abductive Counterfactual Inference for Text-based Image Editing
- 论文/Paper: http://arxiv.org/pdf/2403.02981
- 代码/Code: https://github.com/xuesong39/DAC
<a name="ImageCompression"></a>
图像压缩/Image Compression
<a name="ImageDeblur"></a>
图像去模糊/Image Deblur
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
- 论文/Paper: http://arxiv.org/pdf/2403.02611
- 代码/Code: https://github.com/PieceZhang/MPT-CataBlur
<a name="AutonomousDriving"></a>
自动驾驶 / Autonomous Driving
Abductive Ego-View Accident Video Understanding for Safe Driving Perception
- 论文/Paper: http://arxiv.org/pdf/2403.00436
- 代码/Code: None
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
- 论文/Paper: http://arxiv.org/pdf/2403.07535
- 代码/Code: website:https://github.com/Junda24/AFNet/
<a name="FaceRecognition"></a>
人脸识别/Face Recognition
<a name="FaceDetection"></a>
人脸检测/Face Detection
<a name="FaceAnti-Spoofing"></a>
人脸活体检测/Face Anti-Spoofing
Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
- 论文/Paper: http://arxiv.org/pdf/2402.19298
- 代码/Code: https://github.com/omggggg/mmdg
<a name="FaceReconstruction"></a>
人脸重建/Face Reconstruction
<a name="VideoActionDetection"></a>
视频动作检测/Video Action Detection
<a name="SignLanguageTranslation"></a>
手语翻译/Sign Language Translation
<a name="PersonRe-identification"></a>
行人重识别/Person Re-identification
<a name="TalkingFace"></a>
Talking Face
<a name="HumanPoseEstimation"></a>
姿态估计/Pose Estimation
FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation
- 论文/Paper: http://arxiv.org/pdf/2403.03221
- 代码/Code: None
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
- 论文/Paper: http://arxiv.org/pdf/2403.04381
- 代码/Code: https://github.com/MickeyLLG/S2DHand
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
- 论文/Paper: https://arxiv.org/pdf/2311.12028.pdf
- 代码/Code: https://github.com/NationalGAILab/HoT
<a name="GAN"></a>
GAN
<a name="AgeEstimation"></a>
人脸年龄估计/Age Estimation
<a name="FacialExpressionRecognition"></a>
人脸表情识别/Facial Expression Recognition
<a name="HandPoseEstimation"></a>
手势姿态估计(重建)/Hand Pose Estimation( Hand Mesh Recovery)
<a name="3DReconstruction"></a>
3D Reconstruction
UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Data Sets
- 论文/Paper: http://arxiv.org/pdf/2403.05086
- 代码/Code: https://github.com/Youngju-Na/UFORecon
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
- 论文/Paper: http://arxiv.org/pdf/2403.05005
- 代码/Code: None
Memory-based Adapters for Online 3D Scene Perception
- 论文/Paper: http://arxiv.org/pdf/2403.06974
- 代码/Code: None
Bayesian Diffusion Models for 3D Shape Reconstruction
- 论文/Paper: http://arxiv.org/pdf/2403.06973
- 代码/Code: None
<a name="FrameInterpolation"></a>
视频插帧/Frame Interpolation
<a name="3DPointCloud"></a>
3D点云/3D point cloud
Rethinking Few-shot 3D Point Cloud Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2403.00592
- 代码/Code: https://github.com/ZhaochongAn/COSeg
Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension
- 论文/Paper: http://arxiv.org/pdf/2403.03532
- 代码/Code: https://github.com/liuquan98/eyoc
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
- 论文/Paper: http://arxiv.org/pdf/2403.05247
- 代码/Code: https://github.com/TRLou/HiT-ADV
<a name="AnomalyDetection"></a>
Anomaly Detection
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
- 论文/Paper: http://arxiv.org/pdf/2403.06495
- 代码/Code: https://github.com/mala-lab/inctrl
RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection
- 论文/Paper: http://arxiv.org/pdf/2403.05897
- 代码/Code: https://github.com/cnulab/realnet
<a name="Other"></a>
其他/Other
DisCo: Disentangled Control for Realistic Human Dance Generation
- 论文/Paper: https://arxiv.org/abs/2307.00040
- 代码/Code: https://github.com/Wangt-CN/DisCo
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
- 论文/Paper: http://arxiv.org/pdf/2402.18528
- 代码/Code: None
TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
- 论文/Paper: http://arxiv.org/pdf/2402.18490
- 代码/Code: None
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
- 论文/Paper: http://arxiv.org/pdf/2402.18330
- 代码/Code: https://github.com/tho-kn/egotap
Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
- 论文/Paper: http://arxiv.org/pdf/2402.18277
- 代码/Code: None
Misalignment-Robust Frequency Distribution Loss for Image Transformation
- 论文/Paper: http://arxiv.org/pdf/2402.18192
- 代码/Code: https://github.com/eezkni/FDL
3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
- 论文/Paper: http://arxiv.org/pdf/2402.18146
- 代码/Code: https://github.com/jiangchaokang/3dsflabelling
OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction
- 论文/Paper: http://arxiv.org/pdf/2402.18140
- 代码/Code: None
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
- 论文/Paper: http://arxiv.org/pdf/2402.18115
- 代码/Code: https://github.com/minghanli/univs
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
- 论文/Paper: http://arxiv.org/pdf/2402.18078
- 代码/Code: https://github.com/YanzuoLu/CFLD
Boosting Neural Representations for Videos with a Conditional Decoder
- 论文/Paper: http://arxiv.org/pdf/2402.18152
- 代码/Code: None
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
- 论文/Paper: http://arxiv.org/pdf/2402.18133
- 代码/Code: None
QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction
- 论文/Paper: http://arxiv.org/pdf/2402.17951
- 代码/Code: None
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
- 论文/Paper: http://arxiv.org/pdf/2402.19479
- 代码/Code: None
SeMoLi: What Moves Together Belongs Together
- 论文/Paper: http://arxiv.org/pdf/2402.19463
- 代码/Code: None
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
- 论文/Paper: http://arxiv.org/pdf/2402.19326
- 代码/Code: None
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
- 论文/Paper: http://arxiv.org/pdf/2402.19231
- 代码/Code: https://github.com/lu-feng/cricavpr
MemoNav: Working Memory Model for Visual Navigation
- 论文/Paper: http://arxiv.org/pdf/2402.19161
- 代码/Code: None
VideoMAC: Video Masked Autoencoders Meet ConvNets
- 论文/Paper: http://arxiv.org/pdf/2402.19082
- 代码/Code: https://github.com/nust-machine-intelligence-laboratory/videomac
Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
- 论文/Paper: http://arxiv.org/pdf/2402.18975
- 代码/Code: https://github.com/Jittor/JDet
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
- 论文/Paper: http://arxiv.org/pdf/2402.18969
- 代码/Code: None
WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
- 论文/Paper: http://arxiv.org/pdf/2402.18956
- 代码/Code: None
Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation
- 论文/Paper: http://arxiv.org/pdf/2402.18920
- 代码/Code: None
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
- 论文/Paper: http://arxiv.org/pdf/2402.18848
- 代码/Code: None
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
- 论文/Paper: http://arxiv.org/pdf/2402.18842
- 代码/Code: None
OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition
- 论文/Paper: http://arxiv.org/pdf/2402.18786
- 代码/Code: None
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
- 论文/Paper: http://arxiv.org/pdf/2402.18771
- 代码/Code: None
Towards Generalizable Tumor Synthesis
- 论文/Paper: http://arxiv.org/pdf/2402.19470
- 代码/Code: None
Rethinking Multi-domain Generalization with A General Learning Objective
- 论文/Paper: http://arxiv.org/pdf/2402.18853
- 代码/Code: None
Rethinking Inductive Biases for Surface Normal Estimation
- 论文/Paper: http://arxiv.org/pdf/2403.00712
- 代码/Code: https://github.com/baegwangbin/DSINE
SURE: SUrvey REcipes for building reliable and robust deep networks
- 论文/Paper: http://arxiv.org/pdf/2403.00543
- 代码/Code: https://github.com/YutingLi0606/SURE
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
- 论文/Paper: http://arxiv.org/pdf/2403.00486
- 代码/Code: https://github.com/Windsrain/Selective-Stereo.
Deformable One-shot Face Stylization via DINO Semantic Guidance
- 论文/Paper: http://arxiv.org/pdf/2403.00459
- 代码/Code: https://github.com/zichongc/DoesFS
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
- 论文/Paper: http://arxiv.org/pdf/2403.00274
- 代码/Code: None
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
- 论文/Paper: http://arxiv.org/pdf/2403.03122
- 代码/Code: None
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
- 论文/Paper: http://arxiv.org/pdf/2403.02782
- 代码/Code: None
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
- 论文/Paper: http://arxiv.org/pdf/2403.02769
- 代码/Code: None
Learning Group Activity Features Through Person Attribute Prediction
- 论文/Paper: http://arxiv.org/pdf/2403.02753
- 代码/Code: https://github.com/chihina/GAFL-CVPR2024.
Interactive Continual Learning: Fast and Slow Thinking
- 论文/Paper: http://arxiv.org/pdf/2403.02628
- 代码/Code: None
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
- 论文/Paper: http://arxiv.org/pdf/2403.03122
- 代码/Code: None
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
- 论文/Paper: http://arxiv.org/pdf/2403.02782
- 代码/Code: None
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
- 论文/Paper: http://arxiv.org/pdf/2403.02769
- 代码/Code: None
Learning Group Activity Features Through Person Attribute Prediction
- 论文/Paper: http://arxiv.org/pdf/2403.02753
- 代码/Code: https://github.com/chihina/GAFL-CVPR2024.
Interactive Continual Learning: Fast and Slow Thinking
- 论文/Paper: http://arxiv.org/pdf/2403.02628
- 代码/Code: None
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
- 论文/Paper: http://arxiv.org/pdf/2403.03890
- 代码/Code: None
DART: Implicit Doppler Tomography for Radar Novel View Synthesis
- 论文/Paper: http://arxiv.org/pdf/2403.03896
- 代码/Code: None
MeaCap: Memory-Augmented Zero-shot Image Captioning
- 论文/Paper: http://arxiv.org/pdf/2403.03715
- 代码/Code: https://github.com/joeyz0z/MeaCap
HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
- 论文/Paper: http://arxiv.org/pdf/2403.03561
- 代码/Code: None
Continual Segmentation with Disentangled Objectness Learning and Class Recognition
- 论文/Paper: http://arxiv.org/pdf/2403.03477
- 代码/Code: https://github.com/jordangong/CoMasTRe
HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
- 论文/Paper: http://arxiv.org/pdf/2403.03447
- 代码/Code: None
LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
- 论文/Paper: http://arxiv.org/pdf/2403.03421
- 代码/Code: https://github.com/ispc-lab/lead
F$^3$Loc: Fusion and Filtering for Floorplan Localization
- 论文/Paper: http://arxiv.org/pdf/2403.03370
- 代码/Code: None
Enhancing Vision-Language Pre-training with Rich Supervisions
- 论文/Paper: http://arxiv.org/pdf/2403.03346
- 代码/Code: None
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
- 论文/Paper: http://arxiv.org/pdf/2403.04765
- 代码/Code: None
Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
- 论文/Paper: http://arxiv.org/pdf/2403.04492
- 代码/Code: https://github.com/rashindrie/dipa
Learning to Remove Wrinkled Transparent Film with Polarized Prior
- 论文/Paper: http://arxiv.org/pdf/2403.04368
- 代码/Code: https://github.com/jqtangust/filmremoval
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
- 论文/Paper: http://arxiv.org/pdf/2403.04303
- 代码/Code: None
Active Generalized Category Discovery
- 论文/Paper: http://arxiv.org/pdf/2403.04272
- 代码/Code: https://github.com/mashijie1028/activegcd
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
- 论文/Paper: http://arxiv.org/pdf/2403.04149
- 代码/Code: https://github.com/ispc-lab/map
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
- 论文/Paper: http://arxiv.org/pdf/2403.04245
- 代码/Code: https://github.com/dalision/modalbiasavsr
Seamless Human Motion Composition with Blended Positional Encodings
- 论文/Paper: https://arxiv.org/abs/2402.15509
- 代码/Code:https://github.com/BarqueroGerman/FlowMDM
DiffusionLight: Light Probes for Free by Painting a Chrome Ball
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
- 论文/Paper: http://arxiv.org/pdf/2403.05087
- 代码/Code: https://github.com/initialneil/SplattingAvatar
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
- 论文/Paper: http://arxiv.org/pdf/2403.06946
- 代码/Code: https://github.com/tl-uestc/unimos
Real-Time Simulated Avatar from Head-Mounted Sensors
- 论文/Paper: http://arxiv.org/pdf/2403.06862
- 代码/Code: None
DiaLoc: An Iterative Approach to Embodied Dialog Localization
- 论文/Paper: http://arxiv.org/pdf/2403.06846
- 代码/Code: None
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
- 论文/Paper: http://arxiv.org/pdf/2403.06775
- 代码/Code: https://github.com/modelscope/facechain
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
- 论文/Paper: http://arxiv.org/pdf/2403.06758
- 代码/Code: https://github.com/gmberton/earthloc
CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective
- 论文/Paper: http://arxiv.org/pdf/2403.06676
- 代码/Code: https://github.com/snskysk/cam-back-again
Distributionally Generative Augmentation for Fair Facial Attribute Classification
- 论文/Paper: http://arxiv.org/pdf/2403.06606
- 代码/Code: https://github.com/heqianpei/diga
Exploiting Style Latent Flows for Generalizing Deepfake Detection Video Detection
- 论文/Paper: http://arxiv.org/pdf/2403.06592
- 代码/Code: None
MoST: Motion Style Transformer between Diverse Action Contents
- 论文/Paper: http://arxiv.org/pdf/2403.06225
- 代码/Code: https://github.com/Boeun-Kim/MoST.
Coherent Temporal Synthesis for Incremental Action Segmentation
- 论文/Paper: http://arxiv.org/pdf/2403.06102
- 代码/Code: None
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
- 论文/Paper: http://arxiv.org/pdf/2403.06092
- 代码/Code: None
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
- 论文/Paper: http://arxiv.org/pdf/2403.05854
- 代码/Code: None
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor
- 论文/Paper: http://arxiv.org/pdf/2403.06668
- 代码/Code: None
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
- 论文/Paper: http://arxiv.org/pdf/2403.03170
- 代码/Code: None
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
- 论文/Paper: https://arxiv.org/abs/2403.17749
- 代码/Code: https://github.com/YuqiYang213/MLoRE
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
- 论文/Paper: http://arxiv.org/pdf/2403.07874
- 代码/Code: https://github.com/zh460045050/v2l-tokenizer
Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis
- 论文/Paper: http://arxiv.org/pdf/2403.07719
- 代码/Code: https://github.com/wonderlandxd/wikg
Robust Synthetic-to-Real Transfer for Stereo Matching
- 论文/Paper: http://arxiv.org/pdf/2403.07705
- 代码/Code: https://github.com/jiaw-z/dkt-stereo
CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers
- 论文/Paper: http://arxiv.org/pdf/2403.07700
- 代码/Code: https://github.com/shahaf-arica/cuvler
Masked AutoDecoder is Effective Multi-Task Vision Generalist
- 论文/Paper: http://arxiv.org/pdf/2403.07692
- 代码/Code: https://github.com/hanqiu-hq/mad
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution
- 论文/Paper: http://arxiv.org/pdf/2403.07589
- 代码/Code: None
Unleashing Network Potentials for Semantic Scene Completion
- 论文/Paper: http://arxiv.org/pdf/2403.07560
- 代码/Code: https://github.com/fereenwong/ammnet
Open-World Semantic Segmentation Including Class Similarity
- 论文/Paper: http://arxiv.org/pdf/2403.07532
- 代码/Code: https://github.com/PRBonn/ContMAV
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
- 论文/Paper: http://arxiv.org/pdf/2403.07392
- 代码/Code: https://github.com/Traffic-X/ViT-CoMer
FSC: Few-point Shape Completion
- 论文/Paper: http://arxiv.org/pdf/2403.07359
- 代码/Code: None
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
- 论文/Paper: http://arxiv.org/pdf/2403.07347
- 代码/Code: https://github.com/jiafei127/fd4mm
A Bayesian Approach to OOD Robustness in Image Classification
- 论文/Paper: http://arxiv.org/pdf/2403.07277
- 代码/Code: None