Awesome
WACV-2024-Papers
会议时间:2024年1月3-7日
会议网址:https://wacv2024.thecvf.com/
❣❣❣ WACV 2024 论文分类整理已完成
📢📢📢获奖论文
🏆最佳论文奖(Algorithms)
Conditional Velocity Score Estimation for Image Restoration
🏆最佳论文奖(Applications)
WildlifeDatasets: An Open-Source Toolkit for Animal Re-Identification
🏆最佳学生论文
🏆最佳论文荣誉提名
ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields
查看2024年综述文献点这里↘️2024-CV-Surveys
2024 年论文分类汇总戳这里
2023 年论文分类汇总戳这里
↘️CVPR-2023-Papers ↘️WACV-2023-Papers ↘️ICCV-2023-Papers ↘️2023-CV-Surveys
2022 年论文分类汇总戳这里
2021 年论文分类汇总戳这里
2020 年论文分类汇总戳这里
目录
61.Computed Imaging(计算成像,如光学、几何、光场成像等)
- Motion Matters: Neural Motion Transfer for Better Camera Physiological Measurement
- On the Quantification of Image Reconstruction Uncertainty without Training Data
- Deep Optics for Optomechanical Control Policy Design
- From Chaos to Calibration: A Geometric Mutual Information Approach To Target-Free Camera LiDAR Extrinsic Calibration
- Joint 3D Shape and Motion Estimation From Rolling Shutter Light-Field Images
- CGAPoseNet+GCAN: A Geometric Clifford Algebra Network for Geometry-Aware Camera Pose Regression
- 相机校准
60.Graphic Layout(图形布局)
<a name="59"/>59.Rendering
- LensNeRF: Rethinking Volume Rendering Based on Thin-Lens Camera Model
- Specular Object Reconstruction Behind Frosted Glass by Differentiable Rendering
58.Novel View Synthesis(新视角合成)
- Ray Deformation Networks for Novel View Synthesis of Refractive Objects
- Stereo Conversion With Disparity-Aware Warping, Compositing and Inpainting
57.Neural Radiance Fields(NeRF)
- EvDNeRF: Reconstructing Event Data With Dynamic Neural Radiance Fields
- Hyb-NeRF: A Multiresolution Hybrid Encoding for Neural Radiance Fields
- Fast Sun-aligned Outdoor Scene Relighting based on TensoRF
- ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields
- MoRF: Mobile Realistic Fullbody Avatars From a Monocular Video
- ZIGNeRF: Zero-Shot 3D Scene Representation With Invertible Generative Neural Radiance Fields
- Point-DynRF: Point-Based Dynamic Radiance Fields From a Monocular Video
- A Generic and Flexible Regularization Framework for NeRFs
56.Event Cameras(事件相机)
<a name="55"/>55.Biometrics(生物特征识别)
- Deep Visual-Genetic Biometrics for Taxonomic Classification of Rare Species
- Fingervein Verification using Convolutional Multi-Head Attention Network
- FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude
- Vikriti-ID: A Novel Approach for Real Looking Fingerprint Data-Set Generation
- 指纹生成
54.Style Transfer(风格迁移)
- Optical Flow Domain Adaptation via Target Style Transfer
- Multimodality-guided Image Style Transfer using Cross-modal GAN Inversion<br>:star:code
- FastCLIPstyler: Optimisation-Free Text-Based Image Style Transfer Using Style Representations
- SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer From a Spectral Perspective
- Neural Style Protection: Counteracting Unauthorized Neural Style Transfer
- LipAT: Beyond Style Transfer for Controllable Neural Simulation of Lipstick Using Cosmetic Attributes
53.Crack Segmentation
<a name="52"/>52.Gaze Estimation(凝视估计)
<a name="51"/>51.sound(语音)
- 唇语同步
- 声源定位
- 音频分离
- 3D 声源检测
- 音视频分割
- 语音视频合成
- 身体节拍制作互动鼓声
50.Dataset(数据集)
- HaGRID -- HAnd Gesture Recognition Image Dataset
- Beyond RGB: A Real World Dataset for Multispectral Imaging in Mobile Devices
- IKEA Ego 3D Dataset: Understanding Furniture Assembly Actions From Ego-View 3D Point Clouds
- PsyMo: A Dataset for Estimating Self-Reported Psychological Traits From Gait
- The Growing Strawberries Dataset: Tracking Multiple Objects With Biological Development Over an Extended Period
- UOW-Vessel: A Benchmark Dataset of High-Resolution Optical Satellite Images for Vessel Detection and Segmentation
- NITEC: Versatile Hand-Annotated Eye Contact Dataset for Ego-Vision Interaction<br>:star:code
- FishTrack23: An Ensemble Underwater Dataset for Multi-Object Tracking
- Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning
- NOMAD: A Natural, Occluded, Multi-Scale Aerial Dataset, for Emergency Response Scenarios
- Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
- SphereCraft: A Dataset for Spherical Keypoint Detection, Matching and Camera Pose Estimation
- Ego2HandsPose: A Dataset for Egocentric Two-Hand 3D Global Pose Estimation
- MarsLS-Net: Martian Landslides Segmentation Network and Benchmark Dataset
- Beyond Document Page Classification: Design, Datasets, and Challenges
- MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis
- SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping and Building Change Detection<br>:sunflower:dataset
- IndustReal: A Dataset for Procedure Step Recognition Handling Execution Errors in Egocentric Videos in an Industrial-Like Setting<br>:star:code
- SICKLE: A Multi-Sensor Satellite Imagery Dataset Annotated with Multiple Key Cropping Parameters
- SeaTurtleID2022: A Long-Span Dataset for Reliable Sea Turtle Re-Identification
- Amodal Intra-Class Instance Segmentation: Synthetic Datasets and Benchmark
- Towards Accurate Disease Segmentation in Plant Images: A Comprehensive Dataset Creation and Network Evaluation
- AssemblyNet: A Point Cloud Dataset and Benchmark for Predicting Part Directions in an Exploded Layout
- MAdVerse: A Hierarchical Dataset of Multi-Lingual Ads From Diverse Sources and Categories
- InfraParis: A Multi-Modal and Multi-Task Autonomous Driving Dataset
- ZRG: A Dataset for Multimodal 3D Residential Rooftop Understanding
- 基准
- ISAR: A Benchmark for Single- and Few-Shot Object Instance Segmentation and Re-Identification
- ConeQuest: A Benchmark for Cone Segmentation on Mars<br>:star:code
- dacl10k: Benchmark for Semantic Bridge Damage Segmentation
- IDD-AW: A Benchmark for Safe and Robust Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather
- A Multimodal Benchmark and Improved Architecture for Zero Shot Learning
- RobustCLEVR: A Benchmark and Framework for Evaluating Robustness in Object-Centric Learning
49.Vision Transformers
- Grafting Vision Transformers
- Efficient MAE Towards Large-Scale Vision Transformers
- SimA: Simple Softmax-Free Attention for Vision Transformers
- Open-NeRF: Towards Open Vocabulary NeRF Decomposition
- Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders
- Triplet Attention Transformer for Spatiotemporal Predictive Learning
- Query-Guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch
- GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation<br>:star:code
- Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspective
- SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers<br>:star:code
- Robust Eye Blink Detection Using Dual Embedding Video Vision Transformer
- Semantic Labels-Aware Transformer Model for Searching Over a Large Collection of Lecture-Slides
48.Image/Video Editing
- Unified Concept Editing in Diffusion Models
- Iterative Multi-Granular Image Editing Using Diffusion Models
- Discovering and Mitigating Biases in CLIP-Based Image Editing
- Revisiting Latent Space of GAN Inversion for Robust Real Image Editing
- ProxEdit: Improving Tuning-Free Real Image Editing With Proximal Guidance
- 图像拼接
- 视频编辑
- 文本-图像编辑
- 3D 场景编辑
47.Edge Detection(边缘检测)
<a name="46"/>46.Dense Prediction(密集预测)
- PolyMaX: General Dense Prediction with Mask Transformer
- Convolutional Masked Image Modeling for Dense Prediction Tasks on Pathology Images
45.Visual Tampering Detection(视觉篡改检测)
- 包裹防伪检测
- 视频伪造检测
- Deepfakes
44.visual industrial inspection(工业检测)
- ReConPatch: Contrastive Patch Representation Learning for Industrial Anomaly Detection
- High-Fidelity Zero-Shot Texture Anomaly Localization Using Feature Correspondence Analysis
- 图像异常检测
- 表面异常检测
- 图像异常定位
- 视觉异常检测
- 零样本异常检测
- 轨迹异常检测
- 人类行为理解
- OOD
43.Image Fusion(图像融合)
- Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion<br>:star:code
42.Image Classification(图像分类)
- Semantic Generative Augmentations for Few-Shot Counting
- Learning Quality Labels for Robust Image Classification
- Visual Narratives: Large-Scale Hierarchical Classification of Art-Historical Images
- Benchmark Generation Framework With Customizable Distortions for Image Classifier Robustness
- Deep Subdomain Alignment for Cross-Domain Image Classification
- Online Class-Incremental Learning for Real-World Food Image Classification
- An Empirical Investigation Into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification
- Letting 3D Guide the Way: 3D Guided 2D Few-Shot Image Classification
- 长尾视觉识别
- 多标签图像分类
- 小样本分类
- 多视图分类
- 海草分类
- 细粒度
- 鸟类物种分类
41.Image Progress(低层图像处理、质量评价)
- 图像恢复
- Conditional Velocity Score Estimation for Image Restoration
- UGPNet: Universal Generative Prior for Image Restoration
- PAIR: Perception Aided Image Restoration for Natural Driving Conditions
- LatentDR: Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration
- Efficient Layout-Guided Image Inpainting for Mobile Use
- 图像修复
- 图像矫正
- 图像增强
- 图像去噪
- Self-Supervised Denoising Transformer With Gaussian Process
- Spiking Denoising Diffusion Probabilistic Models
- Image Denoising and the Generative Accumulation of Photons
- Fixed Pattern Noise Removal for Multi-View Single-Sensor Infrared Camera
- LIVENet: A Novel Network for Real-World Low-Light Image Denoising and Enhancement
- 图像去雾
- 图像去闪光
- 图像去反射
- 图像去模糊
- Sharp-NeRF: Grid-Based Fast Deblurring Neural Radiance Fields Using Sharpness Prior
- Deep Plug-and-Play Nighttime Non-Blind Deblurring With Saturated Pixel Handling Schemes
- Deblur-NSFF: Neural Scene Flow Fields for Blurry Dynamic Scenes
- Single-Image Deblurring, Trajectory and Shape Recovery of Fast Moving Objects With Denoising Diffusion Probabilistic Models
- 图像去阴影
- 图像质量评估
- 图像颜色编辑
40.Self/Semi-supervised learning
- 无监督学习
- 半监督学习
- SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learning<br>:star:code
- Debiasing, calibrating, and improving Semi-supervised Learning performance via simple Ensemble Projector<br>:star:code
- Universal Semi-Supervised Model Adaptation via Collaborative Consistency Training
- Improving Open-Set Semi-Supervised Learning With Self-Supervision
- Appearance-Based Curriculum for Semi-Supervised Learning With Multi-Angle Unlabeled Data
- 自监督学习
- Self-Supervised Learning of Semantic Correspondence Using Web Videos
- CycleCL: Self-supervised Learning for Periodic Videos
- Self-Supervised Representation Learning With Cross-Context Learning Between Global and Hypercolumn Features
- Self-Supervised Learning for Visual Relationship Detection through Masked Bounding Box Reconstruction<br>:star:code
- Self-Supervised Learning for Place Representation Generalization Across Appearance Changes
- Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where
- MGM-AE: Self-Supervised Learning on 3D Shape Using Mesh Graph Masked Autoencoders
39.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/域适应)
- 零样本学习
- 小样本学习
- DG
- Learning Class and Domain Augmentations for Single-Source Open-Domain Generalization
- On the Fly Neural Style Smoothing for Risk-Averse Domain Generalization
- Domain Generalization With Correlated Style Uncertainty
- Randomized Adversarial Style Perturbations for Domain Generalization
- Domain Generalisation via Risk Distribution Matching
- Domain Generalization by Rejecting Extreme Augmentations
- Single Domain Generalization via Normalised Cross-Correlation Based Convolutions
- STYLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-Based Domain Generalization
- DA
- Gradual Source Domain Expansion for Unsupervised Domain Adaptation
- Continual Test-Time Domain Adaptation via Dynamic Sample Selection
- Bridging Generalization Gaps in High Content Imaging Through Online Self-Supervised Domain Adaptation<br>:star:code
- GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap<br>:star:code
- Aligning Non-Causal Factors for Transformer-Based Source-Free Domain Adaptation<br>:house:project
- Robust Unsupervised Domain Adaptation Through Negative-View Regularization
- ReCLIP: Refine Contrastive Language Image Pre-Training With Source Free Domain Adaptation
- Stochastic Binary Network for Universal Domain Adaptation
- D3GU: Multi-Target Active Domain Adaptation via Enhancing Domain Alignment
- Feed-Forward Latent Domain Adaptation
38.Visual Representation Learning
<a name="37"/>37.Machine Learning(机器学习)
- 元学习
- 持续学习/增量学习
- MoP-CLIP: A Mixture of Prompt-Tuned CLIP Models for Domain Incremental Learning
- Efficient Expansion and Gradient Based Task Inference for Replay Free Incremental Learning
- 类增量
- Expanding Hyperspherical Space for Few-Shot Class-Incremental Learning
- Overcoming Catastrophic Forgetting for Multi-Label Class-Incremental Learning
- An Analysis of Initial Training Strategies for Exemplar-Free Class-Incremental Learning
- Wakening Past Concepts without Past Data: Class-Incremental Learning from Online Placebos<br>:star:code
- Robust Feature Learning and Global Variance-Driven Classifier Alignment for Long-Tail Class Incremental Learning
- TCP: Triplet Contrastive-Relationship Preserving for Class-Incremental Learning
- MICS: Midpoint Interpolation To Learn Compact and Separated Representations for Few-Shot Class-Incremental Learning
- CL
- Plasticity-Optimized Complementary Networks for Unsupervised Continual Learning
- Kaizen: Practical Self-Supervised Continual Learning With Continual Fine-Tuning
- Evolve: Enhancing Unsupervised Continual Learning With Multiple Experts
- Steering Prototypes With Prompt-Tuning for Rehearsal-Free Continual Learning
- 度量学习/Metric Learning
- 对抗学习
- 主动学习
- 联邦学习
- Gradient Coreset for Federated Learning
- Late to the Party? On-Demand Unlabeled Personalized Federated Learning
- MetaVers: Meta-Learned Versatile Representations for Personalized Federated Learning
- Maximum Knowledge Orthogonality Reconstruction With Gradients in Federated Learning
- Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning
- TransFed: A Way To Epitomize Focal Modulation Using Transformer-Based Federated Learning
- Mixing Gradients in Neural Networks as a Strategy To Enhance Privacy in Federated Learning
- 对比学习
- 强化学习
- 迁移学习
- 多任务学习
36.NLP
<a name="35"/>35.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)
- Wino Vidi Vici: Conquering Numerical Instability of 8-Bit Winograd Convolution for Accurate Inference Acceleration on Edge
- 量化
- Reducing the Side-Effects of Oscillations in Training of Quantized YOLO Networks
- Improved Techniques for Quantizing Deep Networks With Adaptive Bit-Widths
- Evidential Uncertainty Quantification: A Variance-Based Perspective
- Edge Inference With Fully Differentiable Quantized Mixed Precision Neural Networks
- 剪枝
- Token Fusion: Bridging the Gap between Token Pruning and Token Merging
- Torque Based Structured Pruning for Deep Neural Network
- Pruning From Scratch via Shared Pruning Module and Nuclear Norm-Based Regularization
- Towards Better Structured Pruning Saliency by Reorganizing Convolution
- PATROL: Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks
- KD
- Frequency Attention for Knowledge Distillation
- Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-Free Continual Learning
- Towards Domain-Aware Knowledge Distillation for Continual Model Generalization
- Reverse Knowledge Distillation: Training a Large Model Using a Small One for Retinal Image Matching on Limited Data
34.NAS
- FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer<br>:star:code
- Hardware Aware Evolutionary Neural Architecture Search Using Representation Similarity Metric
33.Optical Flow Estimation(光流估计)
- Detection Defenses: An Empty Promise against Adversarial Patch Attacks on Optical Flow<br>:star:code
- CCMR: High Resolution Optical Flow Estimation via Coarse-to-Fine Context-Guided Motion Reasoning<br>:star:code
32.Scene Flow Estimation(场景流估计)
<a name="31"/>31.Automated Driving(自动驾驶)
- 车道线检测
- 自动驾驶
- Re-Evaluating LiDAR Scene Flow for Autonomous Driving
- NVAutoNet: Fast and Accurate 360deg 3D Visual Perception for Self Driving
- Driving Through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving
- StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction<br>:star:code
- 驾驶员损伤评估
- 交通标志检测
- 障碍物检测
- 驾驶员动作意图识别
30.GNN/GCN
- GNN
- 图网络
29.Scene Graph Generation(场景图生成)
- Self-Supervised Relation Alignment for Scene Graph Generation
- Refine and Redistribute: Multi-Domain Fusion and Dynamic Label Assignment for Unbiased Scene Graph Generation
28.Point-Cloud(点云)
- MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds
- Cross-Domain Few-Shot Incremental Learning for Point-Cloud Recognition
- Sparse Convolutional Networks for Surface Reconstruction From Noisy Point Clouds
- LidarCLIP or: How I Learned To Talk to Point Clouds
- FinderNet: A Data Augmentation Free Canonicalization Aided Loop Detection and Closure Technique for Point Clouds in 6-DOF Separation
- Indoor Visual Localization Using Point and Line Correspondences in Dense Colored Point Cloud
- SSP: Semi-Signed Prioritized Neural Fitting for Surface Reconstruction From Unoriented Point Clouds
- 3D 点云
- 点云配准
- 点云补全
- 点云分割
- 点云分类
27.Human-Object Interactions(人物交互)
- Exploiting CLIP for Zero-Shot HOI Detection Requires Knowledge Distillation at Multiple Levels
- Task-Oriented Human-Object Interactions Generation With Implicit Neural Representations
- Beyond Active Learning: Leveraging the Full Potential of Human Interaction via Auto-Labeling, Human Correction, and Human Verification
- Bipartite Graph Diffusion Model for Human Interaction Generation
26.Human Motion Prediction(人体运动预测)
- Incorporating Physics Principles for Precise Human Motion Prediction
- Context-Based Interpretable Spatio-Temporal Graph Convolutional Network for Human Motion Forecasting
- 人体运动合成
25.Multimodal(多模态)
- Dynamic Multimodal Information Bottleneck for Multimodality Classification<br>:star:code
- CoD: Coherent Detection of Entities From Images With Multiple Modalities
- Multimodal Deep Learning for Remote Stress Estimation Using CCT-LSTM
- Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining<br>:star:code
- OmniVec: Learning robust representations with cross modal sharing
- Complementary-Contradictory Feature Regularization Against Multimodal Overfitting<br>:star:code
- Learning Intra-Class Multimodal Distributions With Orthonormal Matrices
- EASUM: Enhancing Affective State Understanding Through Joint Sentiment and Emotion Modeling for Multimodal Tasks
- CLIP
24.Lage Language Models(大语言模型)
<a name="23"/>23.Vision-Language(视觉语言)
- Multitask Vision-Language Prompt Tuning
- Improving Fairness Using Vision-Language Driven Image Augmentation
- Empowering Unsupervised Domain Adaptation With Large-Scale Pre-Trained Vision-Language Models
- Can Vision-Language Models Be a Good Guesser? Exploring VLMs for Times and Location Reasoning
- Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding
- Improving Vision-and-Language Reasoning via Spatial Relations Modeling
- MIVC: Multiple Instance Visual Component for Visual-Language Models
22.Visual Answer Questions(视觉问答)
- RankDVQA: Deep VQA Based on Ranking-Inspired Hybrid Training
- POP-VQA - Privacy Preserving, On-Device, Personalized Visual Question Answering
- Benchmarking Out-of-Distribution Detection in Visual Question Answering
- Can You Even Tell Left From Right? Presenting a New Challenge for VQA
- 视觉对话
- AVQA
- ArtVQA
21.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)
- 虚拟试穿
- 虚拟化身
- 机器人
- 导航
- 视觉定位
- 轨迹预测
20.GAN/生成
- FacadeNet: Conditional Facade Synthesis via Selective Editing
- Synthesizing Anyone, Anywhere, in Any Pose
- GAN
- Consistent Multimodal Generation via a Unified GAN Framework
- StyleGenes: Discrete and Efficient Latent Distributions for GANs
- Improving the Fairness of the Min-Max Game in GANs Training
- StyleGAN-Fusion: Diffusion Guided Domain Adaptation of Image Generators
- PlantPlotGAN: A Physics-Informed Generative Adversarial Network for Plant Disease Prediction
- P2D: Plug and Play Discriminator for Accelerating GAN Frameworks
- Soft Curriculum for Learning Conditional GANs With Noisy-Labeled and Uncurated Unlabeled Data
- What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion
- Improving the Leaking of Augmentations in Data-Efficient GANs via Adaptive Negative Data Augmentation
- PETIT-GAN: Physically Enhanced Thermal Image-Translating Generative Adversarial Network
- 图像生成
- 图像合成
- 文本-图像
- CLIPAG: Towards Generator-Free Text-to-Image Generation
- Customizing 360-Degree Panoramas Through Text-to-Image Diffusion Models
- Text-to-Image Models for Counterfactual Explanations: A Black-Box Approach
- TIAM - A Metric for Evaluating Alignment in Text-to-Image Generation
- Localization and Manipulation of Immoral Visual Cues for Safe Text-to-Image Generation
- Unsupervised Co-Generation of Foreground-Background Segmentation From Text-to-Image Synthesis
- 图像-文本
- 视频合成
- 扩散模型
- Fast Diffusion EM: A Diffusion Model for Blind Inverse Problems With Application to Deconvolution
- Expanding Expressiveness of Diffusion Models with Limited Data via Self-Distillation based Fine-Tuning
- Preserving Image Properties Through Initializations in Diffusion Models
- Exploiting the Signal-Leak Bias in Diffusion Models
- Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
- Common Diffusion Noise Schedules and Sample Steps Are Flawed
- Training-Free Content Injection Using H-Space in Diffusion Models
- PoseDiff: Pose-Conditioned Multimodal Diffusion Model for Unbounded Scene Synthesis From Sparse Inputs
- Diffusion Models Meet Image Counter-Forensics
- PathLDM: Text Conditioned Latent Diffusion Model for Histopathology
- Synthesizing Coherent Story With Auto-Regressive Latent Diffusion Models
- Towards More Realistic Membership Inference Attacks on Large Diffusion Models
- Dual Domain Diffusion Guidance for 3D CBCT Metal Artifact Reduction
- 图像翻译
- 图像-图像翻译
- 文本-3D
- 文本-视频
- 合成图像检测
19.Object Pose Estimation(物体姿态估计)
- 6D
- 物体计数
- 目标重识别
18.Animal
- 犬类姿态分析
- 动物重识别
17.Human Pose Estimation(人体姿态估计)
- Re-VoxelDet: Rethinking Neck and Head Architectures for High-Performance Voxel-Based 3D Detection
- DiffBody: Diffusion-Based Pose and Shape Editing of Human Images
- Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation
- Rethinking Visibility in Human Pose Estimation: Occluded Pose Reasoning via Transformers
- Active Transfer Learning for Efficient Video-Specific Human Pose Estimation<br>:star:code
- LInKs "Lifting Independent Keypoints" - Partial Pose Lifting for Occlusion Handling With Improved Accuracy in 2D-3D Human Pose Estimation
- 3D HPE
- 3D Human Pose Estimation With Two-Step Mixed-Training Strategy
- Unsupervised 3D Pose Estimation With Non-Rigid Structure-From-Motion Modeling
- Back to Optimization: Diffusion-Based Zero-Shot 3D Human Pose Estimation
- MotionAGFormer: Enhancing 3D Human Pose Estimation With a Transformer-GCNFormer Network
- UNSPAT: Uncertainty-Guided SpatioTemporal Transformer for 3D Human Pose and Shape Estimation on Videos
- A Geometry Loss Combination for 3D Human Pose Estimation
- Robust Category-Level 3D Pose Estimation From Diffusion-Enhanced Synthetic Data
- 多身体网格检测
- 人定位与姿态分类
- 三维人体网格恢复
- 人体姿态与网格重建
- 着装人体重建
- 手部
- 手部重建
- 手语翻译
- 手语制作
- 手部姿态估计
- 手势检测
- 抄写员手识别
- 交互式分割
- 人体轮廓提取
- 动作捕捉
- 人体动画
16.Action Detection(动作检测)
- Context in Human Action Through Motion Complementarity
- 小样本动作检测
- 细粒度动作识别
- 时序动作分割
- 时序动作检测
- 动作检测
- Embodied Human Activity Recognition
- JOADAA: Joint Online Action Detection and Action Anticipation
- A Hybrid Graph Network for Complex Activity Detection in Video
- Differentially Private Video Activity Recognition
- Embedding Task Structure for Action Detection
- Egocentric Action Recognition by Capturing Hand-Object Contact and Object State
- Exploring the Impact of Rendering Method and Motion Quality on Model Performance When Using Multi-View Synthetic Data for Action Recognition
- Learnable Cube-Based Video Encryption for Privacy-Preserving Action Recognition
- 动作预测
- 动作分割
- 动作分类
- 动作合成
- 动作质量评估
- 重复动作计数
15.Video
- Detecting Content Segments From Online Sports Streaming Events: Challenges and Solutions
- 视频理解
- 视频分割
- 视频识别
- 视频稳定
- 视频重建
- 视频监控
- 视频分析
- 视频和谐化
- 录像带修复
- 视频时刻检索
- 视频目标定位
- 电影类型分类
- 视频质量增强
- VAD
14.OCR(文本检测识别)
- DTrOCR: Decoder-only Transformer for Optical Character Recognition
- On Manipulating Scene Text in the Wild with Diffusion Models
- DECDM: Document Enhancement using Cycle-Consistent Diffusion Models
- 文本检测
- Text Spotting
- Scene-Text Spotting
- Document Dewarping(文档矫正)
- 场景文本理解
- 文档布局分割
- 字体生成
- 信息提取
13.Reid(人员重识别/步态识别/行人检测)
- Reid
- Privacy-Enhancing Person Re-Identification Framework - A Dual-Stage Approach
- HashReID: Dynamic Network with Binary Codes for Efficient Person Re-identification
- Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person ReID
- Source-Guided Similarity Preservation for Online Person Re-Identification
- Contrastive Viewpoint-Aware Shape Learning for Long-Term Person Re-Identification
- 可见光红外Reid
- 行人识别
- 行人搜索
- 行人检测
- HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information
- Beyond Fusion: Modality Hallucination-Based Multispectral Fusion for Pedestrian Detection
- Booster-SHOT: Boosting Stacked Homography Transformations for Multiview Pedestrian Detection With Attention
- Enhancing Multi-View Pedestrian Detection Through Generalized 3D Feature Pulling
- Favoring One Among Equals - Not a Good Idea: Many-to-One Matching for Robust Transformer Based Pedestrian Detection
- 人群计数
- 步态识别
- 人流估计
- 行人属性识别
12.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)
- CHAI: Craters in Historical Aerial Images
- 树冠检测
- 变化检测
- Implicit Neural Representation for Change Detection
- Semi-Supervised Scene Change Detection by Distillation From Feature-Metric Alignment
- Effective Restoration of Source Knowledge in Continual Test Time Adaptation
- SeaDSC: A video-based unsupervised method for dynamic scene change detection in unmanned surface vehicles
- 图像分割
- 卫星图像分类
- 无人机图像检测
- 航空视频动作识别
- 遥感显著目标检测
11.Object Tracking(目标跟踪)
- So You Think You Can Track?
- MFT: Long-Term Tracking of Every Pixel
- RGB-D Mapping and Tracking in a Plenoxel Radiance Field
- Separable Self and Mixed Attention Transformers for Efficient Object Tracking<br>:star:code
- Tracking Tiny Insects in Cluttered Natural Environments Using Refinable Recurrent Neural Networks
- Tracking Skiers from the Top to the Bottom<br>:star:code
- Leveraging the Power of Data Augmentation for Transformer-Based Tracking
- Automated Monitoring of Ear Biting in Pigs by Tracking Individuals and Events
- VEATIC: Video-Based Emotion and Affect Tracking in Context Dataset
- MOT
- CAMOT: Camera Angle-Aware Multi-Object Tracking
- Beyond SOT: Tracking Multiple Generic Objects at Once
- Contrastive Learning for Multi-Object Tracking with Transformers
- ConfTrack: Kalman Filter-Based Multi-Person Tracking by Utilizing Confidence Score of Detection Box
- FRoG-MOT: Fast and Robust Generic Multiple-Object Tracking by IoU and Motion-State Associations
10.Object Detector(目标检测)
- Label-Free Synthetic Pretraining of Object Detectors
- BALF: Simple and Efficient Blur Aware Local Feature Detector
- Robust Object Detection in Challenging Weather Conditions
- Gradient-Guided Knowledge Distillation for Object Detectors
- Defending Object Detection Models Against Image Distortions
- RGB-X Object Detection via Scene-Specific Fusion Modules
- Interpretable Object Recognition by Semantic Prototype Analysis
- MultIOD: Rehearsal-free Multihead Incremental Object Detector
- On the Importance of Large Objects in CNN Based Object Detection Algorithms
- Towards Few-Annotation Learning for Object Detection: Are Transformer-based Models More Efficient ?
- TPSeNCE: Towards Artifact-Free Realistic Rain Generation for Deraining and Object Detection in Rain<br>:star:code
- Beyond Classification: Definition and Density-based Estimation of Calibration in Object Detection
- Time To Shine: Fine-Tuning Object Detection Models With Synthetic Adverse Weather Images
- BoostRad: Enhancing Object Detection by Boosting Radar Reflections
- Patch-Based Selection and Refinement for Early Object Detection
- Identifying Label Errors in Object Detection Datasets by Loss Inspection
- Efficient Feature Distillation for Zero-Shot Annotation Object Detection
- Data Augmentation for Object Detection via Controllable Diffusion Models
- Investigating the Role of Attribute Context in Vision-Language Models for Object Recognition and Detection
- 协同显著目标检测
- 开放词汇目标检测
- 半监督目标检测
- 弱监督目标检测
- 域适应目标检测
- 伪装目标检测
- 显著目标检测
- 开集目标检测
- 小目标检测
- 3D OD
- Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D Object Detection
- A Robust Diffusion Modeling Framework for Radar Camera 3D Object Detection
- Attentive Prototypes for Source-Free Unsupervised Domain Adaptive 3D Object Detection
- Monocular 3D Object Detection With LiDAR Guided Semi Supervised Active Learning
- SOAP: Cross-Sensor Domain Adaptation for 3D Object Detection Using Stationary Object Aggregation Pseudo-Labelling
- VOD
- 主题关系检测
- 牛识别
- 苹果检测
- 鹅卵石识别
9.Image Segmentation(图像分割)
- Segment Anything, From Space?
- Object Aware Contrastive Prior for Interactive Image Segmentation
- Robust Source-Free Domain Adaptation for Fundus Image Segmentation<br>:star:code
- High-Fidelity Pseudo-Labels for Boosting Weakly-Supervised Segmentation
- 全景分割
- 实例分割
- 语义分割
- Unsupervised Domain Adaptation for Semantic Segmentation with Pseudo Label Self-Refinement
- Learning To Generate Training Datasets for Robust Semantic Segmentation
- Single Frame Semantic Segmentation Using Multi-Modal Spherical Images
- MetaSeg: MetaFormer-Based Global Contexts-Aware Network for Efficient Semantic Segmentation
- Dynamic Token-Pass Transformers for Semantic Segmentation
- Joint Depth Prediction and Semantic Segmentation With Multi-View SAM
- OVeNet: Offset Vector Network for Semantic Segmentation
- TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation
- Location-Aware Self-Supervised Transformers for Semantic Segmentation
- Rethinking Knowledge Distillation With Raw Features for Semantic Segmentation
- Classifying Cable Tendency With Semantic Segmentation by Utilizing Real and Simulated RGB Data
- Uncertainty-Weighted Loss Functions for Improved Adversarial Attacks on Semantic Segmentation
- BPKD: Boundary Privileged Knowledge Distillation for Semantic Segmentation
- FAKD: Feature Augmented Knowledge Distillation for Semantic Segmentation
- Residual Graph Convolutional Network for Bird's-Eye-View Semantic Segmentation
- What's Outside the Intersection? Fine-Grained Error Analysis for Semantic Segmentation Beyond IoU
- 3D语义分割
- 开集语义分割
- 细粒度语义分割
- 半监督语义分割
- 弱监督语义分割
- Foundation Model Assisted Weakly Supervised Semantic Segmentation
- Small Objects Matters in Weakly-supervised Semantic Segmentation
- Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentation
- Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation
- Prompting Classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation
- PrivObfNet: A Weakly Supervised Semantic Segmentation Model for Data Protection
- 域泛化语义分割
- 多模态语义分割
- 开放词汇语义分割
- 场景理解
- TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding<br>:star:code
- Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
- RS2G: Data-Driven Scene-Graph Extraction and Embedding for Robust Autonomous Perception and Scenario Understanding
- 全景图像
- 小样本分割
- 语义场景分割
- VOS
- VSS
- VIS
- VPS
- 抠图
8.Face(人脸技术)
- ProS: Facial Omni-Representation Learning via Prototype-based Self-Distillation
- Improving Fairness using Vision-Language Driven Image Augmentation<br>:star:code
- FPAD
- 3D人脸
- 说话头
- 年龄分类
- 人脸分析
- 人脸验证
- 人脸识别
- 人脸检测
- 人脸生成
- 人脸恢复
- Diffuse and Restore: A Region-Adaptive Diffusion Model for Identity-Preserving Blind Face Restoration
- ENTED: Enhanced Neural Texture Extraction and Distribution for Reference-Based Blind Face Restoration
- Show Your Face: Restoring Complete Facial Images From Partial Observations for VR Meeting
- Personalized Face Inpainting With Diffusion Models by Parallel Visual Attention
- 人脸风格迁移
- 人脸表情识别
- 人脸表情编辑
- 人脸超分辨率
- 人脸变形攻击检测
- 面部动作单元检测
- Face Relighting
7.3D(三维重建\三维视觉)
- BEVMap: Map-Aware BEV Modeling for 3D Perception
- Slice and Conquer: A Planar-to-3D Framework for Efficient Interactive Segmentation of Volumetric Images
- ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes
- 深度估计
- Semi-Supervised Semantic Depth Estimation using Symbiotic Transformer and NearFarMix Augmentation
- Camera-Independent Single Image Depth Estimation From Defocus Blur
- Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation
- Continual Learning of Unsupervised Monocular Depth from Videos
- MonoProb: Self-Supervised Monocular Depth Estimation with Interpretable Uncertainty<br>:star:code
- 三维重建
- Toward Planet-Wide Traffic Camera Calibration<br>:house:project
- Registered and Segmented Deformable Object Reconstruction from a Single View Point Cloud
- Multi-View 3D Object Reconstruction and Uncertainty Modelling With Neural Shape Prior
- 3D Reconstruction of Interacting Multi-Person in Clothing From a Single Image
- SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction
- 房间布局
- Stereo Matching
- MVS
- 三维场景重建
6.Medical Image(医学图像处理)
- SC-MIL: Supervised Contrastive Multiple Instance Learning for Imbalanced Classification in Pathology
- 3D
- CT
- MRI
- Brainomaly: Unsupervised Neurologic Disease Detection Utilizing Unannotated T1-Weighted Brain MR Images
- Controllable Text-to-Image Synthesis for Multi-Modality MR Images
- Constrained Probabilistic Mask Learning for Task-Specific Undersampled MRI Reconstruction
- IR-FRestormer: Iterative Refinement With Fourier-Based Restormer for Accelerated MRI Reconstruction
- Continual Atlas-Based Segmentation of Prostate MRI
- Longformer: Longitudinal Transformer for Alzheimer's Disease Classification With Structural MRIs
- Unsupervised Domain Adaptation of MRI Skull-Stripping Trained on Adult Data to Newborns
- Unsupervised Exemplar-Based Image-to-Image Translation and Cascaded Vision Transformers for Tagged and Untagged Cardiac Cine MRI Registration
- X-Ray
- GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-Ray Classification
- CXR-IRGen: An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs
- I-AI: A Controllable & Interpretable AI System for Decoding Radiologists' Intense Focus for Accurate CXR Diagnoses
- 报告生成
- 息肉检查
- 医学图像分割
- G-CASCADE: Efficient Cascaded Graph Convolutional Decoding for 2D Medical Image Segmentation
- From Denoising Training To Test-Time Adaptation: Enhancing Domain Generalization for Medical Image Segmentation
- Self-Sampling Meta SAM: Enhancing Few-Shot Medical Image Segmentation With Meta-Learning
- SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images
- FreMIM: Fourier Transform Meets Masked Image Modeling for Medical Image Segmentation
- AFTer-SAM: Adapting SAM With Axial Fusion Transformer for Medical Imaging Segmentation
- Med-DANet V2: A Flexible Dynamic Architecture for Efficient Medical Volumetric Segmentation
- CSAM: A 2.5D Cross-Slice Attention Module for Anisotropic Volumetric Medical Image Segmentation
- Beyond Self-Attention: Deformable Large Kernel Attention for Medical Image Segmentation
- SynergyNet: Bridging the Gap between Discrete and Continuous Representations for Precise Medical Image Segmentation
- MIST: Medical Image Segmentation Transformer with Convolutional Attention Mixing (CAM) Decoder
- 医学图像分类
- 大脑年龄预测
- 糖尿病视网膜分类
- 病理切片图像
- 牙齿分割
- 息肉分割
5.Image/Video Composition(图像/视频压缩)
- Differentiable JPEG: The Devil is in the Details<br>:star:code
- IC
- VC
4.Image/Video Caption(图像/视频字幕)
- Simple Token-Level Confidence Improves Caption Correctness
- CLID: Controlled-Length Image Descriptions With Limited Data
- Describe Images in a Boring Way: Towards Cross-Modal Sarcasm Generation
- Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models<br>:star:code
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
3.Image/Video Retrieval(图像/视频检索)
- 图像检索
- 食谱检索
- 3D形状检索
- 文本-穿搭检索(时尚推荐)
- 文本-形状检索
2.Super-Resolution(超分辨率)
- PDA-RWSR: Pixel-Wise Degradation Adaptive Real-World Super-Resolution
- BSRAW: Improving Blind RAW Image Super-Resolution
- Scene Text Image Super-resolution based on Text-conditional Diffusion Models
- Best of Both Worlds: Learning Arbitrary-Scale Blind Super-Resolution via Dual Degradation Representations and Cycle-Consistency
- Lightweight Thermal Super-Resolution and Object Detection for Robust Perception in Adverse Weather Conditions
- SupeRVol: Super-Resolution Shape and Reflectance Estimation in Inverse Volume Rendering
- ICF-SRSR: Invertible Scale-Conditional Function for Self-Supervised Real-World Single Image Super-Resolution
- WaveMixSR: Resource-Efficient Neural Network for Image Super-Resolution
- FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices With a Simple Super-Resolution Pipeline
- 3D Super-Resolution Model for Vehicle Flow Field Enrichment
- Meta-Learned Kernel for Blind Super-Resolution Kernel Estimation
- VCISR: Blind Single Image Super-Resolution With Video Compression Synthetic Data
- VSR
1.其它
- Collage Diffusion
- Taming Normalizing Flows
- Learning Saliency From Fixations
- FIRE: Food Image to REcipe Generation
- WATCH: Wide-Area Terrestrial Change Hypercube
- Design Choices for Enhancing Noisy Student Self-Training
- Pixel-Grounded Prototypical Part Networks
- Bag of Tricks for Fully Test-Time Adaptation
- Fixing Overconfidence in Dynamic Neural Networks
- Learning to Read Analog Gauges from Synthetic Data
- DISCO: Distributed Inference With Sparse Communications
- Learning To Recognize Occluded and Small Objects With Partial Inputs
- Towards a Dynamic Vision Sensor-Based Insect Camera Trap<br>:star:code
- Framework-Agnostic Semantically-Aware Global Reasoning for Segmentation
- ArcAid: Analysis of Archaeological Artifacts Using Drawings
- Controlling Character Motions Without Observable Driving Source
- Assessing Neural Network Robustness via Adversarial Pivotal Tuning
- FIRe: Fast Inverse Rendering Using Directional and Signed Distance Functions
- INCODE: Implicit Neural Conditioning With Prior Knowledge Embeddings
- Robust Learning via Conditional Prevalence Adjustment
- Link Prediction for Flow-Driven Spatial Networks
- Lightweight Portrait Matting via Regional Attention and Refinement
- NCIS: Neural Contextual Iterative Smoothing for Purifying Adversarial Perturbations
- SynthProv: Interpretable Framework for Profiling Identity Leakage
- Analyzing the Domain Shift Immunity of Deep Homography Estimation
- RSMPNet: Relationship Guided Semantic Map Prediction
- Universal Test-Time Adaptation Through Weight Ensembling, Diversity Weighting, and Prior Correction
- 360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View
- Composite Diffusion: whole >= Sparts
- CARE: Counterfactual-Based Algorithmic Recourse for Explainable Pose Correction
- Tunable Hybrid Proposal Networks for the Open World
- Leveraging Next-Active Objects for Context-Aware Anticipation in Egocentric Videos
- Neural Textured Deformable Meshes for Robust Analysis-by-Synthesis
- Revisiting Pixel-Level Contrastive Pre-Training on Scene Images
- Enforcing Sparsity on Latent Space for Robust and Explainable Representations
- Spiking Neural Networks for Active Time-Resolved SPAD Imaging
- Increasing Biases Can Be More Efficient Than Increasing Weights
- Membership Inference Attack Using Self Influence Functions
- Using Early Readouts To Mediate Featural Bias in Distillation
- Depth From Asymmetric Frame-Event Stereo: A Divide-and-Conquer Approach
- HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Image and Event Modalities
- Assist Is Just As Important as the Goal: Image Resurfacing To Aid Model's Robust Prediction
- Improving Normalization With the James-Stein Estimator
- Understanding Dark Scenes by Contrasting Multi-Modal Observations
- Causal Analysis for Robust Interpretability of Neural Networks
- Causal Feature Alignment: Learning To Ignore Spurious Background Features
- Estimating Fog Parameters From an Image Sequence Using Non-Linear Optimisation
- Rethinking Multimodal Content Moderation From an Asymmetric Angle With Mixed-Modality
- CATS: Combined Activation and Temporal Suppression for Efficient Network Inference
- SEMA: Semantic Attention for Capturing Long-Range Dependencies in Egocentric Lifelogs
- Cross-Feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Data
- Neural Echos: Depthwise Convolutional Filters Replicate Biological Receptive Fields
- SLoSH: Set Locality Sensitive Hashing via Sliced-Wasserstein Embeddings
- Auto-BPA: An Enhanced Ball-Pivoting Algorithm With Adaptive Radius Using Contextual Bandits
- Unsupervised Model-Based Learning for Simultaneous Video Deflickering and Deblotching
- USDN: A Unified Sample-Wise Dynamic Network With Mixed-Precision and Early-Exit
- pSTarC: Pseudo Source Guided Target Clustering for Fully Test-Time Adaptation
- PressureVision++: Estimating Fingertip Pressure From Diverse RGB Images
- Approximating Intersections and Differences Between Linear Statistical Shape Models Using Markov Chain Monte Carlo
- FOUND: Foot Optimization With Uncertain Normals for Surface Deformation Using Synthetic Data
- PatchRefineNet: Improving Binary Segmentation by Incorporating Signals From Optimal Patch-Wise Binarization
- Adaptive Deep Neural Network Inference Optimization With EENet
- Partial Binarization of Neural Networks for Budget-Aware Efficient Learning
- What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection
- Concept-Centric Transformers: Enhancing Model Interpretability Through Object-Centric Concept Learning Within a Shared Global Workspace
- Solving the Plane-Sphere Ambiguity in Top-Down Structure-From-Motion
- Guided Cluster Aggregation: A Hierarchical Approach to Generalized Category Discovery
- Recognition of Unseen Bird Species by Learning From Field Guides
- Leveraging Task-Specific Pre-Training To Reason Across Images and Videos
- Graph(Graph): A Nested Graph-Based Framework for Early Accident Anticipation
- Training-Free Layout Control With Cross-Attention Guidance
- Shape-Guided Diffusion With Inside-Outside Attention
- ArcGeo: Localizing Limited Field-of-View Images Using Cross-View Matching
- A Visual Active Search Framework for Geospatial Exploration
- TriPlaneNet: An Encoder for EG3D Inversion
- Learning Visual Body-Shape-Aware Embeddings for Fashion Compatibility
- Simple Post-Training Robustness Using Test Time Augmentations and Random Forest
- ClusterFix: A Cluster-Based Debiasing Approach Without Protected-Group Supervision
- DREAM: Visual Decoding From Reversing Human Visual System
- Volumetric Disentanglement for 3D Scene Manipulation
- Seeing Stars: Learned Star Localization for Narrow-Field Astrometry
- Hybrid Neural Diffeomorphic Flow for Shape Representation and Generation via Triplane
- REALM: Robust Entropy Adaptive Loss Minimization for Improved Single-Sample Test-Time Adaptation
- Occlusion Sensitivity Analysis with Augmentation Subspace Perturbation in Deep Feature Space
- SENetV2: Aggregated dense layer for channelwise and global representations
- RecycleNet: Latent Feature Recycling Leads to Iterative Decision Refinement
- Learning Low-Rank Latent Spaces with Simple Deterministic Autoencoder: Theoretical and Empirical Insights
- Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates and Implicit Duplication Modeling With IoU-Aware Calibration
- Learning Robust Deep Visual Representations from EEG Brain Recordings
- MACP: Efficient Model Adaptation for Cooperative Perception<br>:star:code
- The Background Also Matters: Background-Aware Motion-Guided Objects Discovery
- FATE: Feature-Agnostic Transformer-based Encoder for learning generalized embedding spaces in flow cytometry data<br>:star:code
- Instruct Me More! Random Prompting for Visual In-Context Learning<br>:star:code
- Mini but Mighty: Finetuning ViTs with Mini Adapters
- Efficient Semantic Matching with Hypercolumn Correlation
- MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters<br>:star:code
- Layer-wise Auto-Weighting for Non-Stationary Test-Time Adaptation
- A Neural Height-Map Approach for the Binocular Photometric Stereo Problem
- CrashCar101: Procedural Generation for Damage Assessment
- Self-Annotated 3D Geometric Learning for Smeared Points Removal<br>:star:code<br>:house:project
- Few-shot Shape Recognition by Learning Deep Shape-aware Features
- Learning to Compose SuperWeights for Neural Parameter Allocation Search
- Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection
- Hybrid Sample Synthesis-based Debiasing of Classifier in Limited Data Setting
- Label Shift Estimation for Class-Imbalance Problem: A Bayesian Approach
- AMEND: Adaptive Margin and Expanded Neighborhood for Efficient Generalized Category Discovery
- PreciseDebias: An Automatic Prompt Engineering Approach for Generative AI To Mitigate Image Demographic Biases
- Shape-Biased CNNs Are Not Always Superior in Out-of-Distribution Robustness
- Efficient Transferability Assessment for Selection of Pre-Trained Detectors
- TEGLO: High Fidelity Canonical Texture Mapping From Single-View Images
- Adversarial Likelihood Estimation With One-Way Flows
- Panelformer: Sewing Pattern Reconstruction From 2D Garment Images
- Concurrent Band Selection and Traversability Estimation From Long-Wave Hyperspectral Imagery in Off-Road Settings
- Diverse Imagenet Models Transfer Better
- CL-MAE: Curriculum-Learned Masked Autoencoders
- Improved Topological Preservation in 3D Axon Segmentation and Centerline Detection Using Geometric Assessment-Driven Topological Smoothing (GATS)
- Think Before You Simulate: Symbolic Reasoning To Orchestrate Neural Computation for Counterfactual Question Answering
- MIM
2020 年论文分类汇总戳这里
↘️CVPR-2020-Papers ↘️ECCV-2020-Papers
<a name="00"/>2021 年论文分类汇总戳这里
↘️ICCV-2021-Papers ↘️CVPR-2021-Papers
<a name="000"/>2022 年论文分类汇总戳这里
↘️CVPR-2022-Papers ↘️WACV-2022-Papers ↘️ECCV-2022-Papers