Awesome
ECCV-2022-Papers
官网链接:https://eccv2022.ecva.net/
截稿日期:2022年3月7日(9:59PM CET, 11:59AM PST)
会议日期:2022年10月24日-2022年10月28日
历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~
2022 年论文分类汇总戳这里
↘️CVPR-2022-Papers ↘️WACV-2022-Papers ↘️ECCV-2022-Papers
2021年论文分类汇总戳这里
↘️ICCV-2021-Papers ↘️CVPR-2021-Papers
2020 年论文分类汇总戳这里
↘️CVPR-2020-Papers ↘️ECCV-2020-Papers
❣❣❣另外打包下载ECCV 2022论文,可在【我爱计算机视觉】微信公众号后台回复“paper”。共计 1645 篇。分类完成
:trophy::trophy::trophy: 获奖论文
- 最佳论文奖
- 最佳论文荣誉奖
- Koenderink Prize (test of time)
- Best Demo Award
- [Using a Smartphone for Augmented Reality in a Classroom]<br>:tv:video
- Everingham Prize
- 【The UCF101 and HMD51 dataset teams】&【Walter J. Scheirer 】
61.Light Field(光学、几何、光场成像)
- 相机相关
- 相机姿势
- 相机估计
- 相机自动校准
- 事件相机
- 相机重识别
- 相机定位
- 光场
60.Data Augmentation(数据增强)
- TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers<br>:star:code
- Neuromorphic Data Augmentation for Training Spiking Neural Networks
- 3D Random Occlusion and Multi-layer Projection for Deep Multi-Camera Pedestrian Localization<br>:star:code
59.Image Matching(图像匹配)
- ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer<br>:house:project
- ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement<br>:star:code:house:project
58.Human Motion Prediction(人体动作预测)
- ERA: Expert Retrieval and Assembly for Early Action Prediction
- Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction
- GIMO: Gaze-Informed Human Motion Prediction in Context<br>:star:code
- Diverse Human Motion Prediction Guided by Multi-level Spatial-Temporal Anchors<br>:star:code
- 行动预测
- 运动估计
- PREF: Predictability Regularized Neural Motion Fields<br>:open_mouth:oral
- 人体运动合成
57.Scene Graph Generation(场景图生成)
- Panoptic Scene Graph Generation<br>:star:code:house:project
- Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
- Hierarchical Memory Learning for Fine-Grained Scene Graph Generation
- Fine-Grained Scene Graph Generation with Data Transfer<br>:star:code
- Towards Open-Vocabulary Scene Graph Generation with Prompt-Based Finetuning
56.Sound
- Learning Visual Styles from Audio-Visual Associations<br>:house:project
- Active Audio-Visual Separation of Dynamic Sound Sources<br>:house:project
- 声源定位
- 有源扬声器检测
- 音频驱动的视频肖像生成
- Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation<br>:open_mouth:oral:house:project
- 视听分割
- Audio-Visual Segmentation<br>:star:code
- Audio—Visual Segmentation<br>:star:code
- 语音合成
- 声音分离
55.Style Transfer(风格迁移)
- CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer<br>:open_mouth:oral:star:code
- Learning Graph Neural Networks for Image Style Transfer
- ARF: Artistic Radiance Fields<br>:house:project
- 图像风格化
- 发型迁移
54.View Generation(视图生成)
- InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images<br>:open_mouth:oral
- CompNVS: Novel View Synthesis with Scene Completion
- HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields<br>:star:code
- Neural Radiance Transfer Fields for Relightable Novel-View Synthesis with Global Illumination
- R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis<br>:house:project
- NeXT: Towards High Quality Neural Radiance Fields via Multi-Skip Transformer<br>:star:code
53.Dataset(数据集)
- The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning<br>:sunflower:dataset
- Responsive Listening Head Generation: A Benchmark Dataset and Baseline<br>:sunflower:dataset
- Online Segmentation of LiDAR Sequences: Dataset and Algorithm<br>:sunflower:dataset
- COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts<br>:star:code<br>用于识别任意或截断文本的漫画拟声词数据集
- BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis<br>:sunflower:dataset<br>用于舞蹈动作合成的霹雳舞比赛数据集
- CelebV-HQ: A Large-Scale Video Facial Attributes Dataset<br>:sunflower:dataset:house:project<br>一个大规模的视频人脸属性数据集
- UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture<br>:star:code:house:project<br>用于鲁棒性以自我为中心的三维人类运动捕捉的新数据集
- BEAT: A Large-Scale Semantic and Emotional Multi-modal Dataset for Conversational Gestures Synthesis<br>:sunflower:dataset<br>:newspaper:ECCV 2022 | 76小时动捕,最大规模数字人多模态数据集开源
- MovieCuts: A New Dataset and Benchmark for Cut Type Recognition<br>:sunflower:dataset<br>剪切类型识别
- A Real World Dataset for Multi-View 3D Reconstruction<br>:sunflower:dataset<br>三维重建
- Capturing, Reconstructing, and Simulating: The UrbanScene3D Dataset<br>:sunflower:dataset<br>城市场景重建
- PartImageNet: A Large, High-Quality Dataset of Parts<br>分割
- A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge<br>:sunflower:dataset<br>VQA
- OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images
- The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing<br>:sunflower:dataset<br>视频编辑
- ClearPose: Large-Scale Transparent Object Dataset and Benchmark<br>:sunflower:dataset<br>深度估计
- AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment<br>:sunflower:dataset<br>动画名人头像数据集
- A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing<br>用于室内和室外场景解析的密集材料分割数据集
- MimicME: A Large Scale Diverse 4D Database for Facial Expression Analysis<br>用于面部表情分析的大规模多样化4D数据库
- Delving into Universal Lesion Segmentation: Method, Dataset, and Benchmark<br>:sunflower:dataset<br>病变分割
52.Scene Flow Estimation(场景流估计)
- Bi-PointFlowNet: Bidirectional Learning for Point Cloud Based Scene Flow Estimation<br>:star:code
- What Matters for 3D Scene Flow Network<br>:star:code
- MonoPLFlowNet: Permutohedral Lattice FlowNet for Real-Scale 3D Scene Flow Estimation with Monocular Images
51.Anomaly Detection(异常检测)
- Registration based Few-Shot Anomaly Detection<br>:open_mouth:oral:star:code
- Dynamic Local Aggregation Network with Adaptive Clusterer for Anomaly Detection<br>:star:code
- DSR – A Dual Subspace Re-Projection Network for Surface Anomaly Detection<br>:star:code
- Locally Varying Distance Transform for Unsupervised Visual Anomaly Detection
- SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation<br>:star:code
- HaloAE: An HaloNet based Local Transformer Auto-Encoder for Anomaly Detection and Localization
- Hierarchical Semi-Supervised Contrastive Learning for Contamination-Resistant Anomaly Detection<br>:star:code
- 表面异常检测
50.Neural Rendering(渲染)
- Relighting4D: Neural Relightable Human from Videos<br>:star:code:house:project:tv:video
- MPIB: An MPI-Based Bokeh Rendering Framework for Realistic Partial Occlusion Effects<br>:star:code:house:project
- NeuMan: Neural Human Radiance Field from a Single Video<br>:star:code
- Approximate Differentiable Rendering with Algebraic Surfaces<br>:star:code:house:project
- AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields<br>:star:code:house:project
- Generalizable Patch-Based Neural Rendering<br>:open_mouth:oral:star:code:house:project
- Deforming Radiance Fields with Cages<br>:star:code:house:project
- NeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing<br>:open_mouth:oral:star:code:house:project
- ActiveNeRF: Learning where to See with Uncertainty Estimation<br>:star:code
- ARAH: Animatable Volume Rendering of Articulated Human SDFs<br>:star:code:house:project
- LaTeRF: Label and Text Driven Object Radiance Fields
- MoFaNeRF: Morphable Facial Neural Radiance Field<br>:star:code
- Conditional-Flow NeRF: Accurate 3D Modelling with Reliable Uncertainty Quantification
- Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields<br>:star:code
- KeypointNeRF: Generalizing Image-Based Volumetric Avatars Using Relative Spatial Encoding of Keypoints<br>:house:project
- ViewFormer: NeRF-Free Neural Rendering from Few Images Using Transformers<br>:star:code
- GeoAug: Data Augmentation for Few-Shot NeRF with Geometry Constraints
- SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image<br>:star:code
- BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-Scale Scene Rendering
49.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/适应)
- 小样本
- Cross-Domain Cross-Set Few-Shot Learning via Learning Compact and Aligned Representations<br>:star:code
- Self-Supervision Can Be a Good Few-Shot Learner<br>:star:code
- VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments<br>:house:project
- Contrastive Prototypical Network with Wasserstein Confidence Penalty<br>:star:code
- tSF: Transformer-Based Semantic Filter for Few-Shot Learning
- Worst Case Matters for Few-Shot Recognition
- Learning Instance and Task-Aware Dynamic Kernels for Few-Shot Learning
- Self-Promoted Supervision for Few-Shot Transformer<br>:star:code
- Coarse-to-Fine Incremental Few-Shot Learning
- Improving Few-Shot Learning through Multi-task Representation Learning Theory
- TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning
- Kernel Relative-Prototype Spectral Filtering for Few-Shot Learning<br>:star:code
- Uncertainty-DTW for Time Series and Sequences
- 零样本
- 域适应
- Prior Knowledge Guided Unsupervised Domain Adaptation<br>:star:code
- MoDA: Map Style Transfer for Self-Supervised Domain Adaptation of Embodied Agents
- CoSMix: Compositional Semantic Mix for Domain Adaptation in 3D LiDAR Segmentation<br>:star:code
- GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D LiDAR Segmentation<br>:star:code
- Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation<br>:star:code
- MemSAC: Memory Augmented Sample Consistency for Large Scale Domain Adaptation<br>:star:code:house:project
- Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation<br>:star:code:house:project
- Combating Label Distribution Shift for Active Domain Adaptation
- Uncertainty-guided Source-free Domain Adaptation<br>:star:code
- Learning Unbiased Transferability for Domain Adaptation by Uncertainty Modeling
- Unknown-Oriented Learning for Open Set Domain Adaptation
- Burn after Reading: Online Adaptation for Cross-Domain Streaming Data<br>:star:code
- Adversarial Partial Domain Adaptation by Cycle Inconsistency
- A Broad Study of Pre-training for Domain Generalization and Adaptation
- Interpretable Open-Set Domain Adaptation via Angular Margin Separation
- Contrastive Vicinal Space for Unsupervised Domain Adaptation<br>:star:code
- Incomplete Multi-View Domain Adaptation via Channel Enhancement and Knowledge Transfer
- BMD: A General Class-Balanced Multicentric Dynamic Prototype Strategy for Source-Free Domain Adaptation<br>:star:code
- 域泛化
- Grounding Visual Representations with Texts for Domain Generalization<br>:star:code
- Improving Test-Time Adaptation via Shift-agnostic Weight Regularization and Nearest Source Prototypes
- Attention Diversification for Domain Generalization<br>:star:code
- Cross-Domain Ensemble Distillation for Domain Generalization
- Domain Generalization by Mutual-Information Regularization with Pre-trained Models<br>:star:code
- MVDG: A Unified Multi-View Framework for Domain Generalization<br>:star:code
48.Semantic Correspondence(语义对应)
- Demystifying Unsupervised Semantic Correspondence Estimation<br>:star:code:house:project
- Learning Semantic Correspondence with Sparse Annotations
47.GNN/GCN(图神经网络)
- GCN
- GNN
46.Continual Learning(持续学习)
- Balancing Stability and Plasticity through Advanced Null Space in Continual Learning<br>:open_mouth:oral
- Online Continual Learning with Contrastive Vision Transformer
- Helpful or Harmful: Inter-Task Association in Continual Learning
- Theoretical Understanding of the Information Flow on Continual Learning Performance<br>:star:code
- Transfer without Forgetting<br>:star:code
- incDFM: Incremental Deep Feature Modeling for Continual Novelty Detection
- Online Task-Free Continual Learning with Dynamic Sparse Distributed Memory<br>:star:code
- Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective
- CoSCL: Cooperation of Small Continual Learners Is Stronger than a Big One<br>:star:code
- DualPrompt: Complementary Prompting for Rehearsal-Free Continual Learning<br>:star:code
45.Metric Learning(度量学习)
- DAS: Densely-Anchored Sampling for Deep Metric Learning<br>:star:code
- Posterior Refinement on Metric Matrix Improves Generalization Bound in Metric Learning
- A Non-Isotropic Probabilistic Take On Proxy-Based Deep Metric Learning<br>:star:code
44.Active Learning(主动学习)
- When Active Learning Meets Implicit Semantic Data Augmentation
- PT4AL: Using Self-Supervised Pretext Tasks for Active Learning<br>:star:code
43.Lifelong Learning(终生学习)
<a name="42"/>42.Reinforcement Learning(强化学习)
- Style-Agnostic Reinforcement Learning<br>:star:code
- StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning<br>:star:code
- Learning Efficient Multi-agent Cooperative Visual Exploration<br>:house:project
- DexMV: Imitation Learning for Dexterous Manipulation from Human Videos<br>:house:project
41.Incremental Learning(增量学习)
- Learning with Recoverable Forgetting
- Incremental Task Learning with Incremental Rank Updates<br>:star:code
- DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning
- 类增量
- Class-incremental Novel Class Discovery<br>:star:code
- Long-Tailed Class Incremental Learning<br>:star:code
- Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free Replay
- Few-Shot Class-Incremental Learning from an Open-Set Perspective<br>:star:code
- Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer<br>:star:code:house:project
- R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning<br>:star:code
- FOSTER: Feature Boosting and Compression for Class-Incremental Learning
- S3C: Self-Supervised Stochastic Classifiers for Few-Shot Class-Incremental Learning<br>:star:code
40.Adversarial Learning(对抗学习)
- Prior-Guided Adversarial Initialization for Fast Adversarial Training<br>:star:code<br>:newspaper:ECCV 2022 | 一种基于先验指导的对抗样本初始化方法
- BIPS: Bi-modal Indoor Panorama Synthesis via Residual Depth-Aided Adversarial Learning<br>:star:code
- Decoupled Adversarial Contrastive Learning for Self-supervised Adversarial Robustness<br>:open_mouth:oral:star:code
- RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN<br>:star:code
- Adversarial Coreset Selection for Efficient Robust Training
- Shape Matters: Deformable Patch Attack
- Enhanced Accuracy and Robustness via Multi-Teacher Adversarial Distillation<br>:star:code
- GradAuto: Energy-Oriented Attack on Dynamic Neural Networks<br>:star:code
- Learning Energy-Based Models with Adversarial Training
- Revisiting Outer Optimization in Adversarial Training
- One Size Does NOT Fit All: Data-Adaptive Adversarial Training<br>:star:code
- UniCR: Universally Approximated Certified Robustness via Randomized Smoothing
- ℓ∞-Robustness and Beyond:Unleashing Efficient Adversarial Training
- Towards Efficient Adversarial Training on Vision Transformers
- FrequencyLowCut Pooling - Plug & Play against Catastrophic Overfitting<br>:star:code
- TAFIM: Targeted Adversarial Attacks against Facial Image Manipulations<br>:house:project
- 对抗攻击
- Frequency Domain Model Augmentation for Adversarial Attack<br>:star:code
- Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal<br>:star:code
- SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness
- Scaling Adversarial Training to Large Perturbation Bounds<br>:star:code
- Towards Effective and Robust Neural Trojan Defenses via Input Filtering
- Exploiting the Local Parabolic Landscapes of Adversarial Losses to Accelerate Black-Box Adversarial Attack<br>:star:code
- Robust Network Architecture Search via Feature Distortion Restraining
- Triangle Attack: A Query-Efficient Decision-Based Adversarial Attack<br>:star:code
- Adaptive Image Transformations for Transfer-Based Adversarial Attack
- 黑盒
- 白盒
- 对抗样本
39.Transfer Learning(迁移学习)
- Factorizing Knowledge in Neural Networks<br>:star:code
- SecretGen: Privacy Recovery on Pre-trained Models via Distribution Discrimination<br>:star:code
- How Stable Are Transferability Metrics Evaluations?
- Language-Driven Artistic Style Transfer
- MultiMAE: Multi-modal Multi-task Masked Autoencoders<br>:house:project
38.Contrastive Learning(对比学习)
- Network Binarization via Contrastive Learning<br>:star:code
- Adversarial Contrastive Learning via Asymmetric InfoNCE<br>:star:code
- Fast-MoCo: Boost Momentum-based Contrastive Learning with Combinatorial Patches<br>:star:code
- Contrastive Learning for Diverse Disentangled Foreground Generation
- Decoupled Contrastive Learning
- Joint Learning of Localized Representations from Medical Images and Reports
- Contrasting Quadratic Assignments for Set-Based Representation Learning
- Generative Subgraph Contrast for Self-Supervised Graph Representation Learning<br>:star:code
37.Open-set Recognition(开集识别)
- DenseHybrid: Hybrid Anomaly Detection for Dense Open-set Recognition
- Difficulty-Aware Simulator for Open Set Recognition<br>:star:code
36.Machine Learning(机器学习)
<a name="35"/>35.Feature Learning(联邦学习)
- SphereFed: Hyperspherical Federated Learning
- Image Coding for Machines with Omnipotent Feature Learning
- Addressing Heterogeneity in Federated Learning via Distributional Transformation<br>:star:code
- FedLTN: Federated Learning for Sparse and Personalized Lottery Ticket Networks
- Improving Generalization in Federated Learning by Seeking Flat Minima
- AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation
34.Meta-Learning(元学习)
- Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach
- Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions
- Learning to Weight Samples for Dynamic Early-exiting Networks<br>:star:code
- Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning<br>:star:code
33.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)
- 知识蒸馏
- Knowledge Condensation Distillation<br>:star:code
- FedX: Unsupervised Federated Learning with Cross Knowledge Distillation<br>:star:code
- Black-box Few-shot Knowledge Distillation<br>:star:code
- Efficient One Pass Self-distillation with Zipf's Label Smoothing<br>:star:code
- MixSKD: Self-Knowledge Distillation from Mixup for Image Recognition<br>:star:code
- Switchable Online Knowledge Distillation<br>:star:code
- Distilling the Undistillable: Learning from a Nasty Teacher<br>:star:code
- Masked Generative Distillation<br>:star:code
- DistPro: Searching a Fast Knowledge Distillation Process via Meta Optimization<br>:star:code
- Personalized Education: Blind Knowledge Distillation<br>:star:code
- Prune Your Model before Distill It<br>:star:code
- IDa-Det: An Information Discrepancy-Aware Distillation for 1-Bit Detectors<br>:star:code
- Deep Ensemble Learning by Diverse Knowledge Distillation for Fine-Grained Object Classification
- A Fast Knowledge Distillation Framework for Visual Recognition<br>:house:project
- Self-Regulated Feature Learning via Teacher-Free Feature Distillation<br>:house:project
- 量化
- Synergistic Self-supervised and Quantization Learning<br>:open_mouth:oral:star:code
- PalQuant: Accelerating High-precision Networks on Low-precision Accelerators<br>:star:code
- Fine-Grained Data Distribution Alignment for Post-Training Quantization<br>:star:code
- Symmetry Regularization and Saturating Nonlinearity for Robust Quantization
- Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance
- Non-uniform Step Size Quantization for Accurate Post-Training Quantization<br>:star:code
- Towards Accurate Network Quantization with Equivalent Smooth Regularizer
- Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization
- BASQ: Branch-Wise Activation-Clipping Search Quantization for Sub-4-Bit Neural Networks<br>:star:code
- RDO-Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization
- PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization
- 剪枝
- FairGRAPE: Fairness-aware GRAdient Pruning mEthod for Face Attribute Classification<br>:star:code
- Bayesian Optimization with Clustering and Rollback for CNN Auto Pruning<br>:star:code
- Trainability Preserving Neural Structured Pruning<br>:star:code
- Interpretations Steered Network Pruning via Amortized Inferred Saliency Maps<br>:star:code
- Data-Free Backdoor Removal Based on Channel Lipschitzness<br>:star:code
- Multi-Granularity Pruning for Model Acceleration on Mobile Devices
- Ensemble Knowledge Guided Sub-network Search and Fine-Tuning for Filter Pruning
- Soft Masking for Cost-Constrained Channel Pruning<br>:star:code
- Towards Ultra Low Latency Spiking Neural Networks for Vision and Sequential Tasks Using Temporal Pruning
- CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution
- Filter Pruning via Feature Discrimination in Deep Neural Networks
- 轻量级
- MC
32.Point Cloud(点云)
- Few 'Zero Level Set'-Shot Learning of Shape Signed Distance Functions in Feature Space
- FH-Net: A Fast Hierarchical Network for Scene Flow Estimation on Real-World Point Clouds<br>:star:code
- Dynamic 3D Scene Analysis by Point Cloud Accumulation<br>:star:code:house:project
- Point Cloud Compression with Sibling Context and Surface Priors
- LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds
- Point MixSwap: Attentional Point Cloud Mixing via Swapping Matched Structural Divisions<br>:star:code
- MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes<br>:star:code
- Bottom Up Top down Detection Transformers for Language Grounding in Images and Point Clouds<br>:house:project
- PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees<br>:star:code
- Learning to Generate Realistic LiDAR Point Clouds<br>:house:project
- PseudoAugment: Learning to Use Unlabeled Data for Data Augmentation in Point Clouds
- SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement<br>:star:code
- Resolution-Free Point Cloud Sampling Network with Data Distillation<br>:star:code
- diffConv: Analyzing Irregular Point Clouds with an Irregular View<br>:star:code
- GraphFit: Learning Multi-Scale Graph-Convolutional Representation for Point Cloud Normal Estimation<br>:star:code
- Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap<br>:star:code
- PD-Flow: A Point Cloud Denoising Framework with Normalizing Flows<br>:star:code
- Shape-Pose Disentanglement Using SE(3)-Equivariant Vector Neurons
- Revisiting Point Cloud Simplification: A Learnable Feature Preserving Approach
- Masked Autoencoders for Point Cloud Self-Supervised Learning<br>:star:code
- Masked Discrimination for Self-Supervised Learning on Point Clouds<br>:star:code
- Meta-Sampler: Almost-Universal yet Task-Oriented Sampling for Point Clouds<br>:star:code
- Efficient Point Cloud Analysis Using Hilbert Curve
- RFNet-4D: Joint Object Reconstruction and Flow Estimation from 4D Point Clouds<br>:star:code
- 3D点云
- Autoregressive 3D Shape Generation via Canonical Mapping<br>:star:code
- Point Cloud Domain Adaptation via Masked Local 3D Structure Prediction<br>:star:code
- Exploring the Devil in Graph Spectral Domain for 3D Point Cloud Attacks<br>:star:code
- Unsupervised Learning of 3D Semantic Keypoints with Mutual Reconstruction<br>:star:code
- Few-Shot Class-Incremental Learning for 3D Point Cloud Objects<br>:star:code
- Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models<br>:star:code
- Manifold Adversarial Learning for Cross-Domain 3D Shape Representation
- 点云定位
- 点云分割
- 点云补全
- SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer<br>:star:code
- FBNet: Feedback Network for Point Cloud Completion<br>:open_mouth:oral:star:code
- Optimization over Disentangled Encoding: Unsupervised Cross-Domain Point Cloud Completion via Occlusion Factor Manipulation<br>:star:code
- 点云配准
- SuperLine3D: Self-supervised Line Segmentation and Description for LiDAR Point Cloud<br>:star:code
- Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation<br>:star:code
- PointCLM: A Contrastive Learning-Based Framework for Multi-Instance Point Cloud Registration<br>:star:code
- PCR-CG: Point Cloud Registration via Deep Explicit Color and Geometry
- 点云重建
- 点云分类
- 点云理解
31.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)
- 增强现实
- VR
- LiP-Flow: Learning Inference-Time Priors for Codec Avatars via Normalizing Flows in Latent Space
- human volumetric capture(容积捕获)
- 虚拟试穿
- 视觉定位(相机姿势估计)
- 机器人
30.Optical Flow(光流)
- Secrets of Event-Based Optical Flow<br>:star:code
- Deep 360∘ Optical Flow Estimation Based on Multi-Projection Fusion
- Learning Omnidirectional Flow in 360-degree Video via Siamese Representation<br>:house:project
- Video Interpolation by Event-driven Anisotropic Adjustment of Optical Flow
- Learning Omnidirectional Flow in 360° Video via Siamese Representation<br>:house:project
- FlowFormer: A Transformer Architecture for Optical Flow
- Complementing Brightness Constancy with Deep Networks for Optical Flow Prediction
- Disentangling Architecture and Training for Optical Flow<br>:house:project
- A Perturbation-Constrained Adversarial Attack for Evaluating the Robustness of Optical Flow<br>:open_mouth:oral:star:code
- Optical Flow Training under Limited Label Budget via Active Learning<br>:star:code
- S2F2: Single-Stage Flow Forecasting for Future Multiple Trajectories Prediction
- Semi-Supervised Learning of Optical Flow by Flow Supervisor<br>:star:code
- Deep 360° Optical Flow Estimation Based on Multi-Projection Fusion
29.Re-identification(重识别)
- 重识别
- Negative Samples are at Large: Leveraging Hard-distance Elastic Loss for Re-identification
- PASS: Part-Aware Self-Supervised Pre-training for Person Re-identification<br>:star:code
- Adaptive Cross-Domain Learning for Generalizable Person Re-identification<br>:star:code
- Dynamically Transformed Instance Normalization Network for Generalizable Person Re-identification
- Mimic Embedding via Adaptive Aggregation: Learning Generalizable Person Re-identification<br>:star:code
- Modality Synergy Complement Learning with Cascaded Aggregation for Visible-Infrared Person Re-identification<br>:star:code
- Cross-Modality Transformer for Visible-Infrared Person Re-identification
- Optimal Transport for Label-Efficient Visible-Infrared Person Re-identification <br>:star:code
- Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification
- 行人搜索
- 人群计数
- Visual Search
- Target-Absent Human Attention<br>:star:code
- 步态识别
28.Neural Architecture Search(神经架构搜索)
- SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning<br>:star:code
- UniNet: Unified Architecture Search with Convolution, Transformer, and MLP<br>:star:code
- ScaleNet: Searching for the Model to Scale<br>:star:code
- CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS<br>:star:code
- Towards Regression-Free Neural Networks for Diverse Compute Platforms
- LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds
- U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture Search<br>:star:code
- A Max-Flow Based Approach for Neural Architecture Search
- ViTAS: Vision Transformer Architecture Search
- Learning Where to Look – Generative NAS Is Surprisingly Efficient<br>:star:code
- Neural Architecture Search for Spiking Neural Networks
- Data-Free Neural Architecture Search via Recursive Label Calibration<br>:star:code
27.Image Classification(图像分类)
- Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset<br>:star:code
- Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels<br>:star:code
- Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space<br>:star:code
- Constructing Balance from Imbalance for Long-tailed Image Recognition<br>:star:code
- No Token Left Behind: Explainability-Aided Image Classification and Generation<br>:star:code
- Interpretable Image Classification with Differentiable Prototypes Assignment<br>:star:code
- Rotation Regularization without Rotation<br>:star:code
- Revisiting a kNN-based Image Classification System with High-capacity Storage
- In Defense of Image Pre-training for Spatiotemporal Recognition<br>:star:code
- Augmenting Deep Classifiers with Polynomial Neural Networks
- A Dataset Generation Framework for Evaluating Megapixel Image Classifiers & their Explanations
- Cartoon Explanations of Image Classifiers
- Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification<br>:house:project
- SSBNet: Improving Visual Recognition Efficiency by Adaptive Sampling
- AutoMix: Unveiling the Power of Mixup for Stronger Classifiers
- MaxViT: Multi-axis Vision Transformer<br>:star:code
- Self-Feature Distillation with Uncertainty Modeling for Degraded Image Recognition
- Three Things Everyone Should Know about Vision Transformers
- RealPatch: A Statistical Matching Framework for Model Patching with Real Samples<br>:star:code
- TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs<br>:star:code
- Automatic Check-Out via Prototype-Based Classifier Learning from Single-Product Exemplars<br>:star:code
- Embedding Contrastive Unsupervised Features to Cluster in- and Out-of-Distribution Noise in Corrupted Image Datasets
- Unsupervised Few-Shot Image Classification by Learning Features into Clustering Space
- 小样本图像分类
- Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation<br>:star:code
- Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification<br>:star:code
- Adversarial Feature Augmentation for Cross-domain Few-shot Classification<br>:star:code
- Few-Shot Classification with Contrastive Learning
- 多标签分类
- 长尾分类
- SAFA: Sample-Adaptive Feature Augmentation for Long-Tailed Image Classification
- Invariant Feature Learning for Generalized Long-Tailed Classification<br>:star:code
- Tackling Long-Tailed Category Distribution Under Domain Shifts<br>:star:code:house:project
- Identifying Hard Noise in Long-Tailed Sample Distribution<br>:open_mouth:oral:star:code
- On Multi-Domain Long-Tailed Recognition, Imbalanced Domain Generalization and Beyond<br>:star:code
- 视觉分类
- Visual Knowledge Tracing<br>:star:code
- 细粒度识别
- 长尾识别
- Towards Calibrated Hyper-Sphere Representation via Distribution Overlap Coefficient for Long-tailed Learning<br>:star:code
- Breadcrumbs: Adversarial Class-Balanced Sampling for Long-Tailed Recognition<br>:star:code
- VL-LTR: Learning Class-Wise Visual-Linguistic Representation for Long-Tailed Visual Recognition<br>:star:code
26.Video/Image Super-Resolution(视频/图像超分辨率)
- 跨模态超分辨率
- 图像超分辨率
- Image Super-Resolution with Deep Dictionary<br>:star:code
- CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution<br>:star:code
- Reference-based Image Super-Resolution with Deformable Attention Transformer<br>:star:code
- KXNet: A Model-Driven Deep Neural Network for Blind Super-Resolution<br>:star:code
- Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images
- Boosting Event Stream Super-Resolution with a Recurrent Neural Network
- Learning Series-Parallel Lookup Tables for Efficient Image Super-Resolution<br>:star:code
- Efficient Long-Range Attention Network for Image Super-Resolution<br>:star:code
- Metric Learning Based Interactive Modulation for Real-World Super-Resolution<br>:star:code
- Dynamic Dual Trainable Bounds for Ultra-Low Precision Super-Resolution Networks<br>:star:code
- Perception-Distortion Balanced ADMM Optimization for Single-Image Super-Resolution<br>:star:code
- Uncertainty Learning in Kernel Estimation for Multi-stage Blind Image Super-Resolution
- MuLUT: Cooperating Multiple Look-Up Tables for Efficient Image Super-Resolution
- Adaptive Patch Exiting for Scalable Single Image Super-Resolution<br>:star:code
- From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution<br>:star:code
- Unfolded Deep Kernel Estimation for Blind Image Super-Resolution<br>:star:code
- Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution<br>:star:code
- Self-Supervised Learning for Real-World Super-Resolution from Dual Zoomed Observations<br>:star:code
- Restore Globally, Refine Locally: A Mask-Guided Scheme to Accelerate Super-Resolution Networks<br>:star:code
- Compiler-Aware Neural Architecture Search for On-Mobile Real-Time Super-Resolution<br>:star:code
- KXNet: A Model-Driven Deep Neural Network for Blind Super-Resolution<br>:star:code
- ARM: Any-Time Super-Resolution Method<br>:star:code
- D2C-SR: A Divergence to Convergence Approach for Real-World Image Super-Resolution<br>:star:code
- RRSR:Reciprocal Reference-Based Image Super-Resolution with Progressive Feature Alignment and Selection
- 视频超分辨率
- Towards Interpretable Video Super-Resolution via Alternating Optimization<br>:star:code
- Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution<br>:star:code
- Real-RawVSR: Real-World Raw Video Super-Resolution with a Benchmark Dataset<br>:star:code
- A Codec Information Assisted Framework for Efficient Compressed Video Super-Resolution
25.Autonomous vehicles(自动驾驶)
- 车辆轨迹预测
- Hierarchical Latent Structure for Multi-Modal Vehicle Trajectory Forecasting<br>:star:code
- Action-based Contrastive Learning for Trajectory Prediction
- D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights<br>:star:code
- AdvDO: Realistic Adversarial Attacks for Trajectory Prediction<br>:house:project
- 自动驾驶
- ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning<br>:star:code
- Resolving Copycat Problems in Visual Imitation Learning via Residual Action Prediction
- Dfferentiable Raycasting for Self-supervised Occupancy Forecasting<br>:star:code
- Self-Distillation for Robust LiDAR Semantic Segmentation in Autonomous Driving<br>:star:code
- V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer<br>:star:code
- Radatron: Accurate Detection Using Multi-Resolution Cascaded MIMO Radar<br>:house:project
- Rethinking Closed-Loop Training for Autonomous Driving
- Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving
- KING: Generating Safety-Critical Driving Scenarios for Robust Imitation via Kinematics Gradients<br>:open_mouth:oral:house:project
- InAction: Interpretable Action Decision Making for Autonomous Driving<br>:star:code
- CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving<br>:house:project
- Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation<br>:open_mouth:oral:house:project
- StretchBEV: Stretching Future Instance Prediction Spatially and Temporally<br>:house:project
- BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers<br>:star:code
- Point Cloud Compression with Range Image-Based Entropy Model for Autonomous Driving
- 轨迹预测
- Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction<br>:star:code
- Aware of the History: Trajectory Forecasting with the Local Behavior Data<br>:star:code
- Social-SSL: Self-Supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction<br>:star:code
- Sequential Multi-View Fusion Network for Fast LiDAR Point Motion Estimation
- Social-Implicit: Rethinking Trajectory Prediction Evaluation and the Effectiveness of Implicit Maximum Likelihood Estimation<br>:star:code
- View Vertically: A Hierarchical Network for Trajectory Prediction via Fourier Spectrums<br>:star:code
- PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map<br>:star:code
- 车道线检测
- 行人轨迹预测
- 车辆重识别
24.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)
- 遥感
- 航空视频识别
- FAR: Fourier Aerial Video Recognition<br>:house:project
23.Medical Image(医学影像)
- The Surprisingly Straightforward Scene Text Removal Method With Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis<br>:star:code:house:project
- 医学图像分割
- Personalizing Federated Medical Image Segmentation via Local Calibration<br>:star:code
- Learning Topological Interactions for Multi-Class Medical Image Segmentation<br>:open_mouth:oral:star:code
- Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration<br>:star:code
- PointScatter: Point Set Representation for Tubular Structure Extraction<br>:open_mouth:oral:star:code
- Dual Contrastive Learning with Anatomical Auxiliary Supervision for Few-Shot Medical Image Segmentation<br>:star:code
- Auto-FedRL: Federated Hyperparameter Optimization for Multi-Institutional Medical Image Segmentation<br>:star:code
- Med-DANet: Dynamic Architecture Network for Efficient Medical Volumetric Segmentation<br>:star:code
- CXR Segmentation by AdaIN-Based Domain Adaptation and Knowledge Distillation
- 放射科报告生成
- 密集预测
- retinal image matching(视网膜图像匹配)
- 支架追踪
- 病变检测
- 医学图像分析
- 医学图像分类
- 医学关键点定位
22.OCR
- Levenshtein OCR
- 文本识别
- 手写数学表达式识别
- 场景文本检测
- Scene Text Recognition with Permuted Autoregressive Sequence Models<br>:star:code
- Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting<br>:star:code
- SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition
- Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
- Contextual Text Block Detection towards Scene Text Understanding<br>:house:project
- Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition<br>:open_mouth:oral:star:code
- GLASS: Global to Local Attention for Scene-Text Spotting<br>:star:code
- Multi-Granularity Prediction for Scene Text Recognition
- Pure Transformer with Integrated Experts for Scene Text Recognition
- Background-Insensitive Scene Text Recognition with Text Semantic Segmentation
- Detecting Tampered Scene Text in the Wild<br>:star:code
- Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting
- TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers
- Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features<br>:star:code
- OCR-Free Document Understanding Transformer<br>:star:code
- 视频文本检测
- 文本检测
- 文件图像矫正
- document unwarping
21.Semi/self-supervised learning(半/自监督)
- 无监督
- Contrastive Positive Mining for Unsupervised 3D Action Representation Learning
- Dense Siamese Network for Dense Unsupervised Learning<br>:star:code
- Contrastive Positive Mining for Unsupervised 3D Action Representation Learning
- Relative Contrastive Loss for Unsupervised Representation Learning
- DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model
- 弱监督
- 自监督
- GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning<br>:star:code
- RegionCL: Exploring Contrastive Region Pairs for Self-Supervised Representation Learning<br>:star:code
- Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment
- Differentiable Raycasting for Self-Supervised Occupancy Forecasting
- How Severe Is Benchmark-Sensitivity in Video Self-Supervised Learning?<br>:star:code
- MaCLR: Motion-Aware Contrastive Learning of Representations for Videos
- Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing
- Mc-BEiT: Multi-Choice Discretization for Image BERT Pre-training<br>:star:code
- What to Hide from Your Students: Attention-Guided Masked Image Modeling<br>:star:code
- Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning<br>:star:code
- Semantic-Aware Fine-Grained Correspondence
- Self-Supervised Classification Network<br>:star:code
- Dual-Domain Self-Supervised Learning and Model Adaption for Deep Compressive Imaging
- SdAE: Self-distillated Masked Autoencoder<br>:star:code
- RDA: Reciprocal Distribution Alignment for Robust SSL<br>:star:code
- Motion Sensitive Contrastive Learning for Self-supervised Video Representation
- Towards Efficient and Effective Self-Supervised Learning of Visual Representations<br>:star:code
- Unifying Visual Contrastive Learning for Object Recognition from a Graph Perspective
- The Challenges of Continuous Self-Supervised Learning
- GeoRefine: Self-Supervised Online Depth Refinement for Accurate Dense Mapping
- Fusion from Decomposition: A Self-Supervised Decomposition Approach for Image Fusion
- DNA: Improving Few-Shot Transfer Learning with Low-Rank Decomposition and Alignment<br>:star:code
- Self-Supervised Learning of Visual Graph Matching<br>:star:code
- DisCo: Remedying Self-Supervised Learning on Lightweight Models with Distilled Contrastive Learning<br>:star:code
- SLIP: Self-Supervision Meets Language-Image Pre-training<br>:star:code
- Domain Invariant Masked Autoencoders for Self-Supervised Learning from Multi-Domains
- Improving Self-Supervised Lightweight Model Learning via Hard-Aware Metric Distillation<br>:star:code
- Masked Siamese Networks for Label-Efficient Learning<br>:star:code
- Natural Synthetic Anomalies for Self-Supervised Anomaly Detection and Localization<br>:star:code
- Understanding Collapse in Non-Contrastive Siamese Representation Learning
- Discovering Deformable Keypoint Pyramids<br>:star:code
- 半监督
- Towards Realistic Semi-Supervised Learning
- OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning
- Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning<br>:star:code
- ConMatch: Semi-Supervised Learning with Confidence-Guided Consistency Regularization<br>:star:code
- Vibration-Based Uncertainty Estimation for Learning from Limited Supervision
- Unsupervised Selective Labeling for More Effective Semi-Supervised Learning
- RDA: Reciprocal Distribution Alignment for Robust Semi-Supervised Learning<br>:star:code
- Semi-Supervised Vision Transformers<br>:star:code
- CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation
- RVSL: Robust Vehicle Similarity Learning in Real Hazy Scenes Based on Semi-supervised Learning<br>:star:code
- Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching
- PSS: Progressive Sample Selection for Open-World Visual Representation Learning<br>:star:code
- Stochastic Consensus: Enhancing Semi-Supervised Learning with Consistency of Stochastic Classifiers
- 监督学习
20.Face(人脸)
- Effective Presentation Attack Detection Driven by Face Related Task<br>:star:code
- Facial Depth and Normal Estimation Using Single Dual-Pixel Camera<br>:star:code
- StyleFace: Towards Identity-Disentangled Face Generation on Megapixels
- Augmentation of rPPG Benchmark Datasets: Learning to Remove and Embed rPPG Signals via Double Cycle Consistent Learning from Unpaired Facial Videos<br>:star:code
- Custom Structure Preservation in Face Aging
- deepfake检测
- 三维人脸
- 活体检测
- Generative Domain Adaptation for Face Anti-Spoofing
- Multi-domain Learning for Updating Face Anti-spoofing Models<br>:star:code
- Source-Free Domain Adaptation with Contrastive Domain Alignment and Self-Supervised Exploration for Face Anti-Spoofing<br>:star:code
- Adaptive Transformers for Robust Few-Shot Cross-Domain Face Anti-Spoofing
- 人脸识别
- Controllable and Guided Face Synthesis for Unconstrained Face Recognition<br>:star:code:house:project
- Towards Robust Face Recognition with Comprehensive Search
- BoundaryFace: A mining framework with noise label self-correction for Face Recognition<br>:star:code
- Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain<br>:star:code
- OneFace: One Threshold for All
- AgeTransGAN for Facial Age Transformation with Rectified Performance Metrics<br>:star:code
- Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition<br>:star:code
- CoupleFace: Relation Matters for Face Recognition Distillation
- Towards Racially Unbiased Skin Tone Estimation via Scene Disambiguation<br>:house:project
- Pre-training Strategies and Datasets for Facial Representation Learning
- Unsupervised and Semi-Supervised Bias Benchmarking in Face Recognition
- 人脸聚类
- On Mitigating Hard Clusters for Face Clustering<br>:open_mouth:oral:star:code
- 说话人脸合成
- 谈话头像合成
- 人脸姿势估计
- 人脸交换
- 假脸检测
- UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection<br>:open_mouth:oral
- An Information Theoretic Approach for Attention-Driven Face Forgery Detection
- Exploring Disentangled Content Information for Face Forgery Detection
- Adaptive Face Forgery Detection in Cross Domain
- 人脸捕捉
- 人脸表情识别
- How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset?<br>:star:code
- Teaching with Soft Label Smoothing for Mitigating Noisy Labels in Facial Expressions<br>:star:code
- Order Learning Using Partially Ordered Data via Chainization<br>:star:code
- Emotion-Aware Multi-View Contrastive Learning for Facial Emotion Recognition<br>:star:code
- Learn-to-Decompose: Cascaded Decomposition Network for Cross-Domain Few-Shot Facial Expression Recognition<br>:star:code
- Learn from All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition<br>:star:code
- 三维人脸重建
- 人脸重现
- 人脸身份操作
- 人脸纹理合成与重建
- 人脸恢复
- 表情识别
19.Image Synthesis/Generation(图像合成)
- Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis<br>:star:code:house:project
- GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing
- Generalized Brain Image Synthesis with Transferable Convolutional Sparse Coding Networks
- Auto-regressive Image Synthesis with Integrated Quantization<br>:open_mouth:oral
- Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing<br>:star:code
- Improved Masked Image Generation with Token-Critic
- Weakly-Supervised Stitching Network for Real-World Panoramic Image Generation
- SCAM! Transferring humans between images with Semantic Cross Attention Modulation<br>:house:project
- PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation<br>:star:code
- Adaptive Feature Interpolation for Low-Shot Image Generation
- Few-Shot Image Generation with Mixup-Based Distance Learning<br>:star:code
- Multimodal Conditional Image Synthesis with Product-of-Experts GANs<br>:house:project
- Any-Resolution Training for High-Resolution Image Synthesis<br>:house:project
- 3D-Aware Indoor Scene Synthesis with Depth Priors<br>:house:project
- 图像生成
- DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta<br>:star:code
- Scraping Textures from Natural Images for Synthesis and Editing<br>:house:project
- Word-Level Fine-Grained Story Visualization
- CoGS: Controllable Generation and Search from Sketch and Style
- Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations
- Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes<br>:star:code
- 样本引导下的图像生成
- 文本-图像合成
- 从文本描述中生成不同的人类动作
- TEMOS: Generating Diverse Human Motions from Textual Descriptions<br>:open_mouth:oral:star:code:house:project
18.Image-to-Image Translation(图像到图像翻译)
- VecGAN: Image-to-Image Translation with Interpretable Latent Directions
- Vector Quantized Image-to-Image Translation<br>:star:code:house:project
- Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization<br>:star:code
- Unpaired Image Translation via Vector Symbolic Architectures<br>:open_mouth:oral:star:code
- Bi-Level Feature Alignment for Versatile Image Translation and Manipulation
- ManiFest: Manifold Deformation for Few-Shot Image Translation<br>:star:code
- 图像翻译
17.GAN
- VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
- Quantized GAN for Complex Music Generation from Dance Videos<br>:star:code
- RepMix: Representation Mixing for Robust Attribution of Synthesized Images<br>:star:code
- FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs<br>:star:code
- Generative Multiplane Images: Making a 2D GAN 3D-Aware<br>:star:code:house:project
- Generator Knows What Discriminator Should Learn in Unconditional GANs<br>:star:code
- Hierarchical Semantic Regularization of Latent Spaces in StyleGANs<br>:star:code:house:project
- Mind the Gap in Distilling StyleGANs<br>:star:code
- FurryGAN: High Quality Foreground-aware Image Synthesis<br>:house:project
- Improving GANs for Long-Tailed Data through Group Spectral Regularization<br>:star:code:house:project
- 3D-FM GAN: Towards 3D-Controllable Face Manipulation<br>:house:project
- Exploring Gradient-based Multi-directional Controls in GANs<br>:star:code
- Studying Bias in GANs through the Lens of Race
- FairStyle: Debiasing StyleGAN2 with Style Channel Manipulations<br>:house:project
- FingerprintNet: Synthesized Fingerprints for Generated Image Detection
- Detecting Generated Images by Real Images<br>:star:code
- High-Fidelity GAN Inversion with Padding Space<br>:house:project
- A Style-Based GAN Encoder for High Fidelity Reconstruction of Images and Videos<br>:star:code
- BlobGAN: Spatially Disentangled Scene Representations<br>:house:project
- GAN with Multivariate Disentangling for Controllable Hair Editing<br>:star:code
- StyleGAN-Human: A Data-Centric Odyssey of Human Generation<br>:star:code
- EAGAN: Efficient Two-Stage Evolutionary Architecture Search for GANs<br>:star:code
- JoJoGAN: One Shot Face Stylization
- HairNet: Hairstyle Transfer with Pose Changes
- EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer<br>:star:code
- Editing Out-of-Domain GAN Inversion via Differential Activations<br>:star:code
- On the Robustness of Quality Measures for GANs<br>:star:code
- Diverse Generation from a Single Video Made Possible<br>:house:project
- Rayleigh EigenDirections (REDs): Nonlinear GAN Latent Space Traversals for Multidimensional Features
- Generating Natural Images with Direct Patch Distributions Matching<br>:star:code
- TREND: Truncated Generalized Normal Density Estimation of Inception Embeddings for GAN Evaluation
- Neural Scene Decoration from a Single Photograph<br>:star:code
- ChunkyGAN: Real Image Inversion via Segments
- GAN Cocktail: Mixing GANs without Dataset Access<br>:house:project
- DuelGAN: A Duel between Two Discriminators Stabilizes the GAN Training<br>:star:code
- 线稿上色
- 图像生成
- GAN逆映射
- 妆发迁移
- 文本消除
16.Transformer
- k-means Mask Transformer<br>:star:code
- Outpainting by Queries<br>:star:code
- Laplacian Mesh Transformer: Dual Attention and Topology Aware Network for 3D Mesh Classification and Segmentation
- Locality Guidance for Improving Vision Transformers on Tiny Datasets<br>:star:code
- ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer<br>:star:code
- MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning
- TinyViT: Fast Pretraining Distillation for Small Vision Transformers<br>:star:code
- MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis
- An Impartial Take to the CNN vs Transformer Robustness Contest
- Ghost-free High Dynamic Range Imaging with Context-aware Transformer<br>:star:code
- EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers<br>:star:code
- Adaptive Token Sampling for Efficient Vision Transformers<br>:open_mouth:oral:house:project
- Self-Slimmed Vision Transformer<br>:star:code
- Are Vision Transformers Robust to Patch Perturbations?
- Selective TransHDR: Transformer-Based Selective HDR Imaging Using Ghost Region Mask
- BLT: Bidirectional Layout Transformer for Controllable Layout Generation<br>:house:project
- Convolutional Embedding Makes Hierarchical Vision Transformer Stronger
- AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transformers<br>:star:code
- Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation
- VIP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers
- Improving Vision Transformers by Revisiting High-Frequency Components<br>:star:code
- VSA: Learning Varied-Size Window Attention in Vision Transformers<br>:star:code
- DaViT: Dual Attention Vision Transformers<br>:star:code
- KVT: k-NN Attention for Boosting Vision Transformers<br>:star:code
- ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer<br>:star:code
- DeiT III: Revenge of the ViT
- Sliced Recursive Transformer<br>:star:code
- Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers<br>:house:project
- Training Vision Transformers with Only 2040 Images
15.Vision-Language(视觉语言)
- FashionViL: Fashion-Focused Vision-and-Language Representation Learning<br>:star:code
- NewsStories: Illustrating articles with visual summaries<br>:star:code
- Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding<br>:star:code
- Frozen CLIP Models are Efficient Video Learners<br>:star:code
- Generative Negative Text Replay for Continual Vision-Language Pretraining
- This Is My Unicorn, Fluffy”: Personalizing Frozen Vision-Language Representations<br>:star:code
- Contrastive Vision-Language Pre-training with Limited Resources<br>:star:code
- ASSISTER: Assistive Navigation via Conditional Instruction Generation
- X-DETR: A Versatile Architecture for Instance-Wise Vision-Language Tasks<br>:star:code
- UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
- Single-Stream Multi-level Alignment for Vision-Language Pretraining
- Most and Least Retrievable Images in Visual-Language Query Systems
- 视觉表征学习
- VLN
- Learning from Unlabeled 3D Environments for Vision-and-Language Navigation<br>:house:project
- Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments<br>:house:project
- Bridging the visual gap in VLN via semantically richer instructions
- A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility
- Learning Disentanglement with Decoupled Labels for Vision-Language Navigation<br>:star:code
- Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation<br>:star:code
- FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation<br>:star:code
- Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions
- 视觉重定位
14.Visual Answer Questions(视觉问答)
- Weakly Supervised Grounding for VQA in Vision-Language Transformers<br>:star:code
- Rethinking Data Augmentation for Robust Visual Question Answering<br>:star:code
- Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly<br>:star:code
- New Datasets and Models for Contextual Reasoning in Visual Dialog<br>:star:code
- Classification-Regression for Chart Comprehension
- AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant<br>:house:project
- Video-QA
13.Human-Object Interaction(人物交互)
- Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection<br>:star:code
- Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos<br>:star:code
- IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition
- Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection<br>:star:code
- Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows
- SAGA: Stochastic Whole-Body Grasping with Contact<br>:house:project
- Chairs Can Be Stood On: Overcoming Object Bias in Human-Object Interaction Detection
- Discovering Human-Object Interaction Concepts via Self-Compositional Learning<br>:star:code
- 交互式物体分割
- HOS
- 手物交互
- 人椅互动
12.Action Detection(人体动作检测与识别)
- 动作识别
- PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens<br>:house:project
- Source-Free Video Domain Adaptation by Learning Temporal Consistency for Action Recognition<br>:star:code
- Efficient Video Transformers with Spatial-Temporal Token Selection<br>:star:code
- Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition<br>:house:project
- Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning<br>:star:code
- An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
- Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition<br>:star:code
- Privacy-Preserving Action Recognition via Motion Difference Quantization<br>:star:code
- SOS! Self-Supervised Learning over Sets of Handled Objects in Egocentric Action Recognition
- Real-time Online Video Detection with Temporal Smoothing Transformers<br>:star:code
- CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video
- Uncertainty-Based Spatial-Temporal Attention for Online Action Detection
- Is Appearance Free Action Recognition Possible?
- Panoramic Human Activity Recognition
- Delving into Details: Synopsis-to-Detail Networks for Video Recognition<br>:star:code
- 细粒度动作识别
- 零样本动作识别
- 小样本动作识别
- 3D动作识别
- Collaborating Domain-shared and Target-specific Feature Clustering for Cross-domain 3D Action Recognition
- CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation<br>:open_mouth:oral:star:code
- Continual 3D Convolutional Neural Networks for Real-Time Processing of Videos<br>:star:code
- Egocentric Activity Recognition and Localization on a 3D Map
- 基于骨架动作识别
- 社会群体活动识别
- Hunting Group Clues with Transformers for Social Group Activity Recognition
- Entry-Flipped Transformer for Inference and Prediction of Participant Behavior
- Hunting Group Clues with Transformers for Social Group Activity Recognition
- Self-Supervised Social Relation Representation for Human Group Detection<br>:star:code
- COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality<br>:star:code
- 时序动作检测
- Semi-Supervised Temporal Action Detection with Proposal-Free Masking<br>:star:code
- Temporal Action Detection with Global Segmentation Mask Learning<br>:star:code
- ReAct: Temporal Action Detection with Relational Queries<br>:star:code
- Zero-Shot Temporal Action Detection via Vision-Language Prompting<br>:star:code
- Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions<br>:star:code
- TALLFormer: Temporal Action Localization with a Long-Memory Transformer<br>:star:code
- A Sliding Window Scheme for Online Temporal Action Localization
- Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning<br>:star:code
- 时序动作定位
- 时序动作分割
- Action Quality Assessment(行动质量评估)
- 动作定位
11.Video
- Dynamic Temporal Filtering in Video Models<br>:star:code
- Delta Distillation for Efficient Video Processing
- TDViT: Temporal Dilated Video Transformer for Dense Video Tasks<br>:star:code
- 视频合成
- 视频-视频合成
- 视频帧插值
- 视频生成
- RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos<br>:open_mouth:oral:star:code
- 视频质量评估
- 视频修复
- 视频去模糊
- 视频对话
- 有源扬声器检测(视频会议)
- VOS
- XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model<br>:star:code:house:project:tv:video
- Tackling Background Distraction in Video Object Segmentation<br>:star:code
- BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation
- Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation<br>:star:code
- Learning Quality-aware Dynamic Memory for Video Object Segmentation<br>:star:code
- Global Spectral Filter Memory Network for Video Object Segmentation<br>:star:code
- VIS
- In Defense of Online Models for Video Instance Segmentation<br>:open_mouth:oral:star:code
- Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation<br>:star:code
- Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer
- Less than Few: Self-Shot Video Instance Segmentation<br>:star:code
- Video Mask Transfiner for High-Quality Video Instance Segmentation
- SeqFormer: Sequential Transformer for Video Instance Segmentation<br>:star:code
- VSS
- VPS
- 视频抠图
- One-Trimap Video Matting<br>:star:code:tv:video
- 视频表征
- 视频传输
- 运动分割
- 视频异常检测
- 视频识别
- Temporal Saliency Query Network for Efficient Video Recognition<br>:house:project
- NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition<br>:house:project
- Expanding Language-Image Pretrained Models for General Video Recognition<br>:open_mouth:oral:star:code
- AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition
- DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition<br>:star:code
- K-Centered Patch Sampling for Efficient Video Recognition
- 视频理解
- Spotting Temporally Precise, Fine-Grained Events in Video<br>:star:code:house:project
- Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding
- Panoramic Vision Transformer for Saliency Detection in 360° Videos<br>:star:code
- Streaming Multiscale Deep Equilibrium Models<br>:house:project
- Learning Shadow Correspondence for Video Shadow Detection
- Federated Self-Supervised Learning for Video Understanding<br>:star:code
- Prompting Visual-Language Models for Efficient Video Understanding
- GraphVid: It Only Takes a Few Nodes to Understand a Video
- 视频分类
- 视频卷帘快门(Rolling shutter)
- Video Transition Effects(视频转场特效)
- 图像-视频编解码
- AlphaVC: High-Performance and Efficient Learned Video Compression
- A Cloud 3D Dataset and Application-Specific Learned Image Compression in Cloud 3D<br>:star:code
- CANF-VC: Conditional Augmented Normalizing Flows for Video Compression<br>:star:code
- Expanded Adaptive Scaling Normalization for End to End Image Compression
- Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction<br>:star:code
- Content Adaptive Latents and Decoder for Neural Image Compression
- Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression
- RAWtoBit: A Fully End-to-End Camera ISP Network
- Content-Oriented Learned Image Compression
- Implicit Neural Representations for Image Compression
- Neural Video Compression Using GANs for Detail Synthesis and Propagation
- 视频摘要
- Video Grounding
- 帧插值
- 视频分析
- 视频编辑
- 视频增强
- 视频目标重识别
- 图像视频编辑
- 视频升格
- 视频色彩传播
- 视听事件定位
- 视频活动定位
- 视听视频解析
- Video Highlight Detection
- 视频片段分类
- Video Relation Grounding
- 视频片段检索
10.Pose Estimation(物体姿势估计)
- 物体姿势
- Neural Correspondence Field for Object Pose Estimation<br>:star:code:house:project
- Zero-Shot Category-Level Object Pose Estimation<br>:star:code
- Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects<br>:star:code:house:project
- A Visual Navigation Perspective for Category-Level Object Pose Estimation<br>:star:code
- Polarimetric Pose Prediction
- RayTran: 3D Pose Estimation and Shape Reconstruction of Multiple Objects from Videos with Ray-Traced Transformers
- Gaussian Activated Neural Radiance Fields for High Fidelity Reconstruction & Pose Estimation
- 物体姿势变换
- 抓取物体姿势估计
- 4D
- 6D
- Category-Level 6D Object Pose and Size Estimation using Self-Supervised Deep Prior Deformation Networks<br>:star:code
- Sim-to-Real 6D Object Pose Estimation via Iterative Self-Training for Robotic Bin Picking<br>:house:project
- Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images<br>:house:project
- Affine Correspondences between Multi-Camera Systems for 6DOF Relative Pose Estimation<br>:star:code
- ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization<br>:house:project
- RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation<br>:star:code
- Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features
- Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World<br>:star:code
- Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation<br>:star:code
- Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image
- DProST: Dynamic Projective Spatial Transformer Network for 6D Pose Estimation<br>:star:code
- WeLSA: Learning to Predict 6D Pose from Weakly Labeled Data Using Shape Alignment
- DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation<br>:star:code
- DISP6D: Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose Estimation<br>:star:code
- Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting<br>:star:code
- 9D
9.Human Pose Estimation(人体姿态估计)
- Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation
- Pose for Everything: Towards Category-Agnostic Pose Estimation<br>:open_mouth:oral:star:code
- BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking
- PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation
- Learning Visibility for Robust Dense Human Body Estimation<br>:star:code
- D&D: Learning Human Dynamics from Dynamic Camera<br>:open_mouth:oral:star:code
- PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation<br>:star:code
- DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation<br>:star:code
- SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos<br>:star:code
- Poseur: Direct Human Pose Regression with Transformers<br>:star:code
- SimCC: A Simple Coordinate Classification Perspective for Human Pose Estimation<br>:star:code
- Regularizing Vector Embedding in Bottom-Up Human Pose Estimation<br>:star:code
- Hallucinating Pose-Compatible Scenes
- A Unified Framework for Domain Adaptive Pose Estimation<br>:star:code
- 运动捕捉
- 基于点的衣着人体建模
- 动态人体数字化
- 人体姿势与形状估计
- CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation<br>:open_mouth:oral:star:code
- Super-Resolution 3D Human Shape from a Single Low-Resolution Image<br>:star:code:house:project
- 三维人体姿势估计
- DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose Estimation<br>:star:code
- Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection
- Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation<br>:star:code
- PoseScript: 3D Human Poses from Natural Language<br>:house:project
- Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement
- 3D Human Pose Estimation Using Möbius Graph Convolutional Networks
- P-STMO: Pre-trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation<br>:star:code
- C3P: Cross-Domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation<br>:star:code
- Structural Triangulation: A Closed-Form Solution to Constrained 3D Human Pose Estimation<br>:star:code
- VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual Data<br>:star:code
- Learning to Fit Morphable Models
- EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices<br>:house:project
- AutoAvatar: Autoregressive Neural Fields for Dynamic Avatar Modeling<br>:house:project
- FLEX: Extrinsic Parameters-Free Multi-View 3D Human Motion Reconstruction<br>:house:project
- Mul-Pose
- 三维人体重建
- 3D Clothed Human Reconstruction in the Wild<br>:star:code
- DiffuStereo: High Quality Human Reconstruction via Diffusion-Based Stereo Using Sparse Cameras
- UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation<br>:star:code
- The One Where They Reconstructed 3D Humans and Environments in TV Shows<br>:star:code:house:project
- Neural Capture of Animatable 3D Human from Monocular Video
- SUPR: A Sparse Unified Part-Based Human Representation<br>:star:code:house:project
- IntegratedPIFu: Integrated Pixel Aligned Implicit Function for Single-view Human Reconstruction<br>:star:code
- Learned Vertex Descent:A New Direction for 3D Human Model Fitting<br>:star:code:house:project
- 三维交互式手部姿势估计
- 姿势合成
- TIPS: Text-Induced Pose Synthesis<br>:star:code:house:project
- 手物重建
- 人体与场景的交互
- 人体姿势建模
- Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields<br>:open_mouth:oral:house:project
- 姿势跟踪
- 三维人体网格恢复
- 三维人体运动预测与生成
- 姿势迁移
- 人体姿势预测
- 4D
- 人体网格恢复
- 手部网格估计
- 头部网格重建
- 人体网格动画
- 音频驱动的风格化手势生成
8.3D(三维视觉)
- DeepPS2: Revisiting Photometric Stereo Using Two Differently Illuminated Images<br>:star:code
- Towards High-Fidelity Single-view Holistic Reconstruction of Indoor Scenes<br>:star:code
- Self-calibrating Photometric Stereo by Neural Inverse Rendering<br>:star:code
- 3DG-STFM: 3D Geometric Guided Student-Teacher Feature Matching<br>:star:code
- Learning Online Multi-sensor Depth Fusion<br>:star:code
- Stereo Matching
- MVS
- MVSTER: Epipolar Transformer for Efficient Multi-View Stereo<br>:star:code
- KD-MVS: Knowledge Distillation Based Self-Supervised Learning for Multi-View Stereo<br>:star:code
- RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering<br>:star:code
- Multiview Stereo with Cascaded Epipolar RAFT<br>:star:code
- MVPS
- 3D场景合成
- 场景重建
- 深度估计
- Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches
- Relationship Spatialization for Depth Estimation
- BRNet: Exploring Comprehensive Features for Monocular Depth Estimation<br>:star:code
- Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics<br>:star:code
- Stereo Depth Estimation with Echoes
- JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes<br>:star:code
- RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation<br>:star:code
- PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation
- Depth Field Networks for Generalizable Multi-view Scene Representation<br>:house:project
- Structure and Motion from Casual Videos
- MODE: Multi-View Omnidirectional Depth Estimation with 360° Cameras<br>:star:code
- Gradient-based Uncertainty for Monocular Depth Estimation<br>:star:code
- DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction<br>:star:code
- Self-distilled Feature Aggregation for Self-supervised Monocular Depth Estimation<br>:star:code
- 3D-PL: Domain Adaptive Depth Estimation with 3D-aware Pseudo-Labeling<br>:star:code
- DELTAR: Depth Estimation from a Light-weight ToF Sensor and RGB Image<br>:star:code:house:project
- FloatingFusion: Depth from ToF and Image-stabilized Stereo Cameras
- Context-Enhanced Stereo Transformer
- Adaptive Co-Teaching for Unsupervised Monocular Depth Estimation<br>:star:code
- PanoFormer: Panorama Transformer for Indoor 360° Depth Estimation<br>:star:code
- Towards Comprehensive Representation Enhancement in Semantics-Guided Self-Supervised Monocular Depth Estimation
- LocalBins: Improving Depth Estimation by Learning Local Distributions<br>:star:code
- Depth Map Decomposition for Monocular Depth Estimation
- Uncertainty Quantification in Depth Estimation via Constrained Ordinal Regression<br>:star:code
- Spike Transformer: Monocular Depth Estimation for Spiking Camera<br>:star:code
- Learning Phase Mask for Privacy-Preserving Passive Depth Estimation
- 深度补全
- GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs<br>:star:code
- RigNet: Repetitive Image Guided Network for Depth Completion
- Multi-modal Masked Pre-training for Monocular Panoramic Depth Completion
- Monitored Distillation for Positive Congruent Depth Completion<br>:star:code
- CostDCNet: Cost Volume Based Depth Completion for a Single RGB-D Image<br>:star:code
- 三维视觉
- A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision
- Neural Density-Distance Fields<br>:star:code
- DANBO: Disentangled Articulated Neural Body Representations via Graph Neural Networks<br>:house:project
- CryoAI: Amortized Inference of Poses for Ab Initio Reconstruction of 3D Molecular Volumes from Real Cryo-EM Images
- 三维房间布局
- 三维重建
- Object-Compositional Neural Implicit Surfaces<br>:star:code:house:project:tv:video
- Perspective Phase Angle Model for Polarimetric 3D Reconstruction<br>:star:code
- Monocular 3D Object Reconstruction with GAN Inversion<br>:star:code:house:project
- Structural Causal 3D Reconstruction
- 2D GANs Meet Unsupervised Single-view 3D Reconstruction<br>:star:code:house:project
- Few-shot Single-view 3D Reconstruction with Memory Prior Contrastive Network
- NeuRIS: Neural Reconstruction of Indoor Scenes Using Normal Priors<br>:house:project
- SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views<br>:house:project
- Disentangling Object Motion and Occlusion for Unsupervised Multi-Frame Monocular Depth<br>:star:code
- SNeS: Learning Probably Symmetric Neural Surfaces from Incomplete Data
- CIRCLE: Convolutional Implicit Reconstruction and Completion for Large-Scale Indoor Scene
- IS-MVSNet: Importance Sampling-Based MVSNet<br>:star:code
- Unbiased Gradient Estimation for Differentiable Surface Splatting via Poisson Sampling<br>:star:code
- Towards Learning Neural Representations from Shadows
- PlaneFormers: From Sparse View Planes to 3D Reconstruction<br>:star:code:house:project:tv:video
- SimpleRecon: 3D Reconstruction Without 3D Convolutions<br>:star:code
- Share with Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency
- SketchSampler: Sketch-Based 3D Reconstruction via View-Dependent Depth Sampling
- Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors<br>:star:code
- Bilateral Normal Integration<br>:star:code
- CHORE: Contact, Human and Object REconstruction from a Single RGB Image<br>:star:code:house:project
- Directed Ray Distance Functions for 3D Scene Reconstruction<br>:house:project
- Object Wake-Up: 3D Object Rigging from a Single Image<br>:star:code:house:project
- Latent Partition Implicit with Surface Codes for 3D Representation<br>:star:code
- 3D Equivariant Graph Implicit Functions
- Projective Parallel Single-Pixel Imaging to Overcome Global Illumination in 3D Structure Light Scanning
- EvAC3D: From Event-Based Apparent Contours to 3D Models via Continuous Visual Hulls<br>:house:project
- 3D CoMPaT: Composition of Materials on Parts of 3D Things<br>:house:project
- 三维形状
- Texturify: Generating Textures on 3D Shape Surfaces
- Implicit Field Supervision for Robust Non-rigid Shape Matching<br>:star:code
- 3D Shape Sequence of Human Comparison and Classification using Current and Varifolds<br>:star:code
- The Shape Part Slot Machine: Contact-Based Reasoning for Generating 3D Shapes from Parts
- 3D形状匹配
- 3D形状合成
- Cross-Modal 3D Shape Generation and Manipulation<br>:star:code:house:project
- 形状补全
- 形状解析
- 形状修补
- depth restoration
- 场景理解
- Spatially Invariant Unsupervised 3D Object-Centric Learning and Scene Decomposition
- Pose2Room: Understanding 3D Scenes from Human Activities
- Inverted Pyramid Multi-task Transformer for Dense Scene Understanding<br>:star:code
- Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation<br>:star:code
- Weakly Supervised 3D Scene Segmentation with Region-Level Boundary Awareness and Instance Discrimination
- 4DContrast: Contrastive Learning with Dynamic Correspondences for 3D Scene Understanding
- Point Scene Understanding via Disentangled Instance Mesh Reconstruction<br>:star:code
- PIP: Physical Interaction Prediction via Mental Simulation with Span Selection<br>:house:project
7.Object Tracking(目标跟踪)
- Towards Grand Unification of Object Tracking<br>:open_mouth:oral:star:code<br>:newspaper:ECCV 2022 Oral《Unicorn》首次统一了四项目标跟踪任务的网络结构与学习范式,在8个富有挑战性的数据集上SOTA
- HVC-Net: Unifying Homography, Visibility, and Confidence Learning for Planar Object Tracking
- Tracking by Associating Clips
- ByteTrack: Multi-Object Tracking by Associating Every Detection Box<br>:star:code
- Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework<br>:star:code
- Backbone Is All Your Need: A Simplified Architecture for Visual Object Tracking<br>:star:code
- Robust Visual Tracking by Segmentation<br>:star:code
- FEAR: Fast, Efficient, Accurate and Robust Visual Tracker<br>:star:code
- 3D跟踪
- 3D Siamese Transformer Network for Single Object Tracking on Point Clouds<br>:star:code
- SpOT: Spatiotemporal Modeling for 3D Object Tracking
- Large-displacement 3D Object Tracking with Hybrid Non-local Optimization<br>:star:code
- CMT: Context-Matching-Guided Transformer for 3D Tracking in Point Clouds
- Towards Generic 3D Tracking in RGBD Videos: Benchmark and Baseline<br>:star:code
- 多目标跟踪
- Tracking Objects as Pixel-wise Distributions<br>:open_mouth:oral
- The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting
- MOTCOM: The Multi-Object Tracking Dataset Complexity Metric<br>:star:code:house:project
- Tracking Every Thing in the Wild
- PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object Tracking?
- SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset
- Robust Multi-Object Tracking by Marginal Inference
- MOTR: End-to-End Multiple-Object Tracking with TRansformer<br>:star:code
- Large Scale Real-World Multi-person Tracking<br>:star:code
- Particle Video Revisited: Tracking through Occlusions Using Point Trajectories<br>:house:project
- 视觉跟踪
- 细胞跟踪
6.Object Detection(目标检测)
- Should All Proposals be Treated Equally in Object Detection?<br>:star:code
- TIDEE: Tidying Up Novel Rooms Using Visuo-Semantic Commonsense Priors<br>:house:project
- TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices Using Submodular Mutual Information<br>:star:code
- HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors<br>:star:code
- Adversarially-Aware Robust Object Detector<br>:open_mouth:oral:star:code
- ObjectBox: From Centers to Boxes for Anchor-Free Object Detection<br>:open_mouth:oral:star:code
- Point-to-Box Network for Accurate Object Detection via Single Point Supervision<br>:star:code
- You Should Look at All Objects<br>:star:code
- Class-agnostic Object Detection with Multi-modal Transformer<br>:star:code<br>使用多模态 ViTs 和人类可理解的文本查询来生成高质量的OP
- Exploiting Unlabeled Data with Vision and Language Models for Object Detection<br>:star:code
- PoserNet: Refining Relative Camera Poses Exploiting Object Detections<br>:star:code
- Robust Object Detection With Inaccurate Bounding Boxes<br>:star:code
- UC-OWOD: Unknown-Classified Open World Object Detection<br>:star:code
- Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object<br>:star:code
- Unifying Visual Perception by Dispersible Points Learning<br>:star:code
- A Large-scale Multiple-objective Method for Black-box Attack against Object Detection<br>:star:code
- Distilling Object Detectors With Global Knowledge<br>:star:code
- PANDORA: A Panoramic Detection Dataset for Object with Orientation<br>:star:code
- Exploring Plain Vision Transformer Backbones for Object Detection<br>:star:code
- Long-Tail Detection with Effective Class-Margins<br>:star:code
- Detecting Twenty-Thousand Classes Using Image-Level Supervision<br>:star:code
- Exploring Resolution and Degradation Clues As Self-Supervised Signal for Low Quality Object Detection<br>:star:code
- Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection
- MTTrans: Cross-Domain Object Detection with Mean Teacher Transformer
- PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images<br>:house:project
- Cornerformer: Purifying Instances for Corner-Based Detectors
- Efficient Decoder-Free Object Detection with Transformers<br>:star:code
- W2N: Switching from Weak Supervision to Noisy Supervision for Object Detection<br>:star:code
- Towards Data-Efficient Detection Transformers<br>:star:code
- Open-Vocabulary DETR with Conditional Matching<br>:star:code
- Prediction-Guided Distillation for Dense Object Detection<br>:star:code
- Multimodal Object Detection via Probabilistic Ensembling<br>:star:code
- Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
- GLAMD: Global and Local Attention Mask Distillation for Object Detectors
- Object Detection As Probabilistic Set Prediction
- Out-of-Distribution Identification: Let Detector Tell Which I Am Not Sure
- Simple Open-Vocabulary Object Detection with Vision Transformers<br>:star:code
- A Simple Approach and Benchmark for 21,000-Category Object Detection<br>:star:code
- EAutoDet: Efficient Architecture Search for Object Detection<br>:star:code
- Few-Shot End-to-End Object Detection via Constantly Concentrated Encoding across Heads
- 3D目标检测
- DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection<br>:star:code
- Rethinking IoU-based Optimization for Single-stage 3D Object Detection<br>:star:code
- Densely Constrained Depth Estimator for Monocular 3D Object Detection<br>:star:code
- Learning Ego 3D Representation As Ray Tracing<br>:house:project
- LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection<br>:star:code
- SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention<br>:star:code
- AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection<br>:star:code
- DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection<br>:star:code
- Label-Guided Auxiliary Training Improves 3D Object Detector<br>:star:code
- Monocular 3D Object Detection with Depth from Motion<br>:open_mouth:oral:star:code
- MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones<br>:open_mouth:oral:star:code
- Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph<br>:open_mouth:oral:star:code
- Multimodal Transformer for Automatic 3D Annotation and Object Detection<br>:star:code
- Semi-Supervised 3D Object Detection with Proficient Teachers<br>:star:code
- ProposalContrast: Unsupervised Pre-training for LiDAR-Based 3D Object Detection<br>:star:code
- CenterFormer: Center-based Transformer for 3D Object Detection<br>:open_mouth:oral:star:code
- SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
- Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction<br>:star:code
- CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection
- Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
- Plausibility Verification For 3D Object Detectors Using Energy-Based Optimization
- Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection<br>:star:code
- PETR: Position Embedding Transformation for Multi-View 3D Object Detection<br>:star:code
- Lidar Point Cloud Guided Monocular 3D Object Detection<br>:star:code
- INT: Towards Infinite-Frames 3D Detection with an Efficient Framework
- Semi-Supervised Monocular 3D Object Detection by Multi-View Consistency
- Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training<br>:star:code
- MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection<br>:star:code
- PillarNet: Real-Time and High-Performance Pillar-Based 3D Object Detection<br>:star:code
- Improving the Intra-Class Long-Tail in 3D Detection via Rare Example Mining
- 3D Object Detection with a Self-Supervised Lidar Scene Flow Backbone<br>:star:code
- DetMatch: Two Teachers Are Better than One for Joint 2D and 3D Semi-Supervised Object Detection<br>:star:code
- FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection<br>:star:code
- Enhancing Multi-modal Features Using Local Self-Attention for 3D Object Detection
- 半监督目标检测
- Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection<br>:star:code
- Semi-Supervised Object Detection via Virtual Category Learning<br>:star:code
- Open-Set Semi-Supervised Object Detection<br>:star:code:house:project
- PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection<br>:star:code
- Diverse Learner: Exploring Diverse Supervision for Semi-Supervised Object Detection
- 小样本目标检测
- Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark<br>:star:code
- Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection<br>:star:code
- AcroFOD: An Adaptive Method for Cross-domain Few-shot Object Detection<br>:star:code
- Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot Object Detection<br>:star:code
- AirDet: Few-Shot Detection without Fine-Tuning for Autonomous Exploration<br>:star:code
- Few-Shot Object Detection by Knowledge Distillation Using Bag-of-Visual-Words Representations
- Few-Shot Object Detection with Model Calibration<br>:star:code
- Few-Shot Video Object Detection<br>:star:code
- Mutually Reinforcing Structure with Proposal Contrastive Consistency for Few-Shot Object Detection<br>:star:code
- 显著目标检测
- SESS: Saliency Enhancing with Scaling and Sliding<br>:star:code
- SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection<br>:star:code
- Salient Object Detection for Point Clouds<br>:star:code
- KD-SCFNet: Towards More Accurate and Efficient Salient Object Detection via Knowledge Distillation<br>:star:code
- Saliency Hierarchy Modeling via Generative Kernels for Salient Object Detection
- MVSalNet:Multi-View Augmentation for RGB-D Salient Object Detection
- 弱监督目标检测
- Active Learning Strategies for Weakly-supervised Object Detection<br>:star:code
- W2N:Switching From Weak Supervision to Noisy Supervision for Object Detection<br>:star:code
- Object Discovery via Contrastive Learning for Weakly Supervised Object Detection<br>:star:code
- End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution
- 目标定位
- Object Manipulation via Visual Target Localization<br>:house:project
- On Label Granularity and Object Localization
- 弱监督目标定位
- Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization<br>:star:code
- Weakly Supervised Object Localization through Inter-class Feature Similarity and Intra-Class Appearance Consistency
- Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration<br>:star:code
- 单阶目标检测
- 目标计数
- OOD
- Out-of-Distribution Detection with Semantic Mismatch under Masking<br>:star:code
- Out-of-Distribution Detection with Boundary Aware Learning
- DICE: Leveraging Sparsification for Out-of-Distribution Detection<br>:star:code
- Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-of-Distribution Generalization<br>:star:code
- Data Invariants to Understand Unsupervised Out-of-Distribution Detection
- VOD
- PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer towards Video Object Detection<br>:star:code
- SALISA: Saliency-Based Input Sampling for Efficient Video Object Detection
- Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection
- Efficient One-Stage Video Object Detection by Exploiting Temporal Consistency<br>:star:code
- 小目标检测
- 图像检测
- Discovering Transferable Forensic Features for CNN-generated Images Detection<br>:open_mouth:oral:star:code:house:project
- 目标发现
- 变化检测
5.Image/Video Retrieval(图像/视频检索)
- Text-Based Temporal Localization of Novel Events
- 跨域检索
- 图像检索
- Hierarchical Average Precision Training for Pertinent Image Retrieval<br>:star:code
- Adaptive Fine-Grained Sketch-Based Image Retrieval<br>:star:code
- A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch<br>:star:code:house:project
- Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
- Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval<br>:star:code
- StyleBabel: Artistic Style Tagging and Captioning
- Deep Hash Distillation for Image Retrieval<br>:star:code
- Conditional Stroke Recovery for Fine-Grained Sketch-Based Image Retrieval
- Fine-Grained Fashion Representation Learning by Online Deep Clustering
- 视频检索
- LocVTP: Video-Text Pre-training for Temporal Localization<br>:star:code
- Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval
- Multi-Query Video Retrieval<br>:star:code
- Learning Audio-Video Modalities from Image Captions<br>:house:project
- Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment
- ECLIPSE: Efficient Long-Range Video Retrieval Using Sight and Sound<br>:star:code
- Video Geo-localization(检索)
- 文本-视频检索
- 图像-文本检索
- 细粒度图像检索
- 视频时刻检索
- 视频-文本检索
- 最近邻搜索
4.Video/Image Captioning(视频/图像字幕)
- D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding<br>:house:project
- 图像字幕
- GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features<br>:star:code
- Explicit Image Caption Editing<br>:star:code
- ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-Verified Image-Caption Associations for MS-COCO<br>:star:code
- GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval<br>:star:code
- Object-Centric Unsupervised Image Captioning<br>:star:code
- Unifying Event Detection and Captioning as Sequence Generation via Pre-training<br>:star:code
3.Image Progress(图像处理)
- 图像质量评估
- 图像修补(image retouching)
- 图像变形(Image Warping)
- 图像恢复
- D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration<br>:star:code
- Simple Baselines for Image Restoration<br>:star:code
- Improving Image Restoration by Revisiting Global Information Aggregation<br>:star:code
- Seeing through a Black Box: Toward High-Quality Terahertz Imaging via Subspace-and-Attention Guided Restoration
- JPEG Artifacts Removal via Contrastive Representation Learning<br>:star:code
- TAPE: Task-Agnostic Prior Embedding for Image Restoration
- Spectrum-Aware and Transferable Architecture Search for Hyperspectral Image Restoration
- DRCNet: Dynamic Image Restoration Contrastive Network
- 图像修复
- Learning Prior Feature and Attention Enhanced Image Inpainting<br>:star:code
- Inpainting at Modern Camera Resolution by Guided PatchMatch with Auto-Curation<br>:star:code
- High-Fidelity Image Inpainting with GAN Inversion
- Unbiased Multi-Modality Guidance for Image Inpainting
- Image Inpainting with Cascaded Modulation GAN and Object-Aware Training<br>:star:code
- Perceptual Artifacts Localization for Inpainting<br>:star:code
- Hourglass Attention Network for Image Inpainting<br>:star:code
- Diverse Image Inpainting with Normalizing Flow
- 图像增强
- SepLUT: Separable Image-adaptive Lookup Tables for Real-time Image Enhancement
- Uncertainty Inspired Underwater Image Enhancement<br>:star:code
- Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression<br>:star:code
- LEDNet: Joint Low-Light Enhancement and Deblurring in the Dark<br>:star:code:house:project
- NEST: Neural Event Stack for Event-Based Image Enhancement<br>:star:code
- Seeing Far in the Dark with Patterned Flash<br>:star:code
- Local Color Distributions Prior for Image Enhancement<br>:house:project
- SemAug: Semantically Meaningful Image Augmentations for Object Detection through Language Grounding
- 图像和谐化
- 图像去卷积
- 去雾
- 去噪
- Deep Semantic Statistics Matching (D2SM) Denoising Network<br>:star:code:house:project
- Optimizing Image Compression via Joint Learning with Denoising<br>:star:code
- Fast and High Quality Image Denoising via Malleable Convolution<br>:house:project
- Unidirectional Video Denoising by Mimicking Backward Recurrent Modules with Look-Ahead Forward Ones<br>:star:code
- TempFormer: Temporally Consistent Transformer for Video Denoising
- 去雪
- 去雨
- Not Just Streaks: Towards Ground Truth for Single Image Deraining<br>:house:project
- Blind Image Decomposition<br>:star:code
- ART-SS: An Adaptive Rejection Technique for Semi-Supervised Restoration for Adverse Weather-Affected Images<br>:star:code
- Rethinking Video Rain Streak Removal: A New Synthesis Model and a Deraining Network with Video Rain Prior<br>:star:code
- 去模糊
- Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance<br>:star:code
- United Defocus Blur Detection and Deblurring via Adversarial Promoting Learning<br>:star:code
- Learning Degradation Representations for Image Deblurring<br>:star:code
- Learning Deep Non-Blind Image Deconvolution without Ground Truths
- DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with Flow-Guided Attentive Correlation and Recursive Boosting<br>:star:code
- Realistic Blur Synthesis for Learning Image Deblurring<br>:house:project
- Stripformer: Strip Transformer for Fast Image Deblurring<br>:star:code
- Event-Based Fusion for Motion Deblurring with Cross-Modal Attention<br>:house:project
- ERDN: Equivalent Receptive Field Deformable Network for Video Deblurring<br>:star:code
- Event-Guided Deblurring of Unknown Exposure Time Videos<br>:house:project
- 去摩尔纹
- 去反射
- 去阴影
- Style-Guided Shadow Removal<br>:star:code
- 语义图像编辑
- 图像着色
- PalGAN: Image Colorization with Palette Generative Adversarial Networks<br>:star:code
- Semantic-Sparse Colorization Network for Deep Exemplar-Based Colorization
- CT2: Colorization Transformer via Color Tokens
- BigColor: Colorization Using a Generative Color Prior for Natural Images
- Colorization for In Situ Marine Plankton Images
- ColorFormer: Image Colorization via Color Memory Assisted Hybrid-Attention Transformer<br>:star:code
- Bridging the Domain Gap towards Generalization in Automatic Colorization<br>:star:code
- L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer
- 图像裁剪
- 图像融合
- Rolling shutter(果冻效应)
2.Image Segmentation(图像分割)
- PseudoClick: Interactive Image Segmentation with Click Imitation
- GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation
- Pixel-Wise Energy-Biased Abstention Learning for Anomaly Segmentation on Complex Urban Driving Scenes<br>:open_mouth:oral:star:code
- Highly Accurate Dichotomous Image Segmentation<br>:house:project
- Graph-Constrained Contrastive Regularization for Semi-Weakly Volumetric Segmentation
- Slim Scissors: Segmenting Thin Object from Synthetic Background<br>:star:code
- RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation<br>:star:code
- Unsupervised Segmentation in Real-World Images via Spelke Object Inference
- Learning Instance-Specific Adaptation for Cross-Domain Segmentation<br>:house:project
- Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
- Uncertainty-Aware Multi-modal Learning via Cross-Modal Random Network Prediction
- Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition<br>:star:code
- 语义分割
- Multi-Exit Semantic Segmentation Networks
- Language-Grounded Indoor 3D Semantic Segmentation in the Wild<br>:house:project
- Where in the World Is This Image? Transformer-Based Geo-Localization in the Wild
- Open-World Semantic Segmentation for LIDAR Point Clouds<br>:star:code
- SiamDoGe: Domain Generalizable Semantic Segmentation Using Siamese Network<br>:star:code
- TACS: Taxonomy Adaptive Cross-Domain Semantic Segmentation<br>:star:code
- Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation<br>:star:code
- RBC: Rectifying the Biased Context in Continual Semantic Segmentation<br>:star:code
- ESS: Learning Event-Based Semantic Segmentation from Still Images<br>:house:project
- Learning Implicit Feature Alignment Function for Semantic Segmentation<br>:star:code
- Data Efficient 3D Learner via Knowledge Transferred from 2D Model<br>:star:code
- Multi-Scale and Cross-Scale Contrastive Learning for Semantic Segmentation<br>:star:code
- 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds<br>:star:code
- Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
- ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation
- Union-Set Multi-source Model Adaptation for Semantic Segmentation
- Continual Semantic Segmentation via Structure Preserving and Projected Feature Alignment
- Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation<br>:star:code
- LiDAL: Inter-Frame Uncertainty Based Active Learning for 3D LiDAR Semantic Segmentation<br>:star:code
- DODA: Data-Oriented Sim-to-Real Domain Adaptation for 3D Semantic Segmentation<br>:star:code
- SQN: Weakly-Supervised Semantic Segmentation of Large-Scale 3D Point Clouds<br>:star:code
- Learning Semantic Segmentation from Multiple Datasets with Label Shifts<br>:house:project
- CAR: Class-Aware Regularizations for Semantic Segmentation<br>:star:code
- Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation<br>:star:code
- A Transformer-Based Decoder for Semantic Segmentation with Multi-level Context Mining<br>:star:code
- Extract Free Dense Labels from CLIP
- A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model<br>:star:code
- UCTNet: Uncertainty-Aware Cross-Modal Transformer Network for Indoor RGB-D Semantic Segmentation
- CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation<br>:star:code
- 域适应语义分割
- DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation<br>:star:code
- HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation<br>:star:code
- D2ADA: Dynamic Density-Aware Active Domain Adaptation for Semantic Segmentation
- Online Domain Adaptation for Semantic Segmentation in Ever-Changing Conditions<br>:star:code
- Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation<br>:star:code:house:project
- 小样本语义分割
- 弱监督语义分割
- Adversarial Erasing Framework via Triplet with Gated Pyramid Pooling Layer for Weakly Supervised Semantic Segmentation<br>:star:code
- 无监督语义分割
- 实例分割
- OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers<br>:star:code
- Geodesic-Former: A Geodesic-Guided Few-Shot 3D Point Cloud Instance Segmenter<br>:star:code
- Learning Regional Purity for Instance Segmentation on 3D Point Clouds
- 3D Instances as 1D Kernels<br>:star:code
- 2D Amodal Instance Segmentation Guided by 3D Shape Prior
- Box-supervised Instance Segmentation with Level Set Evolution<br>:star:code
- Long-tailed Instance Segmentation using Gumbel Optimized Loss<br>:star:code
- Active Pointly-Supervised Instance Segmentation
- Trapped in Texture Bias? A Large Scale Comparison of Deep Instance Segmentation<br>:star:code
- Learning with Free Object Segments for Long-Tailed Instance Segmentation<br>:star:code
- A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation<br>:star:code
- Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes<br>:house:project
- Learning to Detect Every Thing in an Open World<br>:house:project
- 全景分割
- 运动分割
- 小样本分割
- Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation<br>:star:code
- Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation<br>:star:code:house:project
- Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation<br>:star:code
- Interclass Prototype Relation for Few-Shot Segmentation
- HM: Hybrid Masking for Few-Shot Segmentation<br>:star:code
- Adaptive Agent Transformer for Few-Shot Segmentation
- Dense Gaussian Processes for Few-Shot Segmentation<br>:star:code
- 抠图
- 3D分割
- 手分割
- 零件分割
- 场景分割
1.其它
- Generative Meta-Adversarial Network for Unseen Object Navigation<br>:star:code
- Housekeep: Tidying Virtual Households Using Commonsense Reasoning<br>:house:project
- OPD: Single-View 3D Openable Part Detection<br>:house:project
- Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance
- Webly Supervised Concept Expansion for General Purpose Vision Models<br>:house:project
- PACS: A Dataset for Physical Audiovisual Commonsense Reasoning
- Fabric Material Recovery from Video Using Multi-Scale Geometric Auto-Encoder
- MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment<br>:star:code
- Bandwidth-Aware Adaptive Codec for DNN Inference Offloading in IoT
- Efficient Deep Visual and Inertial Odometry with Adaptive Visual Modality Selection<br>:star:code
- Flow Graph to Video Grounding for Weakly-Supervised Multi-step Localization<br>:star:code
- Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration
- SeqTR: A Simple Yet Universal Network for Visual Grounding<br>:star:code
- Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding<br>:star:code
- Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
- Fine-Grained Visual Entailment<br>:star:code
- FindIt: Generalized Localization with Natural Language Queries<br>:house:project
- Decomposing the Tangent of Occluding Boundaries according to Curvatures and Torsions
- Real-Time Neural Character Rendering with Pose-Guided Multiplane Images
- TensoRF: Tensorial Radiance Fields<br>:house:project
- TAVA: Template-Free Animatable Volumetric Actors<br>:star:code
- Relative Pose from SIFT Features<br>:star:code
- Solution Space Analysis of Essential Matrix Based on Algebraic Error Minimization
- CoVisPose: Co-Visibility Pose Transformer for Wide-Baseline Relative Pose Estimation in 360° Indoor Panoramas
- Space-Partitioning RANSAC<br>:star:code
- Correspondence Reweighted Translation Averaging
- Beyond Periodicity: Towards a Unifying Framework for Activations in Coordinate-MLPs<br>:star:code
- GigaDepth: Learning Depth from Structured Light with Branching Neural Networks
- Visual Prompt Tuning<br>:star:code
- Cross-Modal Knowledge Transfer without Task-Relevant Source Data
- PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks
- CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation<br>:star:code
- SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas
- MVP: Multimodality-Guided Visual Pre-training
- Self-Filtering: A Noise-Aware Sample Selection for Label Noise with Confidence Penalization
- Learning to Learn with Smooth Regularization<br>:star:code
- Ensemble Learning Priors Driven Deep Unfolding for Scalable Video Snapshot Compressive Imaging<br>:star:code
- Approximate Discrete Optimal Transport Plan with Auxiliary Measure Method
- A Comparative Study of Graph Matching Algorithms in Computer Vision
- Semidefinite Relaxations of Truncated Least-Squares in Robust Rotation Search: Tight or Not
- Dynamic Metric Learning with Cross-Level Concept Distillation<br>:star:code
- MENet: A Memory-Based Network with Dual-Branch for Efficient Event Stream Processing
- Improving Robustness by Enhancing Weak Subnets
- Learning from Multiple Annotator Noisy Labels via Sample-Wise Label Fusion<br>:star:code
- Unbiased Manifold Augmentation for Coarse Class Subdivision<br>:star:code
- OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses<br>:star:code
- ERA: Enhanced Rational Activations<br>:star:code
- Active Label Correction Using Robust Parameter Update and Entropy Propagation
- Revisiting Batch Norm Initialization<br>:star:code
- Differentiable Rendering for Synthetic Aperture Radar Imagery
- Batch-efficient EigenDecomposition for Small and Medium Matrices<br>:star:code
- Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling
- Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality<br>:star:code
- Contrastive Deep Supervision<br>:star:code
- Organic Priors in Non-Rigid Structure from Motion<br>:open_mouth:oral
- Bootstrapped Masked Autoencoders for Vision BERT Pretraining<br>:star:code
- Lipschitz Continuity Retained Binary Neural Network<br>:star:code
- NeFSAC: Neurally Filtered Minimal Samples<br>:star:code
- Towards Understanding The Semidefinite Relaxations of Truncated Least-Squares in Robust Rotation Search
- Latency-Aware Collaborative Perception<br>:star:code
- MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views
- SelectionConv: Convolutional Neural Networks for Non-rectilinear Image Data
- Overcoming Shortcut Learning in a Target Domain by Generalizing Basic Visual Factors from a Source Domain<br>:star:code
- Discrete-Constrained Regression for Local Counting Models
- Streamable Neural Fields<br>:star:code
- Contributions of Shape, Texture, and Color in Visual Recognition<br>:star:code
- Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model
- Latent Discriminant deterministic Uncertainty<br>:star:code
- SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks<br>:star:code
- UFO: Unified Feature Optimization<br>:star:code
- POP: Mining POtential Performance of new fashion products via webly cross-modal query expansion<br>:star:code
- My View is the Best View: Procedure Learning from Egocentric Videos<br>:star:code:house:project
- Equivariance and Invariance Inductive Bias for Learning from Insufficient Data<br>:star:code
- Contrastive Monotonic Pixel-Level Modulation<br>:open_mouth:oral:star:code
- Neural-Sim: Learning to Generate Training Data with NeRF<br>:star:code
- Learning Hierarchy Aware Features for Reducing Mistake Severity<br>:star:code
- Translating a Visual LEGO Manual to a Machine-Executable Plan<br>:star:code:house:project
- Hardly Perceptible Trojan Attack against Neural Networks with Bit Flips<br>:star:code
- LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity
- MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud<br>:star:code:house:project
- Neural Strands: Learning Hair Geometry and Appearance from Multi-View Images<br>:house:project
- A Repulsive Force Unit for Garment Collision Handling in Neural Networks<br>:house:project
- Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal Charts and Distortion<br>:star:code
- Revisiting the Critical Factors of Augmentation-Invariant Representation Learning<br>:star:code
- Fast Two-step Blind Optical Aberration Correction<br>:star:code
- Transformers as Meta-Learners for Implicit Neural Representations<br>:star:code:house:project
- Neighborhood Collective Estimation for Noisy Label Identification and Correction<br>:star:code
- Rethinking Robust Representation Learning Under Fine-grained Noisy Faces
- Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast<br>:star:code
- RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild<br>:house:project
- PRIF: Primary Ray-based Implicit Function<br>:house:project
- Context-Aware Streaming Perception in Dynamic Environments<br>:star:code
- AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets
- TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments<br>:star:code
- L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training<br>:star:code
- GCISG: Guided Causal Invariant Learning for Improved Syn-to-real Generalization
- Learning Continuous Implicit Representation for Near-Periodic Patterns<br>:star:code:house:project
- A Deep Moving-camera Background Model<br>:star:code
- NashAE: Disentangling Representations through Adversarial Covariance Minimization<br>:star:code
- FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion
- Diversified Dynamic Routing for Vision Tasks
- Fast-ParC: Position Aware Global Kernel for ConvNets and ViTs
- Improving the Reliability for Confidence Estimation
- Attaining Class-level Forgetting in Pretrained Model using Few Samples
- Overexposure Mask Fusion: Generalizable Reverse ISP Multi-Step Refinement
- Photo-realistic Neural Domain Randomization
- Editable indoor lighting estimation<br>:house:project
- A Kendall Shape Space Approach to 3D Shape Estimation from 2D Landmarks
- DeepShadow: Neural Shape from Shadow<br>:star:code:house:project
- Intrinsic Neural Fields: Learning Functions on Manifolds
- Unsupervised Pose-Aware Part Decomposition for Man-Made Articulated Objects
- MeshUDF: Fast and Differentiable Meshing of Unsigned Distance Field Networks
- S2N: Suppression-Strengthen Network for Event-Based Recognition under Variant Illuminations<br>:star:code
- A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness
- Transform Your Smartphone into a DSLR Camera: Learning the ISP in the Wild<br>:star:code
- Data Association between Event Streams and Intensity Frames under Diverse Baselines
- Instance Contour Adjustment via Structure-Driven CNN
- 3D Scene Inference from Transient Histograms
- Neural Space-Filling Curves<br>:house:project
- LWGNet – Learned Wirtinger Gradients for Fourier Ptychographic Phase Retrieval<br>:star:code
- PANDORA: Polarization-Aided Neural Decomposition of Radiance
- Benchmarking Omni-Vision Representation through the Lens of Visual Realms<br>:house:project
- When Deep Classifiers Agree: Analyzing Correlations between Learning Order and Image Statistics
- MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration<br>:house:project
- The Missing Link: Finding Label Relations across Datasets
- Domain Adaptive Hand Keypoint and Pixel Localization in the Wild<br>:house:project
- DFNet: Enhance Absolute Pose Regression with Direct Feature Matching<br>:house:project
- GTCaR: Graph Transformer for Camera Re-Localization
- Is Geometry Enough for Matching in Visual Localization?<br>:star:code
- Reducing Information Loss for Spiking Neural Networks
- Deep Partial Updating: Towards Communication Efficient Updating for On-Device Inference
- SP-Net: Slowly Progressing Dynamic Inference Networks
- Meta-GF: Training Dynamic-Depth Neural Networks Harmoniously<br>:star:code
- You Already Have It: A Generator-Free Low-Precision DNN Training Framework Using Stochastic Rounding
- Real Spike: Learning Real-Valued Spikes for Spiking Neural Networks
- Exploring Lottery Ticket Hypothesis in Spiking Neural Networks<br>:star:code
- On the Angular Update and Hyperparameter Tuning of a Scale-Invariant Network
- LANA: Latency Aware Network Acceleration
- Understanding the Dynamics of DNNs Using Graph Modularity<br>:star:code
- MIME: Minority Inclusion for Majority Group Enhancement of AI Performance<br>:house:project
- Trust, but Verify: Using Self-Supervised Probing to Improve Trustworthiness<br>:star:code
- Learning to Censor by Noisy Sampling
- Anti-Neuron Watermarking: Protecting Personal Data against Unauthorized Neural Networks
- Recover Fair Deep Classification Models via Altering Pre-trained Structure
- Decouple-and-Sample: Protecting Sensitive Information in Task Agnostic Data Release<br>:star:code
- Latent Space Smoothing for Individually Fair Representations
- Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration<br>:star:code
- Image-Based CLIP-Guided Essence Transfer<br>:star:code
- End-to-End Visual Editing with a Generatively Pre-trained Artist<br>:house:project
- Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives<br>:star:code
- L-Tracing: Fast Light Visibility Estimation on Neural Surfaces by Sphere Tracing
- Temporal-MPI: Enabling Multi-Plane Images for Dynamic Scene Modelling via Temporal Basis Learning
- 3D-Aware Semantic-Guided Generative Model for Human Synthesis<br>:house:project
- Unified Implicit Neural Stylization<br>:house:project
- Deep Portrait Delighting<br>:house:project
- Free-Viewpoint RGB-D Human Performance Capture and Rendering<br>:house:project
- Multiview Regenerative Morphing with Dual Flows<br>:star:code
- NeRF for Outdoor Scene Relighting<br>:house:project
- Intelli-Paint: Towards Developing More Human-Intelligible Painting Agents
- Motion Transformer for Unsupervised Image Animation
- NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
- Implicit Neural Representations for Variable Length Human Motion Generation<br>:star:code
- Learning Object Placement via Dual-Path Graph Completion
- Compositional Visual Generation with Composable Diffusion Models<br>:house:project
- Spatial-Frequency Domain Information Integration for Pan-Sharpening
- ReCoNet: Recurrent Correction Network for Fast and Efficient Multi-Modality Image Fusion<br>:star:code
- Rethinking Generic Camera Models for Deep Single Image Camera Calibration to Recover Rotation and Fisheye Distortion
- Modeling Mask Uncertainty in Hyperspectral Image Reconstruction<br>:star:code
- Deep Fourier-Based Exposure Correction Network with Spatial-Frequency Interaction<br>:star:code
- Towards Real-World HDRTV Reconstruction: A Data Synthesis-Based Approach
- Attention-Aware Learning for Hyperparameter Prediction in Image Processing Pipelines
- Memory-Augmented Model-Driven Network for Pansharpening<br>:star:code
- All You Need Is RAW: Defending against Adversarial Attacks with Camera Image Pipelines
- GRIT-VLP: Grouped Mini-Batch Sampling for Efficient Vision and Language Pre-training<br>:star:code
- Transformer with Implicit Edges for Particle-Based Physics Simulation<br>:star:code
- LA3: Efficient Label-Aware AutoAugment
- BA-Net: Bridge Attention for Deep Convolutional Neural Networks<br>:star:code
- SAU: Smooth Activation Function Using Convolution with Approximate Identities
- Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks<br>:star:code
- DLME: Deep Local-Flatness Manifold Embedding
- Accurate Detection of Proteins in Cryo-Electron Tomograms from Sparse Labels
- Social ODE: Multi-agent Trajectory Forecasting with Neural Ordinary Differential Equations
- Entropy-Driven Sampling and Training Scheme for Conditional Diffusion Generation<br>:star:code
- Geometry-Guided Progressive NeRF for Generalizable and Efficient Neural Human Rendering
- Controllable Shadow Generation Using Pixel Height Maps
- Subspace Diffusion Generative Models<br>:star:code
- MINER: Multiscale Implicit Neural Representation<br>:house:project
- An Embedded Feature Whitening Approach to Deep Neural Network Optimization<br>:star:code
- Q-FW: A Hybrid Classical-Quantum Frank-Wolfe for Quadratic Binary Optimization
- Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models<br>:star:code
- QISTA-ImageNet: A Deep Compressive Image Sensing Framework Solving ℓq-Norm Optimization Problem
- Rethinking Confidence Calibration for Failure Prediction<br>:star:code
- PRIME: A Few Primitives Can Boost Robustness to Common Corruptions<br>:star:code
- Learning with Noisy Labels by Efficient Transition Matrix Estimation to Combat Label Miscorrection
- Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining<br>:house:project
- Balancing between Forgetting and Acquisition in Incremental Subpopulation Learning<br>:star:code
- Sound Localization by Self-Supervised Time Delay Estimation<br>:house:project
- X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
- A Contrastive Objective for Learning Disentangled Representations<br>:star:code
- A Gyrovector Space Approach for Symmetric Positive Semi-Definite Matrix Learning
- Trading Positional Complexity vs Deepness in Coordinate Networks<br>:house:project
- TO-Scene: A Large-Scale Dataset for Understanding 3D Tabletop Scenes<br>:star:code
- Primitive-Based Shape Abstraction via Nonparametric Bayesian Inference
- S2Net: Stochastic Sequential Pointcloud Forecasting
- LaLaLoc++: Global Floor Plan Comprehension for Layout Localisation in Unvisited Environments
- Variance-Aware Weight Initialization for Point Convolutional Neural Networks
- AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-Shot Interactions<br>:house:project
- human relighting
- 奇异值检测(Novelty Detection)
- Multi-attribute Learning
- 偏见识别
- 新类别发现(Novel Class Discovery)
- 密集预测
- 变分自动编码器(VAEs)
- 开集识别
- 草图
- 聚类
- Visual Grounding
- 互动结构理解
- HDR全景图生成
- 手语识别
- 读唇术
- BNN
- Recurrent Bilinear Optimization for Binary Neural Networks<br>:open_mouth:oral:star:code
- Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies<br>:star:code
- 图像取证
- 图像对齐
- visual hand pressure estimation
- 光亮估计
- 室内场景照明编辑
- HDR
- 关键点定位
- XAI
- STEEX: Steering Counterfactual Explanations with Semantics<br>:star:code
- Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals<br>:star:code
- HIVE: Evaluating the Human Interpretability of Visual Explanations<br>:house:project
- Shap-CAM: Visual Explanations for Convolutional Neural Networks Based on Shapley Value
- 掌纹识别
- 视线估计
- Look Both Ways: Self-Supervising Driver Gaze Estimation and Road Scene Saliency<br>:open_mouth:oral:star:code
- 运动迁移
- 远程呼吸监测
- 图像-图形生成
扫码CV君微信(注明:CVPR)入微信交流群: