Awesome
52CV-WACV-Papers
官网链接:https://wacv2023.thecvf.com/home
会议日期:2023年1月3日-1月7日
历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~
2023 年论文分类汇总戳这里
↘️CVPR-2023-Papers ↘️WACV-2023-Papers
2022 年论文分类汇总戳这里
↘️CVPR-2022-Papers ↘️WACV-2022-Papers ↘️ECCV-2022-Papers
2021年论文分类汇总戳这里
↘️ICCV-2021-Papers ↘️CVPR-2021-Papers
2020 年论文分类汇总戳这里
↘️CVPR-2020-Papers ↘️ECCV-2020-Papers
:exclamation::exclamation::exclamation::star2::star2::star2:WACV 2023收录论文已全部公布,下载可在【我爱计算机视觉】后台回复“paper”,即可收到。共计 638 篇。
目录
67.Sketches(草图识别)
<a name="66"/>66.Scene Flow Estimation(场景流估计)
<a name="65"/>65.Open Set Recognition(开集识别)
<a name="64"/>64.Visual Odometry(视觉里程计)
<a name="63"/>63.Place Recognition(位置识别)
- ETR: An Efficient Transformer for Re-ranking in Visual Place Recognition
- MixVPR: Feature Mixing for Visual Place Recognition
62.Dense Prediction(密集预测)
<a name="61"/>61.geo-localization(城市地理定位)
<a name="60"/>60.Image-to-Image Translation(图像-图像翻译)
- Panoptic-Aware Image-to-Image Translation
- 图像翻译
- 域到域翻译
59.Meta learning(元学习)
<a name="58"/>58.Human Object Interaction(人物交互)
<a name="57"/>57.Federated Learning(联邦学习)
- Learning Across Domains and Devices: Style-Driven Source-Free Domain Adaptation in Clustered Federated Learning<br>:star:code
- Federated Learning for Commercial Image Sources
56.Vision-Language(视觉语言)
- VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models<br>:star:code
- Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision
- Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention<br>:star:code
- GAFNet: A Global Fourier Self Attention Based Novel Network for multi-modal ownstream tasks
- VLN
55.Clustering(聚类)
- Self-Supervised Clustering based on Manifold Learning and Graph Convolutional Networks<br>:star:code
54.Optical Flow(光流)
- Weakly-Supervised Optical Flow Estimation for Time-of-Flight
- Rebalancing Gradient To Improve Self-Supervised Co-Training of Depth, Odometry and Optical Flow Predictions<br>:star:code
- DCVNet: Dilated Cost Volume Networks for Fast Optical Flow
- MFCFlow : A Motion Feature Compensated Multi-Frame Recurrent Network for Optical Flow Estimation
- BrightFlow: Brightness-Change-Aware Unsupervised Learning of Optical Flow<br>:star:code
- Towards Equivariant Optical Flow Estimation with Deep Learning(https://github.com/stsavian/equivariant_of_estimation)
- Learning Lightweight Neural Networks via Channel-Split Recurrent Convolution
- Meta-Learning for Adaptation of Deep Optical Flow Networks
53.Gaze Estimation(视线估计)
- Searching Efficient Neural Architecture with Multi-resolution Fusion Transformer for Appearance-based Gaze Estimation
- iris localization(虹膜定位)
- 视线跟随
- 视线重定向
52.Human Motion Prediction(人类运动预测)
- Multi-view Tracking Using Weakly Supervised Human Motion Prediction<br>:star:code
- Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation<br>:star:code
- GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
- Back to MLP: A Simple Baseline for Human Motion Prediction<br>:star:code
- Intention-Conditioned Long-Term Human Egocentric Action Anticipation
- 行人轨迹预测
51.Scene Graph Generation(场景图生成)
- Grounding Scene Graphs on Natural Images via Visio-Lingual Message Passing<br>:star:code:house:project
- Improving Predicate Representation in Scene Graph Generation by Self-Supervised Learning
- More Knowledge, Less Bias: Unbiasing Scene Graph Generation with Explicit Ontological Adjustment<br>:star:code
- Composite Relationship Fields with Transformers for Scene Graph Generation<br>:star:code
50.Contrastive Learning(对比学习)
- Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning<br>:star:code
- Representation Disentanglement in Generative Models with Contrastive Learning
- Addressing Feature Suppression in Unsupervised Visual Representations
- Ego-Vehicle Action Recognition based on Semi-Supervised Contrastive Learning
49.Neural Radiance(渲染)
- Ev-NeRF: Event Based Neural Radiance Field
- DDNeRF: Depth Distribution Neural Radiance Fields
- X-NeRF: Explicit Neural Radiance Field for Multi-Scene 360deg Insufficient RGB-D Views<br>:star:code
- Fast Differentiable Transient Rendering for Non-Line-of-Sight Reconstruction<br>:star:code
- Compressing Explicit Voxel Grid Representations: fast NeRFs become also small
- Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation<br>:house:project
- Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields<br>:star:code
48.Light Fields(光场)
- 光场
- 相机
- 兴趣点检测
47.Data Augmentation(数据增强)
- Rethinking Rotation in Self-Supervised Contrastive Learning: Adaptive Positive or Negative Data Augmentation<br>:star:code
46.Metric Learning(度量学习)
<a name="45"/>45.Class-Incremental Learning(类增量学习)
- AdvisIL - A Class-Incremental Learning Advisor<br>:star:code
- FeTrIL: Feature Translation for Exemplar-Free Class-Incremental Learning<br>:star:code
- 增量学习
44.Multi-Task Learning(多任务学习)
<a name="43"/>43.Active Learning(主动学习)
- Randomness is the Root of All Evil:More Reliable Evaluation of Deep Active Learning<br>:house:project
42.Landmark Detection(关键点检测)
<a name="41"/>41.Action Generation(动作生成)
- 全身运动合成
40.Anomaly Detection(异常检测)
- Asymmetric Student-Teacher Networks for Industrial Anomaly Detection<br>:star:code
- Zero-Shot Versus Many-Shot: Unsupervised Texture Anomaly Detection<br>:star:code
- No Shifted Augmentations (NSA): compact distributions for robust self-supervised Anomaly Detection
- GLAD: A Global-to-Local Anomaly Detector
- 道路异常检测
- 异常聚类
39.Style Transfer(风格迁移)
- Line Search-Based Feature Transformation for Fast, Stable, and Tunable Content-Style Control in Photorealistic Style Transfer<br>:star:code
- RAST: Restorable Arbitrary Style Transfer via Multi-Restoration
- Dance Style Transfer with Cross-modal Transformer<br>:tv:video
- Is Bigger Always Better? An Empirical Study on Efficient Architectures for Style Transfer and Beyond
38.Sound(音频处理)
- AudioViewer: Learning to Visualize Sounds<br>:house:project
- Audio Visual Event Localization视听事件定位
- 音频去噪
- 视听分割
- 生源定位
- 语音识别
- 音频分离
37.Object Tracking(目标跟踪)
- Efficient Visual Tracking with Exemplar Transformers<br>:star:code
- Hard to Track Objects with Irregular Motions and Similar Appearances?Make It Easier by Buffering the Matching Space
- HOOT: Heavy Occlusions in Object Tracking Benchmark
- VirtualHome Action Genome: A Simulated Spatio-Temporal Scene Graph Dataset With Consistent Relationship Labels
- Tracking Growth and Decay of Plant Roots in Minirhizotron Images<br>:star:code
- Planar Object Tracking via Weighted Optical Flow
- Multi-Frame Attention with Feature-Level Warping for Drone Crowd Tracking
- 多目标跟踪
- AttTrack: Online Deep Attention Transfer for Multi-object Tracking
- Detection Recovery in Online Multi-Object Tracking With Sparse Graph Tracker<br>:star:code
- MMPTRACK: Large-scale Densely Annotated Multi-camera Multiple People Tracking Benchmark
- TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking
36.Soft Biometrics(软生物技术)
- 手指静脉识别
- 隐形眼镜虹膜PAD算法的错误分类
- 生物信息识别
- 虹膜
35.VQA(视觉问答)
- DRAMA: Joint Risk Localization and Captioning in Driving
- VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge<br>:star:code
- Barlow constrained optimization for Visual Question Answering<br>:star:code
- How To Practice VQA on a Resource-Limited Target Domain<br>:house:project
- Guiding Visual Question Answering With Attention Priors
- VideoQA
- 视觉问题生成
34.SLAM\Robots
- SLAM
- AR
33.View Synthesis(视图合成)
- Vision Transformer for NeRF-Based View Synthesis From a Single Input Image
- Self-improving Multiplane-to-layer Images for Novel View Synthesis<br>:house:project
32.Continual Learning(持续学习)
- Continual Learning with Dependency Preserving Hypernetworks
- Do Pre-trained Models Benefit Equally in Continual Learning<br>:star:code
- Saliency Guided Experience Packing for Replay in Continual Learning
31.Deepfake Detection(假象检测)
<a name="30"/>30.Reinforcement Learning(强化学习)
- Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning<br>:star:code
29.Image Classification(图像分类)
- Wavelength-Aware 2D Convolutions for Hyperspectral Imaging<br>:star:code
- ML-Decoder: Scalable and Versatile Classification Head
- CNN2Graph: Building Graphs for Image Classification
- Token Pooling in Vision Transformers for Image Classification
- Augmentation by Counterfactual Explanation -Fixing an Overconfident Classifier
- Treatment Learning Causal Transformer for Noisy Image Classification<br>:star:code
- 长尾识别
- pen-Set Classification
- 细粒度分类
- 多标签分类
- 小样本分类
28.Pose Estimation(姿态估计)
<a name="27"/>27.Person ReID(人员重识别)
- 行人分析
- 行人搜索
- Re-id
- Camera Alignment and Weighted Contrastive Learning for Domain Adaptation in Video Person ReID<br>:star:code
- MEVID: Multi-view Extended Videos with Identities for Video Person Re-Identification<br>:star:code
- Feature Disentanglement Learning with Switching and Aggregation for Video-based Person Re-Identification
- Graph-Based Self-Learning for Robust Person Re-Identification
- Body Part-Based Representation Learning for Occluded Person Re-Identification<br>:star:code
- 步态识别
- 步态迁移
- 嫌疑人识别
- 人群计数
26.Dataset\Benchmark(数据集\基准)
- OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover Mapping<br>:sunflower:dataset
- A Continual Deepfake Detection Benchmark: Dataset, Methods, and Essentials<br>:sunflower:dataset
- The CropAndWeed Dataset: a Multi-Modal Learning Approach for Efficient Crop and Weed Manipulation<br>:sunflower:dataset
- IDD-3D: A Dataset for Driving in Unstructured Road Scenes<br>:sunflower:dataset
- Vis2Rec: A Large-Scale Visual Dataset for Visit Recommendation<br>:sunflower:dataset
- Creating a Forensic Database of Shoeprints from Online Shoe-Tread Photos<br>:sunflower:dataset
- 目标检测、分割、跟踪
25.Image Captioning(图像字幕)
- 人体图像分析
- 图像字幕
- 视频字幕
24.Image Retrieval(图像检索)
- Boosting vision transformers for image retrieval<br>:star:code
- Certified Defense for Content Based Image Retrieval
- Fashion Image Retrieval with Text Feedback by Additive Attention Compositional Learning
- Content-Based Music-Image Retrieval Using Self- and Cross-Modal Feature Embedding Memory
- 图像-句子检索
- 图像-文本检索
- 跨域检索
- 图像-文本匹配
23.Autonomous Driving(智能驾驶)
- IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes<br>:star:code
- PP4AV: A Benchmarking Dataset for Privacy-Preserving Autonomous Driving<br>:star:code
- Benchmarking Visual Localization for Autonomous Navigation<br>:star:code
- 车辆重识别
- 车道线检测
- 轨迹预测
22.Human Action Recognition(人体动作识别与检测)
- 动作识别
- Modality Mixer for Multi-modal Action Recognition
- STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
- Holistic Interaction Transformer Network for Action Detection<br>:star:code
- Reconstructing Humpty Dumpty: Multi-feature Graph Autoencoder for Open Set Action Recognition<br>:star:code
- DA-AIM: Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection<br>:star:code
- Spatio-Temporal Action Detection Under Large Motion<br>:star:code
- Efficient Skeleton-Based Action Recognition via Joint-Mapping Strategies
- Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
- Semantics Guided Contrastive Learning of Transformers for Zero-Shot Temporal Activity Detection
- Adaptive Local-Component-Aware Graph Convolutional Network for One-Shot Skeleton-Based Action Recognition
- Multi-View Action Recognition using Contrastive Learning<br>:star:code
- Stop or Forward: Dynamic Layer Skipping for Efficient Action Recognition
- A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector<br>:star:code
- 时序动作定位
21.Point Cloud(点云)
- PointNeuron: 3D Neuron Reconstruction via Geometry and Topology Learning of Point Clouds
- Visualizing Global Explanations of Point Cloud DNNs<br>:star:code
- RSF: Optimizing Rigid Scene Flow From 3D Point Clouds Without Labels
- Nearest Neighbors Meet Deep Neural Networks for Point Cloud Analysis
- Explainability-Aware One Point Attack for Point Cloud Neural Networks<br>:star:code
- Centroid Distance Keypoint Detector for Colored Point Clouds<br>:star:code
- 点云分类
- 点云分割
- 点云配准
- 点云重建
- 3D点云
20.Transformer
- EmbryosFormer: Deformable Transformer and Collaborative Encoding-Decoding for Embryos Stage Development Classification<br>:star:code
- Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification<br>:star:code
- Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets<br>:star:code
- Couplformer: Rethinking Vision Transformer With Coupling Attention
- Trans4Map: Revisiting Holistic Bird's-Eye-View Mapping From Egocentric Images to Allocentric Semantics With Vision Transformers<br>:star:code
- PatchDropout: Economizing Vision Transformers Using Patch Dropout<br>:star:code
- OutfitTransformer: Learning Outfit Representations for Fashion Recommendation
- Discrete Cosin TransFormer: Image Modeling From Frequency Domain
- Orthogonal Transforms For Learning Invariant Representations In Equivariant Neural Networks
19.Model Compression\Knowledge Distillation\Pruning(模型压缩\知识蒸馏\剪枝)
- 剪枝
- 知识蒸馏
- Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks
- Understanding the Role of Mixup in Knowledge Distillation: \An Empirical Study<br>:star:code
- Understanding the Role of Mixup in Knowledge Distillation:An Empirical Study<br>:star:code
- TinyHD: Efficient Video Saliency Prediction with Heterogeneous Decoders using Hierarchical Maps Distillation
- [Online Knowledge Distillation for Multi-task Learning]](https://openaccess.thecvf.com/content/WACV2023/papers/Jacob_Online_Knowledge_Distillation_for_Multi-Task_Learning_WACV_2023_paper.pdf)
- Adversarial local distribution regularization for knowledge distillation
- 自我蒸馏
- DC
- 量化
- 轻量级
18.NAS(神经架构搜索)
- Revisiting Training-free NAS Metrics: An Efficient Training-based Method<br>:star:code
- SVD-NAS: Coupling Low-Rank Approximation and Neural Architecture Search<br>:star:code
- FreeREA: Training-Free Evolution-based Architecture Search<br>:star:code
- Toward Edge-Efficient Dense Predictions with Synergistic Multi-Task Neural Architecture Search
17.OCR(文本检测)
- OCR-VQGAN: Taming Text-within-Image Generation<br>:star:code
- Efficient few-shot learning for pixel-precise handwritten document layout analysis
- D-Extract: Extracting Dimensional Attributes From Product Images<br>:star:code
- 文本识别
- 表格检测
- LOGO检测
- 文档检测
- 文档理解
- 文本擦除
16.Super-Resolution(超分辨率)
- Single Image Super-Resolution via a Dual Interactive Implicit Neural Network
- HIME: Efficient Headshot Image Super-Resolution with Multiple Exemplars
- Deep Model-Based Super-Resolution With Non-Uniform Blur<br>:star:code
- Kernel-Aware Burst Blind Super-Resolution
- Enriched CNN-Transformer Feature Aggregation Networks for Super-Resolution<br>:star:code
- Joint Video Rolling Shutter Correction and Super-Resolution
- 视频超分辨率
15.Image Synthesis(图像合成)
- One-Shot Synthesis of Images and Segmentation Masks<br>:star:code
- Style-Guided Inference of Transformer for High-resolution Image Synthesis
- Evaluating Generative Networks Using Gaussian Mixtures of Image Features
- More Control for Free! Image Synthesis with Semantic Diffusion Guidance
- 图像生成
- 文本-图像合成
- 文字引导的图像操作
14.Un\Self\Semi-Supervised Learning(无\自\半监督学习)
- 自监督
- Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond<br>:star:code
- Global-Local Self-Distillation for Visual Representation Learning<br>:star:code
- Accelerating Self-Supervised Learning via Efficient Training Strategies
- FUSSL: Fuzzy Uncertain Self Supervised Learning
- Self-Supervised Correspondence Estimation via Multiview Registration<br>:house:project
- Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
- Self-Supervised Relative Pose With Homography Model-Fitting in the Loop
- Self-Distilled Self-supervised Representation Learning<br>:star:code
- Multi-Level Contrastive Learning for Self-Supervised Vision Transformers
- Self-Supervised Distilled Learning for Multi-modal Misinformation Identification
- An Embedding-Dynamic Approach to Self-Supervised Learning
- 半监督
- Class-Level Confidence Based 3D Semi-Supervised Learning
- Dynamic Re-Weighting for Long-Tailed Semi-Supervised Learning
- Unifying Distribution Alignment as a Loss for Imbalanced Semi-supervised Learning<br>:star:code
- Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of Semantics and Depth
- Semi-Supervised Learning for Sparsely-Labeled Sequential Data:Application to Healthcare Video Processing
- 无监督
13.Image Segmentation(图像分割)
- Image Segmentation-based Unsupervised Multiple Objects Discovery
- WSNet: Towards An Effective Method for Wound Image Segmentation<br>:star:code
- Autoencoder-based background reconstruction and foreground segmentation with background noise estimation
- Unsupervised multi-object segmentation using attention and soft-argmax
- VOS
- VSS
- 语义分割
- Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation<br>:star:code
- Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation
- Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation
- LoopDA: Constructing Self-loops to Adapt Nighttime Semantic Segmentation<br>:star:code
- Empirical Generalization Study: Unsupervised Domain Adaptation vs. Domain Generalization Methods for Semantic Segmentation in the Wild
- Semantic Segmentation with Active Semi-Supervised Learning
- Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation
- Semantic Segmentation of Degraded Images Using Layer-Wise Feature Adjustor
- Reducing Annotation Effort by Identifying and Labeling Contextually Diverse Classes for Semantic Segmentation Under Domain Shift<br>:star:code
- Cooperative Self-Training for Multi-Target Adaptive Semantic Segmentation<br>:star:code
- Refign: Align and Refine for Adaptation of Semantic Segmentation to Adverse Conditions<br>:star:code
- BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs
- ProtoSeg: Interpretable Semantic Segmentation with Prototypical Parts<br>:star:code
- Complementary Bi-directional Feature Compression for Indoor 360° Semantic Segmentation with Self-distillation
- Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep earning and Uncertainty Quantification
- 弱监督语义分割
- ingle Stage Weakly Supervised Semantic Segmentation of Complex Scenes](https://openaccess.thecvf.com/content/WACV2023/papers/Akiva_Single_Stage_Weakly_Supervised_Semantic_Segmentation_of_Complex_Scenes_WACV_2023_paper.pdf)
- 半监督语义分割
- Multi-class part parsing
- BEV segmentation
- 全景分割
- 实例分割
- From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments
- CellTranspose: Few-shot Domain Adaptation for Cellular Instance Segmentation
- Weakly Supervised Cell-Instance Segmentation With Two Types of Weak Labels by Single Instance Pasting
- Self-Supervised Learning With Masked Image Modeling for Teeth Numbering, Detection of Dental Restorations, and Instance Segmentation in Dental Panoramic Radiographs<br>:star:code
- Weakly-Supervised Point Cloud Instance Segmentation With Geometric Priors
- NeuralBF: Neural Bilateral Filtering for Top-down Instance Segmentation on Point Clouds<br>:house:project
- SCTS: Instance Segmentation of Single Cells Using a Transformer-Based Semantic-Aware Model and Space-Filling Augmentation<br>:star:code
- 小样本分割
- 叶子疾病分割
- 细胞分割
- 目标分割
- 抠图
12.One\Few-Shot Learning or Domain Adaptation\Generalization\Shift(单\小样本学习 or 域适应\泛化\偏移)
- 域适应
- Self-Distillation for Unsupervised 3D Domain Adaptation<br>:house:project
- CoNMix for Source-free Single and Multi-target Domain Adaptation<br>:star:code:house:project
- Learning Classifiers of Prototypes and Reciprocal Points for Universal Domain Adaptation
- Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation<br>:house:project
- TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation<br>:star:code
- Backprop Induced Feature Weighting for Adversarial Domain Adaptation with Iterative Label Distribution Alignment
- Generative Alignment of Posterior Probabilities for Source-free Domain Adaptation
- 域泛化
- Intra-Source Style Augmentation for Improved Domain Generalization
- Center-aware Adversarial Augmentation for Single Domain Generalization
- FFM: Injecting Out-of-Domain Knowledge via Factorized Frequency Modification
- Improving Diversity with Adversarially Learned Transformations for Domain Generalization<br>:star:code
- 零样本
- 小样本
- Aggregating Bilateral Attention for Few-Shot Instance Localization
- HyperShot: Few-Shot Learning by Kernel HyperNetworks<br>:star:code
- Few-Shot Learning of Compact Models via Task-Specific Meta Distillation
- Semantic Guided Latent Parts Embedding for Few-Shot Learning<br>:star:code
- Self-Attention Message Passing for Contrastive Few-Shot Learning<br>:star:code
11.Face(人脸)
- My Face My Choice: Privacy Enhancing Deepfakes for Social Media Anonymization
- Improving Deep Facial Phenotyping for Ultra-rare Disorder Verification Using Model Ensembles<br>:star:code
- 读唇术
- 3D人脸
- 人脸识别
- DigiFace-1M: 1 Million Digital Face Images for Face Recognition<br>:star:code
- CAST: Conditional Attribute Subsampling Toolkit for Fine-Grained Evaluation<br>:star:code
- CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning-Based Synthetic Face Detection
- Unifying Margin-Based Softmax Losses in Face Recognition
- Harnessing Unrecognizable Faces for Improving Face Recognition
- QMagFace: Simple and Accurate Quality-Aware Face Recognition<br>:star:code
- A Quality Aware Sample-to-Sample Comparison for Face Recognition
- 人脸修复/恢复
- 人脸交换
- 人脸表情识别
- 人脸重现
- Audio-Visual Face Reenactment<br>:house:project
- 人脸命名
- 人脸重建
- 人脸合成
- Deepfake
- Facial Action Unit Detection
- 人脸质量评估
- 活体检测
- Domain Invariant Vision Transformer Learning for Face Anti-Spoofing
- 基于表情的脸部皱纹合成
- 文字和图像引导的3D头像生成
- 说话人脸
- 唇语阅读
10.Adversarial Learning(对抗学习)
- Leveraging Local Patch Differences in Multi-Object Scenes for Generative Adversarial Attacks
- Inducing Data Amplification Using Auxiliary Datasets in Adversarial Training<br>:star:code
- Interpreting Disparate Privacy-Utility Tradeoff in Adversarial Learning via Attribute Correlation
- FLOAT: Fast Learnable Once-for-All Adversarial Training for Tunable Trade-off between Accuracy and Robustness
- Adversarial robustness in discontinuous spaces via alternating sampling & descent
- PatchZero: Defending against Adversarial Patch Attacks by Detecting and Zeroing the Patch
- Avoiding Lingering in Learning Active Recognition by Adversarial Disturbance
- 对抗样本
- 主动攻击
9.Remote Sensing\Satellite Image(遥感\卫星图像)
- RS
- 变化检测
- 航空图像检测
- 航空图像分割
- 国际边界检测
8.Image Processing(图像处理)
- 图像质量评估
- 图像恢复
- Large-to-small Image Resolution Asymmetry in Deep Metric Learning<br>:star:code
- DSTrans: Dual-Stream Transformer for Hyperspectral Image Restoration<br>:star:code
- Semi-Supervised Learning for Low-light Image Restoration through Quality Assisted Pseudo-Labeling<br>:star:code
- Real-Time Restoration of Dark Stereo Images<br>:house:project
- 图像修复
- 图像增强
- Perceptual Image Enhancement for Smartphone Real-Time Applications<br>:star:code
- Robust Real-World Image Enhancement Based on Multi-Exposure LDR Images
- End-to-End Single-Frame Image Signal Processing for High Dynamic Range Scenes
- PSENet: Progressive Self-Enhancement Network for Unsupervised Extreme-Light Image Enhancement<br>:star:code
- 图像着色
- Guiding Users to Where to Give Color Hints for Efficient Interactive Sketch Colorization via Unsupervised Region Prioritization
- Generative Colorization of Structured Mobile Web Pages<br>:star:code
- iColoriT: Towards Propagating Local Hints to the Right Region in Interactive Colorization by Leveraging Vision Transformer<br>:house:project
- Pik-Fix: Restoring and Colorizing Old Photos<br>:star:code
- 图像补全
- 图像重新缩放
- HDR重构
- 去噪
- 去雾
- 去反射
- De-fencing
- Deconvolution
- 阴影消除
7.Human Pose(人体姿态)
- Kinematic-aware Hierarchical Attention Network for Human Pose Estimation in Videos<br>:star:code
- HuPR: A Benchmark for Human Pose Estimation Using Millimeter Wave Radar<br>:star:code
- Computer Vision to the Rescue: Infant Postural Symmetry Estimation from Incongruent Annotations<br>:star:code
- 多人姿态估计
- 三维人体
- Placing Human Animations into 3D Scenes by Learning Interaction- and Geometry-Driven Keyframes
- Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers<br>:star:code
- Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats<br>:house:project
- GarSim: Particle Based Neural Garment Simulator
- Learnable Human Mesh Triangulation for 3D Human Pose and Shape Estimation
- ElliPose: Stereoscopic 3D Human Pose Estimation by Fitting Ellipsoids
- Rethinking the Data Annotation Process for Multi-view 3D Pose Estimation with Active Learning and Self-Training
- CameraPose: Weakly-Supervised Monocular 3D Human Pose Estimation by Leveraging In-the-wild 2D Annotations
- 手部姿势
- 3D手
- 手部重建
- 手-物体姿势估计
6.Video(视频相关)
- A Deep Neural Framework to Detect Individual Advertisement (Ad) from Videos
- TCAM: Temporal Class Activation Maps for Object Localization in Weakly-Labeled Unconstrained Videos<br>:star:code
- Recipe2Video: Synthesizing Personalized Videos from Recipe Texts
- 视频增强
- 视频理解
- 视频摘要
- 多人检测
- 场景识别
- Video Grounding
- 视频异常检测(VAD)
- DyAnNet: A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network
- Cross-Domain Video Anomaly Detection without Target Domain Adaptation
- Bi-Directional Frame Interpolation for Unsupervised Video Anomaly Detection
- Towards Interpretable Video Anomaly Detection
- Normality Guided Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
- 图像视频编解码
- Universal Deep Image Compression via Content-Adaptive Optimization with Adapters<br>:star:code
- A neural video codec with spatial rate-distortion control
- Boosting Neural Video Codecs by Exploiting Hierarchical Redundancy
- Neural Distributed Image Compression with Cross-Attention Feature Alignment<br>:star:code
- Lossy Image Compression with Quantized Hierarchical VAEs
- 视频人像合成
- 视频帧插值
- 视频运动重定位
- 视频运动放大
- 视频稳定
- 视频分类
- 视频分割
- 视频伪造检测
- 视频跟踪
5.Object Detection(目标检测)
- ConfMix: Unsupervised Domain Adaptation for Object Detection via Confidence-based Mixing<br>:star:code
- The Box Size Confidence Bias Harms Your Object Detector<br>:star:code
- Resolving Class Imbalance for LiDAR-based Object Detector by Dynamic Weight Average and Contextual Ground Truth Sampling
- Is Your Noise Correction Noisy? PLS: Robustness To Label Noise With Two Stage Detection<br>:star:code
- Phantom Sponges: Exploiting Non-Maximum Suppression to Attack Deep Object Detectors
- Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather<br>:star:code
- ROMA: Run-Time Object Detection To Maximize Real-Time Accuracy
- Towards Online Domain Adaptive Object Detection<br>:star:code
- MT-DETR: Robust End-to-end Multimodal Detection with Confidence Fusion<br>:star:code
- Towards Few-Annotation Learning for Object Detection:Are Transformer-based Models More Efficient ?
- Scaling Novel Object Detection with Weakly Supervised Detection Transformers
- Mobile Robot Manipulation using Pure Object Detection<br>:star:code
- Domain Adaptation Using Self-Training With Mixup for One-Stage Object Detection
- Gradient-Based Quantification of Epistemic Uncertainty for Deep Object Detectors<br>:star:code
- 小样本目标检测
- 弱监督目标检测
- 3D目标检测
- TransPillars: Coarse-To-Fine Aggregation for Multi-Frame 3D Object Detection
- Adaptive Feature Fusion for Cooperative Perception Using LiDAR Point Clouds
- ImpDet: Exploring Implicit Fields for 3D Object Detection
- Li3DeTr: A LiDAR based 3D Detection Transformer
- Far3Det: Towards Far-Field 3D Detection
- Dense Voxel Fusion for 3D Object Detection
- MonoEdge: Monocular 3D Object Detection Using Local Perspectives
- Multivariate Probabilistic Monocular 3D Object Detection<br>:star:code
- SAILOR: Scaling Anchors via Insights into Latent Object Representation
- VOD
- OOD
- Out-of-distribution Detection via Frequency-regularized Generative Models<br>:star:code
- Heatmap-based Out-of-Distribution Detection<br>:star:code
- Out-of-Distribution Detection with Reconstruction Error and Typicality-based Penalty
- Mixture Outlier Exposure: Towards Out-of-Distribution Detection in Fine-grained Environments<br>:star:code
- Hyperdimensional Feature Fusion for Out-of-Distribution Detection<br>:star:code
- Task Agnostic and Post-hoc Unseen Distribution Detection
- WSOD
- 伪装目标检测
- 目标发现
- 变化检测
- 用于穿行式安检系统的三维雷达图像的实时隐蔽武器检测
- 图像识别
- 入侵物种检测
- 用于红外图像中的海洋涡流检测
4.GAN(生成对抗网络)
- HoechstGAN: Virtual Lymphocyte Staining Using Generative Adversarial Networks
- Image Completion with Heterogeneously Filtered Spectral Hints<br>:star:code
- Indirect Adversarial Losses via an Intermediate Distribution for Training GANs
- SLI-pSp: Injecting Multi-Scale Spatial Layout in pSp
- Multi-scale Contrastive Learning for Complex Scene Generation
- Realistic Full-Body Anonymization with Surface-Guided GANs
- Fantastic Style Channels and Where to Find Them:A Submodular Framework for Discovering Diverse Directions in GANs
- 3D GAN Inversion with Pose Optimization<br>:house:project
- UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation<br>:star:code
- SketchInverter: Multi-Class Sketch-Based Image Generation via GAN Inversion
- 风格编辑
- fashion attribute editing(时尚属性编辑)
- 匿名化
- 指纹生成
- 开集识别
3.3D(三维视觉)
- Generative Range Imaging for Learning Scene Priors of 3D LiDAR Data<br>:star:code
- Seg&Struct: The Interplay Between Part Segmentation and Structure Inference for 3D Shape Parsing
- Surface normal estimation from optimized and distributed light sources using DNN-based photometric stereo
- Meta-Auxiliary Learning for Future Depth Prediction in Videos
- 3D Neural Sculpting (3DNS): Editing Neural Signed Distance Functions
- Improving the Robustness of Point Convolution on k-Nearest Neighbor Neighborhoods with a Viewpoint-Invariant Coordinate Transform
- CountNet3D: A 3D Computer Vision Approach to Infer Counts of Occluded Objects
- Learning Graph Variational Autoencoders with Constraints and Structured Priors for Conditional Indoor 3D Scene Generation
- 三维重建
- 表面重建
- 深度估计
- Frequency-Aware Self-Supervised Monocular Depth Estimation<br>:star:code
- Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention<br>:star:code
- High-Resolution Depth Estimation for 360-degree Panoramas through Perspective and Panoramic Depth Images Registration
- Improving Pixel-Level Contrastive Learning by Leveraging Exogenous Depth Information
- Temporally Consistent Online Depth Estimation in Dynamic Scenes<br>:star:code
- Self-Supervised Monocular Depth Estimation: Solving the Edge-Fattening Problem<br>:star:code
- High-Resolution Depth Estimation for 360◦ Panoramas through Perspective and Panoramic Depth Images Registration
- Self-supervised Monocular Depth Estimation from Thermal Images via Adversarial Multi-spectral Adaptation
- 深度补全
- MVS
- Multi-View Photometric Stereo Revisited
- DELS-MVS: Deep Epipolar Line Search for Multi-View Stereo
- nLMVS-Net: Deep Non-Lambertian Multi-View Stereo<br>:house:project
- 360MVSNet: Deep Multi-view Stereo Network with 360◦ Images for Indoor Scene Reconstruction
- Improving the Pair Selection and the Model Fusion Steps of Satellite Multi-View Stereo Pipelines
- RGB-D重建
- Stereo Matching
- 神经辐射场
- 三维定位
- CAD
2.Medical Image(医学影像)
- DBCE: A Saliency Method for Medical Deep Learning Through Anatomically-Consistent Free-Form Deformations
- Representation Recovering for Self-Supervised Pre-training on Medical Images
- Magnification Prior: A Self-Supervised Method for Learning Representations on Breast Cancer Histopathological Images<br>:star:code
- A Morphology Focused Diffusion Probabilistic Model for Synthesis of Histopathology Images
- 3D医学影像分析
- 胸部X光分类
- CT图像融合
- 医学图像定位
- 医学图像分割
- Few-shot Medical Image Segmentation with Cycle-resemblance Attention
- Medical Image Segmentation via Cascaded Attention Decoding
- HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation
- Training Auxiliary Prototypical Classifiers for Explainable Anomaly Detection in Medical Image Segmentation
- The Fully Convolutional Transformer for Medical Image Segmentation<br>:star:code
- 病变分割
- 医学图像分类
- 医学图像超分辨率
- 心血管检测
- 远程心率估计
- CT重建
- MRI
- 黑色素细胞检测
1.其它
- Instance-Dependent Noisy Label Learning via Graphical Modelling
- Color Recommendation for Vector Graphic Documents based on Multi-Palette Representation
- TeST: Test-time Self-Training under Distribution Shift
- Simultaneous Acquisition of High Quality RGB Image and Polarization Information using a Sparse Polarization Sensor<br>:star:code
- Exemplar Guided Deep Neural Network for Spatial Transcriptomics Analysis of Gene Expression Prediction
- Enabling ISP-less Low-Power Computer Vision
- AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs<br>:star:code
- Composite Learning for Robust and Effective Dense Predictions
- SAILOR: Scaling Anchors via Insights into Latent Object<br>:star:code
- Modeling the Lighting in Scenes as Style for Auto White-Balance Correction<br>:star:code
- DE-CROP: Data-efficient Certified Robustness for Pretrained Classifiers<br>:house:project
- Anisotropic Multi-Scale Graph Convolutional Network for Dense Shape Correspondence
- ATCON: Attention Consistency for Vision Models<br>:star:code
- LAVA: Label-efficient Visual Learning and Adaptation
- Interpolated SelectionConv for Spherical Images and Surfaces
- Augmentation by Counterfactual Explanation -- Fixing an Overconfident Classifier
- Weakly Supervised Annotations for Multi-modal Greeting Cards Dataset
- Multimodal Vision Transformers with Forced Attention for Behavior Analysis<br>:star:code
- Compact and Optimal Deep Learning with Recurrent Parameter Generators
- Motif Mining: Finding and Summarizing Remixed Image Content
- LINEEX: Data Extraction from Scientific Line Charts<br>:star:code
- Neural Implicit Representations for Physical Parameter Inference From a Single Video<br>:house:project
- Physically Plausible Animation of Human Upper Body from a Single Image
- Partially Calibrated Semi-Generalized Pose From Hybrid Point Correspondences
- Learning How to MIMIC: Using Model Explanations To Guide Deep Learning Training<br>:star:code
- Robust and Efficient Alignment of Calcium Imaging Data through Simultaneous Low Rank and Sparse Decomposition
- Improving Multi-Fidelity Optimization With a Recurring Learning Rate for Hyperparameter Tuning
- What can we Learn by Predicting Accuracy?
- Enabling ISPless Low-Power Computer Vision<br>:star:code
- Jointly Learning Band Selection and Filter Array Design for Hyperspectral Imaging
- LCS: Learning Compressible Subspaces for Efficient, Adaptive, Real-Time Network Compression at Inference Time<br>:star:code
- Self-Attentive Pooling for Efficient Deep Learning<br>:star:code
- Fine-Grained Activities of People Worldwide<br>:house:project
- Relaxing Contrastiveness in Multimodal Representation Learning
- Spike-Based Anytime Perception
- Towards Disturbance-Free Visual Mobile Manipulation<br>:house:project
- SERF: Towards Better Training of Deep Neural Networks Using Log-Softplus ERror Activation Function
- RADIANT: Better rPPG Estimation Using Signal Embeddings and Transformer<br>:star:code
- Dataset Condensation With Distribution Matching
- HyperPosePDF - Hypernetworks Predicting the Probability Distribution on SO(3)
- RANCER: Non-Axis Aligned Anisotropic Certification with Randomized Smoothing
- Match Cutting: Finding Cuts with Smooth Visual Transitions
- SIRA: Relightable Avatars from a Single Image
- Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?<br>:star:code
- Patch-based Privacy Preserving Neural Network for Vision Tasks
- Adaptive Sample Selection for Robust Learning under Label Noise<br>:star:code
- Concept Correlation and Its Effects on Concept-Based Models
- Improving Saliency Models' Predictions of the Next Fixation With Humans' Intrinsic Cost of Gaze Shifts
- Mapping DNN Embedding Manifolds for Network Generalization Prediction
- GEMS: Generating Efficient Meta-Subnets
- Learning incoherent light emission steering from metasurfaces using generative models<br>:star:code
- EfficientPhys: Enabling Simple, Fast and Accurate Camera-Based Cardiac Measurement
- Performance comparison of DVS data spatial downscaling methods using Spiking Neural Networks<br>:star:code
- Encouraging Disentangled and Convex Representation with Controllable Interpolation Regularization
- A Protocol for Evaluating Model Interpretation Methods from Visual Explanations
- Learning Latent Structural Relations with Message Passing Prior
- Bootstrapping the Relationship Between Images and Their Clean and Noisy Labels<br>:star:code
- ImPosing: Implicit Pose Encoding for Efficient Visual Localization
- GEMS: Scene Expansion using Generative Models of Graphs
- SONGs: Self-Organizing Neural Graphs<br>:star:code
- Context-empowered Visual Attention Prediction in Pedestrian Scenarios
- Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs
- BNN
- 图像配准
- 视觉重建