Awesome
CVPR2021最新信息及已接收论文/代码(持续更新)
官网链接:http://cvpr2021.thecvf.com<br> 开会时间:2021年6月19日-6月25日<br> 论文接收公布时间:2021年2月28日<br>
接收论文IDs:<br>
:exclamation::exclamation::exclamation:🌟🌟🌟 CVPR 2021 收录论文已全部公布,下载可在【我爱计算机视觉】后台回复“CVPR2021”,即可收到。共计 1660 篇。
:exclamation::exclamation::exclamation:🌟🌟🌟 全部论文已粗略分类完毕,请查阅。
:exclamation::exclamation::exclamation:注:后续论文的细致分类汇总将发布在公众号【OpenCV中文网】,敬请关注。
历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~
2022 年论文分类汇总戳这里
↘️CVPR-2022-Papers ↘️WACV-2022-Papers
2021年论文分类汇总戳这里
↘️ICCV-2021-Papers ↘️CVPR-2021-Papers
2020 年论文分类汇总戳这里
↘️CVPR-2020-Papers ↘️ECCV-2020-Papers
目录
74.Place Recognition(位置识别)
- SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud Based Place Recognition<br>:open_mouth:oral:star:code
73.Object Re-identification(物体重识别)
<a name="72"/>72.Gaze Estimation(视线估计)
- Weakly-Supervised Physically Unconstrained Gaze Estimation<br>:open_mouth:oral:star:code
- Gaze 目标检测
71.Image-to-Image Translation(图像到图像翻译)
- High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network<br>:star:code
- CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation<br>:open_mouth:oral:house:project<br>解读:CoCosNet v2解锁“高配版”图像翻译
- Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation
- Saliency-Guided Image Translation
- Not Just Compete, but Collaborate: Local Image-to-Image Translation via Cooperative Mask Prediction
- Unpaired Image-to-Image Translation via Latent Energy Transport<br>:star:code
- 图像翻译
70.NLP(自然语言处理)
<a name="69"/>69.Transfer learning(迁移学习)
- 域迁移
68.Crowd Counting(计数)
- Learning To Count Everything<br>:star:code
67.Defect Detection(缺陷检测)
<a name="66"/>66.Optical Flow Estimation(光流估计)
- UPFlow:Upsampling Pyramid for Unsupervised Optical Flow Learning<br>粗解:8
- Learning Optical Flow from a Few Matches<br>:star:code
- Learning optical flow from still images<br>:star:code:house:project
- AutoFlow: Learning a Better Training Set for Optical Flow<br>:house:project<br>AutoFlow :CVPR 2021 Oral ,作者发明了一种专为光流算法训练而设计的数据渲染方法,所训练得到的PWC-Net 与 RAFT光流算法达到了SOTA,代码和数据将开源。
- UPFlow: Upsampling Pyramid for Unsupervised Optical Flow Learning<br>:star:code
65.Style Transfer(风格迁移)
- Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes<br>:star:code
- ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows<br>:star:code
- Lipstick ain't enough: Beyond Color Matching for In-the-Wild Makeup Transfer<br>:star:code
- Rethinking and Improving the Robustness of Image Style Transfer<br>:open_mouth:oral<br>解读:CVPR2021 最佳论文候选—提高图像风格迁移的鲁棒性
- Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer<br>:star:code
- Style-Aware Normalized Loss for Improving Arbitrary Style Transfer<br>:open_mouth:oral
- In the Light of Feature Distributions: Moment Matching for Neural Style Transfer<br>:star:code:house:project
- ArtCoder: An End-to-End Method for Generating Scanning-Robust Stylized QR Codes
- Adaptive Convolutions for Structure-Aware Style Transfer
- Learning To Warp for Style Transfer<br>:star:code
- Single-Shot Freestyle Dance Reenactment
- CT-Net: Complementary Transfering Network for Garment Transfer With Arbitrary Geometric Changes
- DualAST: Dual Style-Learning Networks for Artistic Style Transfer<br>:star:code
- What Can Style Transfer and Paintings Do For Model Robustness?<br>:star:code
- 运动迁移
64.Speech processing(语音处理)
- Can audio-visual integration strengthen robustness under multimodal attacks?<br>:star:code
- Robust Audio-Visual Instance Discrimination
- 立体音频生成
- Visually Informed Binaural Audio Generation without Binaural Audios<br>:star:code:house:project:tv:video
- 视听分离
- Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation<br>:house:project:tv:video
- Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation<br>:star:code
- VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency<br>:star:code:house:project
- 声音-视频解析
- A-V
- 语音人脸关联
63.Image Processing(图像处理)
- 图像信号处理
- Invertible Image Signal Processing<br>:star:code:house:project
- 光谱重建
- Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB<br>:open_mouth:oral
62.Free-Hand Sketches(手绘草图识别)
<a name="61"/>61.算法
- 因果推理算法
- ACRE: Abstract Causal REasoning Beyond Covariation<br>:star:code:house:project
- 抽象时空推理算法
60. SLAM/AR/机器人
- Tangent Space Backpropagation for 3D Transformation Groups<br>:star:code
- 视觉里程计
- 机器人
- Visual Room Rearrangement<br>:open_mouth:oral:house:project:tv:video
- GATSBI: Generative Agent-centric Spatio-temporal Object Interaction<br>:open_mouth:oral:star:code:tv:video
- DexYCB: A Benchmark for Capturing Hand Grasping of Objects<br>:star:code:house:project:tv:video
- ContactOpt: Optimizing Contact to Improve Grasps<br>:star:code<br>机器人手抓取
- ManipulaTHOR: A Framework for Visual Object Manipulation<br>:open_mouth:oral:star:code:house:project:tv:video
- 视觉导航
- AR
- Stay Positive: Non-Negative Image Synthesis for Augmented Reality<br>:open_mouth:oral:star:code
- HDR Environment Map Estimation for Real-Time Augmented Reality:tv:video
- NeuralHumanFVV: Real-Time Neural Volumetric Human Performance Rendering Using RGB Cameras
- 虚拟试穿
- VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization
- Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On<br>:house:project:tv:video
- Toward Accurate and Realistic Outfits Visualization with Attention to Details
- ANR: Articulated Neural Rendering for Virtual Avatars<br>:house:project
- Parser-Free Virtual Try-On via Distilling Appearance Flows<br>:star:code
59.Capsule Network(胶囊网络)(深度学习模型)
- Dynamic Slimmable Network<br>:open_mouth:oral:star:code
- Towards Evaluating and Training Verifiably Robust Neural Networks<br>:open_mouth:oral:star:code
- Activate or Not: Learning Customized Activation<br>:star:code<br>粗解:4<br>解读:CVPR 2021 | 自适应激活函数ACON: 统一ReLU和Swish的新范式
- DISCO: Dynamic and Invariant Sensitive Channel Obfuscation for Deep Neural Networks<br>:star:code
- Capsule Network(胶囊网络)
58.Metric Learning(度量学习/相似度学习)
- Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales<br>:star:code
- Embedding Transfer with Label Relaxation for Improved Metric Learning
- Noise-resistant Deep Metric Learning with Ranking-based Instance Selection<br>:star:code
- Unsupervised Hyperbolic Metric Learning
- Deep Compositional Metric Learning<br>:star:code
- SLADE: A Self-Training Framework for Distance Metric Learning
- Asymmetric Metric Learning for Knowledge Transfer<br>:star:code
- Relative Order Analysis and Optimization for Unsupervised Deep Metric Learning
57.Sign Language Recognition(手语识别)
- Skeleton Based Sign Language Recognition Using Whole-body Keypoints<br>:star:code
- Read and Attend: Temporal Localisation in Sign Language Videos<br>:house:project
- Fingerspelling Detection in American Sign Language
- 手语翻译
56.Computational Photography(光学、几何、光场成像、计算摄影)
- Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging<br>:star:code:house:project
- Mask-ToF: Learning Microlens Masks for Flying Pixel Correction in Time-of-Flight Imaging<br>:house:project
- Passive Inter-Photon Imaging<br>:open_mouth:oral
- Shape and Material Capture at Home<br>:star:code:house:project
- Event-based Synthetic Aperture Imaging with a Hybrid Network<br>分享会
- High-Speed Image Reconstruction Through Short-Term Plasticity for Spiking Cameras
- Leveraging the Availability of Two Cameras for Illuminant Estimation
- 相机姿势
- Fusing the Old with the New: Learning Relative Camera Pose with Geometry-Guided Uncertainty<br>:open_mouth:oral
- Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis<br>:star:code
- Uncertainty-Aware Camera Pose Estimation From Points and Lines<br>:star:code:house:project
- Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation<br>:star:code:house:project
- Wide-Baseline Relative Camera Pose Estimation with Directional Learning
- Camera Pose Matters: Improving Depth Prediction by Mitigating Pose Distribution Bias<br>:open_mouth:oral
- 室内照明估计
- Phase Retrieval相位恢复算法
55.Graph Matching(图匹配)
<a name="54"/>54.Emotion Perception(情绪感知/情感预测)
- Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality<br>:house:project
- Human Multimodal Emotion Recognition(人类多模态情感识别)
53.Dataset(数据集)
- Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts<br>:sunflower:dataset
- Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark<br>:house:project
- Benchmarking Representation Learning for Natural World Image Collections<br>:sunflower:dataset
- SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data<br>:open_mouth:oral:sunflower:dataset
- Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback<br>:sunflower:dataset
- Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges<br>:sunflower:dataset:tv:video<br>
- 人脸图像修饰数据集
- PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency<br>:star:code
- 室外场景
- OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets<br>:open_mouth:oral:sunflower:dataset:house:project
- 视觉艺术
- ArtEmis: Affective Language for Visual Art<br>:house:project主页中包含全部:数据集、代码、视频等
- UGC 视频质量评估
- 室内定位数据集
- 数据集(人类意图研究)
- Intentonomy: A Dataset and Study Towards Human Intent Understanding<br>:open_mouth:oral:star:code
- 人脸识别数据集
- 视觉属性预测数据集
- Learning To Predict Visual Attributes in the Wild<br>:sunflower:dataset
- 数据集(Object-Centric Videos)
- 视频场景解析
- VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild<br>:sunflower:dataset:house:project
- 数据集(手语)
52. Image Generation/Synthesis(图像生成)
- Spatially-Adaptive Pixelwise Networks for Fast Image Translation<br>:house:project<br>采用超网络和隐式函数,极快的图像到图像翻译速度(比基线快18倍)
- Image Generators with Conditionally-Independent Pixel Synthesis<br>:open_mouth:oral:star:code
-
Im2Vec: Synthesizing Vector Graphics without Vector Supervision<br>:open_mouth:oral:star:code:house:project
-
Context-Aware Layout to Image Generation with Enhanced Object Appearance<br>:star:code
-
StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis
-
Learning Semantic Person Image Generation by Region-Adaptive Normalization<br>:star:code
-
MUST-GAN: Multi-Level Statistics Transfer for Self-Driven Person Image Generation
-
Combining Semantic Guidance and Deep Reinforcement Learning for Generating Human Level Paintings<br>:star:code
-
Diverse Semantic Image Synthesis via Probability Distribution Modeling<br>:star:code
-
Mol2Image: Improved Conditional Flow Models for Molecule to Image Synthesis
51.Contrastive Learning(对比学习)
-
AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries<br>:star:code<br>解读:CVPR 2021接收论文:AdCo基于对抗的对比学习
-
LAFEAT: Piercing Through Adversarial Defenses with Latent Features<br>:open_mouth:oral:star:code
-
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning<br>:star:code
-
Mining Better Samples for Contrastive Learning of Temporal Correspondence
-
Jo-SRC: A Contrastive Approach for Combating Noisy Labels<br>:star:code
50.OCR
- Fourier Contour Embedding for Arbitrary-Shaped Text Detection
- Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter
- Sequence-to-Sequence Contrastive Learning for Text Recognition
- A Multiplexed Network for End-to-End, Multilingual OCR
- TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption
- 场景文本检测
- What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels<br>:star:code
- Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition<br>:open_mouth:oral:star:code
- MOST: A Multi-Oriented Scene Text Detector with Localization Refinement
- Scene Text Retrieval via Joint Text Detection and Similarity Learning<br>:star:code
- TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text<br>:house:project
- Progressive Contour Regression for Arbitrary-Shape Scene Text Detection<br>:star:code
- Dictionary-guided Scene Text Recognition<br>:star:code
- Primitive Representation Learning for Scene Text Recognition
- 手写文本识别
- 文本分割
- 视频文本检测
- 文本检测
49.Adversarial Learning(对抗学习)
- Simulating Unknown Target Models for Query-Efficient Black-box Attacks<br>:star:code<br>黑盒对抗攻击
- Delving into Data: Effectively Substitute Training for Black-box Attack<br>基于高效训练替代模型的黑盒攻击方法<br>解读:8
- LiBRe: A Practical Bayesian Approach to Adversarial Detection<br>:star:code
- Invisible Perturbations: Physical Adversarial Examples Exploiting the Rolling Shutter Effect
- Enhancing the Transferability of Adversarial Attacks Through Variance Tuning<br>:star:code
- Natural Adversarial Examples<br>:star:code
- SurFree: A Fast Surrogate-Free Black-Box Attack<br>:star:code
- Regularizing Neural Networks via Adversarial Model Perturbation<br>:star:code
- Adversarial Imaging Pipelines
- MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation
- Universal Spectral Adversarial Attacks for Deformable Shapes
- Adversarial Robustness Across Representation Spaces<br>:star:code
- Protecting Intellectual Property of Generative Adversarial Networks From Ambiguity Attacks<br>:star:code
- Dual Attention Suppression Attack: Generate Adversarial Camouflage in Physical World<br>:open_mouth:oral:star:code
- Learning Compositional Representation for 4D Captures with Neural ODE
- 对抗攻击
48.Image Representation(图像表示)
- Learning Continuous Image Representation with Local Implicit Image Function<br>:open_mouth:oral:star:code:house:project:tv:video
47.Vision-Language(视觉语言)
- Kaleido-BERT: Vision-Language Pre-training on Fashion Domain<br>
- Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning<br>:star:code
- UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training
- VinVL: Revisiting Visual Representations in Vision-Language Models<br>:star:code
- Connecting What To Say With Where To Look by Modeling Human Attention Traces<br>:star:code:house:project
- Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval
- VLN BERT: A Recurrent Vision-and-Language BERT for Navigation<br>:open_mouth:oral:star:code
- Transitional Adaptation of Pretrained Models for Visual Storytelling
- Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation<br>:star:code
- Causal Attention for Vision-Language Tasks<br>:star:code
46.Human-Object Interaction(人物交互)
- Learning Asynchronous and Sparse Human-Object Interaction in Videos
- QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information<br>:star:code
- Reformulating HOI Detection as Adaptive Set Prediction<br>:star:code
- Detecting Human-Object Interaction via Fabricated Compositional Learning<br>:star:code
- Affordance Transfer Learning for Human-Object Interaction Detection<br>:star:code
- Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection<br>:star:code
- Hierarchical Video Prediction Using Relational Layouts for Human-Object Interactions
45.Camera Localization(相机定位)
- Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments<br>:open_mouth:oral
- Back to the Feature: Learning Robust Camera Localization from Pixels to Pose<br>:star:code
- Learning Camera Localization via Dense Scene Matching<br>:star:code
- Privacy Preserving Localization and Mapping From Uncalibrated Cameras
- 视觉定位
- VS-Net: Voting with Segmentation for Visual Localization<br>:star:code:house:project:tv:video
44. Image/video Captioning(图像/视频字幕)
- Scan2Cap: Context-aware Dense Captioning in RGB-D Scans<br>:star:code:house:project:tv:video
- VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs<br>视频字幕、视频问答和视频对话任务的多模式框架
- Open-book Video Captioning with Retrieve-Copy-Generate Network
- 图像字幕
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles<br>:star:code
- Towards Accurate Text-based Image Captioning with Content Diversity Exploration<br>:star:code
- Image Change Captioning by Learning From an Auxiliary Task
- FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation<br>:star:code
- Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning
- Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship
43.Active Learning(主动学习)
<a name="42"/>42.Scene Flow Estimation(场景流估计)
- 场景流估计
- Self-Supervised Multi-Frame Monocular Scene Flow<br>:star:code
- HCRF-Flow: Scene Flow from Point Clouds with Continuous High-order CRFs and Position-aware Flow Embedding
- Self-Point-Flow: Self-Supervised Scene Flow Estimation from Point Clouds with Optimal Transport and Random Walk<br>:open_mouth:oral
- FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation<br>:star:code
- RAFT-3D: Scene Flow Using Rigid-Motion Embeddings
41. Representation Learning(表示学习(图像+字幕))
- VirTex: Learning Visual Representations from Textual Annotations<br>:star:code
- Exploring Simple Siamese Representation Learning<br>:open_mouth:oral:star:code
- Representation Learning via Global Temporal Alignment and Cycle-Consistency<br>:star:code
- SelfDoc: Self-Supervised Document Representation Learning
- CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models
- Unsupervised Hyperbolic Representation Learning via Message Passing Auto-Encoders<br>:star:code
- Boosting Video Representation Learning With Multi-Faceted Integration
40.Superpixel (超像素)
<a name="39"/>39.Debiasing(去偏见)
- Fair Attribute Classification through Latent Space De-biasing<br>:star:code:house:project<br>
- Reducing Domain Gap by Reducing Style Bias<br>:star:code
38.Class-Incremental learning(类增量学习)
- IIRC: Incremental Implicitly-Refined Classification<br>:house:project<br>
- Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning<br>:star:code
- DER: Dynamically Expandable Representation for Class Incremental Learning<br>:star:code
- Distilling Causal Effect of Data in Class-Incremental Learning<br>:star:code
- Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning
37. Continual Learning(持续学习)
- Training Networks in Null Space for Continual Learning<br>:open_mouth:oral:star:code
- Efficient Feature Transformations for Discriminative and Generative Continual Learning
- Rainbow Memory: Continual Learning with a Memory of Diverse Samples
- Rectification-based Knowledge Retention for Continual Learning
- Layerwise Optimization by Gradient Decomposition for Continual Learning
- Continual Learning via Bit-Level Information Preserving<br>:star:code
- Training Networks in Null Space of Feature Covariance for Continual Learning<br>:open_mouth:oral
- ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-Supervised Continual Learning
36.Action Detection and Recognition(动作检测与识别)
- Coarse-Fine Networks for Temporal Activity Detection in Videos<br>:star:code
- 3D CNNs with Adaptive Temporal Feature Resolutions<br>:star:code:house:project
- Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack<br>:tv:video
- BASAR:Black-box Attack on Skeletal Action Recognition<br>:house:project:tv:video<br>解读:对抗攻防新方向:动作识别算法容易被攻击!
- TDN: Temporal Difference Networks for Efficient Action Recognition<br>:star:code
- ACTION-Net: Multipath Excitation for Action Recognition<br>:star:code<br>解读:CVPR 2021 | 用于动作识别,即插即用、混合注意力机制的 ACTION 模块<br>解读:CVPR 2021 |针对强时序依赖,即插即用、混合注意力机制的 ACTION 模块
- No frame left behind: Full Video Action Recognition
- Recognizing Actions in Videos from Unseen Viewpoints
- Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories
- Motion Representations for Articulated Animation<br>:star:code:house:project:tv:video
- Home Action Genome: Cooperative Compositional Action Understanding
- Anticipating human actions by correlating past with the future with Jaccard similarity measures
- Graph-Based High-Order Relation Modeling for Long-Term Action Recognition
- Representing Videos As Discriminative Sub-Graphs for Action Recognition
- Three Birds with One Stone: Multi-Task Temporal Action Detection via Recycling Temporal Annotations
- Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization<br>:star:code
- Spatio-temporal Contrastive Domain Adaptation for Action Recognition
- Deep Analysis of CNN-Based Spatio-Temporal Representations for Action Recognition<br>:star:code
- Semi-Supervised Action Recognition With Temporal Contrastive Learning<br>:star:code:house:project:tv:video
- WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos
- BABEL: Bodies, Action and Behavior With English Labels<br>:star:code:house:project:tv:video
- 动作定位
- 时序动作定位
- Modeling Multi-Label Action Dependencies for Temporal Action Localization<br>:open_mouth:oral:star:code<br>提出基于注意力的网络架构来学习视频中的动作依赖性,用于解决多标签时间动作定位任务。
- The Blessings of Unlabeled Background in Untrimmed Videos<br>:star:code
- Temporal Context Aggregation Network for Temporal Action Proposal Refinement
- Learning Salient Boundary Feature for Anchor-free Temporal Action Localization<br>基于显著边界特征学习的无锚框时序动作定位<br>解读:10
- CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
- Action Unit Memory Network for Weakly Supervised Temporal Action Localization
- Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization<br>:star:code
- Uncertainty Guided Collaborative Training for Weakly Supervised Temporal Action Detection
- Video Actor Segmentation
- 动作分割
- Learning To Segment Actions From Visual and Language Instructions via Differentiable Weak Sequence Alignment
- 时序动作分割
- 无监督动作分割
- 监督动作分割
- Anchor-Constrained Viterbi for Set-Supervised Action Segmentation
- 视频动作分割
- Global2Local: Efficient Structure Search for Video Action Segmentation<br>:star:code<br>从全局到局部:面向视频动作分割的高效网络结构搜索<br>解读:19
- Video Moment Localization(视频时刻定位)
- 时空事件定位
- Multi-Shot Temporal Event Localization: A Benchmark<br>:star:code:house:project
35.Image Clustering(图像聚类)
- Improving Unsupervised Image Clustering With Robust Learning<br>:star:code<br>利用鲁棒学习改进无监督图像聚类技术<br>
- Jigsaw Clustering for Unsupervised Visual Representation Learning<br>:open_mouth:oral:star:code
- COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction<br>:star:code
34.Image Classification(图像分类)
- Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels<br>:star:code<br>
- Differentiable Patch Selection for Image Recognition<br>:star:code
- Achieving Robustness in Classification Using Optimal Transport With Hinge Regularization
- Are Labels Always Necessary for Classifier Accuracy Evaluation?
- 细粒度分类
- Fine-grained Angular Contrastive Learning with Coarse Labels<br>:open_mouth:oral<br>:star:code<br>使用自监督进行 Coarse Labels(粗标签)的细粒度分类方面的工作。粗标签与细粒度标签相比,更容易和更便宜,因为细粒度标签通常需要域专家。
- Graph-based High-Order Relation Discovery for Fine-grained Recognition<br>基于特征间高阶关系挖掘的细粒度识别方法<br>解读:20
- Few-Shot Classification with Feature Map Reconstruction Networks<br>:star:code:tv:video
- A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification<br>:open_mouth:oral
- GLAVNet: Global-Local Audio-Visual Cues for Fine-Grained Material Recognition
- Learning Deep Classifiers Consistent With Fine-Grained Novelty Detection
- Your "Flamingo" is My "Bird": Fine-Grained, or Not<br>:open_mouth:oral:star:code
- Discrimination-Aware Mechanism for Fine-Grained Representation Learning
- Neural Prototype Trees for Interpretable Fine-grained Image Recognition<br>:star:code
- 图像分类
- MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition<br>:star:code
- PML: Progressive Margin Loss for Long-tailed Age Classification<br>
- Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification<br>:house:project
- Capsule Network is Not More Robust than Convolutional Network
- Model-Contrastive Federated Learning<br>:star:code
- Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets<br>:open_mouth:oral:star:code:house:project
- Correlated Input-Dependent Label Noise in Large-Scale Image Classification<br>:open_mouth:oral:star:code
- Towards Robust Classification Model by Counterfactual and Invariant Data Generation<br>:star:code
- Dual-Stream Multiple Instance Learning Network for Whole Slide Image Classification With Self-Supervised Contrastive Learning<br>:star:code
- Generative Classifiers as a Basis for Trustworthy Image Classification<br>:star:code
- Synthesize-It-Classifier: Learning a Generative Classifier Through Recurrent Self-Analysis
- Background Splitting: Finding Rare Classes in a Sea of Background
- Permuted AdaIN: Reducing the Bias Towards Global Statistics in Image Classification
- Self-Supervised Wasserstein Pseudo-Labeling for Semi-Supervised Image Classification
- DAP: Detection-Aware Pre-training with Weak Supervision
- 半监督图像分类
- 视觉识别
- Fair Feature Distillation for Visual Recognition
- 长尾视觉识别
- Distribution Alignment: A Unified Framework for Long-tail Visual Recognition<br>:star:code
- Improving Calibration for Long-Tailed Recognition<br>:star:code
- Adversarial Robustness under Long-Tailed Distribution<br>:open_mouth:oral:star:code
- Disentangling Label Distribution for Long-Tailed Visual Recognition<br>:star:code
- Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-Balanced Samplings
- 物体分类
- Nearest Neighbor Matching(最近邻匹配)
- OOD检测
33.6D Pose Estimation(6D位姿估计)
- FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation<br>:open_mouth:oral:star:code<br>
- GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation<br>:star:code
- FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism<br>:open_mouth:oral:star:code
- Wide-Depth-Range 6D Object Pose Estimation in Space<br>:star:code
- DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency
- Single-view robot pose and joint angle estimation via render & compare<br>:open_mouth:oral:star:code:house:project:tv:video
- Keypoint-Graph-Driven Learning Framework for Object Pose Estimation
- StablePose: Learning 6D Object Poses From Geometrically Stable Patches
32.View Synthesis(视图合成)
- ID-Unet: Iterative Soft and Hard Deformation for View Synthesis<br>:open_mouth:oral:star:code
- NeX: Real-time View Synthesis with Neural Basis Expansion<br>:open_mouth:oral:house:project:tv:video<br>利用神经基础扩展的实时视图合成技术
- Layout-Guided Novel View Synthesis from a Single Indoor Panorama<br>:star:code
- Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes<br>:house:project
- Stable View Synthesis<br>:star:code
31.Open-Set Recognition(开放集识别)
- Counterfactual Zero-Shot and Open-Set Visual Recognition<br>:star:code<br>
- Few-shot Open-set Recognition by Transformation Consistency<br>
- Learning Placeholders for Open-Set Recognition<br>:open_mouth:oral
30.Neural rendering(神经渲染)
- DeRF: Decomposed Radiance Fields<br>:house:project<br>
- D-NeRF: Neural Radiance Fields for Dynamic Scenes<br>:house:project<br>
- Neural Lumigraph Rendering<br>:sunflower:dataset:house:project:tv:video<br>斯坦福大学
- AutoInt: Automatic Integration for Fast Neural Volume Rendering<br>:open_mouth:oral:house:project:tv:video<br>斯坦福大学
- pixelNeRF: Neural Radiance Fields from One or Few Images<br>:star:code:house:project:tv:video
- IBRNet: Learning Multi-View Image-Based Rendering<br>:house:project<br>备注:有学者评论pixelNeRF和IBRNet的工作思想相近,但IBRNet似乎更加成熟。
- Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans<br>:star:code:house:project:tv:video<br>浙大等学者发明的Neural Body算法,输入多角度视频可输出3D人体和新角度视图。
- NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis<br>:house:project:tv:video<br>在任意照明条件下,根据一组输入图像生成完整的3D场景
- Self-Supervised Visibility Learning for Novel View Synthesis<br>:star:code
- STaR: Self-Supervised Tracking and Reconstruction of Rigid Objects in Motion With Neural Rendering<br>:star:code:house:project:tv:video
- Pulsar: Efficient Sphere-Based Neural Rendering
- Learning Compositional Radiance Fields of Dynamic Human Heads<br>:open_mouth:oral:house:project
- NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
- Neural Geometric Level of Detail: Real-Time Rendering With Implicit 3D Shapes<br>:star:code:house:project
- Space-Time Neural Irradiance Fields for Free-Viewpoint Video<br>:house:project:tv:video
- Neural Scene Graphs for Dynamic Scenes<br>:open_mouth:oral:house:project:tv:video
- NeuTex: Neural Texture Mapping for Volumetric Neural Rendering
29.Human Pose Estimation(人体姿态估计)
- Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration<br>:star:code<br>
- Monocular Real-time Full Body Capture with Inter-part Correlations<br>:tv:video<br>在电影动作特效中,人体运动捕捉是关键技术,高质量的捕捉往往需要特殊设备,而如果能使用普通RGB相机进行运动捕捉,将会使人人都是特效师。该视频来自清华、马普所等单位的学者发表于CVPR2021的论文结果,使用单目RGB相机的动作捕捉。
- Behavior-Driven Synthesis of Human Dynamics<br>:star:code:house:project
- Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation<br>:star:code<br>粗解:2
- Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression<br>:star:code
- SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks<br>:open_mouth:oral:house:project
- On Self-Contact and Human Pose<br>:house:project
- Lite-HRNet: A Lightweight High-Resolution Network<br>:star:code<br>解读:Lite-HRNet:轻量级HRNet,FLOPs大幅下降
- Deep Dual Consecutive Network for Human Pose Estimation<br>:star:code
- 3D Human Action Representation Learning via Cross-View Consistency Pursuit<br>:star:code
- Body Meshes as Points<br>:star:code
- Unsupervised Human Pose Estimation through Transforming Shape Templates<br>:house:project
- When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks
- Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking
- 3D手部重建
- 人体运动迁移
- Human Volumetric Capture
- 3D人体姿态估计
- CanonPose: Self-supervised Monocular 3D Human Pose Estimation in the Wild<br>:star:code
- Context Modeling in 3D Human Pose Estimation: A Unified Perspective
- PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers<br>:star:code:tv:video<br>通过消除 location-dependent 透视效果来改进3D人体姿势估计技术工作。<br>
- Graph Stacked Hourglass Networks for 3D Human Pose Estimation
- Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors<br>:open_mouth:oral:house:project
- SimPoE: Simulated Character Control for 3D Human Pose Estimation<br>:open_mouth:oral:house:project
- Reconstructing 3D Human Pose by Watching Humans in the Mirror<br>:open_mouth:oral:star:code:house:project
- Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo<br>:star:code
- PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation<br>:open_mouth:oral:star:code
- AGORA: Avatars in Geography Optimized for Regression Analysis<br>:house:project
- Intelligent Carpet: Inferring 3D Human Pose From Tactile Signals
- HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation<br>:star:code
- Neural Descent for Visual 3D Human Pose and Shape
- Probabilistic 3D Human Shape and Pose Estimation from Multiple Unconstrained Images in the Wild
- 动物姿态估计
- From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation<br>:open_mouth:oral:star:code:tv:video
- 3D人体网格配准
- 多人人体重建
- 3D人体运动
- We are More than Our Joints: Predicting how 3D Bodies Move<br>:house:project:tv:video<br>分享会
- 人体运动捕捉
- 多人姿态估计
- FCPose: Fully Convolutional Multi-Person Pose Estimation with Dynamic Instance-Aware Convolutions<br>:star:code<br>FCPose,无 ROI 和无分组的端到端可训练人体姿势估计器可以达到更好的准确性和速度,在 COCO 数据集上,使用 DLA-34 主干的 FCPose 实时版本比 Mask R-CNN(ResNet-101)快 4.5 倍(41.67FPS vs. 9.26FPS),同时实现了性能的提高。与最近的自上而下和自下而上的方法相比,FCPose 还实现了更好的速度/准确度权衡。
- Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks<br>:star:code
- 手-物交互姿态估计
- 人体关键点检测
- 3D人体形状
- 人体动画(姿势迁移)
- 基于人体感应的3D健身训练自动系统
- 三维人体运动
- Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes<br>:star:code:house:project:tv:video
- 三维人体重建
- 手势到手势翻译
- 3D人体运动预测
- 手势识别
- 三维人体网格重建
- 微观手势情感分析
- Dense Human Correspondences
28.Dense prediction(密集预测)
- Densely connected multidilated convolutional networks for dense prediction tasks<br>提出的D3Net在语义分割&音乐源分离任务上的表现优于SOTA网络<br>
- Dense Contrastive Learning for Self-Supervised Visual Pre-Training<br>:open_mouth:oral:star:code
- Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning<br>:star:code
- Densely Connected Multi-Dilated Convolutional Networks for Dense Prediction Tasks<br>:star:code
27.Semantic Line Detection(语义线检测)
<a name="26"/>26.Video Processing(视频相关技术)
- Skip-Convolutions for Efficient Video Processing
- VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples<br>:star:code
- Learning by Aligning Videos in Time
- Hierarchical Motion Understanding via Motion Programs<br>:house:project:tv:video
- Stochastic Image-to-Video Synthesis using cINNs<br>:star:code:house:project
- Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions<br>:house:project
- Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
- Learning To Reconstruct High Speed and High Dynamic Range Videos From Events
- 视频摘要
- Learning Discriminative Prototypes with Dynamic Time Warping<br>:star:code
- Learning Triadic Belief Dynamics in Nonverbal Communication from Videos<br>:open_mouth:oral:star:code
- 视频编解码
- MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing<br>:star:code
- FVC: A New Framework towards Deep Video Compression in Feature Space<br>:open_mouth:oral
- Memory-Efficient Network for Large-Scale Video Compressive Sensing<br>:star:code
- Deep Learning in Latent Space for Video Prediction and Compression<br>:star:code
- 视频插帧
- FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation<br>:star:code:house:project<br>
- Deep Animation Video Interpolation in the Wild<br>:star:code
- TimeLens: Event-based Video Frame Interpolation<br>:star:code:sunflower:dataset:tv:video
- Time Lens: Event-based Video Frame Interpolation<br>:star:code:house:project:tv:video
- 视频语言学习(video-and-language learning)
- Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling<br>:open_mouth:oral:star:code<br>
- 视频预测
- Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction<br>:house:project:tv:video
- Learning Semantic-Aware Dynamics for Video Prediction
- Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning <br>:star:code<br>解读:引入记忆模块,突破长距离依赖视频预测的性能瓶颈
- Learning Goals from Failure<br>:star:code:house:project
- MotionRNN: A Flexible Model for Video Prediction With Spacetime-Varying Motions
- 视频理解
- Context-aware Biaffine Localizing Network for Temporal Sentence Grounding<br>:star:code
- Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos<br>:house:project
- Visual Semantic Role Labeling for Video Understanding<br>:house:project
- Temporal Query Networks for Fine-grained Video Understanding<br>:open_mouth:oral:house:project
- Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
- FrameExit: Conditional Early Exiting for Efficient Video Recognition<br>:open_mouth:oral
- Towards Long-Form Video Understanding
- 视频缩放
- 视频异常检测
- 视频声源定位
- Localizing Visual Sounds the Hard Way<br>:star:code:house:project
- 视频分析
- 视频生成
- Playable Video Generation<br>:open_mouth:oral:star:code:house:project:tv:video
- One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing<br>:open_mouth:oral:star:code:house:project:tv:video<br>解读:颠覆视频压缩的不一定是新压缩算法,而可能是GAN!英伟达新算法最高压缩90%流量<br>Nvidia的新研究,使用人脸关键点+GAN重建视频通话,相比传统的H.264节省90%流量。代码未开源,但英伟达的GAN框架开源了。
- 视频视角切换
- Action Selection Learning
- 视频描述
- 视频分类
- 视频字幕
- Video Grounding
- 视频修复
- Progressive Temporal Feature Alignment Network for Video Inpainting<br>:star:code<br>作者提出 Progressive Temporal Feature Alignment Network,利用光流从相邻帧中提取的特征逐步丰富当前帧的特征。纠正了时空特征传播阶段的 spatial misalignment,极大地提高了 inpainted videos 的视觉质量和时空一致性。在 DAVIS 和 FVI 数据集上实现了与现有深度学习方法相比的最先进性能。
- Restore From Restored: Video Restoration With Pseudo Clean Video<br>:star:code
- 视频去模糊化
- 视频去噪
- 视频质量评估
- 视频动作计数
- Repetitive Activity Counting by Sight and Sound<br>:star:code:tv:video
- 视频稳定
- 3D Video Stabilization With Depth Estimation by CNN-Based Optimization<br>:tv:video
- Real-Time Selfie Video Stabilization<br>:star:code
- 视频去雨
- video looping technique
- Animating Pictures with Eulerian Motion Fields<br>:house:project:tv:video
- 视频识别
- 行为识别
- 视频表征学习
- 视频编码
25.3D(三维视觉)
- A Deep Emulator for Secondary Motion of 3D Characters<br>:open_mouth:oral:house:project
- Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction<br>:open_mouth:oral:house:project:tv:video<br>
- Deep Implicit Templates for 3D Shape Representation<br>:open_mouth:oral:star:code:house:project:tv:video<br>CVPR 2021 Oral,清华学者提出Deep Implicit Templates,极大扩展DIF能力<br>
- SMPLicit: Topology-aware Generative Model for Clothed People<br>:house:project
- Picasso: A CUDA-based Library for Deep Learning over 3D Meshes<br>:star:code
- Semi-supervised Synthesis of High-Resolution Editable Textures for 3D Humans
- RGB-D Local Implicit Function for Depth Completion of Transparent Objects<br>:house:project
- Deep Two-View Structure-from-Motion Revisited
- Deformed Implicit Field: Modeling 3D Shapes with Learned Dense Correspondence
- S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
- Deep Polarization Imaging for 3D Shape and SVBRDF Acquisition<br>:open_mouth:oral:house:project:tv:video
- Learning Feature Aggregation for Deep 3D Morphable Models<br>:star:code
- Plan2Scene: Converting Floorplans to 3D Scenes<br>:star:code:house:project:tv:video
- View Generalization for Single Image Textured 3D Models<br>:house:project:tv:video
- Mirror3D: Depth Refinement for Mirror Surfaces<br>:star:code:house:project
- Learning To Recover 3D Scene Shape From a Single Image<br>:star:code
- Normal Integration via Inverse Plane Fitting With Minimum Point-to-Plane Distance<br>:star:code
- Shelf-Supervised Mesh Prediction in the Wild<br>:house:project
- Unsupervised Learning of 3D Object Categories From Videos in the Wild
- DeepVideoMVS: Multi-View Stereo on Video With Recurrent Spatio-Temporal Fusion<br>:star:code:tv:video
- NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go
- Learning Monocular 3D Reconstruction of Articulated Categories From Motion<br>:star:code:house:project
- Deep Active Surface Models
- Neural Splines: Fitting 3D Surfaces With Infinitely-Wide Neural Networks<br>:open_mouth:oral:star:code
- Learning View Selection for 3D Scenes
- StruMonoNet: Structure-Aware Monocular 3D Prediction
- Physically-Aware Generative Network for 3D Shape Modeling
- Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach
- DeepSurfels: Learning Online Appearance Fusion<br>:star:code:house:project:tv:video
- 深度估计
- PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss
- Beyond Image to Depth: Improving Depth Prediction using Echoes<br>:star:code:house:project
- Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos<br>:open_mouth:oral:star:code:house:project:tv:video
- LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering<br>:open_mouth:oral:star:code:house:project
- S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation<br>:open_mouth:oral
- Depth Completion with Twin Surface Extrapolation at Occlusion Boundaries<br>:star:code
- Self-supervised Learning of Depth Inference for Multi-view Stereo<br>:star:code
- SMD-Nets: Stereo Mixture Density Networks<br>:star:code
- The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth<br>:star:code
- Single Image Depth Estimation using Wavelet Decomposition<br>:star:code
- Differentiable Diffusion for Dense Depth Estimation from Multi-view Images<br>:star:code:house:project:tv:video
- SliceNet: Deep Dense Depth Estimation From a Single Indoor Panorama Using a Slice-Based Representation
- AdaBins: Depth Estimation Using Adaptive Bins
- Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion<br>:star:code
- S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation
- LED2-Net: Monocular 360deg Layout Estimation via Differentiable Depth Rendering<br>:open_mouth:oral:star:code:house:project
- Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks
- Robust Consistent Video Depth Estimation<br>:house:project:tv:video
- 单目深度估计
- Monocular Depth Estimation via Listwise Ranking Using the Plackett-Luce Model
- Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging<br>:star:code:house:project:tv:video
- 3D Packing for Self-Supervised Monocular Depth Estimation<br>:open_mouth:oral:star:code
- 深度预测
- 三维重建
- Deep Implicit Moving Least-Squares Functions for 3D Reconstruction<br>:star:code
- Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction<br>:house:project
- Learning Parallel Dense Correspondence from Spatio-Temporal Descriptors for Efficient and Robust 4D Reconstruction<br>:star:code
- Fostering Generalization in Single-view 3D Reconstruction by Learning a Hierarchy of Local and Global Shape Priors
- NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video<br>:open_mouth:oral:star:code:house:project
- Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction<br>:star:code:house:project:tv:video
- CodedStereo: Learned Phase Masks for Large Depth-of-field Stereo<br>:open_mouth:oral
- SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements<br>:house:project:tv:video
- LASR: Learning Articulated Shape Reconstruction from a Monocular Video<br>:house:project
- Sketch2Model: View-Aware 3D Modeling from Single Free-Hand Sketches
- Birds of a Feather: Capturing Avian Shape Models from Images<br>:house:project:tv:video
- Multi-view 3D Reconstruction of a Texture-less Smooth Surface of Unknown Generic Reflectance<br>:star:code
- Generative PointNet: Deep Energy-Based Learning on Unordered Point Sets for 3D Generation, Reconstruction and Classification<br>:star:code:house:project
- From Points to Multi-Object 3D Reconstruction
- DI-Fusion: Online Implicit 3D Reconstruction With Deep Priors<br>:star:code
- D2IM-Net: Learning Detail Disentangled Implicit Fields From Single Images
- Residential Floor Plan Recognition and Reconstruction
- Indoor Panorama Planar 3D Reconstruction via Divide and Conquer
- Single-View 3D Object Reconstruction from Shape Priors in Memory
- Deep Optimized Priors for 3D Shape Modeling and Reconstruction
- MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera<br>:star:code:house:project:tv:video
- PluckerNet: Learn to Register 3D Line Reconstructions
- 三维网格重建
- 语义场景补全
- 三维关键点
- KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control<br>:open_mouth:oral:star:code:house:project:tv:video
- 三维形状补全
- Unsupervised 3D Shape Completion through GAN Inversion<br>:star:code:house:project
- 三维形状适配
- 三维压缩
- Stereo Matching-立体匹配
- Depth Completion-深度补全
- 三维网格
- DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes With Biharmonic Coordinates<br>:open_mouth:oral:star:code
- 3D形状
- depth map fusion
- 网格重建
- 3D morphable model(三维形变模型)
24.Reinforcement Learning(强化学习)
- Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph<br>:star:code:house:project
- Unsupervised Learning for Robust Fitting:A Reinforcement Learning Approach
- Unsupervised Visual Attention and Invariance for Reinforcement Learning
- Unsupervised Learning for Robust Fitting: A Reinforcement Learning Approach<br>:star:code
- Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning<br>:star:code
23.Autonomous Driving(自动驾驶)
- Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition<br>:star:code<br>ECCV 2020 Facebook Mapillary Visual Place Recognition Challenge 冠军方案
- AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
- Self-Supervised Pillar Motion Learning for Autonomous Driving<br>:star:code
- Learning by Watching
- Binary TTC: A Temporal Geofence for Autonomous Navigation<br>:star:code:tv:video
- GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving<br>:open_mouth:oral:house:project:tv:video
- 车道线预测
- LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of Dynamic Agents
- Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction<br>:open_mouth:oral
- Focus on Local: Detecting Lane Marker from Bottom Up via Key Point
- Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection<br>:star:code
- 轨迹预测
- SGCN:Sparse Graph Convolution Network for Pedestrian Trajectory Prediction<br>:star:code
- Pedestrian and Ego-Vehicle Trajectory Prediction From Monocular Camera<br>:star:code
- Trajectory Prediction With Latent Belief Energy-Based Model<br>:star:code
- Shared Cross-Modal Trajectory Prediction for Autonomous Driving<br>:open_mouth:oral
- 人体轨迹预测
- 交通场景
- 车辆重识别
- HD map reconstruction
- HD 图生成
- 车辆检测
- 车辆姿态估计
22.Medical Imaging(医学影像)
- 3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management<br>用纯多模态 CT 影像可替代目前 JHMI 的需要做肿瘤化学检测和 DNA 测序+医学影像的综合多模态诊断流程,从诊断准确度上有可比较性,定量诊断精度更优<br>
- Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies<br>:star:code<br>肿瘤影像里面智能 PACS 辅助医生读片的重要功能<br>
- Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-constrained Optimization<br>基于CT 影像的骨折/骨质疏松系统<br>
- Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning<br>:star:code<br>多机构合作,利用联合学习改进基于深度学习的磁共振图像重建技术<br>
- DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images<br>:open_mouth:oral:star:code<br>DeepTag: 一种无监督的深度学习方法,用于心脏标记磁共振图像的运动跟踪<br>
- Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles
- XProtoNet: Diagnosis in Chest Radiography with Global and Local Explanations
- Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation
- 医学图像分割
- FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space<br>:star:code
- DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets<br>:star:code:sunflower:dataset
- DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation<br>:open_mouth:oral
- DARCNN: Domain Adaptive Region-based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images<br>
- Every Annotation Counts: Multi-label Deep Supervision for Medical Image Segmentation
- Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling<br>:star:code
- clDice - A Novel Topology-Preserving Loss Function for Tubular Structure Segmentation
- 医学图像合成
- 手术技能评估
- 微创手术
- 放射线报告生成
- MR图像重建
- 关键点检测与跟踪
- X光检测
21.Transformer
- Transformer Interpretability Beyond Attention Visualization<br>:star:code<br>
- MIST: Multiple Instance Spatial Transformer Network<br>:star:code<br>试图从热图中进行可微的top-K选择(MIST)(目前在自然图像上也有了一些结果;) 用它可以在没有任何定位监督的情况下进行检测和分类(并不是它唯一能做的事情!)
- Variational Transformer Networks for Layout Generation
- Lesion-Aware Transformers for Diabetic Retinopathy Grading
- Gaussian Context Transformer
- 小样本动作识别
- 目标检测
- UP-DETR: Unsupervised Pre-training for Object Detection with Transformers<br>:open_mouth:oral:star:code
- 单样本目标检测
- 图像处理
- Pre-Trained Image Processing Transformer<br>:star:code:star:gitee
- 人机交互
- 图像分割
- 语义分割
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers<br>:star:code:house:project<br>基于Transformers从序列到序列的角度重新思考语义分割<br>解读:16<br>解读:Transformer 在语义分割中的应用,曾位ADE20K 榜首(44.42% mIoU)
- Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation
- 视频实例分割
- VisTR: End-to-End Video Instance Segmentation with Transformers<br>:open_mouth:oral:star:code
- 全景分割
- 语义分割
- 跟踪
- Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking<br>:open_mouth:oral:star:code<br>more:Transformer再蓄力,跟踪任务中创新高,桥接独立帧,跨帧传递时域信息,CVPR 2021 Oral
- Transformer Tracking<br>:star:code
- 动作预测
- Multimodal Motion Prediction with Stacked Transformers<br>:star:code:house:project:tv:video
- Self-attention自注意力机制
- Scaling Local Self-Attention For Parameter Efficient Visual Backbones<br>:open_mouth:oral<br>解读:超越卷积的自注意力模型,谷歌、UC伯克利提出HaloNet
- 检索
- 特征匹配
- 姿势识别
- 自动驾驶
- 视觉识别
- Video Hashing
- 视觉和语言导航
- 人体姿态与网格重建
- 直线段检测
- Line Segment Detection Using Transformers Without Edges<br>:open_mouth:oral:star:code
- 图像分类
- 时序语言定位
- 场景布局
- 面部动作单元检测
- 高分辨率图像合成
- Taming Transformers for High-Resolution Image Synthesis<br>:open_mouth:oral:star:code
20.Person Re-Identification(人员重识别)
- Meta Batch-Instance Normalization for Generalizable Person Re-Identification<br>:star:code
- Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification
- Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification<br>:star:code
- Intra-Inter Camera Similarity for Unsupervised Person Re-Identification<br>:star:code<br>论文公开
- Anchor-Free Person Search<br>:star:code
- Lifelong Person Re-Identification via Adaptive Knowledge Accumulation<br>:star:code
- Group-aware Label Transfer for Domain Adaptive Person Re-identification<br>:star:code|code
- Neural Feature Search for RGB-Infrared Person Re-Identification
- Combined Depth Space based Architecture Search For Person Re-identification
- Unsupervised Multi-Source Domain Adaptation for Person Re-Identification<br>:open_mouth:oral
- Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos<br>:open_mouth:oral
- BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-Identification<br>:star:code
- Generalizable Person Re-identification with Relevance-aware Mixture of Experts
- Person30K: A Dual-Meta Generalization Network for Person Re-Identification
- Prototype-Guided Saliency Feature Learning for Person Search
- UnrealPerson: An Adaptive Pipeline Towards Costless Person Re-Identification<br>:star:code
- Learning to Generalize Unseen Domains via Memory-based Multi-Source Meta-Learning for Person Re-Identification<br>:star:code
- Farewell to Mutual Information: Variational Distillation for Cross-Modal Person Re-Identification
- Learning 3D Shape Feature for Texture-Insensitive Person Re-Identification
- Partial Person Re-Identification With Part-Part Correspondence Learning
- Coarse-To-Fine Person Re-Identification With Auxiliary-Domain Classification and Second-Order Information Bottleneck
- Unsupervised Pre-Training for Person Re-Identification
- Joint Generative and Contrastive Learning for Unsupervised Person Re-Identification<br>:star:code:tv:video
- Wide-Baseline Multi-Camera Calibration Using Person Re-Identification
- Watching You: Global-Guided Reciprocal Learning for Video-Based Person Re-Identification<br>:star:code
- Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification
- Person Re-identification using Heterogeneous Local Graph Attention Networks
- Fine-Grained Shape-Appearance Mutual Learning for Cloth-Changing Person Re-Identification
- 拥挤人群计数
- 基于 Transformer
- 行人检测
- 行人跟踪
- Tracking Pedestrian Heads in Dense Crowd<br>:star:code:house:project
- 步态识别
19.Quantization/Pruning/Knowledge Distillation/Model Compression(量化、剪枝、蒸馏、模型压缩/扩展与优化)
- Learning Student Networks in the Wild<br>:star:code
- ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network<br>:star:code<br>
- RepVGG: Making VGG-style ConvNets Great Again<br>:star:code<br>
- Coordinate Attention for Efficient Mobile Network Design<br>:star:code
- 剪枝
- Manifold Regularized Dynamic Network Pruning
- Neural Response Interpretation through the Lens of Critical Pathways<br>:star:code|code
- Riggable 3D Face Reconstruction via In-Network Optimization<br>:star:code
- Towards Compact CNNs via Collaborative Compression
- BCNet: Searching for Network Width with Bilaterally Coupled Network
- The Lottery Ticket Hypothesis for Object Recognition
- Network Pruning via Performance Maximization
- Convolutional Neural Network Pruning With Structural Redundancy Reduction
- 模型扩展
- Fast and Accurate Model Scaling<br>:star:code
- 量化
- Learnable Companding Quantization for Accurate Low-bit Neural Networks
- Diversifying Sample Generation for Accurate Data-Free Quantization
- Zero-shot Adversarial Quantization<br>:open_mouth:oral:star:code
- Network Quantization with Element-wise Gradient Scaling<br>:star:code:house:project
- Automated Log-Scale Quantization for Low-Cost Deep Neural Networks
- Optimal Quantization Using Scaled Codebook
- QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks
- Distribution-Aware Adaptive Multi-Bit Quantization
- Scalability vs. Utility: Do We Have to Sacrifice One for the Other in Data Importance Quantification?<br>:star:code
- Permute, Quantize, and Fine-Tune: Efficient Compression of Neural Networks<br>:star:code
- 知识蒸馏
- Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation<br>:star:code
- Complementary Relation Contrastive Distillation
- Distilling Knowledge via Knowledge Review<br>:star:code
- Learning From the Master: Distilling Cross-Modal Advanced Knowledge for Lip Reading
- Multi-Scale Aligned Distillation for Low-Resolution Detection<br>:star:code
- Tree-Like Decision Distillation
- Revisiting Knowledge Distillation: An Inheritance and Exploration Framework
- Wasserstein Contrastive Representation Distillation
- Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation
- EvDistill: Asynchronous Events To End-Task Learning via Bidirectional Reconstruction-Guided Cross-Modal Knowledge Distillation
- 可逆神经网络
- 模型压缩
- 模型优化
18.Aerial/Drones/Satellite/RS Image(航空影像/无人机)
- UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
- Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark<br>:star:code
- SIPSA-Net: Shift-Invariant Pan Sharpening with Moving Object Alignment for Satellite Imagery<br>:star:code
- 航空影像分割
- 航空影像检测
- 无人机检测
- 多视角卫星摄影测量
17.Super-Resolution(超分辨率)
- Data-Free Knowledge Distillation For Image Super-Resolution<br>:star:code
- AdderSR: Towards Energy Efficient Image Super-Resolution<br>:star:code<br>
- Cross-MPI: Cross-scale Stereo for Image Super-Resolution using Multiplane Images<br>:house:project:tv:video<br>CVPR 2021,Cross-MPI以底层场景结构为线索的端到端网络,在大分辨率(x8)差距下也可完成高保真的超分辨率
- ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic<br>:star:code
- Robust Reference-based Super-Resolution via C²-Matching<br>:star:code
- GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution<br>:open_mouth:oral:house:project<br>解读:CVPR 2021 Oral | GLEAN: 基于隐式生成库的高倍率图像超分辨率
- BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond<br>:star:code:house:project
- Temporal Modulation Network for Controllable Space-Time Video Super-Resolution<br>:star:code作者主页<br>基于时空特征可控插值的视频超分辨率网络<br>解读:18
- Unsupervised Degradation Representation Learning for Blind Super-Resolution<br>:star:code
- SRWarp: Generalized Image Super-Resolution under Arbitrary Transformation<br>:star:code
- MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution<br>:star:code<br>作者提出用于 RefSR 的新方法:MASA 网络,包含两个新设计的模块。其中 Match (匹配)和 Extraction(提取)模块大大降低了计算成本。Spatial Adaptation(空间适应)模块用来学习 LR 和 Ref 图像之间的分布差异,并以空间适应的方式将参考特征的分布 remaps(重新映射)为 LR特征的分布。以此更加鲁棒地处理不同的参考图像。
- Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
- Exploring Sparsity in Image Super-Resolution for Efficient Inference<br>:star:code
- Neural Side-by-Side: Predicting Human Preferences for No-Reference Super-Resolution Evaluation<br>:star:code
- Tackling the Ill-Posedness of Super-Resolution Through Adaptive Target Generation<br>:star:code
- LAU-Net: Latitude Adaptive Upscaling Network for Omnidirectional Image Super-Resolution<br>:star:code
- Image Super-Resolution With Non-Local Sparse Attention
- Unsupervised Real-World Image Super Resolution via Domain-Distance Aware Training<br>:star:code
- Single Pair Cross-Modality Super Resolution
- End-to-End Learning for Joint Image Demosaicing, Denoising and Super-Resolution
- Learning Scene Structure Guidance via Cross-Task Knowledge Transfer for Single Depth Super-Resolution
- Deep Burst Super-Resolution
- Light Field Super-Resolution With Zero-Shot Learning
- Fast Bayesian Uncertainty Estimation and Reduction of Batch Normalized Single Image Super-Resolution Network<br>:star:code:house:project
- Practical Single-Image Super-Resolution Using Look-Up Table<br>:star:code
- Interpreting Super-Resolution Networks With Local Attribution Maps
- Scene Text Telescope: Text-Focused Scene Image Super-Resolution
- 盲超分辨
- 视频超分辨率
16.Visual Question Answering(视觉问答)
- Counterfactual VQA: A Cause-Effect Look at Language Bias<br>:star:code
- AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning<br>:house:project:tv:video
- Domain-robust VQA with diverse datasets and methods but no target labels<br>:house:project
- Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules<br>:star:code
- Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing
- Roses Are Red, Violets Are Blue... but Should VQA Expect Them To?<br>:sunflower:dataset
- Predicting Human Scanpaths in Visual Question Answering
- Separating Skills and Concepts for Novel Visual Question Answering<br>:star:code
- How Transferable Are Reasoning Patterns in VQA?<br>:star:code:house:project:tv:video
- Explicit Knowledge Incorporation for Visual Reasoning
- KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
- Image-Text Matching
- 视频问答
- TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events<br>:star:code
- Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions<br>:star:code
- Look Before You Speak: Visually Contextualized Utterances
- 交通相关VQA
15.GAN
- Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing<br>:star:code
- Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs<br>:star:code
- Efficient Conditional GAN Transfer with Knowledge Propagation across Classes<br>:star:code
- Anycost GANs for Interactive Image Synthesis and Editing<br>:star:code:house:project:tv:video<br>Anycost GAN,可适应广泛的硬件和延迟要求,以及实现交互式图像编辑
- TediGAN: Text-Guided Diverse Image Generation and Manipulation<br>:star:code:house:project:tv:video
- Generative Hierarchical Features from Synthesizing Images<br>:open_mouth:oral:star:code:house:project<br>作者称预训练 GAN 生成器可以当作是一种学习的多尺度损失。用它进行训练可以带来高度竞争的层次化和分离的视觉特征,称之为生成层次化特征(GH-Feat)。并进一步表明,GH-Feat不仅有利于生成性任务,更重要的是有利于分辨性任务,包括人脸验证、关键点检测、layout prediction、迁移学习、style mixing、图像编辑等。
- Teachers Do More Than Teach: Compressing Image-to-Image Models<br>:star:code
- PISE: Person Image Synthesis and Editing with Decoupled GAN<br>:star:code
- LOHO: Latent Optimization of Hairstyles via Orthogonalization<br>:star:code
- HumanGAN: A Generative Model of Humans Images
- HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms<br>:star:code
- DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network<br>:star:code
- pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis<br>:open_mouth:oral:house:project:tv:video<br>更多:斯坦福学者提出周期性隐式生成对抗网络(π-GAN或pi-GAN),用于高质量的3D感知图像合成<br>斯坦福大学
- ReMix: Towards Image-to-Image Translation with Limited Data
- Unsupervised Disentanglement of Linear-Encoded Facial Semantics
- Content-Aware GAN Compression
- Regularizing Generative Adversarial Networks under Limited Data<br>:star:code:house:project
- Where and What? Examining Interpretable Disentangled Representations<br>:star:code
- Few-shot Image Generation via Cross-domain Correspondence<br>:house:project
- DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort<br>:open_mouth:oral
- Surrogate Gradient Field for Latent Space Manipulation
- StylePeople: A Generative Model of Fullbody Human Avatars<br>:house:project
- Ensembling with Deep Generative Views<br>:star:code:house:project
- Continuous Face Aging via Self-estimated Residual Age Embedding
- Blur, Noise, and Compression Robust Generative Adversarial Networks
- Adaptive Weighted Discriminator for Training Generative Adversarial Networks<br>:star:code
- DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort<br>:house:project
- House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects<br>:star:code:house:project
- Roof-GAN: Learning To Generate Roof Geometry and Relations for Residential Houses<br>:star:code
- Exploring Adversarial Fake Images on Face Manifold
- Hyper-LifelongGAN: Scalable Lifelong Learning for Image Conditioned Generation
- GANmut: Learning Interpretable Conditional Space for Gamut of Emotions<br>:star:code
- StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation
- Positional Encoding As Spatial Inductive Bias in GANs<br>:star:code:house:project
- Partition-Guided GANs
- 3D Shape Generation With Grid-Based Implicit Functions
- Linear Semantics in Generative Adversarial Networks<br>:star:code:house:project:tv:video
- Cross-Modal Contrastive Learning for Text-to-Image Generation
- Lifting 2D StyleGAN for 3D-Aware Face Generation
- Unsupervised Learning of Depth and Depth-of-Field Effect From Natural Images With Aperture Rendering Generative Adversarial Networks<br>:open_mouth:oral:house:project
- Training Generative Adversarial Networks in One Stage<br>:star:code
- Self-Supervised Video GANs: Learning for Appearance Consistency and Motion Coherency
- Closed-Form Factorization of Latent Semantics in GANs<br>:open_mouth:oral:star:code:house:project:tv:video
- Discovering Interpretable Latent Space Directions of GANs Beyond Binary Attributes<br>:star:code
- Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement
- L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing
- Spatially-invariant Style-codes Controlled Makeup Transfer<br>:star:code
- 无监督图像合成
- 图像到图像翻译
- Memory-guided Unsupervised Image-to-image Translation
- Image-to-image Translation via Hierarchical Style Disentanglement<br>:open_mouth:oral:star:code<br>在图像到图像翻译上实现层次风格解耦
- CoMoGAN: continuous model-guided image-to-image translation<br>:open_mouth:oral:star:code
- Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation<br>:star:code:house:project<br>
- 图像编辑
- 人脸图像合成
14.Few-Shot/Zero-Shot Learning,Domain Generalization/Adaptation(小/零样本学习,域适应,域泛化)
- 小样本学习
- Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning<br>
- Learning Dynamic Alignment via Meta-filter for Few-shot Learning<br>作者主页<br>通过元卷积核实现基于动态对齐的小样本学习<br>解读:17
- ECKPN: Explicit Class Knowledge Propagation Network for Transductive Few-shot Learning
- Mutual CRF-GNN for Few-Shot Learning
- Rethinking Class Relations: Absolute-Relative Supervised and Unsupervised Few-Shot Learning
- Pareto Self-Supervised Training for Few-Shot Learning
- Reinforced Attention for Few-Shot Learning and Beyond
- Using Shape to Categorize: Low-Shot Learning with an Explicit Shape Bias<br>:house:project
- Prototype Completion With Primitive Knowledge for Few-Shot Learning<br>:star:code
- 域泛化
- FSDR: Frequency Space Domain Randomization for Domain Generalization<br>受 JPEG 将空间图像转换为多个频率分量(FCs)的启发,提出频率空间域随机化(FSDR),通过保留域变量FCs(DIFs)和只随机化域变量FCs(DVFs)来随机化频率空间的图像。
- Domain Generalization via Inference-time Label-Preserving Target Projections
- Adaptive Methods for Real-World Domain Generalization<br>:open_mouth: Oral
- Progressive Domain Expansion Network for Single Domain Generalization<br>:star:code
- A Fourier-based Framework for Domain Generalization<br>:open_mouth:oral:star:code
- Adversarially Adaptive Normalization for Single Domain Generalization
- Generalization on Unseen Domains via Inference-Time Label-Preserving Target Projections
- Uncertainty-Guided Model Generalization to Unseen Domains<br>:star:code
- Open Domain Generalization with Domain-Augmented Meta-Learning
- 零样本学习
- Goal-Oriented Gaze Estimation for Zero-Shot Learning<br>:star:code
- Contrastive Embedding for Generalized Zero-Shot Learning<br>:star:code
- Open World Compositional Zero-Shot Learning
- Learning Graph Embeddings for Compositional Zero-Shot Learning<br>:star:code
- Hardness Sampling for Self-Training Based Transductive Zero-Shot Learning<br>:star:code
- 域适应
- Dynamic Transfer for Multi-Source Domain Adaptation<br>:star:code
- Transferable Semantic Augmentation for Domain Adaptation<br>:star:code
- MetaAlign: Coordinating Domain Alignment and Classification for Unsupervised Domain Adaptation
- DRANet: Disentangling Representation and Adaptation Networks for Unsupervised Cross-Domain Adaptation
- Dynamic Domain Adaptation for Efficient Inference<br>:star:code
- Prototypical Cross-domain Self-supervised Learning for Few-shot Unsupervised Domain Adaptation<br>:house:project
- Domain Consensus Clustering for Universal Domain Adaptation<br>:star:code
- Divergence Optimization for Noisy Universal Domain Adaptation
- Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation<br>:star:code:house:project
- Unsupervised Multi-source Domain Adaptation Without Access to Source Data
- Domain Adaptation with Auxiliary Target Domain-Oriented Classifier<br>:star:code
- Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation
- Generalized Domain Adaptation<br>:star:code
- Multi-Target Domain Adaptation with Collaborative Consistency Learning<br>:star:code
- Wasserstein Barycenter for Multi-Source Domain Adaptation
- Conditional Bures Metric for Domain Adaptation
- Partial Feature Selection and Alignment for Multi-Source Domain Adaptation
- Transferable Query Selection for Active Domain Adaptation<br>:star:code
- Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation<br>:star:code
- 无监督域适应
- Cross-Domain Gradient Discrepancy Minimization for Unsupervised Domain Adaptation<br>:star:code
- Instance Level Affinity-Based Transfer for Unsupervised Domain Adaptation<br>:star:code
- Learning to Relate Depth and Semantics for Unsupervised Domain Adaptation<br>:star:code
- PixMatch: Unsupervised Domain Adaptation via Pixelwise Consistency Training
- FixBi: Bridging Domain Spaces for Unsupervised Domain Adaptation<br>:star:code
- Dynamic Weighted Learning for Unsupervised Domain Adaptation
13.Image/Video Retrieval(图像/视频检索)
- Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
- Convolutional Hough Matching<br>:open_mouth:oral:house:project
- T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
- M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
- VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval<br>:star:code
- 图像检索
- Probabilistic Embeddings for Cross-Modal Retrieval<br>:star:code
- QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval
- More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval<br>:star:code
- StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval
- Prototype-supervised Adversarial Network for Targeted Attack of Deep Hashing<br>:star:code
- CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback<br>:star:code
- Efficient Object Embedding for Spliced Image Retrieval
- You See What I Want You To See: Exploring Targeted Black-Box Transferability Attack for Hash-Based Image Retrieval Systems
- 视频检索
- On Semantic Similarity in Video Retrieval<br>:star:code:house:project:tv:video
- 视觉搜索
- 跨模态检索
- 检索(三维形状检索和变形的联合学习)
12.Image Quality Assessment(图像质量评估)
- 图像恢复Image Restoration
- Multi-Stage Progressive Image Restoration<br>:star:code<br>
- See through Gradients: Image Batch Recovery via GradInversion
- Controllable Image Restoration for Under-Display Camera in Smartphones
- Zero-Shot Single Image Restoration Through Controlled Perturbation of Koschmieder's Model<br>:house:project
- High-Quality Stereo Image Restoration From Double Refraction
- Image Restoration for Under-Display Camera
- 漫画修复
- 去阴影Shadow Removal
- 去模糊Deblurring
- DeFMO: Deblurring and Shape Recovery of Fast Moving Objects<br>:star:code:tv:video<br>
- ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring
- Explore Image Deblurring via Blur Kernel Space
- Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes<br>:star:code
- Learning a Non-Blind Deblurring Network for Night Blurry Images
- Ultra-High-Definition Image Dehazing via Multi-Guided Bilateral Learning
- Test-Time Fast Adaptation for Dynamic Scene Deblurring via Meta-Auxiliary Learning
- Blind Deblurring for Saturated Images
- Explore Image Deblurring via Encoded Blur Kernel Space<br>:star:code
- Learning Spatially-Variant MAP Models for Non-Blind Image Deblurring
- 去反射Reflection Removal
- 去雾
- Learning to Restore Hazy Video: A New Real-World Dataset and A New Method<br>学习复原有雾视频:一种新的真实数据集及算法<br>解读:9
- Contrastive Learning for Compact Single Image Dehazing<br>:star:code<br>基于对比学习的紧凑图像去雾方法<br>解读:5
- PSD: Principled Synthetic-to-Real Dehazing Guided by Physical Priors<br>:star:code
- 去噪Denoising
- Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images<br>:star:code<br>解读:CVPR 2021 | Neighbor2Neighbor:仅需噪声图像即可训练任意降噪网络的方法
- NBNet: Noise Basis Learning for Image Denoising with Subspace Projection<br>:star:code<br>粗解:9
- Invertible Denoising Network: A Light Solution for Real Noise Removal<br>:star:code
- FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise<br>:star:code
- Recorrupted-to-Recorrupted: Unsupervised Deep Learning for Image Denoising
- The Neural Tangent Link Between CNN Denoisers and Non-Local Filters<br>:star:code
- Deep Denoising of Flash and No-Flash Pairs for Photography in Low-Light Environments<br>:house:project
- Adaptive Consistency Prior Based Deep Network for Image Denoising
- EventZoom: Learning To Denoise and Super Resolve Neuromorphic Events<br>:house:project:tv:video
- Extreme Low-Light Environment-Driven Image Denoising Over Permanently Shadowed Lunar Regions With a Physical Noise Model
- Guided Integrated Gradients: An Adaptive Path Method for Removing Noise
- Effective Snapshot Compressive-Spectral Imaging via Deep Denoising and Total Variation Priors<br>:star:code
- Deep Convolutional Dictionary Learning for Image Denoising<br>:star:code
- Learning An Explicit Weighting Scheme for Adapting Complex HSI Noise
- Pseudo 3D Auto-Correlation Network for Real Image Denoising
- 去雨Deraining
- Semi-Supervised Video Deraining with Dynamic Rain Generator
- Closing the Loop: Joint Rain Generation and Removal via Disentangled Image Translation
- Robust Representation Learning With Feedback for Single Image Deraining<br>:star:code
- Multi-Decoding Deraining Network and Quasi-Sparsity Based Training
- Image De-Raining via Continual Learning
- From Rain Generation to Rain Removal<br>:star:code
- Memory Oriented Transfer Learning for Semi-Supervised Image Deraining
- Removing Raindrops and Rain Streaks in One Go
- 控制雨量
- 曝光校正
- 图像修复Image Inpainting
- Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE<br>:star:code
- TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations<br>:house:project
- Image Inpainting with External-internal Learning and Monochromic Bottleneck<br>:star:code
- PD-GAN: Probabilistic Diverse GAN for Image Inpainting<br>:star:code
- Image Inpainting Guided by Coherence Priors of Semantics and Textures
- 图像编辑
- 图像压缩
- Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton
- Slimmable Compressive Autoencoders for Practical Neural Image Compression<br>:star:code
- Checkerboard Context Model for Efficient Learned Image Compression
- Learning Scalable ℓ∞-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression<br>:star:code
- Deep Homography for Efficient Stereo Image Compression<br>:star:code<br>分享会
- iVPF: Numerical Invertible Volume Preserving Flow for Efficient Lossless Compression
- What's in the Image? Explorable Decoding of Compressed Images
- Learning Scalable lY=-Constrained Near-Lossless Image Compression via Joint Lossy Image and Residual Compression
- Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation
- de-rendering
- 消除图像伪影
- 图像对齐
- 图像和谐化
- 图像增强
- CAMERAS: Enhanced Resolution and Sanity Preserving Class Activation Mapping for Image Saliency<br>:star:code
- Retinex-Inspired Unrolling With Cooperative Prior Architecture Search for Low-Light Image Enhancement<br>:star:code:house:project
- Debiased Subjective Assessment of Real-World Image Enhancement
- Learning Temporal Consistency for Low Light Video Enhancement From Single Images<br>:star:code
- Image Stabilization防抖
- 去散焦模糊
- 去遮挡
- 增强夜间可视度
- 图像补全
- image steganography(图片隐写术)
- Image Blending
- 图像矫正
- Defocus Blur Detection(检测由散焦导致的模糊区域)
- 场景恢复(不同天气、成像)
- Image cropping(图片裁剪)
- Image Stitching(图像拼接)
- 深度估计+图像修复
- Image extrapolation
- 图像编辑
- 图像质量
- HDR Deghosting(HDR去伪影)
- 图像增亮
- 图像降质
- DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows<br>:open_mouth:oral:star:code
- Specular highlight 检测与去除
11. Face(人脸技术)
- Towards High Fidelity Face Relighting with Realistic Shadows<br>:star:code
- IronMask: Modular Architecture for Protecting Deep Face Template
- Everything's Talkin': Pareidolia Face Reenactment<br>:star:code:house:project:tv:video
- 人脸识别
- A 3D GAN for Improved Large-pose Facial Recognition<br>
- When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework<br>:open_mouth:oral:star:code<br>
- MagFace: A Universal Representation for Face Recognition and Quality Assessment<br>:open_mouth:oral:star:code<br>人脸识别+质量,今年的Oral presentation。 代码待整理
- WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition<br>:house:project
- ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis<br>:open_mouth:oral:house:project:tv:video
- Spherical Confidence Learning for Face Recognition<br>:star:code<br>:open_mouth:oral<br>基于超球流形置信度学习的人脸识别
- CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement
- Cross-Domain Similarity Learning for Face Recognition in Unseen Domains
- HLA-Face: Joint High-Low Adaptation for Low Light Face Detection<br>:house:project
- FACESEC: A Fine-grained Robustness Evaluation Framework for Face Recognition Systems
- Dynamic Class Queue for Large Scale Face Recognition In the Wild<br>:star:code
- Consistent Instance False Positive Improves Fairness in Face Recognition<br>:star:code<br>基于实例误报一致性的人脸识别公平性提升方法<br>解读:7
- VirFace: Enhancing Face Recognition via Unlabeled Shallow Data
- Variational Prototype Learning for Deep Face Recognition
- Mitigating Face Recognition Bias via Group Adaptive Classifier<br>:star:code
- Pseudo Facial Generation With Extreme Poses for Face Recognition
- Improving Transferability of Adversarial Patches on Face Recognition With Generative Models
- Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition
- 合成人脸(Deepfake/Face Forgery)检测
- Multi-attentional Deepfake Detection<br>:star:code
- Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection
- MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes
- Face Forensics in the Wild<br>:open_mouth:oral:star:code
- Improving the Efficiency and Robustness of Deepfakes Detection through Precise Geometric Features<br>:star:code
- Lips Don't Lie: A Generalisable and Robust Approach To Face Forgery Detection
- Representative Forgery Mining for Fake Face Detection<br>:star:code
- Exploring Adversarial Fake Images on Face Manifold
- Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain
- Generalizing Face Forgery Detection With High-Frequency Features
- Face Forgery Detection by 3D Decomposition
- 人脸质量评估
- SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance<br>:star:code<br>基于相似度分布距离的无监督人脸质量评估<br>解读:6
- 3D人脸重建
- 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction<br>:star:code:house:project
- Riggable 3D Face Reconstruction via In-Network Optimization<br>:star:code<br>本文通过一个嵌入了网络内优化的端到端可训练网络,解决了从单目 RGB 图像中 riggable 3D 人脸重建。并且达到了最先进的重建精度,合理的鲁棒性和泛化能力,可以应用于标准的 face rig 应用,如重定位。
- Pixel Codec Avatars<br>:open_mouth:oral
- Inverting Generative Adversarial Renderer for Face Reconstruction<br>:star:code<br>解读:商汤、港中文实现单目人脸重建新突破: 基于生成网络的渲染器!几何形状更精准!渲染效果更真实!
- Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection<br>在开放的人像集合中学习3D人脸的聚合与特异化重建<br>:open_mouth:oral:star:code
- Monocular Reconstruction of Neural Face Reflectance Fields<br>:house:project
- Learning Complete 3D Morphable Face Models From Images and Videos<br>:house:project
- 人脸表情识别
- Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition<br>
- Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition
- Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition
- Learning a Facial Expression Embedding Disentangled from Identity
- 人脸聚类
- 人脸编辑
- 人脸跟踪
- 广角人脸矫正
- 人脸活体检测
- 音频驱动合成赋有情感的人脸
- Audio-Driven Emotional Video Portraits<br>:star:code:house:project
- 换脸
- 人脸修复
- FaceInpainter: High Fidelity Face Adaptation to Heterogeneous Domains<br>分享会
- Progressive Semantic-Aware Style Transformation for Blind Face Restoration<br>:star:code
- GAN Prior Embedded Network for Blind Face Restoration in the Wild<br>:star:code
- Towards Real-World Blind Face Restoration With Generative Facial Prior<br>:star:code
- 人脸动画
- 3D Talking Faces
- 人脸认证
- 人脸纹理补全
- OSTeC: One-Shot Texture Completion<br>:star:code
- 人脸对齐
- 人脸老龄化
- Facial Action Unit Detection(面部运动单元检测)
- 人脸重建
- Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction<br>:open_mouth:oral:star:code:house:project:tv:video
- 人脸属性识别
- 人脸模糊化
- 人脸生成
10.Neural Architecture Search(神经架构搜索)
- AttentiveNAS: Improving Neural Architecture Search via Attentive<br>
- HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens<br>:star:code
- ReNAS: Relativistic Evaluation of Neural Architecture Search<br>
- OPANAS: One-Shot Path Aggregation Network Architecture Search for Object
- Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search<br>北京大学人工智能研究院机器学习研究中心
- Contrastive Neural Architecture Search with Neural Architecture Comparators<br>:star:code
- Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator<br>:star:code
- Prioritized Architecture Sampling with Monto-Carlo Tree Search<br>:star:code
- One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking<br>:star:code
- NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network Training and Architecture Optimization<br>:house:project
- Neural Architecture Search with Random Labels<br>粗解:1<br>解读:基于随机标签的神经架构搜索
- Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search<br>:star:code
- ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
- TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search<br>:star:code:sunflower:dataset
- HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers<br>:open_mouth:oral:star:code
- DOTS: Decoupling Operation and Topology in Differentiable Architecture Search<br>:star:code
- NPAS: A Compiler-Aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration
- DSRNA: Differentiable Search of Robust Neural Architectures
- Rethinking Graph Neural Architecture Search From Message-Passing<br>:star:code
- FP-NAS: Fast Probabilistic Neural Architecture Search
- FBNetV3: Joint Architecture-Recipe Search Using Predictor Pretraining
- AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling<br>:star:code
9.Object Tracking(目标跟踪)
- Rotation Equivariant Siamese Networks for Tracking<br>:star:code
- LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search<br>:star:code<br>LightTrack:用神经架构搜索得到的轻量级跟踪网络,精度超过SiamRPN++ 和 Ocean,速度快12倍,参数量只有1/13,Flops仅有1/38。代码将开源。
- Track, Check, Repeat: An EM Approach to Unsupervised Tracking<br>:house:project:tv:video
- Learning To Filter: Siamese Relation Network for Robust Tracking<br>:star:code
- Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation<br>:star:code
- CapsuleRRT: Relationships-Aware Regression Tracking via Capsules
- Siamese Natural Language Tracker: Tracking by Natural Language Descriptions With Siamese Trackers<br>:star:code
- MeanShift++: Extremely Fast Mode-Seeking With Applications to Segmentation and Object Tracking
- Learning To Fuse Asymmetric Feature Maps in Siamese Trackers<br>:star:code
- 多目标跟踪
- Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking<br>:star:code
- Track to Detect and Segment: An Online Multi-Object Tracker<br>:star:code:house:project:tv:video<br>TraDeS :CVPR 2021多目标跟踪算法,改进了目前联合检测与跟踪的在线方法,使用跟踪线索辅助检测,在多个数据集实现了大幅精度提升,作者来自纽约州立大学。代码已开源。
- Multiple Object Tracking with Correlation Learning<br>提出 CorrTracker,一个统一的关联跟踪器,可以密集建模目标之间的关联,并通过关联传递信息。在 MOT17 上获得最先进的 MOTA 76.5% 和 IDF1 73.6%。
- Learning a Proposal Classifier for Multiple Object Tracking<br>:star:code
- Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking<br>:star:code
- Online Multiple Object Tracking with Cross-Task Synergy<br>:star:code
- SiamMOT: Siamese Multi-Object Tracking<br>:star:code
- DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate Multi-Camera Multiple Object Tracking<br>:star:code
- Quasi-Dense Similarity Learning for Multiple Object Tracking<br>:open_mouth:oral:star:code
- Discriminative Appearance Modeling With Multi-Track Pooling for Real-Time Multi-Object Tracking<br>:star:code
- GMOT-40: A Benchmark for Generic Multiple Object Tracking<br>:star:code
- Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT Philosophy<br>:star:code
- Improving Multiple Object Tracking With Single Object Tracking
- 3D多目标跟踪
- 视觉目标跟踪
- 单目标跟踪
- 视觉跟踪
- STMTrack: Template-Free Visual Tracking With Space-Time Memory Networks<br>:star:code
- 姿势跟踪
- 行人跟踪
8.Image Segmentation(图像分割)
- Information-Theoretic Segmentation by Inpainting Error Maximization<br>
- Capturing Omni-Range Context for Omnidirectional Segmentation<br>:star:code
- Boundary IoU: Improving Object-Centric Image Segmentation Evaluation<br>:star:code:house:project
- Locate then Segment: A Strong Pipeline for Referring Image Segmentation
- InverseForm: A Loss Function for Structured Boundary-Aware Segmentation<br>:open_mouth:oral
- Omnimatte: Associating Objects and Their Effects in Video<br>:open_mouth:oral:house:project
- Unsupervised Part Segmentation through Disentangling Appearance and Shape
- Encoder Fusion Network With Co-Attention Embedding for Referring Image Segmentation
- Bottom-Up Shift and Reasoning for Referring Image Segmentation<br>:star:code
- DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation
- ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Image Segmentation
- DyStaB: Unsupervised Object Segmentation via Dynamic-Static Bootstrapping
- 实例分割
- Zero-Shot Instance Segmentation<br>:star:code<br>创新奇智首次提出零样本实例分割,助力解决工业场景数据瓶颈难题
- Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers<br>:star:code<br>解读:双图层实例分割,大幅提升遮挡处理性能
- Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency
- FAPIS: A Few-shot Anchor-free Part-based Instance Segmenter
- Weakly-supervised Instance Segmentation via Class-agnostic Learning with Salient Images<br>:star:code
- Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation<br>:star:code
- RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features<br>:star:code
- A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation
- Incremental Few-Shot Instance Segmentation<br>:star:code
- Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segmentation<br>:open_mouth:oral:star:code
- Point Cloud Instance Segmentation Using Probabilistic Embeddings<br>:house:project
- DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
- Robust Instance Segmentation Through Reasoning About Multi-Object Occlusion<br>:star:code
- Deeply Shape-Guided Cascade for Instance Segmentation<br>:star:code
- ColorRL: Reinforced Coloring for End-to-End Instance Segmentation<br>:star:code
- Unsupervised Discovery of the Long-Tail in Instance Segmentation Using Hierarchical Self-Supervision
- DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution<br>:star:code
- BoxInst: High-Performance Instance Segmentation With Box Annotations<br>:star:code
- 全景分割
- 4D Panoptic LiDAR Segmentation<br>:star:code
- Cross-View Regularization for Domain Adaptive Panoptic Segmentation<br>:open_mouth:oral<br>用于域自适应全景分割的跨视图正则化方法<br>
- Part-aware Panoptic Segmentation<br>:star:code
- Toward Joint Thing-and-Stuff Mining for Weakly Supervised Panoptic Segmentation<br>联合物体和物质挖掘的弱监督全景分割<br>解读:15
- Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation<br>:star:code
- Fully Convolutional Networks for Panoptic Segmentation<br>:open_mouth:oral:star:code<br>粗解:11
- Panoptic Segmentation Forecasting<br>:star:code
- Exemplar-Based Open-Set Panoptic Segmentation Network<br>:star:code:house:project
- Hierarchical Lovasz Embeddings for Proposal-free Panoptic Segmentation
- VIP-DeepLab: Learning Visual Perception With Depth-Aware Video Panoptic Segmentation<br>:star:code
- Learning To Associate Every Segment for Video Panoptic Segmentation
- LiDAR-Based Panoptic Segmentation via Dynamic Shifting Network<br>:star:code
- LPSNet: A Lightweight Solution for Fast Panoptic Segmentation
- Improving Panoptic Segmentation at All Scales
- 语义分割
- PLOP: Learning without Forgetting for Continual Semantic Segmentation<br>:star:code
- Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation
- Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
- Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing<br>:open_mouth:oral:star:code
- Learning Statistical Texture for Semantic Segmentation
- MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation<br>:star:code<br>语义分割中的无监督域适应的域感知元损失校正
- Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations<br>:star:code:tv:video
- Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion<br>:star:code
- Rethinking BiSeNet For Real-time Semantic Segmentation<br>:star:code
- BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation<br>:star:code
- Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation<br>:star:code
- Cross-Dataset Collaborative Learning for Semantic Segmentation
- Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization
- Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation<br>:star:code
- Source-Free Domain Adaptation for Semantic Segmentation
- PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering<br>:star:code
- Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation<br>:house:project
- Progressive Semantic Segmentation<br>:star:code
- Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization<br>:house:project
- DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation<br>:open_mouth:oral:star:code<br>实现夜间语义分割最先进性能,已开源。
- Self-supervised Augmentation Consistency for Adapting Semantic Segmentation<br>:star:code
- Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation<br>:star:code
- Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision<br>:star:code
- Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation
- Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency
- Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation
- Uncertainty Reduction for Model Adaptation in Semantic Segmentation<br>:star:code
- HyperSeg: Patch-Wise Hypernetwork for Real-Time Semantic Segmentation<br>:star:code:house:project
- Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds
- Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation<br>:star:code
- Few-Shot 3D Point Cloud Semantic Segmentation<br>:star:code
- Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation<br>:star:code
- Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation<br>:star:code
- (AF)2-S3Net: Attentive Feature Fusion With Adaptive Feature Selection for Sparse Semantic Segmentation Network
- One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation
- Exploit Visual Dependency Relations for Semantic Segmentation
- Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs
- ABMDRNet: Adaptive-Weighted Bi-Directional Modality Difference Reduction Network for RGB-T Semantic Segmentation
- CGA-Net: Category Guided Aggregation for Point Cloud Semantic Segmentation
- 场景理解/场景解析
- Bidirectional Projection Network for Cross Dimension Scene Understanding<br>:open_mouth:oral:star:code
- RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening<br>:open_mouth:oral:star:code
- CoCoNets: Continuous Contrastive 3D Scene Representations<br>:house:project:tv:video<br>来自CMU的学者提出一种3D场景表示方法,利用自监督对比学习和输入的RGB与RGBD场景数据学习而来,这种特征表示方法在目标跟踪、检测等下游任务中表现出良好的性能。
- RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction
- 3D-to-2D Distillation for Indoor Scene Parsing
- Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts
- 场景图合成/分析
- SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences<br>:house:project
- Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation<br>场景图生成---场景解析
- Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis<br>:house:project<br>利用面向边缘的推理进行基于3D点的场景图分析---场景理解
- Fully Convolutional Scene Graph Generation<br>:open_mouth:oral
- Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation<br>:star:code
- Linguistic Structures as Weak Supervision for Visual Scene Graph Generation<br>:star:code
- Energy-Based Learning for Scene Graph Generation<br>:star:code
- 3D 场景理解
- Holistic 3D Scene Understanding From a Single Image With Implicit Representation<br>:star:code:house:project:tv:video
- Monte Carlo Scene Search for 3D Scene Understanding<br>:house:project:tv:video
- Exploring Data Efficient 3D Scene Understanding with Contrastive Scene Contexts<br>:open_mouth:oral:house:project:tv:video
- 抠图
- Real-Time High Resolution Background Matting<br>:star:code:house:project:tv:video<br>最新开源抠图技术,实时快速高分辨率,4k(30fps)、现代GPU(60fps)<br>解读:单块GPU实现4K分辨率每秒30帧,华盛顿大学实时视频抠图再升级,毛发细节到位<br>最新开源抠图技术,实时快速高分辨率,4k(30fps)、现代GPU(60fps)
- Mask Guided Matting via Progressive Refinement Network<br>:star:code
- Semantic Image Matting<br>:star:code
- Improved Image Matting via Real-Time User Clicks and Uncertainty Estimation<br>:tv:video
- Learning Affinity-Aware Upsampling for Deep Image Matting
- 雷达分割
- Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation<br>:open_mouth:oral:star:code<br>在 SemanticKITTI 榜单排名第一(until CVPR DDL),在 nuScenes 中获得 SOTA,并对其他基于激光雷达的任务保持了良好的泛化能力,包括激光雷达全景分割和激光雷达三维检测,其中就基于此工作,在 SemanticKITTI 全景分割榜单也排名第一。
- 视频目标分割
- Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild<br>:star:code
- Efficient Regional Memory Network for Video Object Segmentation<br>:star:code:house:project
- Learning Position and Target Consistency for Memory-based Video Object Segmentation<br>在 DAVIS 和 Youtube-VOS 基准上都达到了最先进的性能,并在 DAVIS 2020 挑战半监督 VOS 任务中排名第一。
- Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps<br>:open_mouth:oral:star:code
- Reciprocal Transformations for Unsupervised Video Object Segmentation<br>:star:code
- Delving Deep Into Many-to-Many Attention for Few-Shot Video Object Segmentation<br>:star:code
- Video Object Segmentation Using Global and Instance Embedding Learning
- SwiftNet: Real-Time Video Object Segmentation<br>:star:code
- SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation<br>:star:code
- Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion<br>:star:code:house:project:tv:video
- Learning Dynamic Network Using a Reuse Gate Function in Semi-Supervised Video Object Segmentation<br>:star:code
- point set tracking
- 视频多目标分割
- 视频实例分割
- SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation<br>:star:code:tv:video<br>文章介绍一个简单有效的单阶段框架:SG-Net,与传统的两阶段框架相比,可以有效提高掩码质量和推理速度。
- Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation<br>:star:code
- 小样本分割
- 伪装目标分割
- 视频抠图
- 点云分割
- 语义部分分割
- Repurposing GANs for One-Shot Semantic Part Segmentation<br>:open_mouth:oral:house:project
- 镜像分割
- Depth-Aware Mirror Segmentation<br>:house:project:tv:video
- 运动分割
- 细粒度分割
7.Object Detection(目标检测)
- Multiple Instance Active Learning for Object Detection<br>:star:code<br>
- Positive-Unlabeled Data Purification in the Wild for Object Detection<br>
- Depth from Camera Motion and Object Detection<br>:star:github:tv:video<br>通过使用“普通手机摄像头运动+目标检测的包围框”数据,设计RNN网络实现了达到最先进精度的目标深度估计。<br>
- Towards Open World Object Detection<br>:open_mouth:oral:star:code<br>
- General Instance Distillation for Object Detection<br>近年来,知识蒸馏已被证明是模型压缩的有效解决方案。可以使轻量级的学生模型获得从繁琐的教师模型中提取的知识,但以往的检测蒸馏方法对于不同的检测框架的泛化能力较弱,而且严重依赖ground truth(GT),忽略了实例之间有价值的关系信息。为此,作者在本文中提出新的基于判别性实例的检测任务蒸馏方法,不考虑 GT 区分的正负,命名为通用实例蒸馏(GID)。该方法包含一个通用实例选择模块(GISM),可以充分利用基于特征、基于关系和基于响应的知识进行蒸馏。实验验证,学生模型在各种检测框架中可以实现显著的 AP 改进,甚至优于教师模型。具体来说,RetinaNet 与 ResNet-50 在 COCO 数据集上用 GID 实现了39.1% 的 mAP,比基线 36.2% 超出了 2.9%,甚至优于基于 ResNet-101 的教师模型 38.1% 的 AP。<br>
- MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection<br>
- Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection<br>:open_mouth:oral
- You Only Look One-level Feature<br>:star:code<br>开源 YOLOF,无需 FPN,速度比 YOLOv4 快13%<br>解读:目标检测算法YOLOF:You Only Look One-level Feature
- Sparse R-CNN: End-to-End Object Detection with Learnable Proposals<br>:star:code
- End-to-End Object Detection with Fully Convolutional Network<br>:star:code<br>解读:丢弃Transformer,FCN也可以实现E2E检测
- Robust and Accurate Object Detection via Adversarial Learning
- Distilling Object Detectors via Decoupled Features<br>:star:code
- OTA: Optimal Transport Assignment for Object Detection<br>:star:code
- Scale-aware Automatic Augmentation for Object Detection<br>:star:code
- A Closer Look at Fourier Spectrum Discrepancies for CNN-generated Images Detection<br>:open_mouth:oral:house:project
- IQDet: Instance-wise Quality Distribution Sampling for Object Detection<br>粗解:20
- Domain-Specific Suppression for Adaptive Object Detection
- PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
- Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
- Dynamic Head: Unifying Object Detection Heads with Attentions<br>:star:code:tv:video
- Open-Vocabulary Object Detection Using Captions<br>:open_mouth:oral:star:code
- MobileDets: Searching for Object Detection Architectures for Mobile Accelerators<br>:star:code
- Layer-Wise Searching for 1-Bit Detectors
- OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection<br>:star:code
- GAIA: A Transfer Learning System of Object Detection That Fits Your Needs<br>:star:code
- DetectoRS: Detecting Objects With Recursive Feature Pyramid and Switchable Atrous Convolution<br>:star:code
- RankDetNet: Delving Into Ranking Constraints for Object Detection
- AQD: Towards Accurate Quantized Object Detection<br>:open_mouth:oral:star:code
- Class-Aware Robust Adversarial Training for Object Detection
- Scaled-YOLOv4: Scaling Cross Stage Partial Network
- Improved Handling of Motion Blur in Online Object Detection<br>:house:project
- The Translucent Patch: A Physical and Universal Attack on Object Detectors<br>:tv:video
- Unbiased Mean Teacher for Cross-Domain Object Detection<br>:star:code
- Interpolation-Based Semi-Supervised Learning for Object Detection<br>:star:code
- Neural Auto-Exposure for High-Dynamic Range Object Detection
- Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
- Black-Box Explanation of Object Detectors via Saliency Maps<br>:open_mouth:oral:house:project:tv:video
- 小样本目标检测
- Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection<br>首个研究少样本检测任务的语义关系推理,并证明它可提升强基线的潜。 <br>
- Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection<br>:star:code<br>北京大学人工智能研究院机器学习研究中心<br>
- FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding<br>:star:code
- Generalized Few-Shot Object Detection without Forgetting<br>粗解:16
- Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss
- Hallucination Improves Few-Shot Object Detection<br>:star:code
- Few-Shot Object Detection via Classification Refinement and Distractor Retreatment
- Transformation Invariant Few-Shot Object Detection
- Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection<br>:star:code
- 多目标检测
- 3D目标检测
- Categorical Depth Distribution Network for Monocular 3D Object Detection<br>:open_mouth:oral:star:code
- 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection<br>:star:code:house:project:tv:video<br>更多:CVPR 2021|利用IoU预测进行半监督式3D目标检测
- Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection<br>:star:code
- M3DSSD: Monocular 3D Single Stage Object Detector<br>:star:code
- GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection<br>:star:code:tv:video<br>提出并集成 GrooMeD-NMS,用于单目3D目标检测。解决了训练和推理管道之间的不匹配问题,在 KITTI 基准数据集上实现最先进的单目3D目标检测结果,表现与基于单目视频的方法相当。
- LiDAR R-CNN: An Efficient and Universal 3D Object Detector<br>:star:code
- Delving into Localization Errors for Monocular 3D Object Detection<br>:star:code
- HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection<br>:house:project
- Objects are Different: Flexible Monocular 3D Object Detection<br>:star:code
- Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds<br>:star:code
- PointAugmenting: Cross-Modal Augmentation for 3D Object Detection<br>分享会
- SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud<br>:star:code<br>提出 Self-Ensembling Single-Stage object Detector(SE-SSD),用于在室外点云中进行准确和有效的 3D 目标检测。关键在于利用 soft 和 hard targets 与所制定的约束条件来共同优化模型,而不在推理中引入额外的计算。与之前的所有作品相比,SE-SSD 达到了顶级性能。此外,它在 KITTI 基准中的汽车检测中获得了最高的精度(分别在 BEV 和 3D 排行榜上排名第一和第二),并具有超高的推理速度。
- Offboard 3D Object Detection From Point Cloud Sequences
- Monocular 3D Object Detection: An Extrinsic Parameter Free Approach
- SRDAN: Scale-Aware and Range-Aware Domain Adaptation Network for Cross-Dataset 3D Object Detection
- PVGNet: A Bottom-Up One-Stage 3D Object Detector With Integrated Multi-Level Features
- MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation<br>:star:code
- LiDAR-Aug: A General Rendering-Based Augmentation Framework for 3D Object Detection
- ST3D: Self-Training for Unsupervised Domain Adaptation on 3D Object Detection<br>:star:code
- RangeIoUDet: Range Image Based Real-Time 3D Object Detector Optimized by Intersection Over Union
- Center-Based 3D Object Detection and Tracking<br>:star:code
- 3D Object Detection with Pointformer<br>:star:code
- To the Point: Efficient 3D Object Detection in the Range Image With Graph Convolution Kernels
- RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection
- 3D-MAN: 3D Multi-Frame Attention Network for Object Detection
- 旋转目标检测
- 弱监督目标定位
- 密集目标检测
- Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection<br>:star:code<br>解读:目标检测无痛涨点之 Generalized Focal Loss V2
- VarifocalNet: An IoU-Aware Dense Object Detector<br>:open_mouth:oral:star:code
- Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection
- 显著目标检测
- Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion<br>:open_mouth:oral
- Weakly Supervised Video Salient Object Detection<br>:star:code
- Uncertainty-aware Joint Salient Object and Camouflaged Object Detection<br>:star:code
- Calibrated RGB-D Salient Object Detection<br>:star:code
- From Semantic Categories to Fixations: A Novel Weakly-Supervised Visual-Auditory Saliency Detection Approach<br>:star:code
- co-saliency detection(协同显著目标检测)
- 半监督目标检测
- Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection
- Points As Queries: Weakly Semi-Supervised Object Detection by Points<br>粗解:6
- Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection
- Humble Teachers Teach Better Students for Semi-Supervised Object Detection
- 长尾目标检测
- 单阶目标检测
- 阴影检测
- Triple-Cooperative Video Shadow Detection<br>:star:code
- Single-Stage Instance Shadow Detection with Bidirectional Relation Learning<br>:open_mouth:oral:star:code
- 无监督目标检测
- 域适应目标检测
- glass surface detection
- 伪装物体检测
- Any-Shot目标检测
6.Data Augmentation(数据增广)
- SuperMix: Supervising the Mixing Data Augmentation<br>:star:code
- On Feature Normalization and Data Augmentation<br>:star:code
- StyleMix: Separating Content and Style for Enhanced Data Augmentation<br>:star:code
5.Anomaly Detection(异常检测)
- Multiresolution Knowledge Distillation for Anomaly Detection<br>:star:code
- PANDA: Adapting Pretrained Features for Anomaly Detection and Segmentation<br>:star:code
- Glancing at the Patch: Anomaly Localization with Global and Local Feature Comparison
- 驾驶场景下的像素异常检测
4.Weakly Supervised/Semi-Supervised/Self-supervised/Unsupervised Learning(自/半/弱监督学习)
- 弱监督
- Weakly Supervised Learning of Rigid 3D Scene Flow<br>:open_mouth:oral:star:code:house:project<br>
- Relation-aware Instance Refinement for Weakly Supervised Visual Grounding<br>:star:code
- 半监督
- Adaptive Consistency Regularization for Semi-Supervised Transfer Learning<br>:star:code<br>
- SSLayout360: Semi-Supervised Indoor Layout Estimation from 360∘ Panorama
- CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning<br>:star:code
- AlphaMatch: Improving Consistency for Semi-Supervised Learning With Alpha-Divergence
- 自监督
- Self-supervised Geometric Perception<br>:open_mouth:oral:star:code<br>作者称 SGP 是第一个在几何感知中进行特征学习的通用框架,不需要任何来自 ground-truth 几何标签的监督。SGP以EM方式运行,它迭代执行几何模型的鲁棒估计以生成伪标签,并在噪声伪标签的监督下进行特征学习。将 SGP 应用于相机姿势估计和点云配准,并证明在大规模真实数据集中,SGP 的性能等同于甚至优于监督的权威。
- Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting<br>:star:code
- Self-supervised Motion Learning from Static Images
- SOLD2: Self-supervised Occlusion-aware Line Description and Detection<br>:open_mouth:oral:star:code
- All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training<br>:star:code
- Global Transport for Fluid Reconstruction with Learned Self-Supervision<br>:open_mouth:oral:star:code
- Task Programming: Learning Data Efficient Behavior Representations<br>:open_mouth:oral:star:code:house:project
- Audio-Visual Instance Discrimination with Cross-Modal Agreement
- Safe Local Motion Planning With Self-Supervised Freespace Forecasting<br>:star:code
- Back to Event Basics: Self-Supervised Learning of Image Reconstruction for Event Cameras via Photometric Constancy<br>:star:code:house:project
- Exponential Moving Average Normalization for Self-Supervised and Semi-Supervised Learning
- How Well Do Self-Supervised Models Transfer?<br>:star:code
- The Lottery Tickets Hypothesis for Supervised and Self-Supervised Pre-Training in Computer Vision Models<br>:star:code
- OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning<br>:star:code
- SSLayout360: Semi-Supervised Indoor Layout Estimation From 360deg Panorama
- Instance Localization for Self-supervised Detection Pretraining<br>:star:code<br>
- CASTing Your Model: Learning to Localize Improves Self-Supervised Representations<br>:star:code
- Self-supervised Motion Learning from Static Images
- SPSG: Self-Supervised Photometric Scene Generation From RGB-D Scans
- SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning
- 无监督
- Unsupervised Visual Representation Learning by Tracking Patches in Video<br>:star:code
- SMURF: Self-Teaching Multi-Frame Unsupervised RAFT with Full-Image Warping<br>:star:code
- PAUL: Procrustean Autoencoder for Unsupervised Lifting
- Progressive Stage-Wise Learning for Unsupervised Feature Representation Enhancement
- VDSM: Unsupervised Video Disentanglement With State-Space Modeling and Deep Mixtures of Experts
- Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination<br>:star:code
- Recurrent Multi-View Alignment Network for Unsupervised Surface Registration<br>:star:code
- Feature-Level Collaboration: Joint Unsupervised Learning of Optical Flow, Stereo Depth and Camera Motion
3.Point Cloud(点云)
- Style-based Point Generator with Adversarial Rendering for Point Cloud Completion<br>
- MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization<br>:open_mouth:oral:star:code
- TPCN: Temporal Point Cloud Networks for Motion Forecasting<br>用于运动预测的时空点云网络<br>
- How Privacy-Preserving are Line Clouds? Recovering Scene Details from 3D Lines<br>:star:code
- PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds<br>:star:code
- Point2Skeleton: Learning Skeletal Representations from Point Clouds<br>:open_mouth:oral:star:code:house:project
- FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
- RPSRNet: End-to-End Trainable Rigid Point Set Registration Network using Barnes-Hut 2D-Tree Representation
- Point Cloud Upsampling via Disentangled Refinement<br>:star:code
- Regularization Strategy for Point Cloud via Rigidly Mixed Sample<br>:star:code
- Verifiability and Predictability: Interpreting Utilities of Network Architectures for Point Cloud Processing
- Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos
- PointNetLK Revisited<br>:open_mouth:oral:star:code
- PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds<br>:star:code
- 点云配准
- PREDATOR: Registration of 3D Point Clouds with Low Overlap<br>:open_mouth:oral:star:code:house:project<br>
- SpinNet: Learning a General Surface Descriptor for 3D Point Cloud Registration<br>:star:code
- Robust Point Cloud Registration Framework Based on Deep Graph Matching<br>:star:code
- PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency<br>:star:code
- ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning<br>:star:code
- DeepI2P: Image-to-Point Cloud Registration via Deep Classification<br>:star:code
- StickyPillars: Robust and Efficient Feature Matching on Point Clouds Using Graph Neural Networks
- UnsupervisedR&R: Unsupervised Point Cloud Registration via Differentiable Rendering<br>:open_mouth:oral:star:code
- 点云补全
- Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding<br>:star:code
- Denoise and Contrast for Category Agnostic Shape Completion<br>:star:code
- Variational Relational Point Completion Network<br>:open_mouth:oral:star:code:house:project
- Unsupervised 3D Shape Completion through GAN Inversion<br>:star:code:house:project
- PMP-Net: Point Cloud Completion by Learning Multi-Step Point Moving Paths<br>:star:code
- View-Guided Point Cloud Completion
- 点云关键点检测
- 3D点云
- Diffusion Probabilistic Models for 3D Point Cloud Generation<br>:open_mouth:oral:star:code<br>
- PointGuard: Provably Robust 3D Point Cloud Classification
- Equivariant Point Network for 3D Point Cloud Analysis<br>:star:code
- CorrNet3D: Unsupervised End-to-End Learning of Dense Correspondence for 3D Point Clouds<br>:star:code
- Self-Supervised Learning on 3D Point Clouds by Learning Discrete Generative Models
- PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization<br>:star:code
- 3D点云生成
- 点云压缩
- 点云识别
- 点云分割
2.Graph Neural Networks(图卷积网络GNN、GCN、GMN)
- Sequential Graph Convolutional Network for Active Learning<br>
- Quantifying Explainers of Graph Neural Networks in Computational Pathology<br>:star:code
- Binary Graph Neural Networks<br>:star:code
- Amalgamating Knowledge from Heterogeneous Graph Neural Networks
- GCN
- SelfSAGCN: Self-Supervised Semantic Alignment for Graph Convolution Network
- Bi-GCN: Binary Graph Convolutional Network
- PU-GCN: Point Cloud Upsampling Using Graph Convolutional Networks<br>:star:code
- A Hyperbolic-to-Hyperbolic Graph Convolutional Network<br>:open_mouth:oral
- TSGCNet: Discriminative Geometric Feature Learning With Two-Stream Graph Convolutional Network for 3D Dental Model Segmentation
- Hierarchical Layout-Aware Graph Convolutional Network for Unified Aesthetics Assessment<br>:star:code
- Graph Matching Networks(GMN)
1.Unkown(未分类)
-
Reconsidering Representation Alignment for Multi-view Clustering<br>:star:code
-
Self-supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map<br>
-
Neural Geometric Level of Detail:Real-time Rendering with Implicit 3D Surfaces<br>:open_mouth:Oral:star:code:house:project<br>
-
Data-Free Model Extraction<br>:star:code<br>
-
Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning<br>:open_mouth:oral
-
PatchmatchNet: Learned Multi-View Patchmatch Stereo<br>:open_mouth:oral:star:code
-
Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning<br>:star:code:house:project<br>
-
Semantic Palette: Guiding Scene Generation with Class Proportions
-
Multi-Objective Interpolation Training for Robustness to Label Noise<br>:star:code
-
Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations<br>:star:code
-
Simpler Certified Radius Maximization by Propagating Covariances<br>:open_mouth:oral:star:code:tv:video
-
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food<br>:star:code
-
Discovering Hidden Physics Behind Transport Dynamics<br>:open_mouth:oral
-
Soft-IntroVAE: Analyzing and Improving the Introspective Variational Autoencoder<br>:open_mouth:oral:star:code:house:project
-
Deep Gradient Projection Networks for Pan-sharpening<br>:star:code
-
Consensus Maximisation Using Influences of Monotone Boolean Functions<br>:open_mouth:oral:star:code
- Forecasting Irreversible Disease via Progression Learning
- Causal Hidden Markov Model for Time Series Disease Forecasting<br>:star:code:house:project
- Knowledge Evolution in Neural Networks<br>:open_mouth:oral:star:code
- RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words<br>:star:code<br>RSTNet: 基于可区分视觉词和非视觉词的自适应注意力机制的图像描述生成模型<br>解读:14
- Removing the Background by Adding the Background: Towards a Background Robust Self-supervised Video Representation Learning<br>通过添加背景来去除背景影响:背景鲁棒的自监督视频表征学习<br>解读:11
- Representative Batch Normalization with Feature Calibration<br>:open_mouth:oral:star:code:house:project<br>作者主页<br>基于特征校准的表征批规范化方法解读:4
- Involution: Inverting the Inherence of Convolution for Visual Recognition<br>:star:code<br>解读:CVPR'21 | involution:超越convolution和self-attention的神经网络新算子
- Spatially Consistent Representation Learning<br>:star:code
- Limitations of Post-Hoc Feature Alignment for Robustness<br>:star:code
- AutoDO: Robust AutoAugment for Biased Data with Label Noise via Scalable Probabilistic Implicit Differentiation<br>:star:code
- Augmentation Strategies for Learning with Noisy Labels<br>:star:code
- PGT: A Progressive Method for Training Models on Long Videos<br>:open_mouth:oral:star:code
- Generic Perceptual Loss for Modeling Structured Output Dependencies
- Masksembles for Uncertainty Estimation<br>:star:code:house:project
- Student-Teacher Learning from Clean Inputs to Noisy Inputs
- Scene-Intuitive Agent for Remote Embodied Visual Grounding
- Meta-Mining Discriminative Samples for Kinship Verification<br>
- Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression<br>:star:code:tv:video<br>论文公开
- Diverse Branch Block: Building a Convolution as an Inception-like Unit<br>:star:code
- OTCE: A Transferability Metric for Cross-Domain Cross-Task Representations
- Disentangled Cycle Consistency for Highly-realistic Virtual Try-On<br>:star:code
- Stylized Neural Painting<br>:star:code:house:project:tv:video<br>风格化的神经绘画,Stylized Neural Painting,提出 image-to-painting 翻译方法,生成生动逼真、风格可控的绘画艺术作品
- Confluent Vessel Trees with Accurate Bifurcations<br>:star:code
- Repopulating Street Scenes
- Can We Characterize Tasks Without Labels or Features?<br>:star:code
- Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding
- Online Learning of a Probabilistic and Adaptive Scene Representation
- Generative Modelling of BRDF Textures from Flash Images<br>:star:code:house:project
- PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting<br>:house:project<br>作者发明的逆向渲染算法PhySG,可以从一组RGB输入图像中重建物体几何图形、材质和光照,全程端到端运行。
- Self-supervised Video Representation Learning by Context and Motion Decoupling
- Dynamic Region-Aware Convolution<br>粗解:14
- Meta Pseudo Labels<br>:star:code:tv:video
- PQA: Perceptual Question Answering
- CondenseNet V2: Sparse Feature Reactivation for Deep Networks<br>:star:code
- CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching<br>:star:code
- Neural Camera Simulators<br>:star:code
- Simpler Certified Radius Maximization by Propagating Covariances<br>:open_mouth:oral:star:code:tv:video
- Lighting, Reflectance and Geometry Estimation from 360∘ Panoramic Stereo<br>:star:code
- MetricOpt: Learning to Optimize Black-Box Evaluation Metrics<br>:open_mouth:oral
- Deep Stable Learning for Out-Of-Distribution<br>分享会
- Learning a Self-Expressive Network for Subspace Clustering<br>分享会
- Heterogeneous Grid Convolution for Adaptive, Efficient, and Controllable Computation
- Extreme Rotation Estimation using Dense Correlation Volumes<br>:house:project
- Decoupled Dynamic Filter Networks<br>:house:project:tv:video
- MongeNet: Efficient Sampler for Geometric Deep Learning<br>:star:code:house:project:tv:video
- Multi-Perspective LSTM for Joint Visual Representation Learning<br>:star:code
- Quantum Permutation Synchronization
- A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts
- DriveGAN: Towards a Controllable High-Quality Neural Simulation<br>:open_mouth:oral
- Faster Meta Update Strategy for Noise-Robust Deep Learning<br>:star:code
- NeRD: Neural 3D Reflection Symmetry Detector<br>:star:code
- SSAN: Separable Self-Attention Network for Video Representation Learning
- Scene-aware Generative Network for Human Motion Synthesis
- Stochastic Whitening Batch Normalization
- CLCC: Contrastive Learning for Color Constancy<br>:star:code
- Magic Layouts: Structural Prior for Component Detection in User Interface Designs
- GIRAFFE: Representing Scenes As Compositional Generative Neural Feature Fields<br>:open_mouth:oral:star:code:house:project
- Polygonal Building Extraction by Frame Field Learning<br>:star:code
- MP3: A Unified Model To Map, Perceive, Predict and Plan
- NewtonianVAE: Proportional Control and Goal Identification From Pixels via Physical Latent Spaces
- Fast End-to-End Learning on Protein Surfaces
- Flow Guided Transformable Bottleneck Networks for Motion Retargeting
- Polka Lines: Learning Structured Illumination and Reconstruction for Active Stereo
- Patch2Pix: Epipolar-Guided Pixel-Level Correspondences<br>:star:code:tv:video
- Pixel-Aligned Volumetric Avatars
- Learnable Motion Coherence for Correspondence Pruning<br>:star:code:house:project
- DualGraph: A Graph-Based Method for Reasoning About Label Noise
- Automatic Correction of Internal Units in Generative Neural Networks
- Adaptive Rank Estimate in Robust Principal Component Analysis
- Cluster-Wise Hierarchical Generative Model for Deep Amortized Clustering
- 3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding
- Ranking Neural Checkpoints
- On Focal Loss for Class-Posterior Probability Estimation: A Theoretical Perspective
- Learning Deep Latent Variable Models by Short-Run MCMC Inference With Optimal Transport Correction
- Learning the Best Pooling Strategy for Visual Semantic Embedding<br>:star:code:house:project
- Backdoor Attacks Against Deep Learning Systems in the Physical World
- Relevance-CAM: Your Model Already Knows Where To Look<br>:star:code
- On Robustness and Transferability of Convolutional Neural Networks
- Square Root Bundle Adjustment for Large-Scale Reconstruction<br>:house:project:tv:video
- Crossing Cuts Polygonal Puzzles: Models and Solvers
- Sparse Multi-Path Corrections in Fringe Projection Profilometry
- Understanding the Behaviour of Contrastive Loss
- Dual Contradistinctive Generative Autoencoder<br>:star:code
- Metadata Normalization<br>:star:code
- End-to-End Rotation Averaging With Multi-Source Propagation<br>:star:code
- UV-Net: Learning From Boundary Representations
- Mixed-Privacy Forgetting in Deep Networks
- Double Low-Rank Representation With Projection Distance Penalty for Clustering
- Lighting, Reflectance and Geometry Estimation From 360deg Panoramic Stereo
- Building Reliable Explanations of Unreliable Neural Networks: Locally Smoothing Perspective of Model Interpretation
- DAT: Training Deep Networks Robust To Label-Noise by Matching the Feature Distributions<br>:star:code
- End-to-End High Dynamic Range Camera Pipeline Optimization
- Dual-GAN: Joint BVP and Noise Modeling for Remote Physiological Measurement
- User-Guided Line Art Flat Filling With Split Filling Mechanism
- KSM: Fast Multiple Task Adaption via Kernel-Wise Soft Mask Learning
- Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization
- Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression<br>:star:code
- Group Whitening: Balancing Learning Efficiency and Representational Capacity<br>:star:code
- Privacy-Preserving Collaborative Learning With Automatic Transformation Search<br>:open_mouth:oral
- Post-Hoc Uncertainty Calibration for Domain Drift Scenarios<br>:star:code
- Efficient Initial Pose-Graph Generation for Global SfM<br>:star:code
- Spk2ImgNet: Learning To Reconstruct Dynamic Scene From Continuous Spike Stream
- A Dual Iterative Refinement Method for Non-Rigid Shape Matching<br>:star:code
- Improving Accuracy of Binary Neural Networks Using Unbalanced Activation Distribution
- Rotation-Only Bundle Adjustment<br>:star:code
- HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features<br>:star:code
- Cross-Iteration Batch Normalization<br>:star:code
- Multimodal Contrastive Training for Visual Representation Learning
- Spatially-Varying Outdoor Lighting Estimation From Intrinsics
- Personalized Outfit Recommendation With Learnable Anchors
- Architectural Adversarial Robustness: The Case for Deep Pursuit
- SetVAE: Learning Hierarchical Composition for Generative Modeling of Set-Structured Data<br>:star:code
- Truly shift-invariant convolutional neural networks<br>:star:code
- Scalable Differential Privacy With Sparse Network Finetuning
- OpenMix: Reviving Known Knowledge for Discovering Novel Visual Categories in an Open World
- Event-Based Bispectral Photometry Using Temporally Modulated Illumination
- Towards Extremely Compact RNNs for Video Recognition With Fully Decomposed Hierarchical Tucker Structure
- Enriching ImageNet With Human Similarity Judgments and Psychological Embeddings
- A Quasiconvex Formulation for Radial Cameras
- BRepNet: A Topological Message Passing System for Solid Models<br>:open_mouth:oral
- Exploiting & Refining Depth Distributions With Triangulation Light Curtains<br>:house:project:tv:video
- Multispectral Photometric Stereo for Spatially-Varying Spectral Reflectances: A Well Posed Problem?<br>:star:code
- SOON: Scenario Oriented Object Navigation With Graph-Based Exploration
- Mesoscopic Photogrammetry With an Unstabilized Phone Camera<br>:star:code
- Convolutional Hough Matching Networks<br>:open_mouth:oral:star:code:house:project
- Learned Initializations for Optimizing Coordinate-Based Neural Representations<br>:house:project:tv:video
- Patchwise Generative ConvNet: Training Energy-Based Models From a Single Natural Image for Internal Learning
- LQF: Linear Quadratic Fine-Tuning
- Positive-Congruent Training: Towards Regression-Free Model Updates
- Shape from Sky: Polarimetric Normal Recovery Under The Sky
- Orthogonal Over-Parameterized Training<br>:open_mouth:oral
- Optimal Gradient Checkpoint Search for Arbitrary Computation Graphs<br>:open_mouth:oral:star:code
- T-vMF Similarity for Regularizing Intra-Class Feature Distribution<br>:star:code
- Defending Multimodal Fusion Models Against Single-Source Adversaries
- Rotation Coordinate Descent for Fast Globally Optimal Rotation Averaging<br>:open_mouth:oral:star:code
- Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations
- How Does Topology Influence Gradient Propagation and Model Performance of Deep Networks With DenseNet-Type Skip Connections?<br>:star:code
- Deep Stable Learning for Out-of-Distribution Generalization
- TrafficSim: Learning To Simulate Realistic Multi-Agent Behaviors
- Sign-Agnostic Implicit Learning of Surface Self-Similarities for Shape Modeling and Reconstruction From Raw Point Clouds
- Effective Sparsification of Neural Networks With Global Sparsity Constraint
- Hyperdimensional computing as a framework for systematic aggregation of image descriptors<br>:house:project
- Time Adaptive Recurrent Neural Network<br>:star:code
- 4D Hyperspectral Photoacoustic Data Restoration with Reliability Analysis
- Neighborhood Normalization for Robust Geometric Feature Learning<br>:star:code
- Neural Surface Maps<br>:star:code:house:project:tv:video
- Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods
- NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning
- Bilinear Parameterization for Non-Separable Singular Value Penalties
- On the Difficulty of Membership Inference Attacks<br>:star:code
- ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks<br>:star:code
- Multi-Label Learning From Single Positive Labels
- CompositeTasking: Understanding Images by Spatial Composition of Tasks<br>:star:code
- Searching for Fast Model Families on Datacenter Accelerators<br>:star:code
- Understanding and Simplifying Perceptual Distances
- Bayesian Nested Neural Networks for Uncertainty Calibration and Adaptive Compression<br>:star:code
- An Alternative Probabilistic Interpretation of the Huber Loss
- Scale-Localized Abstract Reasoning<br>:star:code:sunflower:dataset
- Inferring CAD Modeling Sequences Using Zone Graphs
- Partially View-Aligned Representation Learning With Noise-Robust Contrastive Loss
- Blocks-World Cameras
- The Affective Growth of Computer Vision
- Polarimetric Normal Stereo
- Uncalibrated Neural Inverse Rendering for Photometric Stereo of General Surfaces
- RSG: A Simple but Effective Module for Learning Imbalanced Datasets<br>:star:code
- Fast Sinkhorn Filters: Using Matrix Scaling for Non-Rigid Shape Correspondence With Functional Maps<br>:star:code
- MetaSets: Meta-Learning on Point Sets for Generalizable Representations<br>:star:code
- Isometric Multi-Shape Matching
- Efficient Deformable Shape Correspondence via Multiscale Spectral Manifold Wavelets Preservation
- TearingNet: Point Cloud Autoencoder To Learn Topology-Friendly Representations
- Boosting Ensemble Accuracy by Revisiting Ensemble Diversity Metrics
- Convolutional Dynamic Alignment Networks for Interpretable Classifications<br>:open_mouth:oral:star:code
- EDNet: Efficient Disparity Estimation With Cost Volume Combination and Attention-Based Spatial Residual
- How Robust are Randomized Smoothing based Defenses to Data Poisoning?<br>:star:code
- Generative Interventions for Causal Learning
- Learning to Identify Correct 2D-2D Line Correspondences on Sphere
- Domain-Independent Dominance of Adaptive Methods<br>:star:code
- Combinatorial Learning of Graph Edit Distance via Dynamic Embedding
- IMODAL: Creating Learnable User-Defined Deformation Models<br>:star:code
- Robust Bayesian Neural Networks by Spectral Expectation Bound Regularization<br>:star:code
- Neural Cellular Automata Manifold
- MultiLink: Multi-Class Structure Recovery via Agglomerative Clustering and Model Selection<br>:star:code
- A Sliced Wasserstein Loss for Neural Texture Synthesis
- A Second-Order Approach to Learning with Instance-Dependent Label Noise<br>:open_mouth:oral:star:code
- Hilbert Sinkhorn Divergence for Optimal Transport
- The Multi-Temporal Urban Development SpaceNet Dataset
- Inverse Simulation: Reconstructing Dynamic Geometry of Clothed Humans via Optimal Control
- Learning Decision Trees Recurrently Through Communication<br>:star:code
- Learning the Predictability of the Future<br>:star:code:house:project
- RaScaNet: Learning Tiny Models by Raster-Scanning Images
- Joint Negative and Positive Learning for Noisy Labels
- The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network Architectures<br>:star:code
- Understanding Failures of Deep Networks via Robust Feature Extraction<br>:star:code
- Gradient-based Algorithms for Machine Teaching
- Geo-FARM: Geodesic Factor Regression Model for Misaligned Pre-Shape Responses in Statistical Shape Analysis
- A Functional Approach to Rotation Equivariant Non-Linearities for Tensor Field Networks
- Real-Time Sphere Sweeping Stereo From Multiview Fisheye Images
- Taskology: Utilizing Task Relations at Scale
- Soteria: Provable Defense against Privacy Leakage in Federated Learning from Representation Perspective<br>:star:code
- Spatial Assembly Networks for Image Representation Learning
- SKFAC: Training Neural Networks With Faster Kronecker-Factored Approximate Curvature<br>:star:code
- Student-Teacher Learning from Clean Inputs to Noisy Inputs
- Adversarial Invariant Learning
- S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-Bit Neural Networks via Guided Distribution Calibration<br>:star:code
- MaxUp: Lightweight Adversarial Training With Data Augmentation Improves Neural Network Training
- Explaining Classifiers using Adversarial Perturbations on the Perceptual Ball
- Visual Grounding
- 语义匹配
- 梯度压缩
- 自动生成漫画
- 联合学习
- DL
- 姿势估计(非人体)
- 全家福
- Inception Convolution With Efficient Dilation Search<br>图像识别、人体姿态估计、目标检测、实例分割
- 视觉推理
- Transformation Driven Visual Reasoning<br>:star:code:house:project
- mesh saliency
- 3D场景交互
- Populating 3D Scenes by Learning Human-Scene Interaction<br>:star:code:house:project:tv:video
- Stereo Matching(立体匹配)
- 图像到视频合成
- Audio-Visual Navigation(视听导航)
- Semantic Audio-Visual Navigation<br>:star:code:house:project:tv:video
- 字体生成
- 多任务学习
- 视觉导航
- 图像匹配
- Co-Attention for Conditioned Image Matching<br>:star:code:house:project
- texture recognition(纹理识别)
- Hyperspectral Image Reconstruction(高光谱图像重建)
- Visual Odometry(视觉里程计)
- image registration(图像配准)
- semantic part completion(语义场景补全)
- 行人和车辆相互作用
- 情感计算
- 估计密集的图像与图像之间的对应关系和相关的信度估计
- Learning Accurate Dense Correspondences and When To Trust Them<br>:open_mouth:oral:star:code:house:project:tv:video
- 用Deep-Red Flash看黑暗中的物体