Awesome
ICCV2021最新信息及已接收论文/代码
<div align="center"> <img src="image/ICCV2021.png"/> </div>官网链接:http://iccv2021.thecvf.com/home<br> 开会时间:2021年10月11日至17日<br>
:exclamation::exclamation::exclamation::star2::star2::star2:📗📗📗ICCV 2021收录论文已全部公布,下载可在【我爱计算机视觉】后台回复“paper”,即可收到。共计 1612 篇。
:exclamation::exclamation::exclamation::star2::star2::star2:全部论文已粗略分类完毕,请查阅
历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~
2022 年论文分类汇总戳这里
↘️CVPR-2022-Papers ↘️WACV-2022-Papers
2021年论文分类汇总戳这里
↘️ICCV-2021-Papers ↘️CVPR-2021-Papers
2020 年论文分类汇总戳这里
↘️CVPR-2020-Papers ↘️ECCV-2020-Papers
目录
65.Optical Flow Estimation(光流估计)
- Separable Flow: Learning Motion Cost Volumes for Optical Flow Estimation<br>:star:code
- High-Resolution Optical Flow from 1D Attention and Correlation<br>:open_mouth:oral:star:code
- GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning<br>:star:code
- Sensor-Guided Optical Flow<br>:star:code
64.Anomaly Detection(异常检测)
- 表面异常检测
- 异常检测
63.Data Augmentation(数据增强)
- DivAug: Plug-In Automated Data Augmentation With Explicit Diversity Maximization<br>:star:code
- TrivialAugment: Tuning-Free Yet State-of-the-Art Data Augmentation<br>:open_mouth:oral:star:code
- Semantic Aware Data Augmentation for Cell Nuclei Microscopical Images With Artificial Neural Networks
- A Simple Baseline for Semi-Supervised Semantic Segmentation With Strong Data Augmentation
62.Open-Set Recognition(开放集识别)
- OpenGAN: Open-Set Recognition via Open Data Generation<br>:trophy:Best Paper Honorable Mention
- Conditional Variational Capsule Network for Open Set Recognition<br>:star:code
61.Metric Learning(元学习)
- Do Different Deep Metric Learning Losses Lead to Similar Learned Features?<br>:star:code
- Learning With Memory-Based Virtual Classes for Deep Metric Learning<br>:star:code
60.Federated Learning(联合学习)
- Federated Learning for Non-IID Data via Unified Feature Learning and Optimization Objective Alignment
- Ensemble Attention Distillation for Privacy-Preserving Federated Learning
59.Graph Neural Networks(图神经网络)
- Meta-Aggregator: Learning to Aggregate for 1-bit Graph Neural Networks
- PoGO-Net: Pose Graph Optimization With Graph Neural Networks<br>:star:code
- Dynamic Dual Gating Neural Networks<br>:star:code
58.Computational Photography(光学、几何、光场成像、计算摄影)
- An Asynchronous Kalman Filter for Hybrid Event Cameras<br>:star:code
- 4D Cloud Scattering Tomography
- Snapshot compressive imaging(快照压缩成像)
- 光场
- Light Field Saliency Detection with Dual Local Graph Learning andReciprocative Guidance
- Fast Light-Field Disparity Estimation With Multi-Disparity-Scale Cost Aggregation<br>:star:code
- SeLFVi: Self-supervised Light-Field Video Reconstruction from Stereo Video
- SIGNET: Efficient Neural Representation for Light Fields
- 光场重建
- 压缩成像
- Homography Estimation
- 计算成像
- 光学像差矫正
57.Image Matching(图像匹配)
<a name="56"/>56.Dataset(数据集)
- Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm Under Mixed Illumination<br>:sunflower:dataset
- FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters<br>:sunflower:dataset<br>内陆水域漂浮废物检测数据集和基准
- FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol Spotting<br>:house:project
- 生物医学图像
- 3D重建
- 航空影像数据集
- Beyond Road Extraction: A Dataset for Map Update using Aerial Images<br>:star:code:house:project<br>用于使用航拍图像更新地图的数据集
- 动作识别
- 目标识别
- ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition<br>:star:code:sunflower:dataset
- 车道线检测
- 自动驾驶
- 视觉语言数据集
- DeepFake检测
- KoDF: A Large-Scale Korean DeepFake Detection Dataset<br>:sunflower:dataset
- 高质量视频
55.Activity Recognition(活动识别)
<a name="54"/>54.Sketch recognition(草图)
- SketchLattice: Latticed Representation for Sketch Manipulation
- SketchAA: Abstract Representation for Abstract Sketches
53.Vision Localization(视觉定位)
- Continual Learning for Image-Based Camera Localization<br>:star:code
- CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization<br>:sunflower:dataset
- Pose Correction for Highly Accurate Visual Localization in Large-Scale Indoor Spaces<br>:star:code
- Cross-Descriptor Visual Localization and Mapping
52.Vision-and-Language(视觉语言)
- YouRefIt: Embodied Reference Understanding with Language and Gesture<br>:open_mouth:oral:house:project
- VLGrammar: Grounded Grammar Induction of Vision and Language<br>:star:code
- COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-Training for Vision-Language Representation<br>:star:code
- Panoptic Narrative Grounding<br>:open_mouth:oral:star:code
- AESOP: Abstract Encoding of Stories, Objects, and Pictures<br>:star:code:tv:video
- Adaptive Hierarchical Graph Reasoning With Semantic Coherence for Video-and-Language Inference
- 视觉推理
- 语义导航
- 视觉语言导航
- Airbert: In-domain Pretraining for Vision-and-Language Navigation<br>:house:project
- Waypoint Models for Instruction-guided Navigation in Continuous Environments<br>:open_mouth:oral:star:code:house:project:tv:video
- The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation<br>:star:code
- Vision-Language Navigation With Random Environmental Mixup
- 视觉对话导航
- 视觉导航
- visual grounding
- 视觉对话
51.View Synthesis(视图合成)
- Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization<br>:star:code
- Deep 3D Mask Volume for View Synthesis of Dynamic Scenes<br>:house:project
- Embedding Novel Views in a Single JPEG Image
- Video Autoencoder: self-supervised disentanglement of static 3D structure and motion<br>:open_mouth:oral:star:code:house:project:tv:video
- Geometry-Free View Synthesis: Transformers and No 3D Priors<br>:star:code
- Dynamic View Synthesis From Dynamic Monocular Video<br>:house:project:tv:video
- Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis<br>:house:project:tv:video
- Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image<br>:open_mouth:oral:star:code:house:project:tv:video
- Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image<br>:open_mouth:oral:star:code:house:project:tv:video
50.Continual Learning(持续学习)
-
Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data<br>:star:code
-
Continual Learning on Noisy Data Streams via Self-Purified Replay<br>:star:code
-
Rehearsal Revealed: The Limits and Merits of Revisiting Samples in Continual Learning<br>:star:code
-
Co2L: Contrastive Continual Learning<br>:star:code
49.Human-Object Interaction(人物交互)
- Exploiting Scene Graphs for Human-Object Interaction Detection<br>:star:code
- Spatially Conditioned Graphs for Detecting Human-Object Interactions<br>:star:code:tv:video
- Virtual Multi-Modality Self-Supervised Foreground Matting for Human-Object Interaction
- Detecting Human-Object Relationships in Videos
- Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions<br>:star:code:house:project:sunflower:dataset
- Discovering Human Interactions With Large-Vocabulary Objects via Query and Multi-Scale Detection<br>:star:code
- Visual Relationship Detection Using Part-and-Sum Transformers With Composite QueriesVRD和HOI
- Interaction Compass: Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations<br>:star:code
- H2O
- Human Interaction Understanding
- 手物交互
- HOI(行为理解)
48.6DoF
- SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation<br>:star:code
- StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
- SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation
- RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering<br>:star:code
- DualPoseNet: Category-Level 6D Object Pose and Size Estimation Using Dual Pose Network With Refined Learning of Pose Consistency<br>:star:code
- PR-GCN: A Deep Graph Convolutional Network With Point Refinement for 6D Pose Estimation
- 物体姿势估计
- CAPTRA: CAtegory-Level Pose Tracking for Rigid and Articulated Objects From Point Clouds<br>:open_mouth:oral:star:code:house:project:tv:video
47.NAS
- BN-NAS: Neural Architecture Search with Batch Normalization<br>:star:code
- RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving
- Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift<br>:star:code
- Evolving Search Space for Neural Architecture Search<br>:star:code:tv:video
- FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search<br>:star:code
- GLiT: Neural Architecture Search for Global and Local Image Transformer<br>:star:code
- Neural Architecture Search for Joint Human Parsing and Pose Estimation<br>:star:code
- Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces
- Learning Latent Architectural Distribution in Differentiable Neural Architecture Search via Variational Information Maximization
- Not All Operations Contribute Equally: Hierarchical Operation-Adaptive Predictor for Neural Architecture Search
- Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition<br>:star:code
- BossNAS: Exploring Hybrid CNN-Transformers With Block-Wisely Self-Supervised Neural Architecture Search<br>:star:code
- NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization
- AutoSpace: Neural Architecture Search With Less Human Interference<br>:star:code
- IDARTS: Interactive Differentiable Architecture Search
46.Defect Detection(缺陷检测)
<a name="45"/>45.Image Caption(图像字幕)
- Who's Waldo? Linking People Across Text and Images<br>:open_mouth:oral:house:project<br>:newspaper:解读:ICCV2021 Oral-新任务!新数据集!康奈尔大学提出了类似VG但又不是VG的PVG任务
- Partial Off-Policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning
- Topic Scene Graph Generation by Attention Distillation From Caption<br>:star:code
- Understanding and Evaluating Racial Biases in Image Captioning<br>:star:code:house:project
- In Defense of Scene Graphs for Image Captioning<br>:star:code
- art description generation(艺术描述生成)
- Change Captioning
44.Human motion prediction(人体运动预测)
- MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction<br>:star:code
- Stochastic Scene-Aware Motion Prediction<br>:star:code:house:project
- Generating Smooth Pose Sequences for Diverse Human Motion Prediction<br>:open_mouth:oral:star:code
- TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild<br>:house:project
- Motion Prediction using Trajectory Cues
- 3D人体运动预测
43.Dense Prediction(密集预测)
- FaPN: Feature-aligned Pyramid Network for Dense Image Prediction<br>:star:code
- 多任务密集预测
42.Representations Learning(表征学习)
- Learning From Noisy Data With Robust Representation Learning<br>:star:code
- Self-Supervised Representation Learning From Flow Equivariance
- Exploring Visual Engagement Signals for Representation Learning<br>:star:code
- Switchable K-class Hyperplanes for Noise-Robust Representation Learning<br>:star:code
- Region Similarity Representation Learning<br>:star:code
- Curious Representation Learning for Embodied Intelligence<br>:star:code:house:project
- 视觉表征学习
- Self-Supervised Visual Representations Learning by Contrastive Mask Prediction<br>:newspaper:解读:ICCV2021 比MoCo更通用的对比学习范式,中科大&MSRA提出对比学习新方法MaskCo
- Temporal Knowledge Consistency for Unsupervised Visual Representation Learning
- Contrasting Contrastive Self-Supervised Representation Learning Pipelines<br>:star:code
- Concept Generalization in Visual Representation Learning<br>:house:project
- Collaborative Unsupervised Visual Representation Learning from Decentralized Data
- Episodic Transformer for Vision-and-Language Navigation<br>:star:code
- Multi-VAE: Learning Disentangled View-Common and View-Peculiar Visual Representations for Multi-View Clustering
- 视频表示学习
- ASCNet: Self-Supervised Video Representation Learning With Appearance-Speed Consistency
- ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning<br>:house:project
- Time-Equivariant Contrastive Video Representation Learning
- Space-Time Crop & Attend: Improving Cross-Modal Video Representation Learning<br>:star:code
41.Out-of-Distribution Detection(OOD)
- CODEs: Chamfer Out-of-Distribution Examples against Overconfidence Issue
- Semantically Coherent Out-of-Distribution Detection<br>:star:code:house:project
- The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization<br>:star:code
40.Metric Learning(度量学习)
- Towards Interpretable Deep Metric Learning with Structural Matching<br>:star:code
- Deep Relational Metric Learning<br>:star:code
- LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning<br>:star:code
- Manifold Matching via Deep Metric Learning for Generative Modeling<br>:star:code
39.Incremental Learning(增量学习)
- 类增量学习
- Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning<br>:newspaper:解读:让模型实现“终生学习”,佐治亚理工学院提出Data-Free的增量学习
- Striking a Balance Between Stability and Plasticity for Class-Incremental Learning
- Synthesized Feature Based Few-Shot Class-Incremental Learning on a Mixture of Subspaces<br>:star:code
38.Weakly/Semi-Supervised/Self-supervised/Unsupervised Learning(自/半/弱监督学习)
- 半监督
- Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning
- Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments With Support Samples<br>:star:code
- Semi-Supervised Active Learning for Semi-Supervised Models: Exploit Adversarial Examples With Graph-Based Virtual Labels
- CoMatch: Semi-Supervised Learning With Contrastive Graph Regularization<br>:star:code
- Multiview Pseudo-Labeling for Semi-supervised Learning from Video
- 自监督
- Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring<br>:star:code
- Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging<br>:star:code
- ISD: Self-Supervised Learning by Iterative Similarity Distillation<br>:star:code
- Contrast and Order Representations for Video Self-Supervised Learning
- On Feature Decorrelation in Self-Supervised Learning<br>:open_mouth:oral
- Geography-Aware Self-Supervised Learning
- Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
- Efficient Visual Pretraining with Contrastive Detection
- Broaden Your Views for Self-Supervised Video Learning
- CDS: Cross-Domain Self-supervised Pre-training
- On Compositions of Transformations in Contrastive Self-Supervised Learning<br>:star:code
- Solving Inefficiency of Self-Supervised Representation Learning<br>:star:code
- Divide and Contrast: Self-supervised Learning from Uncurated Data
- Emerging Properties in Self-Supervised Vision Transformers<br>:star:code
- Mean Shift for Self-Supervised Learning<br>:star:code
- 弱监督
37.Multitask Learning(多任务学习)
- MultiTask-CenterNet (MCN): Efficient and Diverse Multitask Learning using an Anchor Free Approach<br>:newspaper:解读:ICCV2021《MultiTask CenterNet》CV多任务新进展!一节更比三节强
- Multi-Task Self-Training for Learning General Representations<br>:newspaper:解读:ICCV2021 MuST:还在特定任务里为刷点而苦苦挣扎?谷歌的大佬们都已经开始玩多任务训练了
- UniT: Multimodal Multitask Learning With a Unified Transformer<br>:star:code
- Learning Multiple Pixelwise Tasks Based on Loss Scale Balancing<br>:star:code
- Learning With Privileged Tasks
- Task Switching Network for Multi-Task Learning
36.SLAM/AR/VR/机器人
- 机器人
- 室内导航
- 机器手抓取
- Hand-Object Contact Consistency Reasoning for Human Grasps Generation<br>:open_mouth:oral:star:code:house:project:tv:video
- VR/AR
- The Power of Points for Modeling Humans in Clothing<br>:star:code:house:project:tv:video
- 虚拟试穿
- M3D-VTON: A Monocular-to-3D Virtual Try-On Network<br>:star:code
- ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors
- Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-On and Outfit Editing
- FashionMirror: Co-Attention Feature-Remapping Virtual Try-On With Sequential Template Poses
- Structure-transformed Texture-enhanced Network for Person Image Synthesis
- SLAM
- On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation<br>:star:code
- Transfusion: A Novel SLAM Method Focused on Transparent Objects
- iMAP: Implicit Mapping and Positioning in Real-Time
- Learning To Bundle-Adjust: A Graph Network Approach to Faster Optimization of Bundle Adjustment for Vehicular SLAM
- R-SLAM: Optimizing Eye Tracking From Rolling Shutter Video of the Retina
- Place Recognition
35.Quantization/Pruning/Knowledge Distillation/Model Compression(量化、剪枝、蒸馏、模型压缩/扩展与优化)
- 知识蒸馏
- Distilling Holistic Knowledge with Graph Neural Networks<br>:star:code
- Lipschitz Continuity Guided Knowledge Distillation<br>:star:code
- Densely Guided Knowledge Distillation Using Multiple Teacher Assistants<br>:star:code
- Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better<br>:star:code
- Compressing Visual-linguistic Model via Knowledge Distillation
- Self-Knowledge Distillation With Progressive Refinement of Targets<br>:star:code:tv:video
- Student Customized Knowledge Distillation: Bridging the Gap Between Student and Teacher
- Channel-Wise Knowledge Distillation for Dense Prediction<br>:star:code
- Exploring Inter-Channel Correlation for Diversity-Preserved Knowledge Distillation<br>:star:code
- 量化
- Distance-aware Quantization<br>:star:code:house:project
- Dynamic Network Quantization for Efficient Video Inference<br>:star:code:house:project
- Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss
- Improving Low-Precision Network Quantization via Bin Regularization
- Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
- Integer-arithmetic-only Certified Robustness for Quantized Neural Networks
- RMSMP: A Novel Deep Neural Network Quantization Framework With Row-Wise Mixed Schemes and Multiple Precisions
- Improving Neural Network Efficiency via Post-Training Quantization With Adaptive Floating-Point<br>:star:code
- Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search<br>:star:code
- 模型压缩
- 剪枝
34.Super-Resolution(超分辨率)
- ISR
- Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution<br>:star:code
- Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling<br>:star:code
- Deep Reparametrization of Multi-Frame Super-Resolution and Denoising<br>:open_mouth:oral
- Dual-Camera Super-Resolution with Aligned Attention Modules<br>:star:code:house:project:tv:video
- Attention-Based Multi-Reference Learning for Image Super-Resolution<br>:star:code:house:project
- Learning a Single Network for Scale-Arbitrary Super-Resolution
- Fourier Space Losses for Efficient Perceptual Image Super-Resolution<br>:star:code
- Achieving On-Mobile Real-Time Super-Resolution With Neural Architecture and Pruning Search
- Designing a Practical Degradation Model for Deep Blind Image Super-Resolution<br>:star:code
- Event Stream Super-Resolution via Spatiotemporal Constraint Learning
- Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution
- Super-Resolving Cross-Domain Face Miniatures by Peeking at One-Shot Exemplar
- Context Reasoning Attention Network for Image Super-Resolution
- EvIntSR-Net: Event Guided Multiple Latent Frames Reconstruction and Super-Resolution
- Super Resolve Dynamic Scene from Continuous Spike Streams
- Deep Blind Video Super-Resolution
- Benchmarking Ultra-High-Definition Image Super-Resolution
- Lucas-Kanade Reloaded: End-to-End Super-Resolution From Raw Image Bursts
- Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective
- Real-World Video Super-Resolution: A Benchmark Dataset and a Decomposition Based Learning Scheme<br>:star:code<br>:newspaper:解读:ICCV2021 香港理工、阿里达摩院提出RealVSR:视频超分任务中的新数据集与损失方案
- VSR
- Omniscient Video Super-Resolution<br>:star:code
- COMISR: Compression-Informed Video Super-Resolution<br>:star:code<br>:newspaper:解读:谷歌提出COMISR算法:针对视频压缩的压缩感知超分辨率
- Learning Frequency-Aware Dynamic Network for Efficient Super-Resolution
- Efficient Video Compression via Content-Adaptive Super-Resolution<br>:star:code
33.Remote Sensing Images(遥感影像)
- SUNet: Symmetric Undistortion Network for Rolling Shutter Correction
- Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery<br>:star:code<br>:newspaper:解读:ICCV2021|武汉大学RSIDEA团队提出一种新颖的弱监督遥感变化检测算法STAR
- 卫星图像全景视频合成
- 基于卫星影像的交通事故检测
- 遥感数据
- 分割
- 三维重建
32.语音
- The Right to Talk: An Audio-Visual Transformer Approach<br>:star:code
- Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis<br>:star:code:house:project
- 音频分离
- 音频-手势
- Active Speaker Detection(ASD主动式扬声器检测)
- 从人脸视频中重新收集音频
- 视听源定位
- 视听源分离
- Move2Hear: Active Audio-Visual Source Separation<br>:star:code:house:project
- 视听平面图重建
- Audio-Visual Floorplan Reconstruction<br>:star:code:house:project:tv:video
31.Style Transfer(风格迁移)
- AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer<br>:star:code
- Domain-Aware Universal Style Transfer<br>:star:code
- Diverse Image Style Transfer via Invertible Cross-Space Mapping
- StyleFormer: Real-Time Arbitrary Style Transfer via Parametric Style Composition
- Manifold Alignment for Semantically Aligned Style Transfer<br>:star:code
30.Image Generation/synthesis(图像生成/合成)
- ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models<br>:open_mouth:oral
- Image Synthesis via Semantic Composition<br>:star:code:house:project
- Image Synthesis From Layout With Locality-Aware Mask Adaption
- 图像融合
29.Image Retrieval(图像检索)
- DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features<br>:star:code
- Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models<br>:star:code:house:project
- Self-supervised Product Quantization for Deep Unsupervised Image Retrieval<br>:star:code
- Instance-Level Image Retrieval Using Reranking Transformers<br>:star:code
- Learning Attribute-Driven Disentangled Representations for Interactive Fashion Retrieval<br>:star:code
- Telling the What While Pointing to the Where: Multimodal Queries for Image Retrieval
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
- Learning Deep Local Features With Multiple Dynamic Attentions for Large-Scale Image Retrieval<br>:star:code
- Bayesian Triplet Loss: Uncertainty Quantification in Image Retrieval
- 跨域检索
- Visual Geolocalization
- 跨模态检索
- Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval With Partial Query<br>:star:code
- Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining<br>:star:code
- Wasserstein Coupled Graph Learning for Cross-Modal Retrieval
- Adversarial Attack on Deep Cross-Modal Hamming Retrieval
- 文本-视频检索
- 视频- 文本检索
- image-based 3D shape retrieval
- 近邻搜索
28.Contrastive Learning(对比学习)
- Improving Contrastive Learning by Visualizing Feature Transformation<br>:open_mouth:oral:star:code
- TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment<br>:newspaper:解读:ICCV2021-TOCo-微软&CMU提出Token感知的级联对比学习方法,在视频文本对齐任务上“吊打”其他SOTA方法
- A Broad Study on the Transferability of Visual Representations With Contrastive Learning<br>:star:code
- Vi2CLR: Video and Image for Visual Contrastive Learning of Representation
- LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions<br>:star:code
- CrossCLR: Cross-Modal Contrastive Learning for Multi-Modal Video Representations
- Social NCE: Contrastive Learning of Socially-Aware Motion Representations<br>:star:code:tv:video
- With a Little Help From My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations
- Contrastive Learning of Image Representations With Cross-Video Cycle-Consistency<br>:house:project
- Weakly Supervised Contrastive Learning
27.Multi-label image recognition(多标签图像识别)
- Residual Attention: A Simple but Effective Method for Multi-Label Recognition<br>:star:code
- Transformer-based Dual Relation Graph for Multi-label Image Recognition
26.Image Processing(图像处理)
- Aligning Latent and Image Spaces to Connect the Unconnectable<br>:star:code:house:project
- 图像形状操纵
- Image Shape Manipulation from a Single Augmented Training Sample<br>:open_mouth:oral:star:code:house:project
- 边缘检测
- 图像识别
- 图像去模糊
- Rethinking Coarse-to-Fine Approach in Single Image Deblurring<br>:star:code
- Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions
- Defocus Map Estimation and Deblurring From a Single Dual-Pixel Image
- Motion Deblurring with Real Events
- Pyramid Architecture Search for Real-Time Image Deblurring
- 运动去模糊
- 视频去模糊
- Image quality assessment(图像质量评估IQA)
- Image Harmonization
- 去阴影
- 去噪
- Rethinking Deep Image Prior for Denoising<br>:star:code
- Rethinking Noise Synthesis and Modeling in Raw Denoising<br>:star:code
- C2N: Practical Generative Noise Modeling for Real-World Denoising
- The Benefit of Distraction: Denoising Camera-Based Physiological Measurements Using Inverse Attention<br>:star:code
- Hyperspectral Image Denoising with Realistic Data<br>:star:code
- End-to-End Unsupervised Document Image Blind Denoising
- Cross-Patch Graph Convolutional Network for Image Denoising
- 视频去噪
- 图像着色
- 图像增强
- Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables
- Adaptive Unfolding Total Variation Network for Low-Light Image Enhancement<br>:star:code
- Representative Color Transform for Image Enhancement
- STAR: A Structure-Aware Lightweight Transformer for Real-Time Image Enhancement
- Deep Symmetric Network for Underexposed Image Enhancement With Recurrent Attentional Learning<br>:star:code:house:project
- StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement
- 图像恢复
- Spatially-Adaptive Image Restoration using Distortion-Guided Networks<br>:star:code
- Dynamic Attentive Graph Learning for Image Restoration<br>:star:code
- Self-Supervised Cryo-Electron Tomography Volumetric Image Restoration From Single Noisy Volume With Sparsity Constraint<br>:star:code
- Searching for Controllable Image Restoration Networks<br>:star:code
- 图像压缩
- 图像修复
- Image Inpainting via Conditional Texture and Structure Dual Generation<br>:star:code
- CR-Fill: Generative Image Inpainting With Auxiliary Contextual Reconstruction<br>:star:code
- Parallel Multi-Resolution Fusion Network for Image Inpainting
- Painting from Part<br>:star:code
- WaveFill: A Wavelet-Based Generation Network for Image Inpainting
- Distillation-Guided Image Inpainting
- Learning a Sketch Tensor Space for Image Inpainting of Man-made Scenes<br>:star:code:house:project
- Image extrapolation
- Reversible Image Conversion
- 伪影去除
- De-rendering
- De-rendering Stylized Texts<br>:star:code:house:project
- 去除光晕
- 全景图拼接
- Flare Removal
- How to Train Neural Networks for Flare Removal<br>:house:project:tv:video
- 图像裁剪
- 去反射
- 去雨
- 图像失真去除
- 消除水下图像的折射失真
- 图像补全
- Image Decomposition
- 失真矫正
- HDR
- 图像去雪
- Image Harmonization
- Image Harmonization With Transformer<br>:star:code
- 图像编辑
- image hiding(图像隐藏)
25.Medical Image(医学影像)
- Equivariant Imaging: Learning Beyond the Range Space<br>:open_mouth:oral:star:code
- Deep Survival Analysis With Longitudinal X-Rays for COVID-19
- 医学图像分割
- 病理学图像表示
- 医学图像分析
- 医学图像去噪
- 视频翻译
- 病理学图像核检测分割
- 医学报告生成
- CT
- 医学图像识别
- 医学图像分类
24.Face(人脸)
- VariTex: Variational Neural Face Textures<br>:star:code:house:project:tv:video
- 人脸造假检测
- 人脸合成
- Disentangled Lifespan Face Synthesis<br>:star:code:house:project:tv:video
- 人脸识别
- PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition
- SynFace: Face Recognition with Synthetic Data
- Adaptive Label Noise Cleaning With Meta-Supervision for Deep Face Recognition
- Disentangled Representation for Age-Invariant Face Recognition: A Mutual Information Minimization Perspective
- Teacher-Student Adversarial Depth Hallucination To Improve Face Recognition<br>:star:code
- DAM: Discrepancy Alignment Metric for Face Recognition
- “去”识别
- Face perception面部感知
- 说话人脸生成
- 说话头合成
- 人脸表情识别
- 人脸呈现攻击检测
- 人脸编辑
- 人脸对齐
- 人脸图像重建
- 3D人脸重建
- 三维人脸动画
- Remote Photoplethysmography (rPPG远程光电容积描记术)
- 人脸加密
- Deepfake检测
- 人脸纹理补全
- 面部动作单元检测
- 人脸分析
- 3D头重建
- 人脸关键点检测
- 人脸图像检索
23.Gaze Estimation(视线估计)
- Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation<br>:star:code
- 视线跟踪
- 视点估计
22.GAN
- Sketch Your Own GAN<br>:star:code:house:project
- Online Multi-Granularity Distillation for GAN Compression<br>:star:code
- Dual Projection Generative Adversarial Networks for Conditional Image Generation<br>:star:code
- InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images
- ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement<br>:star:code:house:project:tv:video
- WarpedGANSpace: Finding non-linear RBF paths in GAN latent space<br>:star:code
- Toward a Visual Concept Vocabulary for GAN Latent Space
- Collaging Class-specific GANs for Semantic Image Synthesis<br>:house:project
- Latent Transformations via NeuralODEs for GAN-Based Image Editing
- Reality Transform Adversarial Generators for Image Splicing Forgery Detection and Localization
- GAN-Control: Explicitly Controllable GANs(https://alonshoshan10.github.io/gan_control/)<br>:house:project
- Omni-GAN: On the Secrets of cGANs and Beyond<br>:star:code
- Unsupervised Image Generation with Infinite Generative Adversarial Networks<br>:star:code
- DAE-GAN: Dynamic Aspect-Aware GAN for Text-to-Image Synthesis
- Detail Me More: Improving GAN’s photo-realism of complex scenes
- Unsupervised Segmentation Incorporating Shape Prior via Generative Adversarial Networks
- DRB-GAN: A Dynamic ResBlock Generative Adversarial Network for Artistic Style Transfer<br>:star:code
- Dual Contrastive Loss and Attention for GANs
- Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation
- Gradient Normalization for Generative Adversarial Networks<br>:star:code
- EigenGAN: Layer-Wise Eigen-Learning for GANs<br>:star:code
- Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval<br>:star:code
- HeadGAN: One-shot Neural Head Synthesis and Editing<br>:house:project:tv:video
- Explaining in Style: Training a GAN To Explain a Classifier in StyleSpace<br>:star:code:house:project:tv:video
- StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery<br>:open_mouth:oral:star:code:tv:video
- Towards Discovery and Attribution of Open-World GAN Generated Images
- Diagonal Attention and Style-Based GAN for Content-Style Disentanglement in Image Generation and Translation
- Re-Aging GAN: Toward Personalized Face Age Transformation
- When do GANs replicate? On the choice of dataset size<br>:star:code
- LoFGAN: Fusing Local Representations for Few-shot Image Generation
- Multi-Class Multi-Instance Count Conditioned Adversarial Image Generation<br>:star:code
- Generative Adversarial Registration for Improved Conditional Deformable Templates<br>:star:code
- F-Drop&Match: GANs with a Dead Zone in the High-Frequency Domain
- GAN inversion(GAN逆映射)
- 图像到图像翻译
- Unaligned Image-to-Image Translation by Learning to Reweight<br>:star:code
- Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation<br>:star:code
- Instance-Wise Hard Negative Example Generation for Contrastive Learning in Unpaired Image-to-Image Translation
- TransferI2I: Transfer Learning for Image-to-Image Translation from Small Datasets<br>:star:code
- Rethinking the Truly Unsupervised Image-to-Image Translation
- SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation
- Image translation
- Scaling-up Disentanglement for Image Translation<br>:star:code:house:project
- Harnessing the Conditioning Sensorium for Improved Image Translation
- Frequency Domain Image Translation: More Photo-Realistic, Better Identity-Preserving<br>:star:code
- Dual Transfer Learning for Event-based End-task Prediction via Pluggable Event to Image Translation<br>:star:code
- Semantically Robust Unpaired Image Translation for Data with Unmatched Semantics Statistics
21.Active Learning(主动学习)
- Semi-Supervised Active Learning with Temporal Output Discrepancy<br>:star:code
- Influence Selection for Active Learning
- Active Domain Adaptation via Clustering Uncertainty-weighted Embeddings<br>:star:code
- Contrastive Coding for Active Learning under Class Distribution Mismatch<br>:star:code
20.Adversarial Learning(对抗学习)
- Low Curvature Activations Reduce Overfitting in Adversarial Training
- Removing Adversarial Noise in Class Activation Feature Space
- Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings
- Invisible Backdoor Attack With Sample-Specific Triggers<br>:star:code
- Defending Against Universal Adversarial Patches by Clipping Feature Norms
- 对抗攻击
- Feature Importance-aware Transferable Adversarial Attacks<br>:star:code
- TkML-AP: Adversarial Attacks to Top-k Multi-Label Learning<br>:star:code
- Meta Gradient Adversarial Attack
- AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning<br>:star:code
- Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes
- AdvDrop: Adversarial Attack to DNNs by Dropping Information<br>:star:code
- Adversarial Attacks Are Reversible With Natural Supervision
- Attack As the Best Defense: Nullifying Image-to-Image Translation GANs via Limit-Aware Adversarial Attack
- Learnable Boundary Guided Adversarial Training<br>:star:code
- Augmented Lagrangian Adversarial Attacks<br>:star:code
- Meta-Attack: Class-Agnostic and Model-Agnostic Physical Adversarial Attack
- On Generating Transferable Targeted Perturbations<br>:star:code
- Admix: Enhancing the Transferability of Adversarial Attacks<br>:star:code
- Consistency-Sensitivity Guided Ensemble Black-Box Adversarial Attacks in Low-Dimensional Spaces
- Adversarial Attacks On Multi-Agent Communication
- Interpreting Attributions and Interactions of Adversarial Attacks
- RDA: Robust Domain Adaptation via Fourier Adversarial Attacking
- 对抗样本
- 黑盒
19.Self-Driving Vehicles(自动驾驶)
- End-to-End Urban Driving by Imitating a Reinforcement Learning Coach<br>:star:code
- MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving<br>:star:code
- NEAT: Neural Attention Fields for End-to-End Autonomous Driving<br>:star:code
- Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving<br>:star:code
- Social-NCE: Contrastive Learning of Socially-aware Motion Representations<br>:star:code:tv:video
- Learning To Drive From a World on Rails<br>:open_mouth:oral:star:code:house:project
- DRIVE: Deep Reinforced Accident Anticipation With Visual Explanation<br>:star:code:house:project:tv:video
- LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving
- Prediction by Anticipation: An Action-Conditional Prediction Method Based on Interaction Learning<br>:star:code:tv:video
- TMCOSS: Thresholded Multi-Criteria Online Subset Selection for Data-Efficient Autonomous Driving
- FIERY: Future Instance Prediction in Bird's-Eye View From Surround Monocular Cameras<br>:star:code
- On Exposing the Challenging Long Tail in Future Prediction of Traffic Actors<br>:star:code
- MGNet: Monocular Geometric Scene Understanding for Autonomous Driving<br>:star:code:tv:video
- Human trajectory prediction(人体轨迹预测)
- 轨迹预测
- Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction
- LOKI: Long Term and Key Intentions for Trajectory Prediction
- MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction<br>:star:code
- DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets
- Where Are You Heading? Dynamic Trajectory Prediction With Expert Goal Examples<br>:star:code
- Three Steps to Multimodal Trajectory Prediction: Modality Clustering, Classification and Synthesis
- Spatial-Temporal Consistency Network for Low-Latency Trajectory Forecasting
- Likelihood-Based Diverse Sampling for Trajectory Forecasting<br>:star:code
- 运动预测
- 自动导航
- 交通场景理解
- 车辆车牌识别
- 自主赛车
- 预测司机的视觉注意力
- 姿势预测
- 车辆跟踪
- 对任意相机视角的车辆进行检测分析
- 车道线检测
- 车速估计
18.Transformers
- Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers<br>:open_mouth:oral:star:code<br>:newspaper:解读:ICCV2021 Oral-TAU&Facebook提出了通用的Attention模型可解释性
- Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction<br>:star:code
- PlaneTR: Structure-Guided Transformers for 3D Plane Recovery<br>:star:code
- Rethinking and Improving Relative Position Encoding for Vision Transformer<br>:star:code
- Vision Transformer with Progressive Sampling<br>:star:code
- Paint Transformer: Feed Forward Neural Painting with Stroke Prediction<br>:open_mouth:oral:star:code
- Rethinking Spatial Dimensions of Vision Transformers<br>:star:code<br>:newspaper:解读:ICCV2021-PiT-池化操作不是CNN的专属,ViT说:“我也可以”;南大提出池化视觉Transformer(PiT)
- PnP-DETR: Towards Efficient Visual Analysis with Transformers<br>:star:code
- Describing and Localizing Multiple Changes With Transformers<br>:star:code:house:project
- LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference<br>:star:code
- VidTr: Video Transformer Without Convolutions
- Visformer: The Vision-Friendly Transformer<br>:star:code
- Going Deeper With Image Transformers<br>:star:code
- Multiscale Vision Transformers<br>:star:code
- Learning Multi-Scene Absolute Pose Regression With Transformers<br>:star:code
- Visual Saliency Transformer<br>:star:code
- Event-Based Video Reconstruction Using Transformer<br>:star:code
- Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows<br>:star:code
- An Empirical Study of Training Self-Supervised Vision Transformers<br>:open_mouth:oral:star:code
- Tokens-to-Token ViT: Training Vision Transformers From Scratch on ImageNet<br>:star:code
- CvT: Introducing Convolutions to Vision Transformers<br>:star:code
- COTR: Correspondence Transformer for Matching Across Images
- ViViT: A Video Vision Transformer<br>:star:code
- AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting<br>:star:code:house:project
- Incorporating Convolution Designs into Visual Transformers<br>:star:code
- LayoutTransformer: Layout Generation and Completion with Self-attention<br>:star:code:house:project
- AutoFormer: Searching Transformers for Visual Recognition<br>:star:code
- Scalable Vision Transformers With Hierarchical Pooling<br>:star:code
- Visual Transformers: Where Do Transformers Really Belong in Vision Models?
- Anticipative Video Transformer<br>:star:code:house:project
- 密集预测
- Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions<br>:open_mouth:oral:star:code<br>:newspaper:解读:大白话Pyramid Vision Transformer
- Vision Transformers for Dense Prediction<br>:star:code
- 3D人体纹理估计
- 3D Human Texture Estimation from a Single Image with Transformers<br>:open_mouth:oral:house:project
- 图像编辑
- OCR
- 根据音乐生成舞蹈
17.3D(三维视觉)
- Discovering 3D Parts from Image Collections<br>:open_mouth:oral:star:code:house:project:tv:video
- PixelSynth: Generating a 3D-Consistent Experience from a Single Image<br>:star:code:house:project
- Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision<br>:star:code:house:project
- Pixel-Perfect Structure-from-Motion with Featuremetric Refinement<br>:star:code
- Learning Anchored Unsigned Distance Functions with Gradient Direction Alignment for Single-view Garment Reconstruction<br>:open_mouth:oral:star:code
- LSD-StructureNet: Modeling Levels of Structural Detail in 3D Part Hierarchies
- Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching<br>:star:code
- Where2Act: From Pixels to Actions for Articulated 3D Objects<br>:tv:video
- BuildingNet: Learning to Label 3D Buildings<br>:open_mouth:oral:star:code:house:project
- SurfGen: Adversarial 3D Shape Synthesis With Explicit Surface Discriminators
- Deep Virtual Markers for Articulated 3D Shapes<br>:star:code:tv:video
- Learning Efficient Photometric Feature Transform for Multi-view Stereo<br>:house:project
- Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing
- Just a Few Points Are All You Need for Multi-View Stereo: A Novel Semi-Supervised Learning Method for Multi-View Stereo
- 3D-FRONT: 3D Furnished Rooms With layOuts and semaNTics
- Learning Generative Models of Textured 3D Meshes from Real-World Images<br>:star:code
- Self-Supervised Pretraining of 3D Features on any Point-Cloud
- High Quality Disparity Remapping with Two-Stage Warping
- Structure-From-Sherds: Incremental 3D Reassembly of Axially Symmetric Pots From Unordered and Mixed Fragment Collections<br>:star:code
- Interpolation-Aware Padding for 3D Sparse Convolutional Neural Networks
- 深度估计
- StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation<br>:star:code
- Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision<br>:star:code:house:project
- Augmenting Depth Estimation with Geospatial Context
- Can Scale-Consistent Monocular Depth Be Learned in a Self-Supervised Scale-Invariant Manner?
- Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective With Transformers<br>:open_mouth:oral:star:code
- Adaptive Surface Normal Constraint for Depth Estimation<br>:star:code
- Event-Intensity Stereo: Estimating Depth by the Best of Both Worlds
- DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes
- DepthInSpace: Exploitation and Fusion of Multiple Video Frames for Structured-Light Depth Estimation<br>:house:project
- Boosting Monocular Depth Estimation With Lightweight 3D Point Fusion
- Monocular Depth Estimation(单目深度估计)
- Revealing the Reciprocal Relations Between Self-Supervised Stereo and Monocular Depth Estimation
- MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
- Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark<br>:star:code
- Towards Interpretable Deep Networks for Monocular Depth Estimation<br>:star:code:tv:video
- Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation<br>:star:code
- Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation<br>:open_mouth:oral:star:code
- Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation<br>:star:code
- R-MSFM: Recurrent Multi-Scale Feature Modulation for Monocular Depth Estimating<br>:star:code
- Adaptive Confidence Thresholding for Monocular Depth Estimation(https://github.com/megvii-research/OMNet)
- SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing
- 深度补全
- Omnidirectional Localization
- 三维重建
- Learning Signed Distance Field for Multi-view Surface Reconstruction<br>:open_mouth:oral
- 3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations<br>:open_mouth:oral:house:project
- DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension
- In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces<br>:open_mouth:oral:star:code:tv:video
- Gaussian Fusion: Accurate 3D Reconstruction via Geometry-Guided Displacement Interpolation
- RetrievalFuse: Neural 3D Scene Reconstruction With a Database<br>:star:code:house:project:tv:video
- Multi-View 3D Reconstruction With Transformers
- Polarimetric Helmholtz Stereopsis
- MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis<br>:star:code:tv:video
- Toward Realistic Single-View 3D Object Reconstruction With Unsupervised Learning From Multiple Images<br>:star:code
- CryoDRGN2: Ab initio neural reconstruction of 3D protein structures from real cryo-EM images
- 三维场景重建
- 三维形状重建
- 3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces<br>:house:project
- Multiresolution Deep Implicit Functions for 3D Shape Representation
- MVTN: Multi-View Transformation Network for 3D Shape Recognition<br>:star:code:tv:video
- Sketch2Mesh: Reconstructing and Editing 3D Shapes from Sketches
- 三维网格重建
- 三维场景
- 相机校准
- 表面重建
- 3D场景合成
- 3D形状识别
- 图像重建
- Multi-view Stereo(MVS)
- Digging into Uncertainty in Self-supervised Multi-view Stereo<br>:star:code
- PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility<br>:star:code
- A Confidence-Based Iterative Solver of Depths and Surface Normals for Deep Multi-View Stereo<br>:star:code
- EPP-MVSNet: Epipolar-Assembling Based Depth Prediction for Multi-View Stereo
16.Re-Identification(重识别)
Object Re-Identification目标(物体)重识别
Person Re-Identification(人员重识别)
- Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
- Learning Instance-level Spatial-Temporal Patterns for Person Re-identification<br>:star:code
- Towards Discriminative Representation Learning for Unsupervised Person Re-identification
- Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences<br>:star:code:house:project
- Video-based Person Re-identification with Spatial and Temporal Memory Networks<br>:house:project
- Multi-Expert Adversarial Attack Detection in Person Re-identification Using Context Inconsistency
- Clothing Status Awareness for Long-Term Person Re-Identification
- Dense Interaction Learning for Video-Based Person Re-Identification<br>:open_mouth:oral
- Explainable Person Re-Identification With Attribute-Guided Metric Distillation<br>:house:project
- Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-Identification
- Pyramid Spatial-Temporal Aggregation for Video-Based Person Re-Identification<br>:star:code
- ICE: Inter-Instance Contrastive Encoding for Unsupervised Person Re-Identification<br>:star:code:tv:video
- Learning To Know Where To See: A Visibility-Aware Approach for Occluded Person Re-Identification
- Attack-Guided Perceptual Data Generation for Real-world Re-Identification
- BV-Person: A Large-Scale Dataset for Bird-View Person Re-Identification<br>:sunflower:dataset
- CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification<br>:star:code
- Meta Pairwise Relationship Distillation for Unsupervised Person Re-Identification
- Syncretic Modality Collaborative Learning for Visible Infrared Person Re-Identification
- Weakly Supervised Text-Based Person Re-Identification<br>:star:code
- Occlude Them All: Occlusion-Aware Attention Network for Occluded Person Re-ID
- Occluded Person Re-Identification with Single-scale Global Representations<br>:star:code
- 域适应人员重识别
- IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID<br>:open_mouth:oral:star:code
- Crowd Counting(拥挤人群计数)
- Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework<br>:open_mouth:oral:star:code
- Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting<br>:star:code
- Spatial Uncertainty-Aware Semi-Supervised Crowd Counting<br>:star:code
- Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting<br>:star:code
- Exploiting Sample Correlation for Crowd Counting With Multi-Expert Network<br>:star:code
- Crowd Counting With Partial Annotations in an Image<br>:star:code
- Towards A Universal Model for Cross-Dataset Crowd Counting
- Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework<br>:open_mouth:oral:star:code
- Uniformity in Heterogeneity: Diving Deep Into Count Interval Partition for Crowd Counting<br>:star:code
- 跨模态人员重识别
- 行人检测
- 行人属性识别
- Person Search(行人搜索)
- Weakly Supervised Person Search with Region Siamese Networks
- End-to-End Trainable Trident Person Search Network Using Adaptive Gradient Propagation
- ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer<br>:house:project
- Weakly Supervised Person Search with Region Siamese Networks
- 行人行为预测
- 步态识别
- Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation
- Context-Sensitive Temporal Feature Learning for Gait Recognition<br>:star:code
- 3D Local Convolutional Neural Networks for Gait Recognition<br>:star:code
- Gait Recognition in the Wild: A Benchmark<br>:star:code:house:project
15.Object Tracking(目标跟踪)
- Saliency-Associated Object Tracking
- Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds<br>:star:code
- Learning to Track Objects from Unlabeled Videos<br>:star:code
- DepthTrack : Unveiling the Power of RGBD Tracking<br>:star:code
- Learning Target Candidate Association To Keep Track of What Not To Track<br>:star:code
- Transparent Object Tracking Benchmark<br>:house:project
- DepthTrack: Unveiling the Power of RGBD Tracking<br>:star:code
- Object Tracking by Jointly Exploiting Frame and Event Domain
- High-Performance Discriminative Tracking with Transformers
- Visio-Temporal Attention for Multi-Camera Multi-Target Association
- 视觉目标跟踪
- Learning to Adversarially Blur Visual Object Tracking<br>:star:code
- Learn to Match: Automatic Matching Network Design for Visual Tracking<br>:star:code
- Video Annotation for Visual Tracking via Selection and Refinement<br>:star:code
- Learning Spatio-Temporal Transformer for Visual Tracking<br>:star:code
- 3D视觉跟踪
- 卫星图像跟踪
- 3D多目标跟踪
- 多目标跟踪与分割
- 多目标跟踪
- 视频目标跟踪
14.Object Detection(目标检测)
- Rank & Sort Loss for Object Detection and Instance Segmentation<br>:open_mouth:oral:star:code
- MDETR : Modulated Detection for End-to-End Multi-Modal Understanding<br>:open_mouth:oral:star:code
- SimROD: A Simple Adaptation Method for Robust Object Detection<br>:open_mouth:oral:house:project<br>:newspaper:解读:ICCV2021 Oral SimROD:简单高效的数据增强!华为提出了一种简单的鲁棒目标检测自适应方法
- GraphFPN: Graph Feature Pyramid Network for Object Detection
- Fast Convergence of DETR with Spatially Modulated Co-Attention<br>:star:code
- Oriented R-CNN for Object Detection<br>:star:code
- Conditional DETR for Fast Training Convergence<br>:newspaper:解读:通过显式寻找物体的 extremity 区域加快 DETR 的收敛:Conditional DETR
- Vector-Decomposed Disentanglement for Domain-Invariant Object Detection<br>:star:code
- G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
- ODAM: Object Detection, Association, and Mapping using Posed RGB Video<br>:open_mouth:oral
- Reconcile Prediction Consistency for Balanced Object Detection
- Deep Structured Instance Graph for Distilling Object Detectors<br>:star:code
- Towards Rotation Invariance in Object Detection<br>:star:code
- Morphable Detector for Object Detection on Demand<br>:star:code
- DetCo: Unsupervised Contrastive Learning for Object Detection<br>:star:code
- Domain-Invariant Disentangled Network for Generalizable Object Detection
- MDETR - Modulated Detection for End-to-End Multi-Modal Understanding<br>:star:code
- Detecting Persuasive Atypicality by Modeling Contextual Compatibility<br>:star:code
- Wanderlust: Online Continual Object Detection in the Real World<br>:house:project
- PreDet: Large-Scale Weakly Supervised Pre-Training for Detection
- FMODetect: Robust Detection of Fast Moving Objects
- Multi-Source Domain Adaptation for Object Detection
- Self-Supervised Object Detection via Generative Image Synthesis<br>:star:code
- Naturalistic Physical Adversarial Patch for Object Detectors<br>:star:code
- Rethinking Transformer-Based Set Prediction for Object Detection<br>:star:code
- Detecting Invisible People<br>:house:project:tv:video
- Dynamic DETR: End-to-End Object Detection With Dynamic Attention
- CrossDet: Crossline Representation for Object Detection<br>:star:code
- Robust Object Detection via Instance-Level Temporal Cycle Confusion<br>:star:code
- End-to-End Semi-Supervised Object Detection With Soft Teacher<br>:star:code
- Parallel Rectangle Flip Attack: A Query-Based Black-Box Attack Against Object Detection
- Fooling LiDAR Perception via Adversarial Trajectory Perturbation<br>:star:code:house:project
- TOOD: Task-Aligned One-Stage Object Detection<br>:open_mouth:oral:star:code
- Active Learning for Deep Object Detection via Probabilistic Modeling<br>:star:code<br>:newspaper:解读:ICCV2021 还在用大量数据暴力train模型?主动学习,教你选出数据集中最有价值的样本
- Dual Bipartite Graph Learning: A General Approach for Domain Adaptive Object Detection
- WB-DETR: Transformer-Based Detector without Backbone
- 3D目标检测
- Geometry Uncertainty Projection Network for Monocular 3D Object Detection<br>:star:code
- Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather<br>:star:code:house:project
- Is Pseudo-Lidar needed for Monocular 3D Object detection?<br>:star:code
- RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection
- LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector<br>:star:code:house:project
- Improving 3D Object Detection with Channel-wise Transformer
- 4D-Net for Learned Multi-Modal Alignment
- Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
- Voxel Transformer for 3D Object Detection
- An End-to-End Transformer Model for 3D Object Detection<br>:open_mouth:oral:star:code:house:project
- Unsupervised Domain Adaptive 3D Detection With Multi-Level Consistency
- Group-Free 3D Object Detection via Transformers<br>:star:code
- VENet: Voting Enhancement Network for 3D Object Detection
- Multi-Echo LiDAR for 3D Object Detection
- Voxel Transformer for 3D Object Detection
- RangeDet: In Defense of Range View for LiDAR-Based 3D Object Detection<br>:star:code
- The Devil Is in the Task: Exploiting Reciprocal Appearance-Localization Features for Monocular 3D Object Detection
- Gated3D: Monocular 3D Object Detection From Temporal Illumination Cues<br>:house:project
- Are We Missing Confidence in Pseudo-LiDAR Methods for Monocular 3D Object Detection?
- SPG: Unsupervised Domain Adaptation for 3D Object Detection via Semantic Point Generation
- You Don't Only Look Once: Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking<br>:star:code:house:project:tv:video
- Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection
- AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection<br>:star:code
- Geometry-Based Distance Decomposition for Monocular 3D Object Detection<br>:star:code
- 目标定位
- Anomaly Detection(图像异常检测)
- 弱监督目标检测
- OOD 检测
- 显著目标检测
- Disentangled High Quality Salient Object Detection
- Specificity-preserving RGB-D Saliency Detection<br>:star:code
- Light Field Saliency Detection with Dual Local Graph Learning and Reciprocative Guidance
- MFNet: Multi-Filter Directive Network for Weakly Supervised Salient Object Detection<br>:star:code
- Scene Context-Aware Salient Object Detection<br>:star:code
- Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection<br>:star:code
- iNAS: Integral NAS for Device-Aware Salient Object Detection<br>:house:project
- RGB-D显著目标检测
- co-saliency detection
- 违禁物品检测
- 小样本目标检测
- 视觉关系协同定位
- Few-shot Visual Relationship Co-localization<br>:star:code:house:project
- 密集目标检测
- 域适应目标检测
- 图像篡改检测
- Visual Relationship Detection(VRD视觉关系检测)
- 长尾目标检测
- Salient Object Ranking
- 小目标检测
- 黑暗中目标检测
- 3D object prediction
- 多目标检测
- 3D object grounding
- 细粒度裂纹检测
- 线段检测
- 细胞检测与分类
- 阴影检测
- 社交距离检测
- 伪装目标检测
13.Image Segmentation(图像分割)
- Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation<br>:open_mouth:oral:star:code:tv:video
- TransForensics: Image Forgery Localization with Dense Self-Attention
- From Contexts to Locality: Ultra-high Resolution Image Segmentation via Locality-aware Contextual Correlation<br>:star:code
- Labels4Free: Unsupervised Segmentation using StyleGAN<br>:house:project:tv:video
- Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency
- Scaling up instance annotation via label propagation<br>:star:code:house:project
- Robust Trust Region for Weakly Supervised Segmentation<br>:star:code:tv:video
- HPNet: Deep Primitive Segmentation Using Hybrid Representations<br>:star:code
- Weakly Supervised Segmentation of Small Buildings With Point Labels
- BAPA-Net: Boundary Adaptation and Prototype Alignment for Cross-Domain Semantic Segmentation<br>:star:code
- Conditional Diffusion for Interactive Segmentation
- Human Detection and Segmentation via Multi-view Consensus<br>:star:code
- Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
- Enhanced Boundary Learning for Glass-Like Object Segmentation<br>:star:code
- PARTS: Unsupervised segmentation with slots, attention and independence maximization
- Predictive Feature Learning for Future Segmentation Prediction
- Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation<br>:star:code
- Segmenter: Transformer for Semantic Segmentation<br>:star:code
- C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing
- 全景分割
- 语义分割
- Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation<br>:open_mouth:oral:star:code
- Personalized Image Semantic Segmentation<br>:star:code
- RECALL: Replay-based Continual Learning in Semantic Segmentation<br>:star:code
- Deep Metric Learning for Open World Semantic Segmentation
- LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation<br>:open_mouth:oral
- Dual Path Learning for Domain Adaptation of Semantic Segmentation<br>:star:code
- Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation
- Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation<br>:star:code:house:project
- Multi-Anchor Active Domain Adaptation for Semantic Segmentation<br>:open_mouth:oral:star:code
- Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation
- Self-Regulation for Semantic Segmentation<br>:star:code
- ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation<br>:star:code
- Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation<br>:house:project
- Mining Contextual Information Beyond Image for Semantic Segmentation<br>:star:code
- ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation<br>:star:code
- Pseudo-mask Matters in Weakly-supervised Semantic Segmentation<br>:star:code
- SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation
- Region-Aware Contrastive Learning for Semantic Segmentation
- GP-S3Net: Graph-Based Panoptic Sparse Semantic Segmentation Network
- Domain Adaptive Semantic Segmentation With Self-Supervised Depth Estimation<br>:star:code
- Scribble-Supervised Semantic Segmentation by Uncertainty Reduction on Neural Representation and Self-Supervision on Neural Eigenspace
- Exploring Cross-Image Pixel Contrast for Semantic Segmentation<br>:open_mouth:oral:star:code
- Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation<br>:star:code
- Uncertainty-Aware Pseudo Label Refinery for Domain Adaptive Semantic Segmentation
- Contrastive Learning for Label Efficient Semantic Segmentation
- Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU<br>:star:code
- Prototypical Matching and Open Set Rejection for Zero-Shot Semantic Segmentation
- Geometric Unsupervised Domain Adaptation for Semantic Segmentation
- Calibrated Adversarial Refinement for Stochastic Semantic Segmentation<br>:star:code
- Multi-View Radar Semantic Segmentation<br>:star:code
- Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation<br>:open_mouth:oral:star:code
- Specialize and Fuse: Pyramidal Output Representation for Semantic Segmentation
- Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals<br>:star:code
- Scribble-Supervised Semantic Segmentation Inference
- Semi-Supervised Semantic Segmentation With Pixel-Level Contrastive Learning From a Class-Wise Memory Bank<br>:star:code
- 小样本语义分割
- 3D语义分割
- VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation<br>:open_mouth:oral:star:code
- Sparse-to-Dense Feature Matching: Intra and Inter Domain Cross-Modal Learning in Domain Adaptation for 3D Semantic Segmentation<br>:star:code
- Weakly Supervised 3D Semantic Segmentation Using Cross-Image Consensus and Inter-Voxel Affinity Relations
- 视频语义分割
- 弱监督语义分割
- Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation<br>:star:code
- Complementary Patch for Weakly Supervised Semantic Segmentation
- ECS-Net: Improving Weakly Supervised Semantic Segmentation by Using Connections Between Class Activation Maps
- Unlocking the Potential of Ordinary Classifier: Class-Specific Adversarial Erasing Framework for Weakly Supervised Semantic Segmentation<br>:star:code
- Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation<br>:star:code
- Seminar Learning for Click-Level Weakly Supervised Semantic Segmentation
- 点云语义分割
- ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation<br>:tv:video
- Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation
- TempNet: Online Semantic Segmentation on Large-Scale Point Cloud Series
- Guided Point Contrastive Learning for Semi-Supervised Point Cloud Semantic Segmentation
- Learning With Noisy Labels for Robust Point Cloud Segmentation<br>:star:code:house:project
- OOD
- 实例分割
- Rank & Sort Loss for Object Detection and Instance Segmentation<br>:open_mouth:oral:star:code
- SOTR: Segmenting Objects with Transformers<br>:star:code
- A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation
- Instances as Queries<br>:star:code:tv:video
- CrossVIS: Crossover Learning for Fast Online Video Instance Segmentation<br>:star:code:tv:video
- CDNet: Centripetal Direction Network for Nuclear Instance Segmentation<br>:star:code
- PrimitiveNet: Primitive Instance Segmentation With Local Primitive Embedding Under Adversarial Metric<br>:star:code
- FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation<br>:star:code:house:project
- Prior to Segment: Foreground Cues for Weakly Annotated Classes in Partially Supervised Instance Segmentation<br>:star:code
- DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence From Box Supervision
- End-to-End Video Instance Segmentation via Spatial-Temporal Graph Neural Networks
- The Surprising Impact of Mask-Head Architecture on Novel Class Segmentation<br>:house:project
- How Shift Equivariance Impacts Metric Learning for Instance Segmentation<br>:star:code
- Parallel Detection-and-Segmentation Learning for Weakly Supervised Instance Segmentation
- Real-Time Instance Segmentation With Discriminative Orientation Maps<br>:star:code
- 视频实例分割
- 3D实例分割
- 小样本分割
- Mining Latent Classes for Few-shot Segmentation<br>:open_mouth:oral:star:code
- Human Motion Segmentation(人体运动分割)
- 点云分割
- 视频目标分割(VOS)
- Full-Duplex Strategy for Video Object Segmentation<br>:house:project
- Joint Inductive and Transductive Learning for Video Object Segmentation<br>:star:code
- Hierarchical Memory Matching Network for Video Object Segmentation<br>:star:code
- Self-supervised Video Object Segmentation by Motion Grouping<br>:star:code:house:project:tv:video
- Deep Transport Network for Unsupervised Video Object Segmentation
- Generating Masks From Boxes by Mining Spatio-Temporal Consistencies in Videos<br>:star:code
- Learning Motion-Appearance Co-Attention for Zero-Shot Video Object Segmentation<br>:star:code
- Video Object Segmentation With Dynamic Memory Networks and Adaptive Object Alignment<br>:star:code
- 语义场景分割
- Referring Segmentation(基于文本的分割)
- 场景理解
- DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization<br>:open_mouth:oral:star:code:house:project:tv:video
- ACDC: The Adverse Conditions Dataset With Correspondences for Semantic Driving Scene Understanding<br>:house:project
- Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding<br>:star:code
- CMA
- 多目标分割
- 动作分割
- 场景解析
- 抠图
- 运动分割
12.Image/Fine-Grained Classification(图像/细粒度分类)
- DiagViB-6: A Diagnostic Benchmark Suite for Vision Models in the Presence of Shortcut and Generalization Opportunities
- Online Continual Learning For Visual Food Classification
- A Unified Objective for Novel Class Discovery<br>:open_mouth:oral:star:code:house:project<br>:newspaper:解读:ICCV2021 Oral | UNO:用于“新类发现”的统一目标函数,简化训练流程!已开源!
- Improving Generalization of Batch Whitening by Convolutional Unit Optimization<br>:star:code
- Towards Learning Spatially Discriminative Feature Representations
- CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification<br>:star:code<br>:newspaper:解读:ICCV2021 MIT-IBM沃森开源CrossViT:Transformer走向多分支、多尺度
- SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition<br>:star:code
- Influence-Balanced Loss for Imbalanced Visual Classification<br>:star:code
- Explanations for Occluded Images<br>:star:code:house:project:tv:video
- Understanding Robustness of Transformers for Image Classification
- Learning Rare Category Classifiers on a Tight Labeling Budget
- Discover the Unknown Biased Attribute of an Image Classifier<br>:star:code
- Co-Scale Conv-Attentional Image Transformers<br>:open_mouth:oral:star:code
- Benchmark Platform for Ultra-Fine-Grained Visual Categorization Beyond Human Performance<br>:star:code
- Do Image Classifiers Generalize Across Time?<br>:house:project
- Interpretable Image Recognition by Constructing Transparent Embedding Space<br>:star:code
- The Pursuit of Knowledge: Discovering and Localizing Novel Categories using Dual Memory
- 长尾识别
- Parametric Contrastive Learning<br>:star:code
- ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot<br>:open_mouth:oral:star:code
- Self Supervision to Distillation for Long-Tailed Visual Recognition<br>:star:code
- Distilling Virtual Examples for Long-Tailed Recognition
- Distributional Robustness Loss for Long-Tail Learning
- GistNet: A Geometric Structure Transfer Network for Long-Tailed Recognition
- 长尾视觉关系识别
- 细粒度
- Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach<br>:star:code
- Learning Canonical 3D Object Representation for Fine-Grained Recognition
- Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification<br>:star:code
- N-ImageNet: Towards Robust, Fine-Grained Object Recognition With Event Cameras
- Grafit: Learning fine-grained image representations with coarse labels
- Stochastic Partial Swap: Enhanced Model Generalization and Interpretability for Fine-Grained Recognition<br>:star:code
- 小样本分类
- Transductive Few-Shot Classification on the Oblique Manifold
- Relational Embedding for Few-Shot Classification<br>:star:code:house:project
- Binocular Mutual Learning for Improving Few-shot Classification<br>:star:code
- Partner-Assisted Learning for Few-Shot Image Classification
- On the Importance of Distractors for Few-Shot Classification<br>:star:code
- Few-Shot Image Classification: Just Use a Library of Pre-Trained Feature Extractors and a Simple Classifier
- Universal Representation Learning From Multiple Domains for Few-Shot Classification<br>:star:code
- A Multi-Mode Modulator for Multi-Domain Few-Shot Classification
- Variational Feature Disentangling for Fine-Grained Few-Shot Classification<br>:star:code
- Mixture-Based Feature Space Learning for Few-Shot Image Classification<br>:star:code:house:project:tv:video
- 多标签分类
11.Visual Question Answering(视觉问答)
- Greedy Gradient Ensemble for Robust Visual Question Answering<br>:star:code
- Weakly Supervised Relative Spatial Reasoning for Visual Question Answering<br>:star:code
- Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images<br>:star:code
- Unshuffling Data for Improved Generalization in Visual Question Answering
- TRAR: Routing the Attention Spans in Transformer for Visual Question Answering(https://github.com/rentainhe/TRAR-VQA/)
- Contrast and Classify: Training Robust VQA Models
- Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering
- Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering<br>:star:code
- Auto-Parsing Network for Image Captioning and Visual Question Answering
- video question answering
- Just Ask: Learning to Answer Questions from Millions of Narrated Videos<br>:open_mouth:oral:star:code:house:project
- Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models<br>:house:project
- Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments<br>:sunflower:dataset
- On the Hidden Treasure of Dialog in Video Question Answering<br>:star:code:house:project
- HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering
- Video Question Answering Using Language-Guided Deep Compressed-Domain Video Feature<br>:star:code
- A-VQA
10.OCR
- Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition<br>:tv:video
- Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation<br>:tv:video
- Towards the Unseen: Iterative Text Recognition by Distilling from Errors<br>:tv:video
- 任意形状文本检测
- 场景文本识别
- 场景文本替换
- 提取文档图像
- 手写文本生成
- Handwriting Transformers<br>:star:code
- Table Structure Recognition(表格结构识别)
9.Video
- Action Detection and Recognition(人体动作检测与识别)
- Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition<br>:star:code
- MGSampler: An Explainable Sampling Strategy for Video Action Recognition<br>:star:code
- Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning
- Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition
- Class Semantics-based Attention for Action Detection
- MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions<br>:star:code
- AdaSGN: Adapting Joint Number and Model Size for Efficient Skeleton-Based Action Recognition<br>:star:code
- OadTR: Online Action Detection With Transformers<br>:star:code
- Self-Supervised 3D Skeleton Action Representation Learning With Motion Consistency and Continuity
- Interactive Prototype Learning for Egocentric Action Recognition
- Efficient Action Recognition via Dynamic Knowledge Propagation
- Else-Net: Elastic Semantic Network for Continual Action Recognition From Skeleton Data
- Learning Self-Similarity in Space and Time As Generalized Motion for Video Action Recognition<br>:star:code:house:project
- Temporal Action Detection With Multi-Level Supervision<br>:star:code
- Watch Only Once: An End-to-End Video Action Detection Framework<br>:star:code
- Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation<br>:open_mouth:oral
- Geometric Deep Neural Network Using Rigid and Non-Rigid Transformations for Human Action Recognition
- Just One Moment: Structural Vulnerability of Deep Action Recognition Against One Frame Attack
- Evidential Deep Learning for Open Set Action Recognition<br>:star:code:house:project:tv:video
- Learning an Augmented RGB Representation With Cross-Modal Knowledge Distillation for Action Detection
- Class-Incremental Learning for Action Recognition in Videos
- D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations<br>:star:code
- 零样本动作识别
- Temporal Action Localization(时序动作定位)
- Enriching Local and Global Contexts for Temporal Action Localization<br>:star:code
- Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization<br>:open_mouth:oral:star:code
- Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization<br>:star:code
- Video Self-Stitching Graph Network for Temporal Action Localization
- Divide and Conquer for Single-Frame Temporal Action Localization
- CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization
- Temporal Action Proposal Generation(时序动作提案生成)
- Action Quality Assessment(行动质量评估)
- Video Rescaling
- Video activity localisation
- 视频修复
- Internal Video Inpainting by Implicit Long-range Propagation<br>:star:code:house:project
- Occlusion-Aware Video Object Inpainting<br>:house:project
- FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting<br>:star:code
- Flow-Guided Video Inpainting with Scene Templates<br>:star:code
- Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection
- 视频分析
- 视频剪辑
- Learning to Cut by Watching Movies<br>:star:code:house:project
- 视频字幕
- Motion Guided Region Message Passing for Video Captioning
- Aligning Subtitles in Sign Language Videos<br>:house:project:tv:video
- Dense Video Captioning
- 视频编码
- 视频生成
- Video Relation Detection(视频关系检测)
- Video Grounding
- 视频精彩片段检测
- Cross-category Video Highlight Detection via Set-based Learning<br>:star:code
- PR-Net: Preference Reasoning for Personalized Video Highlight Detection
- HighlightMe: Detecting Highlights from Human-Centric Videos
- Temporal Cue Guided Video Highlight Detection With Low-Rank Audio-Visual Fusion
- Joint Visual and Audio Learning for Video Highlight Detection
- 视频识别
- Searching for Two-Stream Models in Multivariate Space for Video Recognition
- Adaptive Focus for Efficient Video Recognition<br>:open_mouth:oral:star:code
- AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition<br>:star:code:house:project
- TAM: Temporal Adaptive Module for Video Recognition<br>:star:code
- Condensing a Sequence to One Informative Frame for Video Recognition
- VideoLT: Large-Scale Long-Tailed Video Recognition<br>:star:code
- Motion-Augmented Self-Training for Video Recognition at Smaller Scale
- Multi-Modal Multi-Action Video Recognition<br>:star:code
- Motion Retargeting(运动重定位)
- 视频预测
- 视频合成
- 视频帧插值
- Training Weakly Supervised Video Frame Interpolation With Events<br>:star:code
- Asymmetric Bilateral Motion Estimation for Video Frame Interpolation<br>:star:code
- XVFI: eXtreme Video Frame Interpolation<br>:open_mouth:oral:star:code:tv:video
- Deepfake 视频检测
- 视频稳定
- Hybrid Neural Fusion for Full-Frame Video Stabilization<br>:star:code:house:project:tv:video
- Video Frame-level Similarity(视频帧级相似度学习)
- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective<br>:open_mouth:oral:star:code:house:project:tv:video
- 视频压缩
- 视频时刻检索
- 视频摘要
- 视频质量评估
- Video Grounding
- 视频定位
- Zero-Shot Natural Language Video Localization<br>:open_mouth:oral:star:code
- 视频推理
- 视频相关
- 视频异常检测
- Dance With Self-Attention: A New Look of Conditional Random Fields on Anomaly Detection in Videos
- A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction<br>:star:code<br>:newspaper:解读:ICCV 2021 oral 重构+预测,双管齐下提升视频异常检测性能
- Weakly-Supervised Video Anomaly Detection With Robust Temporal Feature Magnitude Learning<br>:star:code
- 视频去噪
- Unsupervised Deep Video Denoising<br>:star:code:house:project
- Video Portrait Relighting(人像视频重照明)
- 视频时序定位
- 视频关联性
- 视频抠图
- 视频编码
- 识别视频中互动关系
- 视频去模糊
- 视频理解
- 视频重建
- HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset<br>:star:code:house:project:tv:video:sunflower:dataset
8.Human Pose Estimation(人体姿态估计)
- Human Pose Regression with Residual Log-likelihood Estimation<br>:open_mouth:oral:star:code
- Online Knowledge Distillation for Efficient Pose Estimation
- DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders<br>:open_mouth:oral:star:code
- Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation<br>:open_mouth:oral:star:code
- Dynamical Pose Estimation<br>:star:code:tv:video
- Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation<br>:star:code:house:project
- Egocentric Pose Estimation From Human Vision Span
- Learning Privacy-Preserving Optics for Human Pose Estimation<br>:open_mouth:oral:star:code:house:project:tv:video
- TokenPose: Learning Keypoint Tokens for Human Pose Estimation<br>:star:code
- Motion Adaptive Pose Estimation from Compressed Videos
- 3D 人体姿态估计
- PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop<br>:open_mouth:oral:star:code:house:project
- HuMoR: 3D Human Motion Model for Robust Pose Estimation<br>:open_mouth:oral:house:project:tv:video
- Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows<br>:star:code:tv:video
- Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation<br>:star:code
- EventHPE: Event-based 3D Human Pose and Shape Estimation<br>:star:code
- imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose
- Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
- Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition
- Learning to Regress Bodies from Images using Differentiable Semantic Rendering<br>:house:project
- Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild<br>:star:code
- 3D Human Pose Estimation With Spatial and Temporal Transformers<br>:star:code:tv:video
- PARE: Part Attention Regressor for 3D Human Body Estimation<br>:star:code:house:project:tv:video
- Learning Causal Representation for Training Cross-Domain Pose Estimator via Generative Interventions
- UltraPose: Synthesizing Dense Pose With 1 Billion Points by Human-Body Decoupling 3D Model<br>:star:code
- Modulated Graph Convolutional Network for 3D Human Pose Estimation<br>:star:code
- Revitalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation<br>:star:code:house:project:tv:video
- Estimating Egocentric 3D Human Pose in Global Space<br>:house:project:tv:video
- Camera Distortion-Aware 3D Human Pose Estimation in Video With Optimization-Based Meta-Learning<br>:star:code
- EM-POSE: 3D Human Pose Estimation From Sparse Electromagnetic Trackers<br>:star:code:house:project:tv:video
- Towards Alleviating the Modeling Ambiguity of Unsupervised Monocular 3D Human Pose Estimation<br>:house:project
- SPEC: Seeing People in the Wild with an Estimated Camera<br>:star:code:house:project:tv:video
- Encoder-Decoder With Multi-Level Attention for 3D Human Shape and Pose Estimation<br>:star:code
- 3D姿势迁移
- 手部姿势
- 手势合成
- 手势识别
- 3D 手部姿态
- HandFoldingNet: A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton<br>:star:code
- EventHands: Real-Time Neural 3D Hand Pose Estimation From an Event Stream<br>:star:code:house:project:tv:video
- Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning<br>:star:code:house:project:tv:video
- 手部交互姿势估计
- 3D手网格建模
- Towards Accurate Alignment in Real-Time 3D Hand-Mesh Reconstruction
- 手部网格恢复
- 手势学习
- 手势重建
- 三维网格合成
- Deep Hybrid Self-Prior for Full 3D Mesh Generation<br>:house:project
- Mesh Graphormer<br>:star:code
- 人体重建
- ARCH++: Animation-Ready Clothed Human Reconstruction Revisited<br>:tv:video
- 3D 人体重建
- Probabilistic Modeling for Human Mesh Recovery<br>:star:code:house:project
- Gravity-Aware Monocular 3D Human-Object Reconstruction<br>:house:project
- THUNDR: Transformer-Based 3D Human Reconstruction With Markers
- NPMs: Neural Parametric Models for 3D Deformable Shapes<br>:star:code:house:project:tv:video
- 4D人体捕捉
- Learning Motion Priors for 4D Human Body Capture in 3D Scenes<br>:star:code:house:project:tv:video
- 人体姿态估计与合成
- 多人姿态估计
- 人/物体姿态关键点检测
- Keypoint Communities<br>:star:code
- 人体运动捕捉
- 2D人体姿势估计
- Human Action Video Alignment
- 3D姿态迁移
- 人体网格恢复
- 根据人体姿势估计距离
- 3D人体
- 运动合成
- 3D动画
- 服装类别级姿势估计
- 服装人体建模
- Point-Based Modeling of Human Clothing<br>:star:code:house:project:tv:video
- 关键点定位
7.Scene Graph Generation(场景图生成)
- Spatial-Temporal Transformer for Dynamic Scene Graph Generation<br>:star:code:tv:video
- Unconditional Scene Graph Generation<br>:house:project
- Target Adaptive Context Aggregation for Video Scene Graph Generation<br>:star:code
- Learning to Generate Scene Graph from Natural Language Supervision<br>:star:code
- Segmentation-Grounded Scene Graph Generation<br>:star:code
- Context-aware Scene Graph Generation with Seq2Seq Transformer<br>:star:code
- A Simple Baseline for Weakly-Supervised Scene Graph Generation<br>:star:code
- Generative Compositional Augmentations for Scene Graph Prediction<br>:star:code
- From General to Specific: Informative Scene Graph Generation via Balance Adjustment<br>:star:code
- 场景合成
6.Point Cloud(点云)
- AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds<br>:star:code:house:project
- Adaptive Graph Convolution for Point Cloud Analysis<br>:star:code
- Learning Inner-Group Relations on Point Clouds
- CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds<br>:star:code
- Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks<br>:star:code:tv:video
- PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds<br>:star:code
- 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds
- Differentiable Convolution Search for Point Cloud Processing
- Superpoint Network for Point Cloud Oversegmentation<br>:star:code
- PU-EVA: An Edge-Vector Based Approximation Solution for Flexible-Scale Point Cloud Upsampling
- SGMNet: Learning Rotation-Invariant Point Cloud Representations via Sorted Gram Matrix
- DWKS: A Local Descriptor of Deformations Between Meshes and Point Clouds<br>:star:code
- Robustness Certification for Point Cloud Models<br>:star:code
- Vector Neurons: A General Framework for SO(3)-Equivariant Networks<br>:star:code
- Unsupervised Point Cloud Pre-Training via Occlusion Completion<br>:star:code
- Towards Efficient Graph Convolutional Networks for Point Cloud Handling<br>:star:code
- Progressive Seed Generation Auto-Encoder for Unsupervised Point Cloud Learning
- 点云去噪
- Score-Based Point Cloud Denoising<br>:star:code
- 点云配准
- HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration<br>:star:code:house:project
- (Just) A Spoonful of Refinements Helps the Registration Error Go Down<br>:open_mouth:oral:star:code
- A Robust Loss for Point Cloud Registration
- Deep Hough Voting for Robust Global Registration
- Sampling Network Guided Cross-Entropy Method for Unsupervised Point Cloud Registration<br>:star:code
- Feature Interactive Representation for Point Cloud Registration
- LSG-CPD: Coherent Point Drift With Local Surface Geometry for Point Cloud Registration<br>:star:code:tv:video
- OMNet: Learning Overlapping Mask for Partial-to-Partial Point Cloud Registration<br>:star:code
- DeepPRO: Deep Partial Point Cloud Registration of Objects
- Provably Approximated Point Cloud Registration
- Bootstrap Your Own Correspondences点云配准
- Distinctiveness Oriented Positional Equilibrium for Point Cloud Registration
- 3D点云
- Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching<br>:star:code
- Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds<br>:star:code:house:project
- Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projections Matching<br>:star:code
- Point Transformer
- Point-Set Distances for Learning Representations of 3D Point Clouds
- PointBA: Towards Backdoor Attacks in 3D Point Cloud
- Minimal Adversarial Examples for Deep Learning on 3D Point Clouds
- 3D点云重建
- MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans<br>:open_mouth:oral:house:project:tv:video
- 点云补全
- SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer<br>:open_mouth:oral:star:code
- ME-PCN: Point Completion Conditioned on Mask Emptiness
- PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers<br>:open_mouth:oral:star:code
- Voxel-based Network for Shape Completion by Leveraging Edge Generation<br>:star:code
- RFNet: Recurrent Forward Network for Dense Point Cloud Completion
- 点云增强
- 点云形状分析
- 点云分析
- 3D点云分类
- 3D点云生成与补全
- point cloud object co-segmentation
- 点云理解
5.Few-Shot/Zero-Shot Learning;Domain Generalization/Adaptation(小/零样本学习;域适应/泛化)
- 域适应
- Transporting Causal Mechanisms for Unsupervised Domain Adaptation<br>:open_mouth:oral<br>:star:code
- Generalized Source-free Domain Adaptation<br>:star:code
- Semantic Concentration for Domain Adaptation<br>:star:code
- PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation<br>:star:code
- Learning Cross-modal Contrastive Features for Video Domain Adaptation
- Zero-Shot Day-Night Domain Adaptation With a Physics Prior<br>:open_mouth:oral:star:code
- Active Universal Domain Adaptation
- Re-energizing Domain Discriminator with Sample Relabeling for Adversarial Domain Adaptation
- OVANet: One-vs-All Network for Universal Domain Adaptation<br>:star:code
- Collaborative Optimization and Aggregation for Decentralized Domain Generalization and Adaptation
- Partial Video Domain Adaptation with Partial Adversarial Temporal Attentive Network<br>:star:code
- Information-Theoretic Regularization for Multi-Source Domain Adaptation
- Gradient Distribution Alignment Certificates Better Adversarial Domain Adaptation
- Adaptive Adversarial Network for Source-Free Domain Adaptation<br>:star:code
- T-SVDNet: Exploring High-Order Prototypical Correlations for Multi-Source Domain Adaptation<br>:star:code
- Self-Supervised Domain Adaptation for Forgery Localization of JPEG Compressed Images
- ECACL: A Holistic Framework for Semi-Supervised Domain Adaptation<br>:star:code
- STEM: An approach to Multi-source Domain Adaptation with Guarantees
- Towards Novel Target Discovery Through Open-Set Domain Adaptation<br>:star:code
- Deep Co-Training With Task Decomposition for Semi-Supervised Domain Adaptation<br>:star:code
- mDALU: Multi-Source Domain Adaptation and Label Unification with Partial Datasets
- Geometry-Aware Self-Training for Unsupervised Domain Adaptation on Object Point Clouds<br>:star:code
- 无监督域适应
- Adversarial Unsupervised Domain Adaptation with Conditional and Label Shift: Infer, Align and Iterate
- Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation<br>:open_mouth:oral
- Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density<br>:star:code
- Adversarial Robustness for Unsupervised Domain Adaptation<br>:house:project
- SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised Domain Adaptation<br>:star:code
- 零样本域适应
- 域泛化
- Domain Generalization via Gradient Surgery<br>:star:code
- Learning to Diversify for Single Domain Generalization<br>:star:code
- Shape-Biased Domain Generalization via Shock Graph Embeddings
- SelfReg: Self-Supervised Contrastive Regularization for Domain Generalization
- A Style and Semantic Memory Mechanism for Domain Generalization
- Confidence Calibration for Domain Generalization Under Covariate Shift
- A Simple Feature Augmentation for Domain Generalization
- 小样本
- Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder
- Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting<br>:star:code
- Meta Navigator: Search for a Good Adaptation Policy for Few-shot Learning
- Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning<br>:open_mouth:oral:star:code
- Z-Score Normalization, Hubness, and Few-Shot Learning
- Pseudo-Loss Confidence Metric for Semi-Supervised Few-Shot Learning
- Curvature Generation in Curved Spaces for Few-Shot Learning<br>:star:code
- Task-Aware Part Mining Network for Few-Shot Learning
- Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning<br>:star:code
- UVStyle-Net: Unsupervised Few-Shot Learning of 3D Style Similarity Measure for B-Reps<br>:star:code
- Shallow Bayesian Meta Learning for Real-World Few-Shot Recognition<br>:star:code
- Iterative Label Cleaning for Transductive and Semi-Supervised Few-Shot Learning<br>:star:code
- Coarsely-labeled Data for Better Few-shot Transfer<br>:star:code
- 小样本异常检测
- Zero-Shot Learning(零样本学习)
- Discriminative Region-based Multi-Label Zero-Shot Learning<br>:star:code
- Field-Guide-Inspired Zero-Shot Learning
- Generalized Zero-Shot Learning(广义零样本学习)
4.Neural rendering(神经渲染)
- In-Place Scene Labelling and Understanding with Implicit Scene Representation<br>:open_mouth:oral:house:project:tv:video
- Differentiable Surface Rendering via Non-Differentiable Sampling
- Self-Calibrating Neural Radiance Fields<br>:star:code
- NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo<br>:open_mouth:oral:star:code:house:project
- Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering<br>:star:code:house:project
- CodeNeRF: Disentangled Neural Radiance Fields for Object Categories<br>:star:code
- MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo<br>:star:code:house:project:tv:video
- PlenOctrees for Real-Time Rendering of Neural Radiance Fields<br>:open_mouth:oral:star:Conversion Code:star:Viewer Code:house:project:tv:video
- Neural Radiance Flow for 4D View Synthesis and Video Processing<br>:star:code:house:project
- Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies<br>:star:code:house:project:tv:video<br>:newspaper:解读:浙大三维视觉团队提出Animatable NeRF,从RGB视频中重建可驱动人体模型 (ICCV'21)
- GNeRF: GAN-Based Neural Radiance Field Without Posed Camera<br>:open_mouth:oral
- BARF: Bundle-Adjusting Neural Radiance Fields<br>:open_mouth:oral:star:code:house:project
- FastNeRF: High-Fidelity Neural Rendering at 200FPS<br>:house:project:tv:video
- PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering<br>:star:code:tv:video
- NeRD: Neural Reflectance Decomposition from Image Collections<br>:star:code:house:project:tv:video
- Editing Conditional Radiance Fields<br>:star:code:house:project:tv:video
- GRF: Learning a General Radiance Field for 3D Representation and Rendering<br>:star:code
- 4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface<br>:star:code:tv:video
- KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs<br>:star:code
- Neural Articulated Radiance Field<br>:star:code
- Baking Neural Radiance Fields for Real-Time View Synthesis<br>:house:project:tv:video
- Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video<br>:star:code:house:project
- Nerfies: Deformable Neural Radiance Fields<br>:star:code:house:project:tv:video
- Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
- UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction<br>:open_mouth:oral:star:code:house:project:tv:video
- 3D渲染
- GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds<br>:open_mouth:oral:star:code:house:project:tv:video
- 3D photography(3D 相片)
- SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting<br>:open_mouth:oral:house:project:tv:video
- 渲染
3.Image Clustering(图像聚类)
- Clustering by Maximizing Mutual Information Across Views
- Learning Hierarchical Graph Neural Networks for Image Clustering<br>:star:code
- One-Pass Multi-View Clustering for Large-Scale Data
- End-to-End Robust Joint Unsupervised Image Alignment and Clustering
- Graph Contrastive Clustering<br>:star:code
- 人脸聚类
2.Sign Language(手语识别)
- Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives
- SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
- Self-Mutual Distillation Learning for Continuous Sign Language Recognition
- Visual Alignment Constraint for Continuous Sign Language Recognition<br>:star:code
- 手语翻译
1.Other(其它)
- Bias Loss for Mobile Neural Networks<br>:star:code
- Improve Unsupervised Pretraining for Few-label Transfer
- Temporal-wise Attention Spiking Neural Networks for Event Streams Classification
- Accelerating Atmospheric Turbulence Simulation via Learned Phase-to-Space Transform
- Energy-Based Open-World Uncertainty Modeling for Confidence Calibration
- Robustness via Cross-Domain Ensembles<br>:open_mouth:oral:star:code:house:project:tv:video
- Warp Consistency for Unsupervised Learning of Dense Correspondences<br>:open_mouth:oral:star:code
- Few-Shot and Continual Learning with Attentive Independent Mechanisms<br>:star:code
- Out-of-Core Surface Reconstruction via Global TGV Minimization
- ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
- Multi-scale Matching Networks for Semantic Correspondence<br>:star:code
- Learning with Noisy Labels via Sparse Regularization<br>:star:code
- CanvasVAE: Learning to Generate Vector Graphic Documents
- Toward Spatially Unbiased Generative Models<br>:star:code
- Learning Compatible Embeddings<br>:star:code
- Instance Similarity Learning for Unsupervised Feature Representation<br>:star:code
- Generalizable Mixed-Precision Quantization via Attribution Rank Preservation<br>:star:code
- Unifying Nonlocal Blocks for Neural Networks<br>:star:code
- Impact of Aliasing on Generalization in Deep Convolutional Networks
- NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models<br>:star:code
- ProAI: An Efficient Embedded AI Hardware for Automotive Applications - a Benchmark Study
- m-RevNet: Deep Reversible Neural Networks with Momentum 涉嫌学术不端,已申请撤稿
- Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
- MT-ORL: Multi-Task Occlusion Relationship Learning<br>:star:code
- Finding Representative Interpretations on Convolutional Neural Networks
- Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation<br>:star:code
- PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion
- Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks<br>:star:code
- Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision<br>:star:code
- Structured Outdoor Architecture Reconstruction by Exploration and Classification
- Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs<br>:star:code
- A New Journey from SDRTV to HDRTV<br>:star:code
- A Simple Framework for 3D Lensless Imaging with Programmable Masks<br>:star:code
- Causal Attention for Unbiased Visual Recognition<br>:star:code
- Learning to Match Features with Seeded Graph Matching Network<br>:star:code
- Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
- PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility<br>:open_mouth:oral
- Towards Understanding the Generative Capability of Adversarially Robust Classifiers<br>:open_mouth:oral
- Ranking Models in Unlabeled New Environments<br>:star:code
- Learning of Visual Relations: The Devil is in the Tails<br>:house:project
- BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies<br>:star:code
- Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image
- 去偏差
- Full-Velocity Radar Returns by Radar-Camera Fusion
- CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing<br>:star:code:house:project
- NGC: A Unified Framework for Learning with Open-World Noisy Data
- LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision<br>:house:project
- Unsupervised Dense Deformation Embedding Network for Template-Free Shape Correspondence
- Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process<br>:star:code
- Digging into Uncertainty in Self-supervised Multi-view Stereo
- Learning to Discover Reflection Symmetry via Polar Matching Convolution<br>:star:code:house:project
- A Dual Adversarial Calibration Framework for Automatic Fetal Brain Biometry
- The Functional Correspondence Problem
- The Animation Transformer: Visual Correspondence via Segment Matching
- Parsing Table Structures in the Wild<br>:star:code
- Square Root Marginalization for Sliding-Window Bundle Adjustment<br>:star:code:house:project:tv:video
- Hierarchical Object-to-Zone Graph for Object Navigation<br>:star:code:tv:video
- Robustness and Generalization via Generative Adversarial Training
- Learning Fast Sample Re-weighting Without Reward Data<br>:star:code
- ReconfigISP: Reconfigurable Camera Image Processing Pipeline<br>:house:project
- Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting<br>:open_mouth:oral
- Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories
- DisUnknown: Distilling Unknown Factors for Disentanglement Learning<br>:star:code:house:project
- S3VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation<br>:house:project
- ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity<br>:tv:video
- Photon-Starved Scene Inference using Single Photon Cameras<br>:tv:video
- OSCAR-Net: Object-centric Scene Graph Attention for Image Attribution<br>:star:code:house:project
- Learning to Estimate Hidden Motions with Global Motion Aggregation<br>:star:code:tv:video
- Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
- Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness<br>:star:code
- Procedure Planning in Instructional Videosvia Contextual Modeling and Model-based Policy Learning<br>:open_mouth:oral
- Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice<br>:open_mouth:oral:star:code
- Neural Strokes: Stylized Line Drawing of 3D Shapes<br>:star:code
- Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D Shape, Pose, and Appearance Consistency
- Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans<br>:house:project
- Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data via Differentiable Cross-Approximation<br>:star:code
- Exploiting Explanations for Model Inversion Attacks
- Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization
- RDI-Net: Relational Dynamic Inference Networks<br>:star:code
- ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators<br>:star:code
- T-Net: Effective Permutation-Equivariant Network for Two-View Correspondence Learning<br>:star:code
- Learning To Stylize Novel Views<br>:star:code:house:project
- A Lazy Approach to Long-Horizon Gradient-Based Meta-Learning
- Viewing Graph Solvability via Cycle Consistency<br>:open_mouth:oral:star:code<br>:trophy:Best paper honorable mention
- SACoD: Sensor Algorithm Co-Design Towards Efficient CNN-Powered Intelligent PhlatCam<br>:star:code
- Rethinking 360° Image Visual Attention Modelling with Unsupervised Learning
- Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection<br>:star:code
- Batch Normalization Increases Adversarial Vulnerability and Decreases Adversarial Transferability: A Non-Robust Feature Perspective
- DeepCAD: A Deep Generative Network for Computer-Aided Design Models<br>:house:project
- Better Aggregation in Test-Time Augmentation
- Self-Born Wiring for Neural Trees
- Detector-Free Weakly Supervised Grounding by Separation
- Motion-Aware Dynamic Architecture for Efficient Frame Interpolation
- Relating Adversarially Robust Generalization to Flat Minima
- Bit-Mixer: Mixed-Precision Networks With Runtime Bit-Width Selection
- AINet: Association Implantation for Superpixel Segmentation<br>:star:code
- Orthogonal Projection Loss<br>:star:code
- Knowledge-Enriched Distributional Model Inversion Attacks<br>:star:code
- Architecture Disentanglement for Deep Neural Networks<br>:star:code
- On Equivariant and Invariant Learning of Object Landmark Representations<br>:star:code:house:project
- Predicting with Confidence on Unseen Distributions
- Embed Me If You Can: A Geometric Perceptron<br>:star:code
- Persistent Homology Based Graph Convolution Network for Fine-Grained 3D Shape Segmentation
- HIRE-SNN: Harnessing the Inherent Robustness of Energy-Efficient Deep Spiking Neural Networks by Training With Crafted Input Noise<br>:star:code
- Towards Memory-Efficient Neural Networks via Multi-Level In Situ Generation
- From Culture to Clothing: Discovering the World Events Behind a Century of Fashion Images<br>:house:project
- MBA-VO: Motion Blur Aware Visual Odometry<br>:star:code
- STR-GQN: Scene Representation and Rendering for Unknown Cameras Based on Spatial Transformation Routing
- Explaining Local, Global, And Higher-Order Interactions In Deep Learning
- Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations<br>:star:code
- Homogeneous Architecture Augmentation for Neural Predictor<br>:star:code
- SS-IL: Separated Softmax for Incremental Learning
- VSAC: Efficient and Accurate Estimator for H and F
- Fusion Moves for Graph Matching<br>:star:code:house:project
- Geometric Granularity Aware Pixel-To-Mesh
- Modulated Periodic Activations for Generalizable Local Functional Representations<br>:house:project
- Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents<br>:star:code:house:project:tv:video
- A Dark Flash Normal Camera<br>:house:project:tv:video
- Pri3D: Can 3D Priors Help 2D Representation Learning?<br>:star:code:tv:video
- Membership Inference Attacks Are Easier on Difficult Problems
- Auxiliary Tasks and Exploration Enable ObjectGoal Navigation<br>:star:code:house:project
- MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
- Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery<br>:house:project
- DCT-SNN: Using DCT To Distribute Spatial Information Over Time for Low-Latency Spiking Neural Networks<br>:star:code
- Learning To Resize Images for Computer Vision Tasks
- Field of Junctions: Extracting Boundary Structure at Low SNR
- DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling
- Learning To Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data<br>:star:code
- Graph-based Asynchronous Event Processing for Rapid Object Recognitio
- Ranking Models in Unlabeled New Environments<br>:star:code
- A Hybrid Frequency-Spatial Domain Model for Sparse Image Reconstruction in Scanning Transmission Electron Microscopy<br>:star:code
- MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing
- Efficient Large Scale Inlier Voting for Geometric Vision Problems<br>:star:code
- Aggregation With Feature Detection
- ReCU: Reviving the Dead Weights in Binary Neural Networks<br>:star:code
- Deep Halftoning With Reversible Binary Pattern
- FFT-OT: A Fast Algorithm for Optimal Transportation
- Progressive Correspondence Pruning by Consensus Learning<br>:star:code:house:project<br>:newspaper:解读:基于一致性学习的渐进式匹配筛选 (ICCV 2021)
- Multispectral Illumination Estimation Using Deep Unrolling Network
- Distilling Global and Local Logits With Densely Connected Relations
- Learning specialized activation functions with the Piecewise Linear Unit
- Adaptive Convolutions With Per-Pixel Dynamic Filter Atom
- Deep Matching Prior: Test-Time Optimization for Dense Correspondence<br>:star:code
- Calibrated and Partially Calibrated Semi-Generalized Homographies<br>:star:code
- The Spatio-Temporal Poisson Point Process: A Simple Model for the Alignment of Event Camera Data<br>:star:code
- EC-DARTS: Inducing Equalized and Consistent Optimization Into DARTS
- Refining activation downsampling with SoftPool
- FATNN: Fast and Accurate Ternary Neural Networks<br>:star:code
- GTT-Net: Learned Generalized Trajectory Triangulation
- Deep Permutation Equivariant Structure from Motion<br>:star:code
- Extending Neural P-frame Codecs for B-frame Codin
- Hierarchical Graph Attention Network for Few-Shot Visual-Semantic Learning
- SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks<br>:star:code
- AA-RMVSNet: Adaptive Aggregation Recurrent Multi-View Stereo Network<br>:star:code
- Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective<br>:star:code
- Orthographic-Perspective Epipolar Geometry
- Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?<br>:star:code
- PixelPyramids: Exact Inference Models From Lossless Image Pyramids<br>:star:code
- SurfaceNet: Adversarial SVBRDF Estimation from a Single Image<br>:star:code
- Adaptive Curriculum Learning
- Sparse-Shot Learning With Exclusive Cross-Entropy for Extremely Many Localisations
- Graspness Discovery in Clutters for Fast and Accurate Grasp Detection
- RobustNav: Towards Benchmarking Robustness in Embodied Navigation<br>:star:code
- Generating Attribution Maps With Disentangled Masked Backpropagation
- Spectral Leakage and Rethinking the Kernel Size in CNNs<br>:star:code
- What You Can Learn by Staring at a Blank Wall
- Neural TMDlayer: Modeling Instantaneous Flow of Features via SDE Generators
- CLEAR: Clean-up Sample-Targeted Backdoor in Neural Networks
- Learning To Hallucinate Examples From Extrinsic and Intrinsic Supervision
- Single-shot Hyperspectral-Depth Imaging with Learned Diffractive Optics
- GridToPix: Training Embodied Agents With Minimal Supervision<br>:house:project:tv:video
- Differentiable Dynamic Wirings for Neural Networks
- JEM++: Improved Techniques for Training JEM<br>:star:code
- X-World: Accessibility, Vision, and Autonomy Meet
- Memory-augmented Dynamic Neural Relational Inference
- Physics-based Differentiable Depth Sensor Simulation
- Hypergraph Neural Networks for Hypergraph Matching<br>:star:code
- Visual Grounding
- Cortical Surface Shape Analysis Based on Alexandrov Polyhedra
- FcaNet: Frequency Channel Attention Networks<br>:star:code
- Procedure Planning in Instructional Videos via Contextual Modeling and Model-Based Policy Learning
- Structured Outdoor Architecture Reconstruction by Exploration and Classification<br>:star:code
- ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
- Testing Using Privileged Information by Adapting Features With Statistical Dependence
- Virtual Light Transport Matrices for Non-Line-of-Sight Imaging<br>:open_mouth:oral
- DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training
- Contrastive Multimodal Fusion with TupleInfoNCE
- Learning Better Visual Data Similarities via New Grouplet Non-Euclidean Embedding
- An Elastica Geodesic Approach With Convexity Shape Prior
- Inverting a Rolling Shutter Camera: Bring Rolling Shutter Images to High Framerate Global Shutter Video
- Multimodal Knowledge Expansion<br>:star:code
- Direct Differentiable Augmentation Search<br>:star:code
- The Functional Correspondence Problem<br>:house:project
- Joint Topology-Preserving and Feature-Refinement Network for Curvilinear Structure Segmentation<br>:star:code
- Generative Layout Modeling Using Constraint Graphs
- Self-Supervised Image Prior Learning with GMM from a Single Noisy Image<br>:star:code
- Deep Implicit Surface Point Prediction Networks<br>:star:code:house:project:tv:video
- Poly-NL: Linear Complexity Non-local Layers With 3rd Order Polynomials
- Factorizing Perception and Policy for Interactive Instruction Following<br>:star:code
- Group-Wise Inhibition Based Feature Regularization for Robust Classification<br>:star:code
- Searching for Robustness: Loss Learning for Noisy Classification Tasks
- Statistically Consistent Saliency Estimation
- Practical Relative Order Attack in Deep Ranking<br>:star:code
- Q-Match: Iterative Shape Matching via Quantum Annealing<br>:star:code:house:project
- Learning To Better Segment Objects From Unseen Classes With Unlabeled Videos<br>:house:project:tv:video
- Globally Optimal and Efficient Manhattan Frame Estimation by Delimiting Rotation Search Space
- Cross-Encoder for Unsupervised Gaze Representation Learning
- Hierarchical Disentangled Representation Learning for Outdoor Illumination Estimation and Editing
- NeuSpike-Net: High Speed Video Reconstruction via Bio-Inspired Neuromorphic Cameras
- Local Temperature Scaling for Probability Calibration
- LIRA: Learnable, Imperceptible and Robust Backdoor Attacks
- Conformer: Local Features Coupling Global Representations for Visual Recognition<br>:star:code
- Reliably fast adversarial training via latent adversarial perturbation
- PX-NET: Simple and Efficient Pixel-Wise Training of Photometric Stereo Networks
- A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation<br>:star:code:house:project:tv:video
- ICON: Learning Regular Maps Through Inverse Consistency
- Video Geo-Localization Employing Geo-Temporal Feature Learning and GPS Trajectory Smoothing<br>:star:code
- Kernel Methods in Hyperbolic Spaces
- Cross-Camera Convolutional Color Constancy
- BlockPlanner: City Block Generation with Vectorized Graph Representation
- A Machine Teaching Framework for Scalable Recognition
- Clothed Human Bodies
- Dynamic Surface Function Networks for Clothed Human Bodies<br>:star:code:house:project:tv:video
- 迁移学习
- Active Recognition(AR)
- 3D摄影
- SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting<br>:open_mouth:oral:house:project:tv:video
- Sub-Bit Neural Networks: Learning To Compress and Accelerate Binary Neural Networks<br>:star:code
- When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes<br>:star:code
- Physics-Enhanced Machine Learning for Virtual Fluorescence Microscopy<br>:star:code
- Ground-truth or DAER: Selective Re-query of Secondary Information<br>:star:code
- Can Shape Structure Features Improve Model Robustness Under Diverse Adversarial Settings?
- Joint Representation Learning and Novel Category Discovery on Single- and Multi-Modal Data
- Sparse Needlets for Lighting Estimation with Spherical Transport Loss
- Semantic Perturbations with Normalizing Flows for Improved Generalization
- Differentiable Surface Rendering via Non-Differentiable Sampling
- Towards Robustness of Deep Neural Networks via Regularization
- Objects as Cameras: Estimating High-Frequency Illumination from Shadows
- Inference of Black Hole Fluid-Dynamics From Sparse Interferometric Measurements
- Removing the Bias of Integral Pose Regression
- A Light Stage on Every Desk<br>:house:project
- Multi-Level Curriculum for Training a Distortion-Aware Barrel Distortion Rectification Model
- Generic Event Boundary Detection: A Benchmark for Event Segmentation
- Extreme Structure from Motion for Indoor Panoramas without Visual Overlaps<br>:star:code
- Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams<br>:star:code
- VaPiD: A Rapid Vanishing Point Detector via Learned Optimizers
- Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images<br>:star:code
- Efficient and Differentiable Shadow Computation for Inverse Problems
- Minimal Cases for Computing the Generalized Relative Pose using Affine Correspondences
- Radial Distortion Invariant Factorization for Structure from Motion
- LaLaLoc: Latent Layout Localisation in Dynamic, Unvisited Environments
- Transforms Based Tensor Robust PCA: Corrupted Low-Rank Tensors Recovery via Convex Optimization
- Synchronization of Group-labelled Multi-graphs
- Robust Watermarking for Deep Neural Networks via Bi-Level Optimization
- CrossNorm and SelfNorm for Generalization under Distribution Shifts<br>:star:code
- Learning Temporal Dynamics from Cycles in Narrated Video<br>:house:project
- von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning
- Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts<br>:star:code
- Me-Momentum: Extracting Hard Confident Examples From Noisily Labeled Data<br>:star:code
- ProFlip: Targeted Trojan Attack with Progressive Bit Flips
- Attention Is Not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion
- AdvRush: Searching for Adversarially Robust Neural Architectures
- Improving robustness against common corruptions with frequency biased models
- UASNet: Uncertainty Adaptive Sampling Network for Deep Stereo Matching
- Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration<br>:star:code
- Field Convolutions for Surface CNNs<br>:open_mouth:oral:star:code
- SIMstack: A Generative Shape and Instance Model for Unordered Object Stacks
- Learning Icosahedral Spherical Probability Map Based on Bingham Mixture Model for Vanishing Point Estimation
- Incorporating Learnable Membrane Time Constant to Enhance Learning of Spiking Neural Networks<br>:star:code
- Real-Time Vanishing Point Detector Integrating Under-Parameterized RANSAC and Hough Transform
- Low-Rank Tensor Completion by Approximating the Tensor Average Rank
- Rotation Averaging in a Split Second: A Primal-Dual Method and a Closed-Form for Cycle Graphs<br>:star:code
- Effectively Leveraging Attributes for Visual Similarity<br>:star:code
- Localized Simple Multiple Kernel K-means<br>:star:code
- SmartShadow: Artistic Shadow Drawing Tool for Line Drawings
- PT-CapsNet: A Novel Prediction-Tuning Capsule Network Suitable for Deeper Architectures<br>:star:code
- Generalized Shuffled Linear Regression<br>:star:code
- The Animation Transformer: Visual Correspondence via Segment Matching
- Weak Adaptation Learning: Addressing Cross-Domain Data Insufficiency With Weak Annotator
- Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation
- Procrustean Training for Imbalanced Deep Learning