Awesome
CVPR-2023-Papers
❣❣❣ CVPR 2023 论文分类整理已完成
:loudspeaker::loudspeaker::loudspeaker:获奖论文
:trophy:Best Paper
- Planning-oriented Autonomous Driving<br>:house:project
- Visual Programming: Compositional visual reasoning without training
:trophy:Best student Paper
:trophy:Honorable Mention
:trophy:Honorable Mention(Student)
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation<br>:house:project
历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~
2024 年论文分类汇总戳这里
2023 年论文分类汇总戳这里
↘️CVPR-2023-Papers ↘️WACV-2023-Papers ↘️ICCV-2023-Papers
2022 年论文分类汇总戳这里
2021 年论文分类汇总戳这里
2020 年论文分类汇总戳这里
目录
80.计算机图形学
- Learning Anchor Transformations for 3D Garment Animation<br>:star:code
- Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion<br>:star:code
- CloSET: Modeling Clothed Humans on Continuous Surface with Explicit Template Decomposition<br>:house:project
- FLEX: Full-Body Grasping Without Full-Body Grasps<br>:house:project
79.thermal imaging technology(热敏成像技术)
<a name="78"/>78.Image/Video Editing(图像/视频编辑)
- PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image<br>:house:project
- 文本驱动的视频编辑
- Image Editing(图像编辑)
- CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing
- SIEDOB: Semantic Image Editing by Disentangling Object and Background
- NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models
- InstructPix2Pix: Learning To Follow Image Editing Instructions<br>:house:project
- Local 3D Editing via 3D Distillation of CLIP Knowledge
- Deep Curvilinear Editing: Commutative and Nonlinear Image Manipulation for Pretrained Deep Generative Model
- Imagic: Text-Based Real Image Editing With Diffusion Models
- 基于样本的图像编辑
77.sketch(草图)
- Photo Pre-Training, but for Sketch<br>:star:code
- Restoration of Hand-Drawn Architectural Drawings Using Latent Space Mapping With Degradation Generator
- SECAD-Net: Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations<br>:star:code
76.IP protection(知识产权保护)
- Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection
- Effective Ambiguity Attack Against Passport-Based DNN Intellectual Property Protection Schemes Through Fully Connected Layer Substitution
75.Semantic Scene Completion(语义场景补全)
- Semantic Scene Completion With Cleaner Self
- VoxFormer: Sparse Voxel Transformer for Camera-Based 3D Semantic Scene Completion<br>:star:code
74.Machine Learning(机器学习)
- Cooperation or Competition: Avoiding Player Domination for Multi-Target Robustness via Adaptive Budgets
- Multi-Agent Automated Machine Learning
- Towards Better Decision Forests: Forest Alternating Optimization
- ERM-KTP: Knowledge-Level Machine Unlearning via Knowledge Transfer<br>:star:code
- A Whac-a-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others<br>:star:code
- 新类别发现
- 迁移学习
73.Neural Radiance Fields(神经辐射场)
- Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization
- Discriminating Known From Unknown Objects via Structure-Enhanced Recurrent Variational AutoEncoder
- Occlusion-Free Scene Recovery via Neural Radiance Fields
- Grid-guided Neural Radiance Fields for Large Urban Scenes<br>:house:project
- NeRFLight: Fast and Light Neural Radiance Fields using a Shared Feature Grid
- GazeNeRF: 3D-Aware Gaze Redirection With Neural Radiance Fields<br>:star:code
- SPARF: Neural Radiance Fields from Sparse and Noisy Poses<br>:star:code
- Masked Wavelet Representation for Compact Neural Radiance Fields<br>:star:code
- MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures<br>:star:code
- AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware Training<br>:house:project
- JacobiNeRF: NeRF Shaping With Mutual Information Gradients
- Robust Dynamic Radiance Fields<br>:house:project
- Exact-NeRF: An Exploration of a Precise Volumetric Parameterization for Neural Radiance Fields
- PaletteNeRF: Palette-Based Appearance Editing of Neural Radiance Fields
- EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points<br>:house:project
- SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene<br>:house:project
- ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision<br>:star:code
- Flow supervision for Deformable NeRF
- Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields<br>:house:project
- EventNeRF: Neural Radiance Fields From a Single Colour Event Camera<br>:house:project
- SeaThru-NeRF: Neural Radiance Fields in Scattering Media
- SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory
- Complementary Intrinsics From Neural Radiance Fields and CNNs for Outdoor Scene Relighting
- Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields
- Removing Objects From Neural Radiance Fields
- Grid-guided Neural Radiance Fields for Large Urban Scenes<br>:star:code
- GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images
- HandNeRF: Neural Radiance Fields for Animatable Interacting Hands
- NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects<br>:star:code
- JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields<br>:house:project
- Multi-Space Neural Radiance Fields<br>:star:code
- DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields<br>:star:code
- StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields<br>:house:project
- Temporal Interpolation Is All You Need for Dynamic Neural Radiance Fields<br>:house:project
- SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting With Neural Radiance Fields
- F2-NeRF: Fast Neural Radiance Field Training With Free Camera Trajectories<br>:house:project
- Clothed Human Performance Capture with a Double-layer Neural Radiance Fields
- DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models
- 去模糊
72.open-set recognition(开集识别)
<a name="71"/>71.visual reasoning(视觉推理)
- Visual Programming: Compositional visual reasoning without training<br>:trophy:Best Paper
- Abstract Visual Reasoning: An Algebraic Approach for Solving Raven's Progressive Matrices<br>:star:code
- Super-CLEVR: A Virtual Benchmark To Diagnose Domain Robustness in Visual Reasoning<br>:star:code
- Unicode Analogies: An Anti-Objectivist Visual Reasoning Challenge
70.Image Forgery Detection
- Hierarchical Fine-Grained Image Forgery Detection and Localization<br>:star:code
- Detecting and Grounding Multi-Modal Media Manipulation<br>:star:code<br>:star:code虚假信息检测
- Evading DeepFake Detectors via Adversarial Statistical Consistency
- Edge-Aware Regional Message Passing Controller for Image Forgery Localization
- TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization<br>:house:project
- Towards Universal Fake Image Detectors That Generalize Across Generative Models
- Deepfake Detection
69.Reinforcement learning(强化学习)
- PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav
- Local-Guided Global: Paired Similarity Representation for Visual Reinforcement Learning
- Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning<br>:star:code
- Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-per-Second<br>:star:code
- Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning<br>:house:project
68.Lifelong Learning(终身学习)
<a name="67"/>67.Active Learning(主动学习)
- Re-thinking Federated Active Learning based on Inter-class Diversity
- Box-Level Active Detection<br>:star:code
- Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning<br>:star:code
- Re-Thinking Federated Active Learning Based on Inter-Class Diversity
66.Clustering(聚类)
- DivClust: Controlling Diversity in Deep Clustering
- MVC
- On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering<br>:star:code
- GCFAgg: Global and Cross-View Feature Aggregation for Multi-View Clustering
- Sample-Level Multi-View Graph Clustering
- On the Effects of Self-Supervision and Contrastive Alignment in Deep Multi-View Clustering<br>:star:code
- Deep Incomplete Multi-View Clustering With Cross-View Partial Sample and Prototype Alignment
- Highly Confident Local Structure Based Consensus Graph Learning for Incomplete Multi-View Clustering
65.Scene flow estimation(场景流估计)
- Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision<br>:star:code
- Self-Supervised 3D Scene Flow Estimation Guided by Superpoints
- Unsupervised Cumulative Domain Adaptation for Foggy Scene Optical Flow
64.Motion Retargeting(动作重定向)
<a name="63"/>63.edge detection(边缘检测)
- edge detection
62.Object Counting(物体计数)
- Zero-shot Object Counting<br>:star:code
- Indiscernible Object Counting in Underwater Scenes<br>:star:code
61.Object Re-identification(物体重识别)
- MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID<br>:star:code
- Large-scale Training Data Search for Object Re-identification<br>:star:code
- Adaptive Sparse Pairwise Loss for Object Re-Identification<br>:star:code
60.Industrial Anomaly Detection(工业缺陷检测)
- 缺陷定位
- 工业异常检测
- 异常分割
59.Image\Video Compression(图像视频压缩)
- Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
- Context-Based Trit-Plane Coding for Progressive Image Compression<br>:star:code
- Learned Image Compression with Mixed Transformer-CNN Architectures<br>:star:code
- LVQAC: Lattice Vector Quantization Coupled with Spatially Adaptive Companding for Efficient Learned Image Compression
- Optimization-Inspired Cross-Attention Transformer for Compressive Sensing<br>:star:code
- Multi-Realism Image Compression With a Conditional Generator
- AccelIR: Task-aware Image Compression for Accelerating Neural Restoration
- 视频压缩
- Towards Scalable Neural Representation for Diverse Videos
- HNeRV: A Hybrid Neural Representation for Videos<br>:star:code<br>:star:code
- Video Compression With Entropy-Constrained Neural Representations
- Complexity-Guided Slimmable Decoder for Efficient Deep Video Compression
- EfficientSCI: Densely Connected Network with Space-time Factorization for Large-scale Video Snapshot Compressive Imaging<br>:star:code
- MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy Coding
- Neural Video Compression With Diverse Contexts<br>:star:code ( Motion Information Propagation for Neural Video Compression
- Hierarchical B-Frame Video Coding Using Two-Layer CANF Without Motion Coding
- 矢量量化
58.Neural rendering(神经渲染)
- TMO: Textured Mesh Acquisition of Objects With a Mobile Device by Using Differentiable Rendering
- Tensor4D: Efficient Neural 4D Decomposition for High-Fidelity Dynamic Reconstruction and Rendering
- Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur<br>:house:project
- NeUDF: Leaning Neural Unsigned Distance Fields With Volume Rendering
- DiffRF: Rendering-Guided 3D Radiance Field Diffusion<br>:house:project
- Unsupervised Continual Semantic Adaptation Through Neural Rendering
- Neural Fields Meet Explicit Geometric Representations for Inverse Rendering of Urban Scenes<br>:house:project
- UV Volumes for Real-Time Rendering of Editable Free-View Human Performance<br>:house:project
- Inverse Rendering of Translucent Objects Using Physical and Neural Renderers
- ORCa: Glossy Objects As Radiance-Field Cameras<br>:house:project
- MAIR: Multi-View Attention Inverse Rendering With 3D Spatially-Varying Lighting Estimation<br>:house:project
- FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views<br>:house:project
- Learning To Render Novel Views From Wide-Baseline Stereo Pairs<br>:house:project
- NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer<br>:house:project
- FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization<br>:house:project
- Local Implicit Ray Function for Generalizable Radiance Field Representation<br>:star:code
- FitMe: Deep Photorealistic 3D Morphable Model Avatars<br>:star:code
- Pointersect: Neural Rendering with Cloud-Ray Intersection
- Inverse Rendering of Translucent Objects using Physical and Neural Renderers
- Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention<br>:star:code
- ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field
- WildLight: In-the-wild Inverse Rendering with a Flashlight<br>:star:code
- FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views<br>:star:code
- NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field Indirect Illumination
- MonoHuman: Animatable Human Neural Field from Monocular Video<br>:star:code
- Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos<br>:star:code
- PlenVDB: Memory Efficient VDB-Based Radiance Fields for Fast Training and Rendering<br>在 iPhone12 手机上达到了对于输出 1280x720 分辨率的画面每秒 30 帧的速率。
- NeFII: Inverse Rendering for Reflectance Decomposition With Near-Field Indirect Illumination
57.Gaze Estimation(视线估计)
- NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation
- Source-free Adaptive Gaze Estimation by Uncertainty Reduction<br>:star:code
- ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection
56.Sound + Vision(声音与视觉)
- Conditional Generation of Audio from Video via Foley Analogies<br>:star:code
- Vision Transformers Are Parameter-Efficient Audio-Visual Learners
- 扬声器检测
- 视听语音识别
- Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring<br>:star:code
- Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception<br>:star:code
- AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
- SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
- 视听定位
- 音频源分离
- 声音合成
- 电影音频描述
- AutoAD: Movie Description in Context<br>:house:project
- 从声音中生成场景图像
- 视听异常检测
- 电影配音
- 舞蹈生成
- 视频显著性预测
- 音频驱动的肖像动画
- 听觉定位
55.Novel View Synthesis(视图合成)
- Neural Pixel Composition for 3D-4D View Synthesis From Multi-Views
- Consistent View Synthesis With Pose-Guided Diffusion Models
- MixNeRF: Modeling a Ray with Mixture Density for Novel View Synthesis from Sparse Inputs<br>:house:project
- NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis<br>:house:project
- NeRDi: Single-View NeRF Synthesis With Language-Guided Diffusion As General Image Priors
- Novel-View Acoustic Synthesis<br>:house:project
- Cross-Guided Optimization of Radiance Fields With Multi-View Image Super-Resolution for High-Resolution Novel View Synthesis
- Frequency-Modulated Point Cloud Rendering with Easy Editing<br>:star:code
- Learning Neural Duplex Radiance Fields for Real-Time View Synthesis<br>:house:project
- ReLight My NeRF: A Dataset for Novel View Synthesis and Relighting of Real World Objects<br>:star:code
- Balanced Spherical Grid for Egocentric View Synthesis
- Progressively Optimized Local Radiance Fields for Robust View Synthesis<br>:star:code
- F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories<br>:star:code
- Enhanced Stable View Synthesis
- Consistent View Synthesis with Pose-Guided Diffusion Models<br>:star:code
- Learning to Render Novel Views from Wide-Baseline Stereo Pairs<br>:star:code
- Painting 3D Nature in 2D: View Synthesis of Natural Scenes From a Single Semantic Mask<br>:house:project
- NoPe-NeRF: Optimising Neural Radiance Field With No Pose Prior<br>:house:project
- Multiscale Tensor Decomposition and Rendering Equation Encoding for View Synthesis<br>:star:code
- Efficient View Synthesis and 3D-Based Multi-Frame Denoising With Multiplane Feature Representations
- NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds
- DINER: Depth-aware Image-based NEural Radiance fields<br>:house:project
- RefSR-NeRF: Towards High Fidelity and Super Resolution View Synthesis<br>:star:code
- VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization<br>:star:code
- DynIBaR: Neural Dynamic Image-Based Rendering<br>:house:project<br>:trophy:Honorable Mention
- Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories
54.Benchmark/Dataset(基准/数据集)
- Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset
- A New Dataset Based on Images Taken by Blind People for Testing the Robustness of Image Classification Models Trained for ImageNet Categories
- Benchmarking Self-Supervised Learning on Diverse Pathology Datasets
- Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline
- Towards Artistic Image Aesthetics Assessment: A Large-Scale Dataset and a New Method
- ScaleDet: A Scalable Multi-Dataset Object Detector
- JRDB-Pose: A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking<br>:sunflower:dataset
- Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning
- DF-Platter: Multi-Face Heterogeneous Deepfake Dataset<br>:sunflower:dataset
- HandsOff: Labeled Dataset Generation With No Additional Human Annotations<br>:sunflower:dataset
- M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis<br>:star:code
- ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations<br>:sunflower:dataset
- NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation<br>:star:code
- MISC210K: A Large-Scale Dataset for Multi-Instance Semantic Correspondence<br>:star:code
- StarCraftImage: A Dataset for Prototyping Spatial Reasoning Methods for Multi-Agent Environments<br>:house:project
- Habitat-Matterport 3D Semantics Dataset
- CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset<br>:star:code<br>大规模公共中文视频文本数据集
- FLAG3D: A 3D Fitness Activity Dataset With Language Instruction<br>:house:project
- Multi-Label Compound Expression Recognition: C-EXPR Database & Network
- ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation<br>:house:project<br>手物体操作的数据集
- xFBD: Focused Building Damage Dataset and Analysis<br>建筑物损坏数据集
- Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo<br>:sunflower:dataset
- Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
- HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling<br>:sunflower:dataset
- CUDA: Convolution-based Unlearnable Datasets<br>:sunflower:dataset
- MVImgNet: A Large-scale Dataset of Multi-view Images<br>:sunflower:dataset
- V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception<br>:sunflower:dataset<br>Vehicle-to-Vehicle(V2V)感知
- Polynomial Implicit Neural Representations For Large Diverse Datasets<br>:sunflower:dataset
- MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset<br>:sunflower:dataset
- RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset<br>:sunflower:dataset
- Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts<br>:star:code
- ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data<br>:star:code
- CelebV-Text: A Large-Scale Facial Text-Video Dataset<br>:star:code<br>人脸文本到视频生成
- Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method<br>:star:code<br>艺术图像美学评估
- CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions<br>:house:project<br>攀爬动作数据集
- Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
- AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection<br>:star:code<br>:house:project公共短视频镜头边界检测数据集
- V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting<br>:star:code
- WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models<br>:star:code用于极端天气条件下的物体检测和天气分类任务的合成数据集
- CLOTH4D: A Dataset for Clothed Human Reconstruction<br>:sunflower:dataset<br>用于穿衣服人体重建的数据集
- OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images<br>:sunflower:dataset<br>从多层次和多视图图像中获取全能城市理解的新数据集。
- RealImpact: A Dataset of Impact Sound Fields for Real Objects<br>:star:code
- BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion<br>:house:project
- GFIE:A Dataset and Baseline for Gaze-Following From 2D to 3D in Indoor Environments<br>:house:project
- Benchmark(基准)
- A Soma Segmentation Benchmark in Full Adult Fly Brain<br>:star:code
- A New Comprehensive Benchmark for Semi-Supervised Video Anomaly Detection and Anticipation
- A Large-Scale Homography Benchmark
- Toward RAW Object Detection: A New Benchmark and a New Model
- MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding
- Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild<br>:house:project
- Advancing Visual Grounding With Scene Knowledge: Benchmark and Method<br>:star:code
- The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects<br>:star:code
- Meta Omnium: A Benchmark for General-Purpose Learning-to-Learn<br>:star:code
- A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation<br>:star:code
- GeoNet: Benchmarking Unsupervised Adaptation across Geographies<br>:star:code
- PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout<br>:star:code
- Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
- ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos<br>:house:project
- Image Similarity
- ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos<br>:star:code
- Ultra-High Resolution Segmentation with Ultra-Rich Context: A Novel Benchmark<br>:star:code
- NewsNet: A Novel Benchmark for Hierarchical Temporal Segmentation<br>:star:code
- Ultra-High Resolution Segmentation With Ultra-Rich Context: A Novel Benchmark<br>:star:code
- PosterLayout: A New Benchmark and Approach for Content-Aware Visual-Textual Presentation Layout<br>:star:code
- Meta Omnium: A Benchmark for General-Purpose Learning-To-Learn<br>:star:code
- RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension<br>:house:project
53.Sign Language (手语)
- Ham2Pose: Animating Sign Language Notation Into Pose Sequences<br>:house:project
- 手语翻译
- 手语识别
- Continuous Sign Language Recognition with Correlation Network<br>:star:code
- Reconstructing Signing Avatars From Video Using Linguistic Priors<br>:house:project
- Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition
- CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational Alignment<br>:star:code
- Natural Language-Assisted Sign Language Recognition<br>:star:code
- Continuous Sign Language Recognition With Correlation Network<br>:star:code
- 手语检索
52.Human Motion(人体运动)
- Semi-Weakly Supervised Object Kinematic Motion Prediction
- The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction
- MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion
- 人体运动预测
- 人体运动合成
- Generating Human Motion From Textual Descriptions With Discrete Representations<br>:house:project
- UDE: A Unified Driving Engine for Human Motion Generation<br>:star:code
- Mofusion: A Framework for Denoising-Diffusion-Based Motion Synthesis<br>:house:project
- MoDi: Unconditional Motion Synthesis From Diverse Data
- 3D HM
51.Computed Imaging(计算成像,如光学、几何、光场成像等)
- Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography<br>:star:code
- TRACE: 5D Temporal Regression of Avatars With Dynamic Cameras in 3D Environments<br>:star:code
- High-Fidelity Event-Radiance Recovery via Transient Event Frequency<br>:star:code
- Real-Time Neural Light Field on Mobile Devices<br>:house:project
- Accidental Light Probes<br>:house:project
- DyLiN: Making Light Field Networks Dynamic<br>:star:code
- Learning Rotation-Equivariant Features for Visual Correspondence<br>:house:project
- Role of Transients in Two-Bounce Non-Line-of-Sight Imaging
- Revisiting Rolling Shutter Bundle Adjustment: Toward Accurate and Fast Solution
- 相机姿势估计
- 快门校正
- 相机校准
- 几何估计
- 相机定位 *NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization<br>:star:code
50.Anomaly Detection(异常检测)
- Revisiting Reverse Distillation for Anomaly Detection
- SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection
- Prototypical Residual Networks for Anomaly Detection and Localization
- OpenMix: Exploring Outlier Samples for Misclassification Detection<br>:star:code
- Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection<br>:star:code
- Diversity-Measurable Anomaly Detection
- SimpleNet: A Simple Network for Image Anomaly Detection and Localization<br>:star:code
- DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection
- WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
- OOD
- Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection<br>:star:code
- Mind the Label Shift of Augmentation-Based Graph OOD Generalization
- Block Selection Method for Using Feature Norm in Out-of-Distribution Detection<br>:star:code
- Distribution Shift Inversion for Out-of-Distribution Prediction<br>:star:code
- Are Data-Driven Explanations Robust Against Out-of-Distribution Data?
- LINe: Out-of-Distribution Detection by Leveraging Important Neurons
- Rethinking Out-of-Distribution (OOD) Detection: Masked Image Modeling Is All You Need<br>:star:code
- Balanced Energy Regularization Loss for Out-of-Distribution Detection
- Decoupling MaxLogit for Out-of-Distribution Detection
- Detection of Out-of-Distribution Samples Using Binary Neuron Activation Patterns
- GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection<br>:star:code
49.Image Geo-localization(图像地理位置识别)
<a name="48"/>48.NLP(自然语言处理)
- Images Speak in Images: A Generalist Painter for In-Context Visual Learning<br>:star:code
- CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes From Natural Language
- 反讽检测(检测文本(或图像,如漫画等其他模态)中是否存在讽刺)
- NLQ
- Visual Grounding(视觉指代)
- Referring Expression Comprehension(指代表达理解)
47.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/域适应)
- DG
- Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View
- Meta-Causal Learning for Single Domain Generalization
- Bi-Level Meta-Learning for Few-Shot Domain Generalization
- Promoting Semantic Connectivity: Dual Nearest Neighbors Contrastive Learning for Unsupervised Domain Generalization
- Federated Domain Generalization With Generalization Adjustment<br>:star:code
- Decompose, Adjust, Compose: Effective Normalization by Playing With Frequency for Domain Generalization
- NICO++: Towards Better Benchmarking for Domain Generalization<br>:star:code
- Improved Test-Time Adaptation for Domain Generalization<br>:star:code
- Modality-Agnostic Debiasing for Single Domain Generalization
- Neuron Structure Modeling for Generalizable Remote Physiological Measurement<br>:star:code
- Sharpness-Aware Gradient Matching for Domain Generalization<br>:star:code
- Improving Generalization with Domain Convex Game
- Generalist: Decoupling Natural and Robust Generalization<br>:star:code
- ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization<br>:star:code
- Deep Frequency Filtering for Domain Generalization
- Progressive Random Convolutions for Single Domain Generalization
- Meta-causal Learning for Single Domain Generalization
- DA
- Guiding Pseudo-labels with Uncertainty Estimation for Test-Time Adaptation<br>:star:code
- Class Relationship Embedded Learning for Source-Free Unsupervised Domain Adaptation
- Semi-Supervised Domain Adaptation With Source Label Adaptation
- SCoDA: Domain Adaptive Shape Completion for Real Scans
- Divide and Adapt: Active Domain Adaptation via Customized Learning
- Source-Free Video Domain Adaptation With Spatial-Temporal-Historical Consistency Learning
- DARE-GRAM: Unsupervised Domain Adaptation Regression by Aligning Inverse Gram Matrices<br>:star:code
- Dual-Bridging With Adversarial Noise Generation for Domain Adaptive rPPG Estimation
- MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation<br>:star:code
- DATE: Domain Adaptive Product Seeker for E-commerce<br>:star:code
- Adjustment and Alignment for Unbiased Open Set Domain Adaptation<br>:star:code
- Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective
- MHPL: Minimum Happy Points Learning for Active Source Free Domain Adaptation
- COT: Unsupervised Domain Adaptation with Clustering and Optimal Transport
- Upcycling Models under Domain and Category Shift<br>:star:code
- C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation
- TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation<br>:star:code
- OSAN: A One-Stage Alignment Network to Unify Multimodal Alignment and Unsupervised Domain Adaptation
- MOT: Masked Optimal Transport for Partial Domain Adaptation
- Feature Alignment and Uniformity for Test Time Adaptation
- ZSL
- Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning<br>:star:code
- Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning<br>:star:code
- Learning Attention as Disentangler for Compositional Zero-shot Learning<br>:star:code
- Zero-shot Model Diagnosis
- Learning Conditional Attributes for Compositional Zero-Shot Learning<br>:star:code
- (ML)$^2$P-Encoder: On Exploration of Channel-Class Correlation for Multi-Label Zero-Shot Learning<br>:star:code
- Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning
- FSL
- Transductive Few-shot Learning with Prototype-based Label Propagation by Iterative Graph Refinement
- Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners<br>:star:code
- Revisiting Prototypical Network for Cross Domain Few-Shot Learning<br>:star:code
- Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning With Multimodal Models<br>:house:project
- Open-Set Likelihood Maximization for Few-Shot Learning
- StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning<br>:star:code
46.Scene Graph Generation(场景图生成)
- Unbiased Scene Graph Generation in Videos
- Prototype-Based Embedding Network for Scene Graph Generation
- IS-GGT: Iterative Scene Graph Generation With Generative Transformers
- Prototype-based Embedding Network for Scene Graph Generation<br>:star:code
- Devil's on the Edges: Selective Quad Attention for Scene Graph Generation<br>:house:project
- Learning To Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space
- Panoptic Video Scene Graph Generation
- Fast Contextual Scene Graph Generation With Unbiased Context Augmentation
45.Dense Prediction(密集预测)
- Ensemble-Based Blackbox Attacks on Dense Prediction<br>:star:code
- DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
- Ensemble-based Blackbox Attacks on Dense Prediction<br>:star:code
- Probabilistic Prompt Learning for Dense Prediction
- 1% VS 100%: Parameter-Efficient Low Rank Adapter for Dense Predictions
- DPF: Learning Dense Prediction Fields With Weak Supervision<br>:star:code
- 密集检测
- 密集目标定位
44.Federated Learning(联邦学习)
- Confidence-Aware Personalized Federated Learning via Variational Expectation Maximization
- Federated Learning With Data-Agnostic Distribution Fusion
- How To Prevent the Poor Performance Clients for Personalized Federated Learning?
- GradMA: A Gradient-Memory-Based Accelerated Federated Learning With Alleviated Catastrophic Forgetting
- Bias-Eliminating Augmentation Learning for Debiased Federated Learning
- Make Landscape Flatter in Differentially Private Federated Learning
- The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning
- Rethinking Federated Learning With Domain Shift: A Prototype View<br>:star:code
- On the Effectiveness of Partial Variance Reduction in Federated Learning With Heterogeneous Data
- Elastic Aggregation for Federated Optimization
- FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning
- Adaptive Channel Sparsity for Federated Learning Under System Heterogeneity
- ScaleFL: Resource-Adaptive Federated Learning With Heterogeneous Clients
- Reliable and Interpretable Personalized Federated Learning
43.Multi-Task Learning(多任务学习)
- Independent Component Alignment for Multi-Task Learning
- Dynamic Neural Network for Multi-Task Learning Searching Across Diverse Network Topologies
- AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task Learning<br>:star:code
- Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners<br>:house:project
- Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing With Non-Learnable Primitives<br>:star:code
- Hierarchical Prompt Learning for Multi-Task Learning
42.Metric Learning(度量学习)
- Advancing Deep Metric Learning Through Multiple Batch Norms And Multi-Targeted Adversarial Examples
- Deep Factorized Metric Learning<br>:star:code
- Deep Semi-Supervised Metric Learning With Mixed Label Propagation
- Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning
41.Incremental Learning(增量学习)
- Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning<br>:star:code
- AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning<br>:star:code
- GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task
- 类增量学习
- Dense Network Expansion for Class Incremental Learning
- Class-Incremental Exemplar Compression for Class-Incremental Learning<br>:star:code
- Rebalancing Batch Normalization for Exemplar-based Class-Incremental Learning
- Learning with Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning<br>:star:code
- On the Stability-Plasticity Dilemma of Class-Incremental Learning
- Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation<br>:star:code
- Multi-Centroid Task Descriptor for Dynamic Class Incremental Inference
- DKT: Diverse Knowledge Transfer Transformer for Class Incremental Learning<br>:star:code
- Learning With Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning
- CafeBoost: Causal Feature Boost To Eliminate Task-Induced Bias for Class Incremental Learning
40.Adversarial Learning(对抗学习)
- Adversarial Robustness via Random Projection Filters
- Seasoning Model Soups for Robustness to Adversarial and Natural Distribution Shifts
- Dynamic Generative Targeted Attacks With Pattern Injection
- FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits
- Enhancing the Self-Universality for Transferable Targeted Attacks<br>:star:code
- Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization<br>:house:project
- Revisiting Residual Networks for Adversarial Robustness<br>:star:code
- Feature Separation and Recalibration for Adversarial Robustness<br>:star:code
- CFA: Class-wise Calibrated Fair Adversarial Training<br>:star:code
- Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations<br>:house:project
- Efficient Loss Function by Minimizing the Detrimental Effect of Floating-Point Errors on Gradient-Based Attacks
- 黑盒
- 对抗样本
- Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression
- Introducing Competition To Boost the Transferability of Targeted Adversarial Examples Through Clean Feature Mixup<br>:star:code
- Towards Transferable Targeted Adversarial Examples
- Improving the Transferability of Adversarial Samples by Path-Augmented Method
- Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples<br>:star:code
- 后门攻击
- Single Image Backdoor Inversion via Robust Smoothed Classifiers<br>:star:code
- Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency
- You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?
- MEDIC: Remove Model Backdoors via Importance Driven Cloning<br>:star:code
- Backdoor Defense via Adaptively Splitting Poisoned Dataset<br>:star:code
- Detecting Backdoors in Pre-trained Encoders<br>:star:code
- Color Backdoor: A Robust Poisoning Attack in Color Space
- Detecting Backdoors in Pre-Trained Encoders<br>:star:code
- 对抗攻击
- Adversarial Attack with Raindrops
- Progressive Backdoor Erasing via Connecting Backdoor and Adversarial Attacks
- Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial Attacks
- The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks
- Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization
- Robust Single Image Reflection Removal Against Adversarial Attacks
- Transferable Adversarial Attacks on Vision Transformers With Token Gradient Regularization<br>:star:code
- StyLess: Boosting the Transferability of Adversarial Examples
- Re-thinking Model Inversion Attacks Against Deep Neural Networks<br>:star:code
- Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning<br>:star:code
- Jedi: Entropy-based Localization and Removal of Adversarial Patches
- 后门防御
- 对抗训练
39.Continual Learning(持续学习)
- Dealing With Cross-Task Class Discrimination in Online Continual Learning
- Heterogeneous Continual Learning
- Batch Model Consolidation: A Multi-Task Model Consolidation Framework
- CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning<br>:star:code
- Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning<br>:star:code
- Computationally Budgeted Continual Learning: What Does Matter?<br>:star:code
- Achieving a Better Stability-Plasticity Trade-Off via Auxiliary Networks in Continual Learning
- Preserving Linear Separability in Continual Learning by Backward Feature Projection
- Regularizing Second-Order Influences for Continual Learning<br>:star:code
- Rethinking Gradient Projection Continual Learning: Stability / Plasticity Feature Space Decoupling
- MetaMix: Towards Corruption-Robust Continual Learning With Temporally Self-Adaptive Data Transformation
- Exploring Data Geometry for Continual Learning
- PCR: Proxy-based Contrastive Replay for Online Class-Incremental Continual Learning
- Bilateral Memory Consolidation for Continual Learning
- Adaptive Plasticity Improvement for Continual Learning
- Real-Time Evaluation in Online Continual Learning: A New Hope
- PIVOT: Prompting for Video Continual Learning
38.Meta-Learning(元学习)
- Meta-Learning with a Geometry-Adaptive Preconditioner<br>:star:code元学习
- Improving Generalization of Meta-Learning With Inverted Regularization at Inner-Level
- Ground-Truth Free Meta-Learning for Deep Compressive Sampling
- HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization
- Panoptic Compositional Feature Field for Editable Scene Rendering With Network-Inferred Labels via Metric Learning
37.Contrastive Learning(对比学习)
- Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning
- Difficulty-Based Sampling for Debiased Contrastive Representation Learning
- MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining<br>:star:code
- Twin Contrastive Learning with Noisy Labels<br>:star:code
- Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
- Best of Both Worlds: Multimodal Contrastive Learning With Tabular and Imaging Data
- CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose
- ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-Real Novel View Synthesis via Contrastive Learning
- Hyperbolic Contrastive Learning for Visual Representations beyond Objects<br>:star:code
- 非对比学习
36.Optical Flow(光流估计)
- Rethinking Optical Flow from Geometric Matching Consistent Perspective<br>:star:code
- DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling
- AnyFlow: Arbitrary Scale Optical Flow with Implicit Neural Representation
- TransFlow: Transformer as Flow Learner
- Tangentially Elongated Gaussian Belief Propagation for Event-Based Incremental Optical Flow Estimation<br>:star:code
- FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
35.OCR
- 文本识别
- 场景文本检测
- 表格结构识别
- 字体生成
- 手写文本生成
- 矢量字体合成
- 生成图形文档
- 文本检测
- 文档处理
- Scene Text Spotting
34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)
- Network Expansion for Practical Training Acceleration<br>:star:code
- Accelerating Dataset Distillation via Model Augmentation
- Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks<br>:star:code
- 量化
- Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective<br>:star:code
- Adaptive Data-Free Quantization<br>:star:code
- Defining and Quantifying the Emergence of Sparse Concepts in DNNs
- NIPQ: Noise Proxy-Based Integrated Pseudo-Quantization
- Bit-Shrinking: Limiting Instantaneous Sharpness for Improving Post-Training Quantization
- Genie: Show Me the Data for Quantization
- One-Shot Model for Mixed-Precision Quantization
- Post-training Quantization on Diffusion Models<br>:star:code
- Q-DETR: An Efficient Low-Bit Quantized Detection Transformer<br>:star:code
- NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
- PD-Quant: Post-Training Quantization Based on Prediction Difference Metric<br>:star:code
- Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
- 剪枝
- CP$^3$: Channel Pruning Plug-in for Point-based Networks
- Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures
- Global Vision Transformer Pruning With Hessian-Aware Saliency
- X-Pruner: eXplainable Pruning for Vision Transformers<br>:star:code
- DepGraph: Towards Any Structural Pruning
- Progressive Neighbor Consistency Mining for Correspondence Pruning<br>:star:code
- Training Debiased Subnetworks With Contrastive Weight Pruning
- MC
- KD
- DisWOT: Student Architecture Search for Distillation WithOut Training<br>:star:code
- Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint
- Supervised Masked Knowledge Distillation for Few-Shot Transformers<br>:star:code
- Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation<br>:star:code
- KD-DLGAN: Data Limited Image Generation via Knowledge Distillation
- TinyMIM: An Empirical Study of Distilling MIM Pre-Trained Models(https://github.com/OliverRensu/TinyMIM)
- Masked Autoencoders Enable Efficient Knowledge Distillers<br>:star:[code]<br>:star:code
- Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning
- Class Attention Transfer Based Knowledge Distillation<br>:star:code
- DaFKD: Domain-Aware Federated Knowledge Distillation
- Multi-Level Logit Distillation<br>:star:code
- A Unified Knowledge Distillation Framework for Deep Directed Graphical Models<br>:star:code
- Enhanced Multimodal Representation Learning with Cross-modal KD
- Constructing Deep Spiking Neural Networks From Artificial Neural Networks With Knowledge Distillation
- Learning To Retain While Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation
- 对抗性蒸馏
- 轻量级网络
- 去量化
33.Human-Object Interaction(人物交互)
- Visibility Aware Human-Object Interaction Tracking From Single RGB Camera
- Affordance Diffusion: Synthesizing Hand-Object Interactions
- HOICLIP: Efficient Knowledge Transfer for HOI Detection With Vision-Language Models<br>:star:code
- ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection<br>:star:code
- Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework
- Detecting Human-Object Contact in Images<br>:house:project
- Category Query Learning for Human-Object Interaction Classification
- Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
- Relational Context Learning for Human-Object Interaction Detection
- HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models<br>:star:code
- ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection<br>:star:code
- Visibility Aware Human-Object Interaction Tracking from Single RGB Camera<br>:house:project
- Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream
- A Neural Modeling Pipeline on Multi-View Human-Object Interactions
- 双手交互
- 手物交互
32.Data Augmentation(数据增强)
- Full or Weak annotations? An adaptive strategy for budget-constrained annotation campaigns
- SLACK: Stable Learning of Augmentations With Cold-Start and KL Regularization<br>:star:code
- 学习库
- 关键点定位
- 关键点检测
31.Vision-Language(视觉语言)
- Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
- InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
- GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods
- Learning To Exploit Temporal Structure for Biomedical Vision-Language Processing
- REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory
- Policy Adaptation from Foundation Model Feedback<br>:house:project
- Learning Visual Representations via Language-Guided Sampling
- LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models
- Scaling Language-Image Pre-Training via Masking
- MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model
- Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles<br>:star:code
- Few-Shot Learning With Visual Distribution Calibration and Cross-Modal Distribution Alignment<br>:star:code
- ConStruct-VL: Data-Free Continual Structured VL Concepts Learning<br>:star:code
- Teaching Structured Vision & Language Concepts to Vision & Language Models<br>:star:code
- Leveraging per Image-Token Consistency for Vision-Language Pre-Training
- Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks<br>:house:project
- CREPE: Can Vision-Language Foundation Models Reason Compositionally?
- Open-vocabulary Attribute Detection<br>:house:project
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training<br>:star:code
- FashionSAP: Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training
- Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language
- Task Residual for Tuning Vision-Language Models<br>:star:code
- Masked Autoencoding Does Not Help Natural Language Supervision at Scale
- Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator
- Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization
- Position-Guided Text Prompt for Vision-Language Pre-Training<br>:star:code
- RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training
- FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks<br>:star:code
- Seeing What You Miss: Vision-Language Pre-Training With Semantic Completion Learning
- You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
- DeAR: Debiasing Vision-Language Models with Additive Residuals
- Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
- Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding<br>:star:code
- VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
- MAGVLT: Masked Generative Vision-and-Language Transformer
- Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
- Top-Down Visual Attention from Analysis by Synthesis<br>:house:project
- Accelerating Vision-Language Pretraining with Free Language Modeling<br>:star:code
- Multi-Modal Representation Learning with Text-Driven Soft Masks
- Fine-tuned CLIP models are efficient video learners<br>:star:code
- MaPLe: Multi-modal Prompt Learning<br>:star:code
- Learning to Name Classes for Vision and Language Models
- Dynamic Inference With Grounding Based Vision and Language Models
- Connecting Vision and Language with Video Localized Narratives<br>:house:project
- Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models<br>:star:code
- Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks<br>:star:code
- VILA: Learning Image Aesthetics From User Comments With Vision-Language Pretraining<br>:star:code
- VLN
- Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding<br>:house:project
- Lana: A Language-Capable Navigator for Instruction Following and Generation<br>:star:code
- LANA: A Language-Capable Navigator for Instruction Following and Generation
- KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation<br>:star:code
- Improving Vision-and-Language Navigation by Generating Future-View Image Semantics<br>:star:code
- Iterative Vision-and-Language Navigation
- Behavioral Analysis of Vision-and-Language Navigation Agents
- Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation
- GeoVLN: Learning Geometry-Enhanced Visual Representation With Slot Attention for Vision-and-Language Navigation
- A New Path: Scaling Vision-and-Language Navigation With Synthetic Instructions and Imitation Learning
- Layout-Based Causal Inference for Object Navigation<br>:star:code
- KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
- 视频语言
- Test of Time: Instilling Video-Language Models with a Sense of Time<br>:house:project
- All in One: Exploring Unified Video-Language Pre-Training<br>:star:code
- HierVL: Learning Hierarchical Video-Language Embeddings
- An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling<br>:star:code
- Clover: Towards A Unified Video-Language Alignment and Fusion Model<br>:star:code<br>Clover 视频-文本预训练模型在 DiDeMo、MSRVTT 和 LSMDC 三个文本-视频检索任务上取得了 zero-shot 及 finetune performance 的最佳表现;在 8 个主流的视频问答 benchmark 上也达到了新的 state-of-the-art。
- VindLU: A Recipe for Effective Video-and-Language Pretraining<br>:star:code
- LLM
- visual grounding
- 视觉对话
30.Visual Answer Questions(视觉问答)
- VQA
- SimVQA: Exploring Simulated Environments for Visual Question Answering<br>:house:project
- From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models
- Logical Implications for Visual Question Answering Consistency
- S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
- RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases<br>:star:code
- VQACL: A Novel Visual Question Answering Continual Learning Setting<br>:star:code
- Q: How To Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!<br>:star:code
- Improving Selective Visual Question Answering by Learning From Your Peers<br>:star:code
- MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
- Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering<br>:star:code
- MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
- Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning<br>:star:code
- Prompting Large Language Models With Answer Heuristics for Knowledge-Based Visual Question Answering<br>:star:code
- Generative Bias for Robust Visual Question Answering
- Video-QA
29.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)
- 机器人
- PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations
- Affordances From Human Videos as a Versatile Representation for Robotics
- Markerless Camera-to-Robot Pose Estimation via Self-supervised Sim-to-Real Transfer
- Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation From Image Sequence<br>:house:project
- Phone2Proc: Bringing Robust Robots Into Our Chaotic World<br>:house:project
- DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects<br>:house:project
- Learning Human-to-Robot Handovers from Point Clouds<br>:star:code
- Neural Volumetric Memory for Visual Locomotion Control<br>:star:code
- Affordances from Human Videos as a Versatile Representation for Robotics<br>:star:code
- NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models机器人
- 机器手抓取
- UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy<br>:house:project
- UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning
- Target-Referenced Reactive Grasping for Dynamic Objects<br>:house:project
- Visual Navigation(视觉导航)
- SLAM
- Efficient Map Sparsification Based on 2D and 3D Discretized Grids<br>:star:code
- Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM<br>:star:code
- ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of Signed Distance Fields<br>:house:project
- ObjectMatch: Robust Registration Using Canonical Object Correspondences<br>:house:project
- vMAP: Vectorised Object Mapping for Neural Field SLAM<br>:house:project
- 虚拟试穿
- GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning<br>:star:code
- TryOnDiffusion: A Tale of Two UNets
- Linking Garment With Person via Semantically Associated Landmarks for Virtual Try-On<br>:house:project
- Synthesizing Photorealistic Virtual Humans Through Cross-Modal Disentanglement
- AR/VR
- Affordance Grounding from Demonstration Video to Target Image<br>:star:code
- GarmentTracking: Category-Level Garment Pose Tracking<br>:house:project
- Object Pop-Up: Can We Infer 3D Objects and Their Poses From Human Interactions Alone?
- Learning to Zoom and Unzoom<br>:star:code
- Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time Mobile Telepresence
- Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model<br>:star:codeVR/AR
- Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-Time Mobile Telepresence
- Affordance Grounding From Demonstration Video To Target Image<br>:star:code
- Hand Avatar: Free-Pose Hand Animation and Rendering from Monocular Video<br>:house:project
- 混合现实
- Visual Localization(视觉定位)
- OrienterNet: Visual Localization in 2D Public Maps with Neural Matching
- Visual Localization using Imperfect 3D Models from the Internet
- SFD2: Semantic-Guided Feature Detection and Description<br>:star:code
- SegLoc: Learning Segmentation-Based Representations for Privacy-Preserving Visual Localization
- Long-term Visual Localization with Mobile Sensors
- Paired-Point Lifting for Enhanced Privacy-Preserving Visual Localization
- VPR(Visual Place Recognition)
- 视觉里程计
28.Style Transfer(风格迁移)
- CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer
- StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer<br>:star:code
- Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer<br>:star:code
- Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer
- Neural Preset for Color Style Transfer<br>:house:project
- Learning Dynamic Style Kernels for Artistic Style Transfer
- Inversion-Based Style Transfer With Diffusion Models<br>:star:code
- QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity<br>:star:code
- 文本驱动的室内风格化
27.Pose Estimation(物体姿势估计)
- 物体姿势估计
- Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation
- SMOC-Net: Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation
- HS-Pose: Hybrid Scope Feature Extraction for Category-level Object Pose Estimation
- TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation<br>:house:project
- IMP: Iterative Matching and Pose Estimation with Adaptive Pooling<br>:star:code
- 6D
- Rigidity-Aware Detection for 6D Object Pose Estimation
- Neural Texture Learning for Self-Supervised 6D Object Pose Estimation
- Shape-Constraint Recurrent Flow for 6D Object Pose Estimation
- Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions
- HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling<br>:house:project
- 4D
- 动物姿态估计
26.GCN/GNN
- GNN
25.Fine-Grained/Image Classification(细粒度/图像分类)
- Quantum-Inspired Spectral-Spatial Pyramid Network for Hyperspectral Image Classification
- Learning Partial Correlation Based Deep Visual Representation for Image Classification
- iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition
- I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
- Soft Augmentation for Image Classification<br>:star:code
- Explaining Image Classifiers With Multiscale Directional Image Representation
- Equiangular Basis Vectors<br>:star:code
- Prefix Conditioning Unifies Language and Label Supervision
- Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
- Boosting Verified Training for Robust Image Classifications via Abstraction<br>:star:code
- Semantic Prompt for Few-Shot Image Recognition
- Regularization of polynomial networks for image recognition<br>:star:code
- Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm<br>:star:code
- Dynamic Conceptional Contrastive Learning for Generalized Category Discovery<br>:star:code
- Learning Bottleneck Concepts in Image Classification<br>:house:project<br>:star:code
- Learning Partial Correlation based Deep Visual Representation for Image Classification
- PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification<br>:star:code
- Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
- 小样本图像分类
- 小样本分类
- Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners<br>:star:code
- Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings<br>:star:code
- Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation
- 细粒度
- Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems<br>:star:code
- Fine-Grained Classification with Noisy Labels
- An Erudite Fine-Grained Visual Classification Model<br>:star:code
- Weakly Supervised Posture Mining for Fine-Grained Classification
- Learning Attribute and Class-Specific Representation Duet for Fine-Grained Fashion Analysis
- 视觉识别
- 长尾分类
- 长尾视觉识别
- SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail
- Balanced Product of Calibrated Experts for Long-Tailed Recognition<br>:star:code
- FCC: Feature Clusters Compression for Long-Tailed Visual Recognition<br>:star:code
- Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment<br>:star:code
- Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions<br>:star:code
- Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
- Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition<br>:star:code
- No One Left Behind: Improving the Worst Categories in Long-Tailed Learning
- 多标签分类
- 多标签识别
- 多视觉分类
- Superclass Learning(超类学习)
- 材料分类
24.Super-Resolution(超分辨率)
- Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution
- Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit
- N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution<br>:star:code
- Perception-Oriented Single Image Super-Resolution Using Optimal Objective Estimation<br>:star:code
- Toward Stable, Interpretable, and Lightweight Hyperspectral Super-Resolution
- CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution
- Zero-Shot Dual-Lens Super-Resolution<br>:star:code
- Non-Line-of-Sight Imaging With Signal Superresolution Network
- Kernel Aware Resampler
- RobustNeRF: Ignoring Distractors With Robust Losses
- 光场超分辨率
- ISR
- OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution
- Activating More Pixels in Image Super-Resolution Transformer<br>:star:code
- Equivalent Transformation and Dual Stream Network Construction for Mobile Image Super-Resolution<br>:star:code
- Learning Generative Structure Prior for Blind Text Image Super-Resolution
- Human Guided Ground-Truth Generation for Realistic Image Super-Resolution<br>:star:code
- OSRT: Omnidirectional Image Super-Resolution With Distortion-Aware Transformer<br>:star:code
- CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input
- Memory-Friendly Scalable Super-Resolution via Rewinding Lottery Ticket Hypothesis
- B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution<br>:star:code
- Rethinking Image Super Resolution From Long-Tailed Distribution Learning Perspective
- Correspondence Transformers With Asymmetric Feature Learning and Matching Flow Super-Resolution<br>:star:code
- Toward Accurate Post-Training Quantization for Image Super Resolution<br>:star:code
- Image Super-Resolution Using T-Tetromino Pixels
- Spectral Bayesian Uncertainty for Image Super-Resolution
- Super-Resolution Neural Operator<br>:star:code
- Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution
- Better "CMOS" Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution
- Human Guided Ground-truth Generation for Realistic Image Super-resolution<br>:star:code
- Implicit Diffusion Models for Continuous Super-Resolution
- Better "CMOS" Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution
- Guided Depth Super-Resolution by Deep Anisotropic Diffusion<br>:star:code
- Omni Aggregation Networks for Lightweight Image Super-Resolution<br>:star:code
- VSR
- Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting<br>:star:code
- Compression-Aware Video Super-Resolution<br>:star:code
- Structured Sparsity Learning for Efficient Video Super-Resolution<br>:star:code
- Consistent Direct Time-of-Flight Video Depth Super-Resolution<br>:star:code
- Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution
- 文本图像超分辨率
- Image Resampling(图像重采样)
23.Image Retrieval(图像检索)
- Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval
- Asymmetric Feature Fusion for Image Retrieval
- Improving Image Recognition by Retrieving From Web-Scale Image-Text Data
- Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval<br>:star:code
- Revisiting Self-Similarity: Structural Embedding for Image Retrieval<br>:star:code
- Train/Test-Time Adaptation With Retrieval
- Pic2Word: Mapping Pictures to Words for Zero-Shot Composed Image Retrieval<br>:star:code
- 基于草图的图像检索
- 视频-文本检索
- 视频-文本
- SViTT: Temporal Learning of Sparse Video-Text Transformers<br>:house:project视频文本检索和问答
- 多模态检索
- 跨模态检索
- 文本-图像匹配
- 图像文本检索
- 文本-视频检索
- 视频语言检索
22.Image Synthesis/Generation(图像合成)
- LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation<br>:star:code
- Zero-shot Generative Model Adaptation via Image-specific Prompt Learning<br>:star:code
- TopNet: Transformer-based Object Placement Network for Image Compositing
- 基于草图生成
- 图像-视频合成
- 海报生成
- 文本-图像合成
- Variational Distribution Learning for Unsupervised Text-to-Image Generation
- ReCo: Region-Controlled Text-to-Image Generation
- Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style
- Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
- DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model
- Multi-Concept Customization of Text-to-Image Diffusion<br>:house:project
- Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models<br>:house:project
- Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models<br>:star:code
- GLIGEN: Open-Set Grounded Text-to-Image Generation
- RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts<br>:star:code
- GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis<br>:star:code
- Shifted Diffusion for Text-to-image Generation<br>:star:code
- Conditional Text Image Generation With Diffusion Models
- Scaling Up GANs for Text-to-Image Synthesis<br>:house:project
- prompting
- 图像生成
- LayoutDM: Discrete Diffusion Model for Controllable Layout Generation<br>:house:project
- Private Image Generation With Dual-Purpose Auxiliary Classifier
- Unsupervised Domain Adaption With Pixel-Level Discriminator for Image-Aware Layout Generation
- SpaText: Spatio-Textual Representation for Controllable Image Generation<br>:house:project
- MaskSketch: Unpaired Structure-Guided Masked Image Generation
- Where Is My Spot? Few-Shot Image Generation via Latent Subspace Optimization<br>:star:code
- Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation<br>:star:code
- Controllable Mesh Generation Through Sparse Latent Point Diffusion Models<br>:house:project
- NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs<br>:house:project
- Exploring Incompatible Knowledge Transfer in Few-shot Image Generation
- Wavelet Diffusion Models Are Fast and Scalable Image Generators<br>:star:code
- Picture That Sketch: Photorealistic Image Generation From Abstract Sketches<br>:house:project
- DiffCollage: Parallel Generation of Large Content with Diffusion Models<br>:house:project
- Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization<br>:star:code
- LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation<br>:star:code
- Domain Expansion of Image Generators<br>:house:project
- 视频生成
- Image Synthesis(图像合成)
- Learning 3D-aware Image Synthesis with Unknown Pose Distribution<br>:house:project
- Few-Shot Semantic Image Synthesis With Class Affinity Transfer
- Fake It Till You Make It: Learning Transferable Representations From Synthetic ImageNet Clones
- 3D-Aware Conditional Image Synthesis
- SceneComposer: Any-Level Semantic Image Synthesis<br>:house:project
- RWSC-Fusion: Region-Wise Style-Controlled Fusion Network for the Prohibited X-Ray Security Image Synthesis
- Exploring Intra-Class Variation Factors With Learnable Cluster Prompts for Semi-Supervised Image Synthesis
- Quantitative Manipulation of Custom Attributes on 3D-Aware Image Synthesis
- Inferring and Leveraging Parts From Object Shape for Improving Semantic Image Synthesis<br>:star:code
- MAGE: MAsked Generative Encoder To Unify Representation Learning and Image Synthesis<br>:star:code
- Person Image Synthesis via Denoising Diffusion Model
- Freestyle Layout-to-Image Synthesis<br>:star:code
- Few-shot Semantic Image Synthesis with Class Affinity Transfer图像合成
- Regularized Vector Quantization for Tokenized Image Synthesis
- High-Fidelity Guided Image Synthesis with Latent Diffusion Models<br>:house:project
- PixHt-Lab: Pixel Height Based Light Effect Generation for Image Compositing<br>:house:project
- 文本-运动生成
- 纹理合成
21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)
- TopDiG: Class-Agnostic Topological Directional Graph Extraction From Remote Sensing Images
- Change-Aware Sampling and Contrastive Learning for Satellite Images
- MethaneMapper: Spectral Absorption aware Hyperspectral Transformer for Methane Detection
- ViTs for SITS: Vision Transformers for Satellite Image Time Series
- Adaptive Sparse Convolutional Networks With Global Context Enhancement for Faster Object Detection on Drone Images<br>:star:code
- 图像检测
- 跟踪
- 雷达定位
- 无人机目标检测
20.Autonomous vehicles(自动驾驶)
- 自动驾驶
- UniSim: A Neural Closed-Loop Sensor Simulator<br>:house:project
- Planning-Oriented Autonomous Driving
- Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in Autonomous Driving
- TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving<br>:star:code
- Weakly Supervised Class-Agnostic Motion Prediction for Autonomous Driving
- Learning and Aggregating Lane Graphs for Urban Automated Driving<br>:star:code
- RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving<br>:star:code
- Azimuth Super-Resolution for FMCW Radar in Autonomous Driving
- Unsupervised 3D Point Cloud Representation Learning by Triangle Constrained Contrast for Autonomous Driving<br>:house:project
- Localized Semantic Feature Mixers for Efficient Pedestrian Detection in Autonomous Driving
- DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization
- Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
- ReasonNet: End-to-End Driving with Temporal and Global Reasoning
- LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation<br>:star:code
- Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction<br>:star:code
- Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving
- MSeg3D: Multi-Modal 3D Semantic Segmentation for Autonomous Driving<br>:star:code
- 轨迹预测
- IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction
- ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries
- Query-Centric Trajectory Prediction
- Leapfrog Diffusion Model for Stochastic Trajectory Prediction<br>:star:code
- Uncovering the Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction<br>:star:code
- FEND: A Future Enhanced Distribution-Aware Contrastive Learning Framework for Long-tail Trajectory Prediction
- Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction
- Stimulus Verification Is a Universal and Effective Sampler in Multi-Modal Human Trajectory Prediction
- Place Recognition
- 车道线检测
- 鸟瞰识别
19.Neural Architecture Search(神经架构搜索)
- PA&DA: Jointly Sampling PAth and DAta for Consistent NAS<br>:star:code
- Differentiable Architecture Search With Random Features
- Adversarially Robust Neural Architecture Search for Graph Neural Networks
- MDL-NAS: A Joint Multi-Domain Learning Framework for Vision Transformer
- HOTNAS: Hierarchical Optimal Transport for Neural Architecture Search
- EMT-NAS:Transferring Architectural Knowledge Between Tasks From Different Datasets
18.Person Re-Identification(人员重识别)
- Towards Modality-Agnostic Person Re-Identification With Descriptive Query
- Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning<br>:star:code
- Patch-Wise High-Frequency Augmentation for Transformer-Based Person Re-Identification<br>:star:code
- TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification<br>:star:code
- 人员检索
- 换衣重识别
- 可见光-红外人员重识别(VIReID)
- Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification<br>:star:code
- Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
- PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-identification可见光-红外人员重识别(VI-ReID)
- Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternate Learning
- G-ReID
- 行人检测
- 人群计数
- 步态识别
- Dynamic Aggregated Network for Gait Recognition<br>:star:code
- LidarGait: Benchmarking 3D Gait Recognition With Point Clouds<br>:house:project
- GaitGCI: Generative Counterfactual Intervention for Gait Recognition
- OpenGait: Revisiting Gait Recognition Towards Better Practicality
- Multi-Modal Gait Recognition via Effective Spatial-Temporal Feature Fusion
17.Medical Image(医学影像)
- Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-Training
- Interventional Bag Multi-Instance Learning on Whole-Slide Pathological Images<br>:star:code
- Causally-Aware Intraoperative Imputation for Overall Survival Time Prediction
- Flexible-Cm GAN: Towards Precise 3D Dose Prediction in Radiotherapy
- Towards Trustable Skin Cancer Diagnosis via Rewriting Model’s Decision
- Hierarchical discriminative learning improves visual representations of biomedical microscopy<br>:house:project
- Topology-Guided Multi-Class Cell Context Generation for Digital Pathology
- Image Quality-aware Diagnosis via Meta-knowledge Co-embedding
- METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens医学诊断
- 3D医学
- 图像配准
- 图像分类
- ask-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification<br>:star:code
- RankMix: Data Augmentation for Weakly Supervised Learning of Classifying Whole Slide Images With Diverse Sizes and Imbalanced Categories
- Grounding Counterfactual Explanation of Image Classifiers to Textual Concept Space
- PEFAT: Boosting Semi-Supervised Medical Image Classification via Pseudo-Loss Estimation and Feature Adversarial Training<br>:star:code
- Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology Whole Slide Image Classification<br>:star:code
- A Loopback Network for Explainable Microvascular Invasion Classification
- 报告生成
- 医学影像分割
- Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation
- SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation
- Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation<br>:star:code
- Fair Federated Medical Image Segmentation via Client Contribution Estimation
- Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation<br>:star:code
- Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization
- Weakly supervised segmentation with point annotations for histopathology images via contrast-based variational model
- MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation<br>:star:code
- Rethinking Few-Shot Medical Segmentation: A Vector Quantization View
- Devil Is in the Queries: Advancing Mask Transformers for Real-World Medical Image Segmentation and Out-of-Distribution Localization
- MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery<br>:star:code
- Ambiguous Medical Image Segmentation Using Diffusion Models
- Directional Connectivity-Based Segmentation of Medical Images
- 医学影像分析
- 肿瘤分割
- 医学影像报告生成
- Interactive and Explainable Region-guided Radiology Report Generation<br>:star:code自动生成放射学报告
- 切片分析
- 细胞检测、跟踪与计数
- DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting
- Overlapped Cell on Tissue Dataset for Histopathology
- Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses<br>:star:code
- Masked Autoencoder Guided Segmentation at Pixel Resolution for Accurate, Self-Supervised Subcellular Structure Recognition<br>:star:code
- 单目内窥镜跟踪
- 皮肤癌诊断
- MRI 重建
- 生物医学
16.Semi/self-supervised learning(半/自监督)
- 无监督学习
- 自监督
- Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning
- StepFormer: Self-Supervised Step Discovery and Localization in Instructional Videos
- DLBD: A Self-Supervised Direct-Learned Binary Descriptor
- Canonical Fields: Self-Supervised Learning of Pose-Canonicalized Neural Fields
- Self-Supervised Learning From Images With a Joint-Embedding Predictive Architecture
- Defending Against Patch-Based Backdoor Attacks on Self-Supervised Learning<br>:star:code
- DrapeNet: Garment Generation and Self-Supervised Draping<br>:star:code
- Neural Congealing: Aligning Images to a Joint Semantic Atlas<br>:house:project
- Self-Supervised AutoFlow
- Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need<br>:star:code
- Siamese Image Modeling for Self-Supervised Vision Representation Learning<br>:star:code
- SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow<br>:star:code
- Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning<br>:star:code
- Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual Correspondence
- Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss
- Evolved Part Masking for Self-Supervised Learning
- Towards Professional Level Crowd Annotation of Expert Domain Data
- ALSO: Automotive Lidar Self-Supervision by Occupancy Estimation<br>:star:code
- Correlational Image Modeling for Self-Supervised Visual Pre-Training
- Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks<br>:star:code<br>:thumbsup:CVPR 2023 深挖无标签数据价值!自监督学习框架SOLIDER:用于以人为中心的视觉
- Mixed Autoencoder for Self-supervised Visual Representation Learning
- Siamese DETR<br>:star:code
- Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
- Self-Supervised Representation Learning for CAD
- 半监督
- Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data<br>:star:code
- HyperMatch: Noise-Tolerant Semi-Supervised Learning via Relaxed Contrastive Constraint
- Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning
- DualRel: Semi-Supervised Mitochondria Segmentation From a Prototype Perspective
- CHMATCH:Contrastive Hierarchical Matching and Robust Adaptive Threshold Boosted Semi-Supervised Learning<br>:star:code
- ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-supervised Learning
- Class Balanced Adaptive Pseudo Labeling for Federated Semi-Supervised Learning<br>:star:code
- MarginMatch:Improving Semi-Supervised Learning with Pseudo-Margins<br>:star:code
- Semi-Supervised Learning Made Simple With Self-Supervised Clustering
- 弱监督
15.Vision Transformers
- Transformer-Based Learned Optimization
- Teaching Matters: Investigating the Role of Supervision in Vision Transformers
- Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
- PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers
- NLOST: Non-Line-of-Sight Imaging with Transformer
- SVGformer: Representation Learning for Continuous Vector Graphics Using Transformers
- Adversarial Normalization: I Can visualize Everything (ICE)<br>:star:code
- Hint-Aug: Drawing Hints From Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning<br>:star:code
- PanoSwin: A Pano-Style Swin Transformer for Panorama Understanding
- D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers
- NAR-Former: Neural Architecture Representation Learning Towards Holistic Attributes Prediction<br>:star:code
- DropKey for Vision Transformer
- Integrally Pre-Trained Transformer Pyramid Networks<br>:star:code
- DSVT: Dynamic Sparse Voxel Transformer With Rotated Sets
- Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
- Trade-Off Between Robustness and Accuracy of Vision Transformers
- A Light Touch Approach to Teaching Transformers Multi-view Geometry
- Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers<br>:star:code
- RGB no more: Minimally-decoded JPEG Vision Transformers
- Making Vision Transformers Efficient from A Token Sparsification View<br>:star:code
- Blur Interpolation Transformer for Real-World Motion from Blur<br>:star:code
- Neighborhood Attention Transformer<br>:star:code
- MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers<br>:star:code
- Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers<br>:house:project
- Improving Robustness of Vision Transformers by Reducing Sensitivity To Patch Corruptions
- Latency Matters: Real-Time Action Forecasting Transformer<br>:star:code
- OmniMAE: Single Model Masked Pretraining on Images and Videos<br>:star:code
- MAGVIT: Masked Generative Video Transformer<br>:house:project
- Learning Imbalanced Data with Vision Transformers<br>:star:code
- Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves<br>:house:project
- AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images<br>:star:code
- Generic-to-Specific Distillation of Masked Autoencoders<br>:star:code
- BiFormer: Vision Transformer with Bi-Level Routing Attention<br>:star:code
- Making Vision Transformers Efficient from A Token Sparsification View
- Dual-path Adaptation from Image to Video Transformers<br>:star:code
- Spherical Transformer for LiDAR-based 3D Recognition<br>:star:code
- MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models<br>:star:code
- Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
- Learning Expressive Prompting With Residuals for Vision Transformers
- SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer<br>:house:project
- Visual Dependency Transformers: Dependency Tree Emerges from Reversed AttentionTransformer
- Token Boosting for Robust Self-Supervised Visual Transformer Pre-trainingTransformer
- Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention<br>:star:code
- RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer<br>:star:code
- DropKey<br>:thumbsup:CVPR 2023|两行代码高效缓解视觉Transformer过拟合,美图&国科大联合提出正则化方法DropKey
- Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers<br>:star:code
- EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention<br>:star:code
- TrojViT: Trojan Insertion in Vision Transformers
- Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference<br>:house:project
- ResFormer: Scaling ViTs with Multi-Resolution Training<br>:star:code
- Vision Transformer With Super Token Sampling<br>:star:code
- Vision Transformers Are Good Mask Auto-Labelers
14.Video
- PointAvatar: Deformable Point-Based Head Avatars From Videos
- Video Probabilistic Diffusion Models in Projected Latent Space
- Masked Motion Encoding for Self-Supervised Video Representation Learning<br>:star:code
- Modular Memorability: Tiered Representations for Video Memorability Prediction<br>:star:code
- Language-Guided Music Recommendation for Video via Prompt Analogies
- Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
- 1000 FPS HDR Video With a Spike-RGB Hybrid Camera<br>:house:project
- Egocentric Video Task Translatio<br>:house:project
- Relational Space-Time Query in Long-Form Videos
- Spatial-Then-Temporal Self-Supervised Learning for Video Correspondence<br>:star:code
- Few-Shot Referring Relationships in Videos<br>:house:project
- Aligning Step-by-Step Instructional Diagrams to Video Demonstrations<br>:house:project
- 3D Video Loops From Asynchronous Input<br>:house:project
- VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking<br>:star:code
- Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos<br>:star:code
- StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
- VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
- Implicit View-Time Interpolation of Stereo Videos using Multi-Plane Disparities and Non-Uniform Coordinates<br>:house:project<br>:tv:video
- How You Feelin'? Learning Emotions and Mental States in Movie Scenes<br>:house:project
- 视频时刻检索
- 视频高亮检测
- 视频帧插值
- Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation<br>:star:code
- AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation<br>:star:code
- Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation
- Exploring Discontinuity for Video Frame Interpolation
- Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields<br>:star:code
- A Unified Pyramid Recurrent Network for Video Frame Interpolation<br>:star:code
- Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time<br>:star:code
- BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation<br>:star:code视频帧插值
- Frame Interpolation Transformer and Uncertainty Guidance
- Range-Nullspace Video Frame Interpolation With Focalized Motion Estimation
- 视频合成
- 视频预测
- 视频理解
- Selective Structured State-Spaces for Long-Form Video Understanding
- How you feelin'? Learning Emotions and Mental States in Movie Scenes<br>:star:code
- System-status-aware Adaptive Network for Online Streaming Video Understanding
- LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling<br>:star:code
- System-Status-Aware Adaptive Network for Online Streaming Video Understanding
- Therbligs in Action: Video Understanding Through Motion Primitives
- Streaming Video Model<br>:star:code
- Procedure-Aware Pretraining for Instructional Video Understanding<br>:star:code
- Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations<br>:star:code
- Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring<br>:star:code
- 视频分类
- 视频描述
- 视频摘要
- 视频识别
- Video Deflickering(去闪烁)
- 时间句子定位(TSG)
- VAD
- Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection
- Video Event Restoration Based on Keyframes for Video Anomaly Detection
- Generating Anomalies for Video Anomaly Detection With Prompt-Based Feature Mapping
- Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
- Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection
- Look Around for Anomalies: Weakly-Supervised Anomaly Detection via Context-Motion Relational Learning
- 视频异常定位
- 视频镜像检测
- Learning To Detect Mirrors From Videos via Dual Correspondences<br>:house:project
- 视频表示学习
- Weakly Supervised Video Representation Learning With Unaligned Text for Sequential Videos<br>:star:code
- Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations<br>:star:code
- Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning<br>:star:code
- Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning<br>:star:code
- Video Paragraph Grounding
- Video Grounding
- Text-Visual Prompting for Efficient 2D Temporal Video Grounding
- WINNER: Weakly-Supervised hIerarchical decompositioN and aligNment for Spatio-tEmporal Video gRounding
- Iterative Proposal Refinement for Weakly-Supervised Video Grounding
- Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding
- ProTeGe: Untrimmed Pretraining for Video Temporal Grounding by Video Temporal Grounding
- 视频阴影检测
- 视频关键点检测
- 视频情感检测
- 场景检测
13.GAN
- AdaptiveMix: Improving GAN Training via Feature Space Shrinkage
- Masked Auto-Encoders Meet Generative Adversarial Networks and Beyond
- Spider GAN: Leveraging Friendly Neighbors To Accelerate GAN Training
- Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection<br>:star:code
- MoStGAN-V: Video Generation With Temporal Motion Styles<br>:star:code
- Sequential Training of GANs Against GAN-Classifiers Reveals Correlated "Knowledge Gaps" Present Among Independently Trained GAN Instances
- Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration<br>:star:code
- HumanGen: Generating Human Radiance Fields With Explicit Priors
- Bi-Directional Feature Fusion Generative Adversarial Network for Ultra-High Resolution Pathological Image Virtual Re-Staining
- GlassesGAN: Eyewear Personalization Using Synthetic Appearance Discovery and Targeted Subspace Modeling
- Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint<br>:house:project
- 3DAvatarGAN: Bridging Domains for Personalized Editable Avatars<br>:house:project
- GLeaD: Improving GANs With a Generator-Leading Task<br>:house:project
- Transforming the Residuals for Real Image Editing With StyleGAN<br>:star:code
- Improving GAN Training via Feature Space Shrinkage<br>:star:code
- Spider GAN: Leveraging Friendly Neighbors to Accelerate GAN Training
- NoisyTwins: Class-Consistent and Diverse Image Generation through StyleGANs<br>:star:code
- Graph Transformer GANs for Graph-Constrained House Generation
- Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models<br>:star:code
- Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis<br>:star:code
- VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs<br>:star:code
- Discriminator-Cooperated Feature Map Distillation for GAN Compression<br>:star:code
- Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field<br>:star:code
- 图像-文本合成
- 扩散模型
- How to Backdoor Diffusion Models?<br>:star:code
- Diffusion Probabilistic Model Made Slim
- VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models
- Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
- Seeing Beyond the Brain: Conditional Diffusion Model With Sparse Masked Modeling for Vision Decoding
- Self-Guided Diffusion Models
- ObjectStitch: Object Compositing With Diffusion Model
- Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models
- Parallel Diffusion Models of Operator and Image for Blind Inverse Problems
- RGBD2: Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models<br>:house:project
- Dimensionality-Varying Diffusion Process
- TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets<br>:star:code
- Towards Practical Plug-and-Play Diffusion Models<br>:star:code
- All Are Worth Words: A ViT Backbone for Diffusion Models
- Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models<br>:house:project
- Binary Latent Diffusion
- Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
- Lookahead Diffusion Probabilistic Models for Refining Mean Estimation
- EDICT: Exact Diffusion Inversion via Coupled Transformations<br>:star:code
- ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts
- GAN 逆映射
- 3D GAN Inversion With Facial Symmetry Prior<br>:house:project
- NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-Shot Real Image Animation
- Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion<br>:house:project
- High-Fidelity 3D GAN Inversion by Pseudo-Multi-View Optimization<br>:house:project
12.Image-to-Image Translation(图像到图像翻译)
- 3D-Aware Multi-Class Image-to-Image Translation With NeRFs
- Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
- DSI2I: Dense Style for Unpaired Image-to-Image Translation
- Fix the Noise: Disentangling Source Feature for Controllable Domain Translation<br>:star:code
- 3D-Aware Multi-Class Image-to-Image Translation with NeRFs
- LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data<br>:house:project
- Unpaired Image-to-Image Translation With Shortest Path Regularization<br>:star:code
- BBDM: Image-to-Image Translation With Brownian Bridge Diffusion Models
- 图像翻译
- 视频翻译
11.Face(人脸)
- Rethinking Feature-Based Knowledge Distillation for Face Recognition
- Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning
- Learning a 3D Morphable Face Reflectance Model From Low-Cost Data
- CLIP2Protect: Protecting Facial Privacy Using Text-Guided Makeup via Adversarial Latent Search
- Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis From Monocular Image<br>:house:project
- Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in the Wild
- Evading Forensic Classifiers With Attribute-Conditioned Adversarial Faces<br>:star:code
- Improving Fairness in Facial Albedo Estimation via Visual-Textual Cues
- Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization<br>:star:code
- Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation<br>:star:code
- Privacy-Preserving Adversarial Facial Features
- BioNet: A Biologically-Inspired Network for Face Recognition<br>:star:code
- High-Res Facial Appearance Capture From Polarized Smartphone Images
- MARLIN: Masked Autoencoder for Facial Video Representation LearnINg<br>:star:code
- Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition<br>:house:project
- Disentanglement of Pose and Expression for General Video Portrait Editing
- BlendFields: Few-Shot Example-Driven Facial Modeling<br>:star:code
- Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition<br>:thumbsup:CVPR 2023 | 人脸识别路漫漫:清华、北大等提出AT3D人脸识别系统攻击方法
- Collaborative Diffusion for Multi-Modal Face Generation and Editing<br>:star:code<br>:star:code<br>:thumbsup:CVPR 2023 | Collaborative Diffusion 怎样让不同的扩散模型合作?
- Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation
- DiffusionRig: Learning Personalized Priors for Facial Appearance Editing<br>:star:code
- Probabilistic Knowledge Distillation of Face Ensembles
- DCFace: Synthetic Face Generation with Dual Condition Diffusion Model<br>:star:code
- Discrete Point-Wise Attack Is Not Enough: Generalized Manifold Adversarial Attack for Face Recognition
- 3D 人脸
- Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D Images
- Physical-World Optical Adversarial Attacks on 3D Face Recognition
- Learning a 3D Morphable Face Reflectance Model from Low-cost Data<br>:house:project
- NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images<br>:star:code
- FaceLit: Neural 3D Relightable Faces
- Learning Neural Proto-face Field for Disentangled 3D Face Modeling In the Wild
- High-Fidelity 3D Face Generation From Natural Language Descriptions<br>:star:code
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior<br>:star:code
- PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360deg
- 人脸重建
- A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images<br>:house:project
- Graphics Capsule: Learning Hierarchical 3D Face Representations From 2D Images
- FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction<br>:star:code
- Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation
- AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction<br>:house:project
- 人脸恢复
- 人脸对齐
- 人脸匿名化
- 人脸超分辨率
- 裸眼年龄识别
- 情绪识别
- Context De-confounded Emotion Recognition
- Decoupled Multimodal Distilling for Emotion Recognition<br>:star:code
- Multivariate, Multi-Frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation
- Learning Emotion Representations from Verbal and Nonverbal Communication<br>:star:code
- 人像照明
- 人脸活体检测
- 说话头
- OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering<br>:star:code
- High-Fidelity Generalized Emotional Talking Face Generation With Multi-Modal Emotion Space Learning
- Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis
- LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook
- SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation<br>:star:code
- Implicit Neural Head Synthesis via Controllable Local Deformation Fields
- Identity-Preserving Talking Face Generation with Landmark and Appearance Priors<br>:star:code
- Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
- High-Fidelity and Freely Controllable Talking Head Video Generation<br>:house:project
- High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning
- GANHead: Towards Generative Animatable Neural Head Avatars<br>:star:code
- One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field<br>:house:project
- MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation<br>:house:project
- 人脸分割
- 眨眼检测
- 三维头像生成
- Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos<br>:star:code
- Instant Volumetric Head Avatars<br>:house:project
- Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars<br>:house:project
- OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis
- PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360◦
- 人脸表情识别
- 微表情识别
- 人脸合成
- 假脸检测
- Facial Action Unit Detection
- 人脸视频编辑
- 人脸质量评估
- 人脸交换
- 3D-Aware Face Swapping<br>:star:code
- Implicit Identity Driven Deepfake Face Swapping Detection
- StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping<br>:star:code
- Fine-Grained Face Swapping via Regional GAN Inversion<br>:house:project
- DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion
- 人脸聚类
- 人脸修饰
- 三维数字头像
- 音频驱动的人脸重演
- 隐私保护
- 人脸关键点检测
- 头部捕获
- 年龄估计
10.3D(三维重建\三维视觉)
- Structured 3D Features for Reconstructing Controllable Avatars<br>:house:project
- In-Hand 3D Object Scanning from an RGB Sequence
- Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs<br>:star:code
- 3D Concept Learning and Reasoning from Multi-View Images<br>:house:project
- LP-DIF: Learning Local Pattern-Specific Deep Implicit Function for 3D Objects and Scenes<br>:house:project
- DynamicStereo: Consistent Dynamic Depth From Stereo Videos<br>:house:project
- ARO-Net: Learning Implicit Fields from Anchored Radial Observations
- G-MSM:Unsupervised Multi-Shape Matching With Graph-Based Affinity Priors<br>:star:code
- Magic3D: High-Resolution Text-to-3D Content Creation<br>:house:project
- PointListNet: Deep Learning on 3D Point Lists
- Omnimatte3D: Associating Objects and Their Effects in Unconstrained Monocular Video
- HexPlane: A Fast Representation for Dynamic Scenes<br>:house:project
- Energy-Efficient Adaptive 3D Sensing<br>:house:project
- Objaverse: A Universe of Annotated 3D Objects<br>:house:project
- Level-S2fM: Structure from Motion on Neural Level Set of Implicit Surfaces<br>:house:project
- 3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions<br>:star:code
- OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation<br>:house:project<br>:thumbsup:CVPR 2023 Award Candidate | 真实高精三维物体数据集OmniObject3D
- Neural Scene Chronology<br>:house:project
- 3D Neural Field Generation Using Triplane Diffusion<br>:house:project
- Learning Adaptive Dense Event Stereo From the Image Domain
- GANmouflage: 3D Object Nondetection With Texture Fields<br>:house:project
- Learning Accurate 3D Shape Based on Stereo Polarimetric Imaging
- Sphere-Guided Training of Neural Implicit Surfaces<br>:house:project
- PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision<br>:house:project
- Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning<br>:star:code
- Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching<br>:star:code
- SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field<br>:star:code
- 3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process<br>:star:code
- DynamicStereo: Consistent Dynamic Depth from Stereo Videos<br>:house:project
- 3D Concept Learning and Reasoning from Multi-View Images<br>:house:project
- PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360$^{\circ}$<br>:star:code
- Persistent Nature: A Generative Model of Unbounded 3D Worlds<br>:house:project
- TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
- Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization
- Robust Outlier Rejection for 3D Registration With Variational Bayes<br>:star:code
- On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks<br>:star:code
- SUDS: Scalable Urban Dynamic Scenes<br>:house:project
- Understanding and Improving Features Learned in Deep Functional Maps<br>:star:code
- TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering<br>:star:code
- Generalizable Local Feature Pre-training for Deformable Shape Analysis<br>:star:code
- CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects<br>:house:project
- CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes<br>:house:project
- HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images<br>:star:code
- Multi-View Azimuth Stereo via Tangent Space Consistency<br>:star:code
- 3D Line Mapping Revisited<br>:star:code
- NeRF-Supervised Deep Stereo<br>:star:code<br>:star:code
- Robust Outlier Rejection for 3D Registration with Variational Bayes三维
- Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
- Stereo Matching
- Iterative Geometry Encoding Volume for Stereo Matching<br>:star:code
- Masked representation learning for domain generalized stereo matching
- Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation
- Domain Generalized Stereo Matching via Hierarchical Visual Transformation
- Unsupervised Deep Asymmetric Stereo Matching With Spatially-Adaptive Self-Similarity
- High-frequency Stereo Matching Network<br>:star:code
- 三维视觉
- 三维重建
- Neural Lens Modeling<br>:star:code
- Self-Supervised Super-Plane for Neural 3D Reconstruction<br>:star:code
- Multi-Sensor Large-Scale Dataset for Multi-View 3D Reconstruction
- ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction
- Towards Unbiased Volume Rendering of Neural Implicit Surfaces With Geometry Priors
- Multiview Compressive Coding for 3D Reconstruction<br>:house:project
- Multi-View Reconstruction Using Signed Ray Distance Functions (SRDF)
- PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction<br>:house:project
- RealFusion: 360deg Reconstruction of Any Object From a Single Image<br>:house:project
- Deep Polarization Reconstruction With PDAVIS Events<br>:star:code
- RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation<br>:star:code
- Distilling Neural Fields for Real-Time Articulated Shape Reconstruction<br>:house:project
- High-Fidelity Clothed Avatar Reconstruction from a Single Image
- Efficient Second-Order Plane Adjustment
- SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation<br>:house:project
- Reconstructing Animatable Categories From Videos<br>:house:project
- OReX: Object Reconstruction From Planar Cross-Sections Using Neural Fields
- Learning Articulated Shape with Keypoint Pseudo-labels from Web Images<br>:star:code
- SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction<br>:house:project
- 3D Shape Reconstruction of Semi-Transparent Worms
- Power Bundle Adjustment for Large-Scale 3D Reconstruction
- PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces<br>:star:code
- AutoRecon: Automated 3D Object Discovery and Reconstruction<br>:star:code
- 3D Registration with Maximal Cliques
- 3D shape reconstruction of semi-transparent worms
- VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos<br>:star:code
- NeUDF: Leaning Neural Unsigned Distance Fields with Volume Rendering<br>:house:project
- ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-based Consistency<br>:house:project
- BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects<br>:star:code
- PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters<br>:star:code
- Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly
- MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices<br>:house:project
- Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container<br>:star:code
- SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates<br>:star:code
- MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision<br>:house:project
- Scalable, Detailed and Mask-Free Universal Photometric Stereo<br>:star:code
- Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
- NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images<br>:house:project
- Behind the Scenes: Density Fields for Single View Reconstruction<br>:house:project
- VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction
- Surface Reconstruction(曲面重建)
- NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
- Octree Guided Unoriented Surface Reconstruction
- Neuralangelo: High-Fidelity Neural Surface Reconstruction<br>:house:project
- Neural Kernel Surface Reconstruction
- Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections<br>:star:code
- 深度估计
- Fully Self-Supervised Depth Estimation from Defocus Clue<br>:star:code
- Gated Stereo: Joint Depth Estimation From Gated and Wide-Baseline Active Stereo Cues<br>:house:project
- OmniVidar: Omnidirectional Depth Estimation From Multi-Fisheye Images
- Learning To Fuse Monocular and Multi-View Cues for Multi-Frame Depth E<br>:star:code
- SfM-TTR: Using Structure From Motion for Test-Time Refinement of Single-View Depth Networks<br>:star:code
- Shakes on a Plane: Unsupervised Depth Estimation From Unstabilized Photography<br>:house:project
- Depth Estimation From Camera Image and mmWave Radar Point Cloud<br>:star:code
- Deep Depth Estimation From Thermal Image<br>:star:code
- LightedDepth: Video Depth Estimation in Light of Limited Inference View Angles<br>:star:code
- Trap Attention: Monocular Depth Estimation With Manual Traps<br>:star:code
- PlaneDepth: Self-supervised Depth Estimation via Orthogonal Planes<br>:star:code
- Depth Estimation From Indoor Panoramas With Neural Scene Representation<br>:star:code
- Polarimetric iToF:Measuring High-Fidelity Depth Through Scattering Media
- SCADE: NeRFs from Space Carving With Ambiguity-Aware Depth Estimates<br>:star:code
- iDisc: Internal Discretization for Monocular Depth Estimation<br>:house:project
- HRDFuse: Monocular 360°Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions<br>:house:project
- Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes<br>:star:code
- Temporally Consistent Online Depth Estimation Using Point-Based Fusion<br>:house:project
- DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium<br>:star:code<br>:star:code
- Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation<br>:star:code<br>:thumbsup:CVPR2023 | 轻量高效的自监督深度估计框架Lite-Mono
- 深度补全
- 室内场景重建
- 场景重建
- 3D场景生成
- MVS
- 三维形状分类
- 三维图像
- 三维形状
- 三维形状生成 *Diffusion-Based Signed Distance Fields for 3D Shape Generation
- 三维形状重建
- 3D动画
- 室内布局
- 视频重建
9.Human Pose Estimation(人体姿态估计)
- 手势
- A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image<br>:star:code3D交互手势姿势估计
- Neural Voting Field for Camera-Space 3D Hand Pose Estimation
- AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation<br>:star:code
- Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos<br>:house:project
- Cross-Domain 3D Hand Pose Estimation with Dual Modalities
- 音频驱动的联合语音手势生成
- 手势合成
- 手部重建
- ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction<br>:star:code
- High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition<br>:star:code
- ACR: Attention Collaboration-Based Regressor for Arbitrary Two-Hand Reconstruction<br>:star:code
- HARP: Personalized Hand Reconstruction From a Monocular RGB Video<br>:house:project
- Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction
- A Probabilistic Attention Model with Occlusion-aware Texture Regression for 3D Hand Reconstruction from a Single RGB Image
- gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction<br>:star:code
- MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction
- POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo<br>:star:code
- Handy: Towards a high fidelity 3D hand shape and appearance model<br>:star:code
- 3D手部恢复
- Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild<br>:star:code
- Recovering 3D Hand Mesh Sequence from a Single Blurry Image: A New Dataset and Temporal Unfolding<br>:star:code
- Semi-Supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination<br>:house:project
- H2ONet: Hand-Occlusion-and-Orientation-Aware Network for Real-Time 3D Hand Mesh Reconstruction
- 手物姿态估计
- 3D手势预测
- 人体
- HPE
- DistilPose: Tokenized Pose Regression with Heatmap Distillation
- Towards Stable Human Pose Estimation via Cross-View Fusion and Foot Stabilization
- Human Pose As Compositional Tokens<br>:star:code
- Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo Label Correction Module<br>:star:code
- TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers<br>:star:code
- A Characteristic Function-Based Method for Bottom-Up Human Pose Estimation
- Analyzing and Diagnosing Pose Estimation With Attributions<br>:house:project
- PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation<br>:star:code
- Human Pose as Compositional Tokens<br>:star:code
- Unified Pose Sequence Modeling
- Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video
- Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation
- HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation<br>:star:code
- Human Pose Estimation in Extremely Low-Light Conditions
- 3D HPE
- PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers
- PLIKS: A Pseudo-Linear Inverse Kinematic Solver for 3D Human Body Estimation
- NIKI: Neural Inverse Kinematics With Invertible Neural Networks for 3D Human Pose and Shape Estimation<br>:star:code
- DiffPose: Toward More Reliable 3D Pose Estimation<br>:house:project
- Scene-Aware Egocentric 3D Human Pose Estimation
- Self-Supervised 3D Keypoint Discovery From Multi-View Videos<br>:house:project
- Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation<br>:star:code
- 3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention
- Ego-Body Pose Estimation via Ego-Head Pose Estimation<br>获奖论文候选
- Listening Human Behavior: 3D Human Pose Estimation With Acoustic Signals
- NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation<br>:star:code
- GFPose: Learning 3D Human Pose Prior With Gradient Fields<br>:house:project
- PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation<br>:star:code<br>:star:code
- 3D Human Pose Estimation via Intuitive Physics<br>:house:project
- 3D 人体关键点估计
- 4D HPE
- 网格恢复
- POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery<br>:star:code
- Deformable Mesh Transformer for 3D Human Mesh Recovery<br>:star:code
- One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer<br>:house:project
- Learning Human Mesh Recovery in 3D Scenes<br>:star:code
- One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer<br>:star:code<br>:thumbsup:CVPR2023 IDEA与清华提出首个一阶段3D全身人体网格重建算法OSX
- Learning Analytical Posterior Probability for Human Mesh Recovery<br>:star:code
- Implicit 3D Human Mesh Recovery Using Consistency With Pose and Shape From Unseen-View
- 三维人体网格估计
- 三维人体网格重建
- 3D人体重建
- High-fidelity 3D Human Digitization from Single 2K Resolution Images<br>:star:code
- Crowd3D: Towards Hundreds of People Reconstruction From a Single Image
- PersonNeRF: Personalized Reconstruction From Photo Collections<br>:house:project
- NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action<br>:house:project
- FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER<br>:house:project
- CloSET: Modeling Clothed Humans on Continuous Surface With Explicit Template Decomposition<br>:star:code
- Learning Visibility Field for Detailed 3D Human Reconstruction and Relighting
- FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER<br>:house:project
- Learning Semantic-Aware Disentangled Representation for Flexible 3D Human Body Editing<br>:house:project
- Complete 3D Human Reconstruction From a Single Incomplete Image
- High-Fidelity 3D Human Digitization From Single 2K Resolution Images<br>:star:code
- BAAM: Monocular 3D Pose and Shape Reconstruction With Bi-Contextual Attention Module and Attention-Guided Modeling<br>:star:code
- Humans As Light Bulbs: 3D Human Reconstruction From Thermal Reflection
- Clothed Human Reconstruction(穿衣人体重建)
- 人体形状补全
- HPE
- 多人姿态预测
- 人体解析
- 姿势迁移
- Avatar
8.Action Detection(人体动作检测与识别)
- Video Test-Time Adaptation for Action Recognition
- A Large-Scale Robustness Analysis of Video Action Recognition Models
- How Can Objects Help Action Recognitio
- MMG-Ego4D: Multimodal Generalization in Egocentric Action Recognition<br>:star:code
- Dual-Path Adaptation From Image to Video Transformers<br>:star:code
- Hybrid Active Learning via Deep Clustering for Video Action Detection<br>:house:project
- Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
- Learning Action Changes by Measuring Verb-Adverb Textual Relationships<br>:star:code
- STMixer: A One-Stage Sparse Action Detector
- AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation
- Search-Map-Search: A Frame Selection Paradigm for Action Recognition
- On the Benefits of 3D Pose and Tracking for Human Action Recognition<br>:star:code
- MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition<br>:star:code
- SVFormer: Semi-Supervised Video Transformer for Action Recognition
- 基于骨架的动作识别
- Learning Discriminative Representations for Skeleton Based Action Recognition
- Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition<br>:house:project
- 3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
- HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions<br>:star:code
- Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition
- 基于关键点的动作识别
- 时序动作识别
- TriDet: Temporal Action Detection with Relative Boundary Modeling<br>:star:code
- Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization<br>:star:code
- Post-Processing Temporal Action Detection<br>:star:code
- Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection
- PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization
- 开集动作识别
- 基于MoCap的动作识别
- 小样本动作识别
- 半监督动作识别
- 时序动作定位
- Boosting Weakly-Supervised Temporal Action Localization with Text Information<br>:star:code
- Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks
- Two-Stream Networks for Weakly-Supervised Temporal Action Localization With Semantic-Aware Mechanisms
- Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization
- Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels
- Distilling Vision-Language Pre-Training To Collaborate With Weakly-Supervised Temporal Action Localization
- AdamsFormer for Spatial Action Localization in the Future
- Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
- 群组动作质量评估
- 群体动作识别
7.Point Cloud(点云)
- FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
- Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent With Learned Distance Functions
- Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning
- Unsupervised Inference of Signed Distance Functions From Single Sparse Point Clouds Without Learning Priors
- PointVector: A Vector Representation in Point Cloud Analysis
- CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data
- PointClustering: Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering
- Adversarially Masking Synthetic To Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation
- Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching Between Parts and Words
- Attention-Based Point Cloud Edge Sampling
- Meta Architecture for Point Cloud Analysis
- Building Rearticulable Models for Arbitrary 3D Objects From 4D Point Clouds<br>:house:project
- Implicit Surface Contrastive Clustering for LiDAR Point Clouds
- Poly-PC: A Polyhedral Network for Multiple Point Cloud Tasks at Once
- TriVol: Point Cloud Rendering via Triple Volumes
- PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations
- PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
- GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds
- Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting<br>:star:code
- ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
- SE-ORNet: Self-Ensembling Orientation-Aware Network fhttpsor Unsupervised Point Cloud Shape Correspondence
- GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training
- Neural Intrinsic Embedding for Non-rigid Point Cloud Matching
- 3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud<br>:star:code
- SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds
- GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training
- SCPNet: Semantic Scene Completion on Point Cloud
- NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds<br>:star:code
- Rotation-Invariant Transformer for Point Cloud Matching
- Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants with no False Negatives and no False Positives<br>:house:project
- PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
- VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud<br>:star:code
- Unsupervised Inference of Signed Distance Functions from Single Sparse Point Clouds without Learning Priors<br>:star:code
- Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent with Learned Distance Functions<br>:star:code
- Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
- Spatiotemporal Self-supervised Learning for Point Clouds in the Wild<br>:star:code
- NerVE: Neural Volumetric Edges for Parametric Curve Extraction from Point Cloud<br>:star:code
- IterativePFN: True Iterative Point Cloud Filtering<br>:star:code
- Fast Point Cloud Generation With Straight Flows
- GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds
- 3D点云
- Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis<br>:star:code
- ToThePoint: Efficient Contrastive Learning of 3D Point Clouds via Recycling
- PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models<br>:house:project
- Starting From Non-Parametric Networks for 3D Point Cloud Analysis<br>:star:code
- Learnable Skeleton-Aware 3D Point Cloud Sampling
- GraVoS: Voxel Selection for 3D Point-Cloud Detection
- MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds<br>:star:code
- NeuralPCI: Spatio-temporal Neural Field for 3D Point Cloud Multi-frame Non-linear Interpolation<br>:star:code<br>:star:code
- Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation<br>:star:code
- 点云实例分割
- 点云分类
- 点云补全
- ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer<br>:star:code
- Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion
- ACL-SPC: Adaptive Closed-Loop system for Self-Supervised Point Cloud Completion<br>:star:code
- AnchorFormer: Point Cloud Completion From Discriminative Nodes<br>:star:code
- Hyperspherical Embedding for Point Cloud Completion
- 点云配准
- Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration<br>:star:code
- PEAL: Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration
- Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration
- Robust Multiview Point Cloud Registration with Reliable Pose Graph Initialization and History Reweighting<br>:star:code
- BUFFER: Balancing Accuracy, Efficiency, and Generalizability in Point Cloud Registration<br>:star:code
- 点云理解
- 点云重建
- 点云匹配
- 点云分割 *Improving Graph Representation for Point Cloud Segmentation via Attentive Filtering
- 点云压缩
6.Object Tracking(目标跟踪)
- Data-Driven Feature Tracking for Event Cameras
- Autoregressive Visual Tracking<br>:star:code
- Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking<br>:house:project
- Unifying Short and Long-Term Tracking With Graph Hierarchies<br>:house:project
- VideoTrack: Learning To Track Objects via Video Transformer
- Tracking Through Containers and Occluders in the Wild<br>:house:project
- Frame-Event Alignment and Fusion Network for High Frame Rate Tracking
- Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking<br>:star:code
- Joint Visual Grounding and Tracking with Natural Language Specification<br>:star:code
- Generalized Relation Modeling for Transformer Tracking<br>:star:code
- SeqTrack: Sequence to Sequence Learning for Visual Object Tracking
- Tracking through Containers and Occluders in the Wild<br>:house:project
- DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks<br>:star:code
- CXTrack: Improving 3D Point Cloud Tracking With Contextual Information
- Representation Learning for Visual Object Tracking by Masked Appearance Transfer<br>:star:code
- 3D-POP - An Automated Annotation Approach to Facilitate Markerless 2D-3D Tracking of Freely Moving Birds With Marker-Based Motion Capture
- 多目标跟踪
- Referring Multi-Object Tracking<br>:star:code
- Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking
- Simple Cues Lead to a Strong Multi-Object Tracker
- Tracking Multiple Deformable Objects in Egocentric Videos<br>:house:project
- MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors<br>:star:code
- UTM: A Unified Multiple Object Tracking Model With Identity-Aware Feature Enhancement
- Focus on Details: Online Multi-Object Tracking With Diverse Fine-Grained Representation
- Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking<br>:star:code
- MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking
- OVTrack: Open-Vocabulary Multiple Object Tracking<br>:house:project
- 多模态跟踪
- Visual Prompt Multi-Modal Tracking<br>:star:code
- RGB-T tracking(可见光图像(RGB)和热红外图像(T)结合起来进行目标追踪)
5.Object Detection(目标检测)
- Angelic Patches for Improving Third-Party Object Detector Performance
- STDLens: Model Hijacking-Resilient Federated Learning for Object Detection
- Enhanced Training of Query-Based Object Detection via Selective Query Recollection
- The Differentiable Lens: Compound Lens Search Over Glass Surfaces and Materials for Object Detection
- Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains<br>:star:code
- Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
- NeRF-RPN: A General Framework for Object Detection in NeRFs<br>:star:code
- Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors
- Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration<br>:star:code
- Gaussian Label Distribution Learning for Spherical Image Object Detection
- Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments
- Towards Unsupervised Object Detection From LiDAR Point Clouds<br>:house:project
- Mask DINO: Towards a Unified Transformer-Based Framework for Object Detection and Segmentation<br>:star:code
- T-SEA: Transfer-Based Self-Ensemble Attack on Object Detection<br>:star:code
- Recurrent Vision Transformers for Object Detection With Event Cameras
- Learned Two-Plane Perspective Prior Based Image Resampling for Efficient Object Detection
- Normalizing Flow Based Feature Synthesis for Outlier-Aware Object Detection
- YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors<br>:star:code
- MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding From Object Detection<br>:star:code
- Doubly Right Object Recognition: A Why Prompt for Visual Rationales
- Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection<br>:star:code
- Unbalanced Optimal Transport: A Unified Framework for Object Detection
- CLIP the Gap: A Single Domain Generalization Approach for Object Detection
- Learning Transformations To Reduce the Geometric Shift in Object Detection
- Object Detection With Self-Supervised Scene Adaptation<br>:star:code
- Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR<br>:star:code
- SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency<br>:star:code
- Multiclass Confidence and Localization Calibration for Object Detection<br>:star:code
- Mobile User Interface Element Detection Via Adaptively Prompt Tuning
- DynamicDet: A Unified Dynamic Architecture for Object Detection<br>:star:code
- ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection<br>:star:code
- Curricular Object Manipulation in LiDAR-based Object Detection<br>:star:code
- STDLens: Model Hijacking-resilient Federated Learning for Object Detection<br>:star:code
- What Can Human Sketches Do for Object Detection?<br>:star:code
- Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects<br>:star:code
- Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection<br>:star:code
- Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
- T-SEA: Transfer-based Self-Ensemble Attack on Object Detection<br>:star:code<br>:thumbsup:CVPR 2023 | 北大提出T-SEA: 自集成策略实现更强的黑盒攻击迁移性
- Knowledge Combination to Learn Rotated Detection Without Rotated Annotation
- Universal Instance Perception as Object Discovery and Retrieval<br>:star:code
- Continual Detection Transformer for Incremental Object Detection目标检测
- Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains<br>:star:code目标检测
- 开放词汇目标检测
- Aligning Bag of Regions for Open-Vocabulary Object Detection<br>:star:code
- Region-Aware Pretraining for Open-Vocabulary Object Detection With Vision Transformers
- DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment
- OvarNet: Towards Open-vocabulary Object Attribute Recognition<br>:thumbsup:CVPR2023|小红书提出 OvarNet 模型:开集预测的新SOTA,“万物识别”有了新玩法
- Learning To Detect and Segment for Open Vocabulary Object Detection
- Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
- Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection<br>:star:code
- CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
- DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
- 开放世界目标检测
- Annealing-Based Label-Transfer Learning for Open World Object Detection<br>:star:code
- CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
- PROB: Probabilistic Objectness for Open World Object Detection<br>:star:code
- CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection
- Detecting Everything in the Open World: Towards Universal Object Detection<br>:star:code<br>:thumbsup:CVPR 2023 | 标注500类,检测7000类!清华大学等提出通用目标检测算法UniDetector
- 目标定位
- LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding<br>:house:project
- Egocentric Audio-Visual Object Localization
- Unsupervised Object Localization: Observing the Background To Discover Objects<br>:star:code
- NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization
- 3D OD
- Virtual Sparse Convolution for Multimodal 3D Object Detection<br>:star:code
- Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection
- MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection
- BEVHeight: A Robust Framework for Vision-Based Roadside 3D Object Detection
- UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View
- PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection<br>:star:code
- AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers<br>:star:code
- BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks
- 3D Video Object Detection With Learnable Object-Centric Global Optimization<br>:star:code
- ConQueR: Query Contrast Voxel-DETR for 3D Object Detection<br>:house:project
- Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection<br>:star:code
- Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection<br>:star:code
- Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection<br>:star:code
- Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark<br>:star:code
- Deep Dive Into Gradients: Better Optimization for 3D Object Detection With Gradient-Corrected IoU Supervision<br>:star:code
- AeDet: Azimuth-invariant Multi-view 3D Object Detection<br>:star:code
- FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection<br>:star:code
- PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer
- itKD: Interchange Transfer-Based Knowledge Distillation for 3D Object Detection
- OcTr: Octree-Based Transformer for 3D Object Detection
- MoDAR: Using Motion Forecasting for 3D Object Detection in Point Cloud Sequences
- Semi-Supervised Stereo-Based 3D Object Detection via Cross-View Consensus
- LinK: Linear Kernel for LiDAR-based 3D Perception<br>:star:code
- PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds
- PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer<br>:star:code
- 3D Video Object Detection with Learnable Object-Centric Global Optimization<br>:star:code
- Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection<br>:star:code
- X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection<br>:star:code
- Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving
- Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency<br>:star:code
- Viewpoint Equivariance for Multi-View 3D Object Detection<br>:star:code
- Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving<br>:star:code
- Collaboration Helps Camera Overtake LiDAR in 3D Detection<br>:star:code<br>:star:code
- OcTr: Octree-based Transformer for 3D Object Detection
- MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences<br>:star:code
- MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
- MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training<br>:star:code
- NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
- VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking<br>:star:code
- Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection<br>:star:code
- LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion<br>:star:code
- PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection<br>:star:code
- CAPE: Camera View Position Embedding for Multi-View 3D Object Detection<br>:star:code
- Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection<br>:star:code
- Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection<br>:star:code3D目标检测
- 端到端目标检测
- 半监督目标检测
- Active Teacher for Semi-Supervised Object Detection<br>:star:code
- Semi-DETR: Semi-Supervised Object Detection With Detection Transformers
- Consistent-Teacher: Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection<br>:star:code
- SOOD: Towards Semi-Supervised Oriented Object Detection<br>:star:code
- MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection<br>:star:code
- 弱监督目标检测
- 小样本目标检测
- NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging
- Generating Features with Increased Crop-related Diversity for Few-Shot Object Detection
- Meta-tuning Loss Functions and Data Augmentation for Few-shot Object Detection
- DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection<br>:star:code
- 域适应目标检测
- 2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection
- AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection<br>:star:code
- CIGAR: Cross-Modality Graph Reasoning for Domain Adaptive Object Detection
- Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection<br>:house:project
- Domain Adaptive Detection Transformer With Information Fusion
- Harmonious Teacher for Cross-Domain Object Detection
- Contrastive Mean Teacher for Domain Adaptive Object Detectors
- 弱样本目标检测
- 显著目标检测
- Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings
- Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection
- Modeling the Distributional Uncertainty for Salient Object Detection Models<br>:star:code
- Test Time Adaptation With Regularized Loss for Weakly Supervised Salient Object Detection
- Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection
- 红外目标检测
- 伪装目标检测
- 密集目标检测
- 协同目标检测
- 点云目标检测
- 目标发现
- 视频目标检测
- 小目标检测
- Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection<br>:star:code
- Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection With Single Point Supervision<br>:star:code
- Distilling Scale-Aware Knowledge in Small Object Detector
- LSTFE-Net:Long Short-Term Feature Enhancement Network for Video Small Object Detection<br>:star:code
- 红外小目标检测
- 线段检测
- 目标导航
4.Image Captioning(图像字幕生成)
- 视频字幕
- 图像字幕
- Cross-Domain Image Captioning with Discriminative Finetuning
- Crossing the Gap: Domain Generalization for Image Captioning
- Model-Agnostic Gender Debiased Image Captioning
- A-CAP: Anticipation Captioning with Commonsense Knowledge字幕
- Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation<br>:star:code
- HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
- Semantic-Conditional Diffusion Networks for Image Captioning<br>:star:code
- ConZIC: Controllable Zero-Shot Image Captioning by Sampling-Based Polishing
- SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation
- story generation(视觉故事生成)
- 3D密集字幕
3.Image Progress(低层图像处理、质量评价)
- Initialization Noise in Image Gradients and Saliency Maps
- Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models<br>:star:code
- Tunable Convolutions with Parametric Multi-Loss Optimization
- 图像着色
- 阴影去除
- 图像恢复
- Efficient and Explicit Modelling of Image Hierarchies for Image Restoration<br>:star:code
- Visual Recognition-Driven Image Restoration for Multiple Degradation With Intrinsic Semantics Recovery
- Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions
- Generating Aligned Pseudo-Supervision From Non-Aligned Data for Image Restoration in Under-Display Camera<br>:star:code
- Comprehensive and Delicate: An Efficient Transformer for Image Restoration
- Ingredient-Oriented Multi-Degradation Learning for Image Restoration
- All-in-One Image Restoration for Unknown Degradations Using Adaptive Discriminative Filters for Specific Degradations
- Contrastive Semi-Supervised Learning for Underwater Image Restoration via Reliable Bank<br>:star:code
- Burstormer: Burst Image Restoration and Enhancement Transformer
- Generating Aligned Pseudo-Supervision from Non-Aligned Data for Image Restoration in Under-Display Camera<br>:star:code
- Generative Diffusion Prior for Unified Image Restoration and Enhancement
- Bitstream-Corrupted JPEG Images are Restorable: Two-stage Compensation and Alignment Framework for Image Restoration<br>:star:code
- Learning Distortion Invariant Representation for Image Restoration From a Causality Perspective<br>:star:code
- Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack
- Robust Unsupervised StyleGAN Image Restoration<br>:house:project
- 图像修复
- 视频恢复
- 视频修复
- 图像照明
- 图像质量评估
- Quality-aware Pre-trained Models for Blind Image Quality Assessment
- Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective<br>:star:code
- Quality-Aware Pre-Trained Models for Blind Image Quality Assessment
- An Image Quality Assessment Dataset for Portraits<br>:star:code
- Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild
- 去雾
- Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior<br>:star:code
- Curricular Contrastive Regularization for Physics-aware Single Image Dehazing
- Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing
- Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring<br>:star:code
- RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors
- 去雨
- 去噪
- Masked Image Training for Generalizable Deep Image Denoising
- Real-Time Controllable Denoising for Image and Video
- Patch-Craft Self-Supervised Training for Correlated Image Denoising
- Polarized Color Image Denoising
- sRGB Real Noise Synthesizing With Neighboring Correlation-Aware Noise Model<br>:star:code
- Zero-Shot Noise2Noise: Efficient Image Denoising Without Any Data<br>:house:project
- HouseDiffusion: Vector Floorplan Generation via a Diffusion Model With Discrete and Continuous Denoising<br>:house:project
- Structure Aggregation for Cross-Spectral Stereo Image Guided Denoising<br>:star:code
- Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising<br>:star:code
- Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising<br>:star:code
- Real-time Controllable Denoising for Image and Video
- LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising<br>:star:code
- Efficient View Synthesis and 3D-based Multi-Frame Denoising with Multiplane Feature Representations
- Learning with Noisy labels via Self-supervised Adversarial Noisy Masking去噪
- Learning from Noisy Labels with Decoupled Meta Label Purifier去噪
- 去模糊
- HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering<br>:star:code
- Neumann Network With Recursive Kernels for Single Image Defocus Deblurring
- K3DN: Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring
- Uncertainty-Aware Unsupervised Image Deblurring With Deep Residual Prior
- $\text{DC}^2$: Dual-Camera Defocus Control by Learning to Refocus<br>:star:code去模糊
- Self-Supervised Non-Uniform Kernel Estimation With Flow-Based Motion Prior for Blind Image Deblurring<br>:house:project
- Joint Video Multi-Frame Interpolation and Deblurring Under Unknown Exposure Time<br>:star:code
- Event-Based Frame Interpolation With Ad-Hoc Deblurring
- Deep Discriminative Spatial and Temporal Network for Efficient Video Deblurring
- 去鬼影
- 去反射光斑
- image deweathering
- 图像缩放
- 瞬间恢复与增强
- 图像增强
- Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement
- Realistic Saliency Guided Image Enhancement
- Learning a Simple Low-Light Image Enhancer From Paired Low-Light Instances<br>:star:code
- Low-Light Image Enhancement via Structure Modeling and Guidance
- You Do Not Need Additional Priors or Regularizers in Retinex-Based Low-Light Image Enhancement
- 图像和谐化
- 图像曝光校正
- 物体移除
- Image Decomposition
- 图像重建
- Raw Image Reconstruction With Learned Compact Metadata<br>:star:code
- Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder
- High-Resolution Image Reconstruction With Latent Diffusion Models From Human Brain Activity<br>:house:project
- PermutoSDF: Fast Multi-View Reconstruction with Implicit Surfaces using Permutohedral Lattices<br>:house:project
- 文本驱动的图像处理
- 运动模糊
- 图像裁剪
- 图像重照明
- 模糊帧插值
2.Image Segmentation(图像分割)
- MED-VT: Multiscale Encoder-Decoder Video Transformer With Application To Object Segmentation
- SimpSON: Simplifying Photo Cleanup With Single-Click Distracting Object Segmentation Network
- Towards Open-World Segmentation of Parts
- Heat Diffusion Based Multi-Scale and Geometric Structure-Aware Transformer for Mesh Segmentation
- MOVES: Manipulated Objects in Video Enable Segmentation
- Decoupled Semantic Prototypes Enable Learning From Diverse Annotation Types for Semi-Weakly Segmentation in Expert-Driven Domains
- Compositor: Bottom-Up Clustering and Compositing for Robust Part and Object Segmentation
- VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation<br>:star:code
- Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervisio
- OneFormer: One Transformer To Rule Universal Image Segmentation<br>:house:project
- PanelNet: Understanding 360 Indoor Environment via Panel Representation
- AutoFocusFormer: Image Segmentation off the Grid
- MP-Former: Mask-Piloted Transformer for Image Segmentation<br>:star:code
- Explicit Visual Prompting for Low-Level Structure Segmentations<br>:star:code
- Focused and Collaborative Feedback Integration for Interactive Image Segmentation<br>:star:code
- FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation<br>:house:project<br>在 VIS、VOS、MOTS 三个下游视频分割任务的五个数据集上,将 InstMove 插入到现有 SOTA 模型可以进一步带来 1~5 个点的提升。
- MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation分割
- 零样本分割
- 3D分割
- 全景分割
- 实例分割
- DynaMask: Dynamic Mask Selection for Instance Segmentation<br>:star:code
- Tree Instance Segmentation With Temporal Contour Graph
- Hi4D: 4D Instance Segmentation of Close Human Interaction
- Beyond mAP: Towards Better Evaluation of Instance Segmentation
- Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt
- Cut and Learn for Unsupervised Object Detection and Instance Segmentation<br>:star:code
- PartDistillation: Learning Parts From Instance Segmentation<br>:star:code
- Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections<br>:star:code
- AttentionShift: Iteratively Estimated Part-Based Attention Map for Pointly Supervised Instance Segmentation
- DoNet: Deep De-overlapping Network for Cytology Instance Segmentation<br>:star:code
- FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation<br>:star:code
- Camouflaged Instance Segmentation via Explicit De-Camouflaging
- 无监督实例分割
- 弱监督实例分割
- SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation<br>:star:code
- BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation<br>:star:code
- The Devil is in the Points: Weakly Semi-Supervised Instance Segmentation via Point-Guided Mask Representation<br>:star:code
- 开放词汇实例分割
- 零样本实例分割
- 语义分割
- IFSeg: Image-free Semantic Segmentation via Vision-Language Model<br>:star:code
- Transformer Scale Gate for Semantic Segmentation
- Towards Better Stability and Adaptability: Improve Online Self-Training for Model Adaptation in Semantic Segmentation
- BAEFormer: Bi-Directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation
- Combining Implicit-Explicit View Correlation for Light Field Semantic Segmentation
- Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on the Edge
- Less Is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
- SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation
- PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers
- Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions
- PeakConv: Learning Peak Receptive Field for Radar Semantic Segmentation<br>:star:code
- Understanding Imbalanced Semantic Segmentation Through Neural Collapse<br>:star:code
- Geometry and Uncertainty-Aware 3D Point Cloud Class-Incremental Semantic Segmentation<br>:star:code
- Single Domain Generalization for LiDAR Semantic Segmentation<br>:star:code
- FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation
- Proximal Splitting Adversarial Attack for Semantic Segmentation<br>:star:code
- On Calibrating Semantic Segmentation Models: Analyses and an Algorithm
- Incrementer: Transformer for Class-Incremental Semantic Segmentation With Knowledge Distillation Focusing on Old Class
- Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers
- Endpoints Weight Fusion for Class Incremental Semantic Segmentation
- Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures<br>:star:code
- ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation
- Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation
- Dynamic Focus-Aware Positional Queries for Semantic Segmentation<br>:star:code
- Continual Semantic Segmentation With Automatic Memory Sample Selection
- Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision
- Dynamically Instance-Guided Adaptation: A Backward-Free Approach for Test-Time Domain Adaptive Semantic Segmentation<br>:star:code
- Federated Incremental Semantic Segmentation<br>:star:code
- Delivering Arbitrary-Modal Semantic Segmentation<br>:star:code
- Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
- A Simple Framework for Text-Supervised Semantic Segmentation<br>:star:code<br>在 PASCAL VOC 2012、PASCAL Context 和 COCO 数据集上的表现明显优于之前最先进的方法。
- Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
- Generative Semantic Segmentation<br>:star:code
- Reliability in Semantic Segmentation: Are We on the Right Track?<br>:star:code
- Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
- Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation<br>:star:code
- Instant Domain Augmentation for LiDAR Semantic Segmentation<br>:house:project
- Delving into Shape-aware Zero-shot Semantic Segmentation<br>:star:code
- 开放词汇语义分割
- Open Vocabulary Semantic Segmentation With Patch Aligned Contrastive Learning
- Open-Vocabulary Semantic Segmentation With Mask-Adapted CLIP<br>:house:project
- Side Adapter Network for Open-Vocabulary Semantic Segmentation<br>:star:code<br>:thumbsup:CVPR2023 Highlight | Side Adapter Network – 极致轻薄却性能强劲的开放词汇语义分割器
- 开放世界语义分割
- 域适应语义分割
- DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation<br>:star:code
- Weakly-Supervised Domain Adaptive Semantic Segmentation With Prototypical Contrastive Learning
- Continuous Pseudo-Label Rectified Domain Adaptive Semantic Segmentation With Implicit Neural Representations<br>:star:code
- 域泛化语义分割
- 无监督语义分割
- 半监督语义分割
- Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation<br>:star:code
- Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
- Hunting Sparsity: Density-Guided Contrastive Learning for Semi-Supervised Semantic Segmentation<br>:star:code
- Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation
- LaserMix for Semi-Supervised LiDAR Semantic Segmentation<br>:star:code
- Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation
- Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation
- 弱监督语义分割
- Token Contrast for Weakly-Supervised Semantic Segmentation<br>:star:code
- CLIP Is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation
- Boundary-Enhanced Co-Training for Weakly Supervised Semantic Segmentation<br>:star:code
- Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation
- Weakly Supervised Semantic Segmentation via Adversarial Learning of Classifier and Reconstructor<br>:star:code
- 自监督语义分割
- 点云语义分割
- 零样本语义分割
- 小样本语义分割
- 长尾语义分割
- 3D 语义分割
- 开集语义分割
- 交互式分割
- 小样本分割
- VSS
- Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos<br>:star:code
- Simultaneously Short- and Long-Term Temporal Modeling for Semi-Supervised Video Semantic Segmentation
- Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation<br>:star:code
- VOS
- InstMove: Instance Motion for Object-centric Video Segmentation<br>:star:code
- Breaking the "Object" in Video Object Segmentation
- Look Before You Match: Instance Understanding Matters in Video Object Segmentation
- MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
- Boosting Video Object Segmentation via Space-time Correspondence Learning<br>:star:code
- Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual GroupingVOS
- Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation<br>:star:code
- Two-shot Video Object Segmentation<br>:star:code
- Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping
- VIS
- Mask-Free Video Instance Segmentation<br>:star:code<br>:house:project<br>:star:code
- MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos<br>:star:code
- A Generalized Framework for Video Instance Segmentation<br>:star:code
- 场景理解
- FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding
- SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text<br>:house:project
- Movies2Scenes: Using Movie Metadata To Learn Scene Representation
- Seeing With Sound: Long-range Acoustic Beamforming for Multimodal Scene Understanding
- Single View Scene Scale Estimation Using Scale Field
- Neural Part Priors: Learning To Optimize Part-Based Object Completion in RGB-D Scans
- 3D 场景理解
- OpenScene: 3D Scene Understanding With Open Vocabularies
- Long Range Pooling for 3D Large-Scale Scene Understanding
- Panoptic Lifting for 3D Scene Understanding With Neural Fields<br>:house:project
- FAC: 3D Representation Learning via Foreground Aware Feature Contrast
- Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding
- CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP<br>:star:code
- PLA:Language-driven Open-Vocabulary 3D Scene Understanding<br>:star:code<br>:house:project
- MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency
- 抠图
- 指代图像分割
- 引用表达分割
- 运动分割
- 视频分割
- 动作分割
1.other(其它,待分类)
- CIRCLE: Capture in Rich Contextual Environments
- Trainable Projected Gradient Method for Robust Fine-Tuning
- HDR Imaging With Spatially Varying Signal-to-Noise Ratios
- Are Deep Neural Networks SMARTer Than Second Graders?
- Blowing in the Wind: CycleNet for Human Cinemagraphs From Still Images
- Uncertainty-Aware Vision-Based Metric Cross-View Geolocalization
- pCON: Polarimetric Coordinate Networks for Neural Scene Representations
- Two-Stage Co-Segmentation Network Based on Discriminative Representation for Recovering Human Mesh From Videos
- Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate
- Implicit View-Time Interpolation of Stereo Videos Using Multi-Plane Disparities and Non-Uniform Coordinates
- LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction
- Stare at What You See: Masked Image Modeling Without Reconstruction
- Neural Kaleidoscopic Space Sculpting
- HyperCUT: Video Sequence From a Single Blurry Image Using Unsupervised Ordering
- Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders
- Edges to Shapes to Concepts: Adversarial Augmentation for Robust Vision
- Improved Distribution Matching for Dataset Condensation
- Slimmable Dataset Condensation
- LEGO-Net: Learning Regular Rearrangements of Objects in Rooms
- Neuralizer: General Neuroimage Analysis Without Re-Training
- DETRs With Hybrid Matching<br>:star:code
- A Rotation-Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization
- A-La-Carte Prompt Tuning (APT): Combining Distinct Data via Composable Prompting
- Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
- Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery From Sparse Image Ensemble
- Decentralized Learning With Multi-Headed Distillation
- On the Convergence of IRLS and Its Variants in Outlier-Robust Estimation
- Learning Joint Latent Space EBM Prior Model for Multi-Layer Generator
- Knowledge Combination To Learn Rotated Detection Without Rotated Annotation
- FlowGrad: Controlling the Output of Generative ODEs With Gradients
- Recurrent Homography Estimation Using Homography-Guided Image Warping and Focus Transformer
- Multi-View Inverse Rendering for Large-Scale Real-World Indoor Scenes
- Boosting Transductive Few-Shot Fine-Tuning With Margin-Based Uncertainty Weighting and Probability Regularization
- BiasAdv: Bias-Adversarial Augmentation for Model Debiasing
- CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion
- Why Is the Winner the Best?
- HGNet: Learning Hierarchical Geometry From Points, Edges, and Surfaces
- Revisiting the P3P Problem
- RiDDLE: Reversible and Diversified De-Identification With Latent Encryptor
- BASiS: Batch Aligned Spectral Embedding Space
- CRAFT: Concept Recursive Activation FacTorization for Explainability
- Infinite Photorealistic Worlds using Procedural Generation
- All-in-Focus Imaging From Event Focal Stack
- Learning 3D Scene Priors With 2D Supervision
- NeuWigs: A Neural Dynamic Model for Volumetric Hair Capture and Animation
- CLIPPO: Image-and-Language Understanding from Pixels Only<br>:star:code
- Towards Bridging the Performance Gaps of Joint Energy-Based Models
- expOSE: Accurate Initialization-Free Projective Factorization Using Exponential Regularization
- Learning Debiased Representations via Conditional Attribute Interpolation
- Learning Neural Volumetric Representations of Dynamic Humans in Minutes
- Bayesian Posterior Approximation With Stochastic Ensembles
- RILS: Masked Visual Reconstruction in Language Semantic Space
- RepMode: Learning to Re-Parameterize Diverse Experts for Subcellular Structure Prediction
- Zero-Shot Model Diagnosis
- Improving Visual Grounding by Encouraging Consistent Gradient-Based Explanations<br>:star:code
- AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning With Masked Autoencoders
- Understanding and Improving Visual Prompting: A Label-Mapping Perspective
- DegAE: A New Pretraining Paradigm for Low-Level Vision
- LiDAR-in-the-Loop Hyperparameter Optimization
- Understanding Deep Generative Models With Generalized Empirical Likelihoods
- Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning
- Compressing Volumetric Radiance Fields to 1 MB<br>:star:code
- Label Information Bottleneck for Label Enhancement<br>:star:code
- DNF: Decouple and Feedback Network for Seeing in the Dark
- Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-World
- How To Prevent the Continuous Damage of Noises To Model Training?
- ActMAD: Activation Matching To Align Distributions for Test-Time-Training<br>:house:project
- Leveraging Temporal Context in Low Representational Power Regimes<br>:house:project
- Guided Recommendation for Model Fine-Tuning
- OT-Filter: An Optimal Transport Filter for Learning With Noisy Labels
- E2PN: Efficient SE(3)-Equivariant Point Network<br>:star:code
- Understanding Masked Image Modeling via Learning Occlusion Invariant Feature
- Fine-Tuned CLIP Models Are Efficient Video Learners<br>:star:code
- Visual Recognition by Request
- Stitchable Neural Networks<br>:house:project
- RUST: Latent Neural Scene Representations From Unposed Imagery<br>:star:code
- Spatio-Focal Bidirectional Disparity Estimation From a Dual-Pixel Image
- Four-View Geometry With Unknown Radial Distortion
- Learning Optical Expansion From Scale Matching<br>:star:code
- Don't Lie to Me! Robust and Efficient Explainability With Verified Perturbation Analysis<br>:star:code
- Learning Transformation-Predictive Representations for Detection and Description of Local Features
- Two-Way Multi-Label Loss<br>:star:code
- Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization<br>:star:code
- Dionysus: Recovering Scene Structures by Dividing Into Semantic Pieces
- Noisy Correspondence Learning With Meta Similarity Correction
- HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics
- Modeling Entities As Semantic Points for Visual Information Extraction in the Wild<br>:house:project
- NeAT: Learning Neural Implicit Surfaces With Arbitrary Topologies From Multi-View Images
- Learning a Deep Color Difference Metric for Photographic Images
- DINN360: Deformable Invertible Neural Network for Latitude-Aware 360deg Image Rescaling<br>:star:code
- Finetune Like You Pretrain: Improved Finetuning of Zero-Shot Vision Models<br>:star:code
- Learning a Practical SDR-to-HDRTV Up-Conversion Using New Dataset and Degradation Models<br>:star:code
- DynaFed: Tackling Client Data Heterogeneity With Global Dynamics
- CUF: Continuous Upsampling Filters
- Learning Decorrelated Representations Efficiently Using Fast Fourier Transform
- Practical Network Acceleration With Tiny Sets
- AstroNet: When Astrocyte Meets Artificial Neural Network
- NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360deg Views<br>:star:code
- Command-Driven Articulated Object Understanding and Manipulation<br>:star:code
- HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes With Iterative Intertwined Regularization<br>:star:code
- Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction<br>:star:code
- Gradient-Based Uncertainty Attribution for Explainable Bayesian Deep Learning
- Class Adaptive Network Calibration<br>:star:code
- OCTET: Object-Aware Counterfactual Explanations<br>:star:code
- DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos
- FFF: Fragment-Guided Flexible Fitting for Building Complete Protein Structures
- Open-Set Representation Learning Through Combinatorial Embedding
- A Unified HDR Imaging Method With Pixel and Patch Level
- Accelerated Coordinate Encoding: Learning to Relocalize in Minutes Using RGB and Poses<br>:star:code
- Switchable Representation Learning Framework With Self-Compatibility
- Exploring and Utilizing Pattern Imbalance
- Top-Down Visual Attention From Analysis by Synthesis<br>:house:project
- Interactive Cartoonization With Controllable Perceptual Factors
- Regularize Implicit Neural Representation by Itself
- Delving Into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
- Re-Basin via Implicit Sinkhorn Differentiation<br>:star:code
- Towards Effective Visual Representations for Partial-Label Learning
- Samples With Low Loss Curvature Improve Data Efficiency<br>:star:code
- Learning Correspondence Uncertainty via Differentiable Nonlinear Least Squares
- Tunable Convolutions With Parametric Multi-Loss Optimization
- RelightableHands: Efficient Neural Relighting of Articulated Hand Models<br>:house:project
- DyNCA: Real-Time Dynamic Texture Synthesis Using Neural Cellular Automata<br>:house:project
- Token Turing Machines<br>:star:code
- Probabilistic Debiasing of Scene Graphs<br>:star:code
- Few-Shot Non-Line-of-Sight Imaging With Signal-Surface Collaborative Regularization
- The Dark Side of Dynamic Routing Neural Networks: Towards Efficiency Backdoor Injection
- Generalized Decoding for Pixel, Image, and Language<br>:house:project
- EC2: Emergent Communication for Embodied Control
- Generalizable Local Feature Pre-Training for Deformable Shape Analysis<br>:star:code
- On-the-Fly Category Discovery<br>:star:code
- PyramidFlow: High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow
- Efficient Verification of Neural Networks Against LVM-Based Specifications
- TensoIR: Tensorial Inverse Rendering<br>:house:project
- Learning From Unique Perspectives: User-Aware Saliency Modeling
- LargeKernel3D: Scaling Up Kernels in 3D Sparse CNNs<br>:star:code
- Learning Transferable Spatiotemporal Representations From Natural Script Knowledge<br>:star:code
- FFCV: Accelerating Training by Removing Data Bottlenecks<br>:house:project
- Semidefinite Relaxations for Robust Multiview Triangulation
- GradICON: Approximate Diffeomorphisms via Gradient Inverse Consistency<br>:star:code
- Polynomial Implicit Neural Representations for Large Diverse Datasets<br>:star:code
- Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption
- Learning To Zoom and Unzoom<br>:house:project
- Masked Image Modeling With Local Multi-Scale Reconstruction
- Neural Vector Fields: Implicit Representation by Explicit Learning<br>:star:code
- Rate Gradient Approximation Attack Threats Deep Spiking Neural Networks<br>:star:code
- Critical Learning Periods for Multisensory Integration in Deep Networks
- Imitation Learning as State Matching via Differentiable Physics<br>:star:code
- Probing Sentiment-Oriented Pre-Training Inspired by Human Sentiment Perception Mechanism<br>:star:code
- Relightable Neural Human Assets From Multi-View Gradient Illuminations<br>:star:code
- DINER: Disorder-Invariant Implicit Neural Representation
- Robust Mean Teacher for Continual and Gradual Test-Time Adaptation<br>:star:code
- A Probabilistic Framework for Lifelong Test-Time Adaptation<br>:star:code
- Probing Neural Representations of Scene Perception in a Hippocampally Dependent Task Using Artificial Neural Networks
- Decoupling Human and Camera Motion From Videos in the Wild<br>:house:project
- DISC: Learning From Noisy Labels via Dynamic Instance-Specific Selection and Correction<br>:star:code
- DC2: Dual-Camera Defocus Control by Learning To Refocus
- FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs
- "Seeing" Electric Network Frequency From Events<br>:house:project
- Confidential and Private Decentralized Learning Based on Encryption-Friendly Distillation Loss<br>:star:code
- Revealing the Dark Secrets of Masked Image Modeling
- RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer
- Adaptive Graph Convolutional Subspace Clustering
- Graph Representation for Order-Aware Visual Transformation
- Train-Once-for-All Personalization
- Learning Sample Relationship for Exposure Correction
- EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata<br>:house:project
- Gradient norm aware minimization seeks first-order flatness and improves generalization<br>:star:code<br>:thumbsup:CVPR2023|清华大学提出GAM:神经网络“一阶平滑优化器”,显著提升模型“泛化能力”
- EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata<br>:house:project
- InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds
- GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts
- Deep Deterministic Uncertainty: A New Simple Baseline
- WIRE: Wavelet Implicit Neural Representations
- Learning From Noisy Labels With Decoupled Meta Label Purifier
- Architectural Backdoors in Neural Networks
- Event-Based Shape From Polarization
- Deep Hashing With Minimal-Distance-Separated Hash Centers
- Progressive Spatio-Temporal Alignment for Efficient Event-Based Motion Estimation<br>:star:code
- Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation<br>:house:project
- MetaCLUE: Towards Comprehensive Visual Metaphors Research<br>:house:project
- EVA: Exploring the Limits of Masked Visual Representation Learning at Scale<br>:star:code
- Sliced Optimal Partial Transport
- Deep Learning of Partial Graph Matching via Differentiable Top-K<br>:star:code
- Unsupervised Volumetric Animation<br>:house:project
- Passive Micron-Scale Time-of-Flight With Sunlight Interferometry
- Generalizable Implicit Neural Representations via Instance Pattern Composers<br>:star:code
- On the Pitfall of Mixup for Uncertainty Calibration
- UMat: Uncertainty-Aware Single Image High Resolution Material Capture
- On Data Scaling in Masked Image Modeling
- End-to-End Vectorized HD-Map Construction With Piecewise Bezier Curve<br>:star:code
- Boundary Unlearning: Rapid Forgetting of Deep Networks via Shifting the Decision Boundary
- MobileOne: An Improved One millisecond Mobile Backbone<br>:star:code
- Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization
- Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning<br>:star:code
- Residual Degradation Learning Unfolding Framework With Mixing Priors Across Spectral and Spatial for Compressive Spectral Imaging
- Robust and Scalable Gaussian Process Regression and Its Applications<br>:star:code
- NeuralUDF: Learning Unsigned Distance Fields for Multi-View Reconstruction of Surfaces With Arbitrary Topologies<br>:house:project
- Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations
- Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations<br>:star:code
- Multiplicative Fourier Level of Detail
- VGFlow: Visibility guided Flow Network for Human Reposing
- Neural Dependencies Emerging From Learning Massive Categories
- MaLP: Manipulation Localization Using a Proactive Scheme<br>:house:project
- Efficient Robust Principal Component Analysis via Block Krylov Iteration and CUR Decomposition
- ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders<br>:star:code
- Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Masked Autoencoders<br>:star:code
- MEGANE: Morphable Eyeglass and Avatar Network<br>:house:project
- Solving relaxations of MAP-MRF problems: Combinatorial in-face Frank-Wolfe directions
- EXCALIBUR: Encouraging and Evaluating Embodied Exploration
- Learning To Predict Scene-Level Implicit 3D From Posed RGBD Data
- SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries<br>:house:project
- Learning Neural Parametric Head Models<br>:house:project
- Integral Neural Networks
- Simulated Annealing in Early Layers Leads to Better Generalization
- Fresnel Microfacet BRDF: Unification of Polari-Radiometric Surface-Body Reflection
- Improving Visual Representation Learning Through Perceptual Understanding
- Probability-Based Global Cross-Modal Upsampling for Pansharpening<br>:star:code
- SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy
- Megahertz Light Steering Without Moving Parts
- TempSAL - Uncovering Temporal Information for Deep Saliency Prediction<br>:house:project
- Affection: Learning Affective Explanations for Real-World Visual Data<br>:house:project
- Metadata-Based RAW Reconstruction via Implicit Neural Functions
- Coaching a Teachable Student
- Progressive Transformation Learning for Leveraging Virtual Images in Training
- NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling
- Spatial-Temporal Concept Based Explanation of 3D ConvNets
- Overlooked Factors in Concept-Based Explanations: Dataset Choice, Concept Learnability, and Human Capability<br>:star:code
- Neural Fourier Filter Bank<br>:star:code
- ECON: Explicit Clothed Humans Optimized via Normal Integration<br>:star:code
- Autonomous Manipulation Learning for Similar Deformable Objects via Only One Demonstration
- Plateau-Reduced Differentiable Path Tracing<br>:house:project
- Test Time Adaptation With Transformation Invariance<br>:star:code
- Learning To Exploit the Sequence-Specific Prior Knowledge for Image Processing Pipelines Optimization
- Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric<br>:house:project
- CUDA: Convolution-based Unlearnable Datasets
- Efficient On-Device Training via Gradient Filtering
- Transfer Knowledge From Head to Tail: Uncertainty Calibration Under Long-Tailed Distribution
- Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning
- Disentangled Representation Learning for Unsupervised Neural Quantization
- DA Wand: Distortion-Aware Selection Using Neural Mesh Parameterization<br>:star:code<br>:house:project
- On Distillation of Guided Diffusion Models
- Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes<br>:star:code
- K-Planes: Explicit Radiance Fields in Space, Time, and Appearance<br>:house:project
- Understanding Masked Autoencoders via Hierarchical Latent Variable Models
- Co-Training 2L Submodels for Visual Recognition
- Masked Images Are Counterfactual Samples for Robust Fine-Tuning<br>:star:code
- Learning Customized Visual Models With Retrieval-Augmented Knowledge
- A Unified Spatial-Angular Structured Light for Single-View Acquisition of Shape and Reflectance
- PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery<br>:star:code
- Reproducible Scaling Laws for Contrastive Language-Image Learning<br>:star:code
- Intrinsic Physical Concepts Discovery With Object-Centric Predictive Models
- Invertible Neural Skinning<br>:house:project
- Multi-Object Manipulation via Object-Centric Neural Scattering Functions
- Fair Scratch Tickets: Finding Fair Sparse Networks Without Weight Training
- Backdoor Cleansing With Unlabeled Data<br>:star:code
- Full or Weak Annotations? An Adaptive Strategy for Budget-Constrained Annotation Campaigns
- Extracting Class Activation Maps From Non-Discriminative Features As Well
- Executing Your Commands via Motion Diffusion in Latent Space
- Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations<br>:house:project
- Learning To Generate Image Embeddings With User-Level Differential Privacy
- Revisiting the Stack-Based Inverse Tone Mapping
- PACO: Parts and Attributes of Common Objects<br>:star:code
- Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models
- A General Regret Bound of Preconditioned Gradient Method for DNN Training<br>:star:code
- A Practical Upper Bound for the Worst-Case Attribution Deviations
- Perception and Semantic Aware Regularization for Sequential Confidence Calibration<br>:star:code
- Deep Random Projector: Accelerated Deep Image Prior<br>:star:[code](https://github.com/sun- umn/DeepRandom-Projector)
- Bias Mimicking: A Simple Sampling Approach for Bias Mitigation<br>:star:code
- DeCo: Decomposition and Reconstruction for Compositional Temporal Grounding via Coarse-To-Fine Contrastive Ranking
- Structured Kernel Estimation for Photon-Limited Deconvolution<br>:star:code
- FlexiViT: One Model for All Patch Sizes<br>:star:code
- BiasBed - Rigorous Texture Bias Evaluation<br>:star:code
- GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction<br>:star:code
- Finding Geometric Models by Clustering in the Consensus Space<br>:star:code
- Hierarchical Neural Memory Network for Low Latency Event Processing<br>:house:project
- Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries<br>:star:code
- PointConvFormer: Revenge of the Point-Based Convolution
- A Practical Stereo Depth System for Smart Glasses
- Differentiable Shadow Mapping for Efficient Inverse Graphics
- Multi Domain Learning for Motion Magnification<br>:star:code
- Re-Thinking Model Inversion Attacks Against Deep Neural Networks<br>:star:code
- DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects<br>:house:project
- Two-View Geometry Scoring Without Correspondences<br>:house:project
- ScanDMM: A Deep Markov Model of Scanpath Prediction for 360deg Images<br>:star:code
- Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation
- Analyzing Physical Impacts Using Transient Surface Wave Imaging
- Adaptive Global Decay Process for Event Cameras<br>:star:code
- Leveraging Inter-Rater Agreement for Classification in the Presence of Noisy Labels
- Towards Better Gradient Consistency for Neural Signed Distance Functions via Level Set Alignment<br>:star:code
- Swept-Angle Synthetic Wavelength Interferometry
- Shape, Pose, and Appearance From a Single Image via Bootstrapped Radiance Field Inversion<br>:house:project
- Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples<br>:star:code
- 3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification
- EcoTTA: Memory-Efficient Continual Test-Time Adaptation via Self-Distilled Regularization
- Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation
- Minimizing the Accumulated Trajectory Error To Improve Dataset Distillation<br>:star:code
- DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis<br>:house:project
- Virtual Occlusions Through Implicit Depth
- StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator<br>:star:code
- Putting People in Their Place: Affordance-Aware Human Insertion into Scenes<br>:star:code
- Inverting the Imaging Process by Learning an Implicit Camera Model<br>:star:code
- Visual DNA: Representing and Comparing Images using Distributions of Neuron Activations<br>:star:code
- GeoLayoutLM: Geometric Pre-training for Visual Information Extraction<br>:star:code
- Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning
- Noisy Correspondence Learning with Meta Similarity Correction
- Efficient Multimodal Fusion via Interactive Prompting
- Representing Volumetric Videos as Dynamic MLP Maps<br>:star:code
- Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
- Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
- DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
- EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization
- Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
- A Meta-Learning Approach to Predicting Performance and Data Requirements
- Multimodal Prompting with Missing Modalities for Visual Recognition<br>:star:code
- Masked Images Are Counterfactual Samples for Robust Fine-tuning
- UniHCP: A Unified Model for Human-Centric Perceptions<br>:star:code
- DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network<br>:star:code
- Progressive Open Space Expansion for Open-Set Model Attribution<br>:star:code
- TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets<br>:star:code
- HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining<br>:star:code
- 3D Cinemagraphy from a Single Image<br>:house:project
- Masked Image Modeling with Local Multi-Scale Reconstruction<br>:star:code
- Revisiting Rotation Averaging: Uncertainties and Robust Losses<br>:star:code
- Unifying Layout Generation with a Decoupled Diffusion Model
- Adversarial Counterfactual Visual Explanations<br>:star:code
- Trainable Projected Gradient Method for Robust Fine-tuning<br>:star:code
- Partial Network Cloning<br>:star:code
- Extracting Class Activation Maps from Non-Discriminative Features as well<br>:star:code
- TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization<br>:star:code
- Visibility Constrained Wide-band Illumination Spectrum Design for Seeing-in-the-Dark<br>:star:code
- PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment<br>:star:code
- Boundary Unlearning<br>:house:project
- ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals
- VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions
- Learning a Depth Covariance Function<br>:star:code
- A Bag-of-Prototypes Representation for Dataset-Level Applications
- CrOC: Cross-View Online Clustering for Dense Visual Representation Learning<br>:star:code
- Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels<br>:star:code
- Marching-Primitives: Shape Abstraction from Signed Distance Function<br>:star:code
- Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
- Robust Test-Time Adaptation in Dynamic Scenarios<br>:star:code
- Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck<br>:star:code
- IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients
- Compacting Binary Neural Networks by Sparse Kernel Selection
- PDPP:Projected Diffusion for Procedure Planning in Instructional Videos<br>:star:code
- Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph<br>:star:code
- Quantum Multi-Model Fitting<br>:star:code
- Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
- PMatch: Paired Masked Image Modeling for Dense Geometric Matching<br>:star:code
- ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing<br>:star:code
- Single Image Depth Prediction Made Better: A Multivariate Gaussian Take
- Why is the winner the best?
- Disorder-invariant Implicit Neural Representation<br>:star:code
- HypLiLoc: Towards Effective LiDAR Pose Regression with Hyperbolic Fusion<br>:star:code
- Enhancing Deformable Local Features by Jointly Learning to Detect and Describe Keypoints<br>:house:project
- SMPConv: Self-moving Point Representations for Continuous Convolution<br>:star:code
- VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution<br>:star:code
- Delving into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
- Wide-Angle Rectification via Content-Aware Conformal Mapping<br>:house:project
- Large-capacity and Flexible Video Steganography via Invertible Neural Network<br>:star:code
- SketchXAI: A First Look at Explainability for Human Sketches<br>:star:code
- Hard Patches Mining for Masked Image Modeling<br>:thumbsup:CVPR 2023 | HPM:在掩码学习中挖掘困难样本,带来稳固性能提升!
- Learning Geometry-aware Representations by Sketching
- DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training<br>:star:code
- Investigating the Nature of 3D Generalization in Deep Neural Networks<br>:star:code
- EC^2: Emergent Communication for Embodied Control
- Generalizing Dataset Distillation via Deep Generative Prior<br>:star:code<br>:house:project
- Learning Locally Editable Virtual Humans<br>:house:project
- Class-Balancing Diffusion Models
- SFD2: Semantic-guided Feature Detection and Description<br>:star:code
- Computational Flash Photography Through Intrinsics
- Deep Graph Reprogramming
- LayoutDM: Transformer-based Diffusion Model for Layout Generation
- MetaViewer: Towards a Unified Multi-View Representation
- Learning Compact Representations for LiDAR Completion and Generation<br>:house:project
- 多模态
- Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning
- PMR: Prototypical Modal Rebalance for Multimodal Learning
- Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling
- Towards Flexible Multi-Modal Document Models
- Multi-Modal Representation Learning With Text-Driven Soft Masks
- Align and Attend: Multimodal Summarization With Dual Contrastive Losses<br>:house:project
- Improving Zero-Shot Generalization and Robustness of Multi-Modal Models<br>:star:code
- BEV-Guided Multi-Modality Fusion for Driving Perception<br>:star:code
- BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency
- Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information<br>:star:code
- Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce多模态预训练
- MMANet: Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning<br>:star:code
- Affordance Learning(启示学习)
- Feature Matching(特征匹配)
- PATS: Patch Area Transportation with Subdivision for Local Feature Matching<br>:house:project
- Adaptive Spot-Guided Transformer for Consistent Local Feature Matching<br>:star:code<br>:star:code
- Adaptive Assignment for Geometry Aware Local Feature Matching<br>:star:code特征匹配
- DKM: Dense Kernelized Feature Matching for Geometry Estimation<br>:star:code
- 紫外线预测
- vector quantization(矢量量化)
2020 年论文分类汇总戳这里
↘️CVPR-2020-Papers ↘️ECCV-2020-Papers
<a name="00"/>2021 年论文分类汇总戳这里
↘️ICCV-2021-Papers ↘️CVPR-2021-Papers
<a name="000"/>2022 年论文分类汇总戳这里
↘️CVPR-2022-Papers ↘️WACV-2022-Papers ↘️ECCV-2022-Papers