Awesome

ECCV 2024 论文和开源项目合集(Papers with Code)

ECCV 2024 decisions are now available！

注1：欢迎各位大佬提交issue，分享ECCV 2024论文和开源项目！

注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

CVPR 2024

ECCV 2022

ECCV 2020

想看ECCV 2024和最新最全的顶会工作，欢迎扫码加入【CVer学术交流群】，这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料，学起来！

【ECCV 2024 论文开源目录】

3DGS(Gaussian Splatting)
Mamba / SSM)
Avatars
Backbone
CLIP
MAE
Embodied AI
GAN
GNN
多模态大语言模型(MLLM)
大语言模型(LLM)
NAS
OCR
NeRF
DETR
Prompt
扩散模型(Diffusion Models)
ReID(重识别)
长尾分布(Long-Tail)
Vision Transformer
视觉和语言(Vision-Language)
自监督学习(Self-supervised Learning)
数据增强(Data Augmentation)
目标检测(Object Detection)
异常检测(Anomaly Detection)
目标跟踪(Visual Tracking)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
全景分割(Panoptic Segmentation)
医学图像(Medical Image)
医学图像分割(Medical Image Segmentation)
视频目标分割(Video Object Segmentation)
视频实例分割(Video Instance Segmentation)
参考图像分割(Referring Image Segmentation)
图像抠图(Image Matting)
图像编辑(Image Editing)
Low-level Vision
超分辨率(Super-Resolution)
去噪(Denoising)
去模糊(Deblur)
自动驾驶(Autonomous Driving)
3D点云(3D Point Cloud)
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
3D目标跟踪(3D Object Tracking)
3D语义场景补全(3D Semantic Scene Completion)
3D配准(3D Registration)
3D人体姿态估计(3D Human Pose Estimation)
3D人体Mesh估计(3D Human Mesh Estimation)
医学图像(Medical Image)
图像生成(Image Generation)
视频生成(Video Generation)
3D生成(3D Generation)
视频理解(Video Understanding)
行为识别(Action Recognition)
行为检测(Action Detection)
文本检测(Text Detection)
知识蒸馏(Knowledge Distillation)
模型剪枝(Model Pruning)
图像压缩(Image Compression)
三维重建(3D Reconstruction)
深度估计(Depth Estimation)
轨迹预测(Trajectory Prediction)
车道线检测(Lane Detection)
图像描述(Image Captioning)
视觉问答(Visual Question Answering)
手语识别(Sign Language Recognition)
视频预测(Video Prediction)
新视点合成(Novel View Synthesis)
Zero-Shot Learning(零样本学习)
立体匹配(Stereo Matching)
特征匹配(Feature Matching)
场景图生成(Scene Graph Generation)
计数(Counting)
隐式神经表示(Implicit Neural Representations)
图像质量评价(Image Quality Assessment)
视频质量评价(Video Quality Assessment)
数据集(Datasets)
新任务(New Tasks)
其他(Others)

3DGS(Gaussian Splatting)

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Project: https://donydchen.github.io/mvsplat
Paper: https://arxiv.org/abs/2403.14627
Code：https://github.com/donydchen/mvsplat

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Paper: https://arxiv.org/abs/2404.01133
Code: https://github.com/DekuLiuTesla/CityGaussian

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

Project: https://zehaozhu.github.io/FSGS/
Paper: https://arxiv.org/abs/2312.00451
Code: https://github.com/VITA-Group/FSGS

Mamba / SSM

VideoMamba: State Space Model for Efficient Video Understanding

Paper: https://arxiv.org/abs/2403.06977
Code: https://github.com/OpenGVLab/VideoMamba

ZIGMA: A DiT-style Zigzag Mamba Diffusion Model

Paper: https://arxiv.org/abs/2403.13802
Code: https://taohu.me/zigma/

Avatars

Backbone

CLIP

MAE

Embodied AI

GAN

OCR

Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors

Paper: https://arxiv.org/pdf/2312.05286
Code: https://github.com/SJTU-DeepVisionLab/FreeReal

PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer

Paper: https://arxiv.org/abs/2407.07764
Code: https://github.com/SJTU-DeepVisionLab/PosFormer

Occupancy

Fully Sparse 3D Occupancy Prediction

Paper: https://arxiv.org/abs/2312.17118
Code: https://github.com/MCG-NJU/SparseOcc

NeRF

NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

Project: https://nerf-mae.github.io/
Paper: https://arxiv.org/pdf/2404.01300
Code: https://github.com/zubair-irshad/NeRF-MAE

DETR

Prompt

多模态大语言模型(MLLM)

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Paper: https://arxiv.org/abs/2403.11299
Code: https://github.com/heliossun/SQ-LLaVA

ControlCap: Controllable Region-level Captioning

Paper: https://arxiv.org/abs/2401.17910
Code: https://github.com/callsys/ControlCap

大语言模型(LLM)

NAS

ReID(重识别)

扩散模型(Diffusion Models)

ZIGMA: A DiT-style Zigzag Mamba Diffusion Model

Paper: https://arxiv.org/abs/2403.13802
Code: https://taohu.me/zigma/

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

Paper: https://arxiv.org/abs/2403.16394
Code: https://github.com/zdxdsw/skewed_relations_T2I

The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

Project: https://ut-mao.github.io/noise.github.io/
Paper: https://arxiv.org/abs/2312.08872
Code: https://github.com/UT-Mao/Initial-Noise-Construction

Vision Transformer

GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper: https://arxiv.org/abs/2403.09394
Code: https://github.com/Haiyang-W/GiT

视觉和语言(Vision-Language)

GalLoP: Learning Global and Local Prompts for Vision-Language Models

Paper：https://arxiv.org/abs/2407.01400

目标检测(Object Detection)

Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

Paper: https://arxiv.org/abs/2407.11699v1
Code: https://github.com/xiuqhou/Relation-DETR
Dataset: https://huggingface.co/datasets/xiuqhou/SA-Det-100k

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

Project: http://yuqianfu.com/CDFSOD-benchmark/
Paper: https://arxiv.org/pdf/2402.03094
Code: https://github.com/lovelyqian/CDFSOD-benchmark

异常检测(Anomaly Detection)

目标跟踪(Object Tracking)

语义分割(Semantic Segmentation)

Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

Paper: https://arxiv.org/abs/2405.06228
Code: https://github.com/nizhenliang/CGRSeg

医学图像(Medical Image)

Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging

Paper: https://arxiv.org/abs/2311.16914
Code: https://github.com/peirong26/Brain-ID

FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

医学图像分割(Medical Image Segmentation)

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image

Project: https://scribbleprompt.csail.mit.edu/
Paper: https://arxiv.org/abs/2312.07381
Code: https://github.com/halleewong/ScribblePrompt

AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking

Paper: https://arxiv.org/abs/2407.06468
Code: https://github.com/ricklisz/AnatoMask

Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures

视频目标分割(Video Object Segmentation)

DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries

Project: https://zhang-tao-whu.github.io/projects/DVIS_DAQ/
Paper: https://arxiv.org/abs/2404.00086
Code: https://github.com/zhang-tao-whu/DVIS_Plus

自动驾驶(Autonomous Driving)

Fully Sparse 3D Occupancy Prediction

Paper: https://arxiv.org/abs/2312.17118
Code: https://github.com/MCG-NJU/SparseOcc

milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing

Paper: https://arxiv.org/abs/2306.17010
Code: https://github.com/Toytiny/milliFlow/

4D Contrastive Superflows are Dense 3D Representation Learners

Paper : https://arxiv.org/abs/2407.06190
Code: https://github.com/Xiangxu-0103/SuperFlow

3D点云(3D-Point-Cloud)

3D目标检测(3D Object Detection)

3D Small Object Detection with Dynamic Spatial Pruning

Project: https://xuxw98.github.io/DSPDet3D/
Paper: https://arxiv.org/abs/2305.03716
Code: https://github.com/xuxw98/DSPDet3D

Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection

Paper: https://arxiv.org/abs/2402.03634
Code: https://github.com/LiewFeng/RayDN

3D语义分割(3D Semantic Segmentation)

图像编辑(Image Editing)

图像补全/图像修复(Image Inpainting)

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

Project https://tencentarc.github.io/BrushNet/
Paper: https://arxiv.org/abs/2403.06976
Code: https://github.com/TencentARC/BrushNet

视频编辑(Video Editing)

Low-level Vision

Restoring Images in Adverse Weather Conditions via Histogram Transformer

Paper: https://arxiv.org/abs/2407.10172
Code: https://github.com/sunshangquan/Histoformer

OneRestore: A Universal Restoration Framework for Composite Degradation

Project https://gy65896.github.io/projects/ECCV2024_OneRestore
Paper: https://arxiv.org/abs/2407.04621
Code: https://github.com/gy65896/OneRestore

超分辨率(Super-Resolution)

去噪(Denoising)

图像去噪(Image Denoising)

3D人体姿态估计(3D Human Pose Estimation)

图像生成(Image Generation)

Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models

Paper: https://arxiv.org/abs/2404.07389
Code: https://github.com/YasminZhang/EBAMA

Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization

Project: https://kaminyou.com/Dense-Normalization/
Paper: https://arxiv.org/abs/2407.04245
Code: https://github.com/Kaminyou/Dense-Normalization

ZIGMA: A DiT-style Zigzag Mamba Diffusion Model

Paper: https://arxiv.org/abs/2403.13802
Code: https://taohu.me/zigma/

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

Paper: https://arxiv.org/abs/2403.16394
Code: https://github.com/zdxdsw/skewed_relations_T2I

视频生成(Video Generation)

VideoStudio: Generating Consistent-Content and Multi-Scene Videos

Project: https://vidstudio.github.io/
Code: https://github.com/FuchenUSTC/VideoStudio

3D生成

视频理解(Video Understanding)

VideoMamba: State Space Model for Efficient Video Understanding

Paper: https://arxiv.org/abs/2403.06977
Code: https://github.com/OpenGVLab/VideoMamba

C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Paper: https://arxiv.org/abs/2407.06113
Code: https://github.com/RongchangLi/ZSCAR_C2C

行为识别(Action Recognition)

SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders

Paper: https://arxiv.org/abs/2407.13460
Code: https://github.com/pha123661/SA-DVAE

知识蒸馏(Knowledge Distillation)

图像压缩(Image Compression)

Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation

Code: https://github.com/qingshi9974/ECCV2024-AdpatICMH
Paper: http://arxiv.org/abs/2407.09853

立体匹配(Stereo Matching)

场景图生成(Scene Graph Generation)

计数(Counting)

Zero-shot Object Counting with Good Exemplars

Paper: https://arxiv.org/abs/2407.04948
Code: https://github.com/HopooLinZ/VA-Count

视频质量评价(Video Quality Assessment)

数据集(Datasets)

其他(Others)

Multi-branch Collaborative Learning Network for 3D Visual Grounding

Paper: https://arxiv.org/abs/2407.05363v2
Code: https://github.com/qzp2018/MCLN

PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers

Code: https://github.com/ananthu-aniraj/pdiscoformer
Paper: https://arxiv.org/abs/2407.04538

SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

Project: https://fraunhoferhhi.github.io/spvloc/
Paper: https://arxiv.org/abs/2404.10527
Code: https://github.com/fraunhoferhhi/spvloc

REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices

Project: https://xdimlab.github.io/REFRAME/
Paper: https://arxiv.org/abs/2403.16481
Code: https://github.com/MARVELOUSJI/REFRAME