Home

Awesome

Awesome-ICCV2023-Low-Level-Vision

A Collection of Papers and Codes in ICCV2023 related to Low-Level Vision

[In Construction] If you find some missing papers or typos, feel free to pull issues or requests.

Related collections for low-level vision

Overview

Image Restoration

SYENet: A Simple Yet Effective Network for Multiple Low-Level Vision Tasks with Real-time Performance on Mobile Device

DiffIR: Efficient Diffusion Model for Image Restoration

PIRNet: Privacy-Preserving Image Restoration Network via Wavelet Lifting

Focal Network for Image Restoration

Learning Image-Adaptive Codebooks for Class-Agnostic Image Restoration

Under-Display Camera Image Restoration with Scattering Effect

FSI: Frequency and Spatial Interactive Learning for Image Restoration in Under-Display Cameras

Multi-weather Image Restoration via Domain Translation

Adverse Weather Removal with Codebook Priors

Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond

Improving Lens Flare Removal with General Purpose Pipeline and Multiple Light Sources Recovery

High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net

Boundary-Aware Divide and Conquer: A Diffusion-Based Solution for Unsupervised Shadow Removal

Leveraging Inpainting for Single-Image Shadow Removal

Fine-grained Visible Watermark Removal

Physics-Driven Turbulence Image Restoration with Stochastic Refinement

Building Bridge Across the Time: Disruption and Restoration of Murals In the Wild

DDS2M: Self-Supervised Denoising Diffusion Spatio-Spectral Model for Hyperspectral Image Restoration

Fingerprinting Deep Image Restoration Models

Self-supervised Monocular Underwater Depth Recovery, Image Restoration, and a Real-sea Video Dataset

Image Reconstruction

Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction

Video Restoration

Snow Removal in Video: A New Dataset and A Novel Method

Video Adverse-Weather-Component Suppression Network via Weather Messenger and Adversarial Backpropagation

Fast Full-frame Video Stabilization with Iterative Optimization

Minimum Latency Deep Online Video Stabilization

Task Agnostic Restoration of Natural Video Dynamics

[Back-to-Overview]

Super Resolution

Image Super Resolution

On the Effectiveness of Spectral Discriminators for Perceptual Quality Improvement

SRFormer: Permuted Self-Attention for Single Image Super-Resolution

DLGSANet: Lightweight Dynamic Local and Global Self-Attention Network for Image Super-Resolution

Dual Aggregation Transformer for Image Super-Resolution

MSRA-SR: Image Super-resolution Transformer with Multi-scale Shared Representation Acquisition

Content-Aware Local GAN for Photo-Realistic Super-Resolution

Boosting Single Image Super-Resolution via Partial Channel Shifting

Feature Modulation Transformer: Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution

Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution

Lightweight Image Super-Resolution with Superpixel Token Interaction

Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution

Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution

MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces

Learning Correction Filter via Degradation-Adaptive Regression for Blind Single Image Super-Resolution

LMR: A Large-Scale Multi-Reference Dataset for Reference-Based Super-Resolution

Real-CE: A Benchmark for Chinese-English Scene Text Image Super-resolution

Learning Non-Local Spatial-Angular Correlation for Light Field Image Super-Resolution

Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution

HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models

ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution

Rethinking Multi-Contrast MRI Super-Resolution: Rectangle-Window Cross-Attention Transformer and Arbitrary-Scale Upsampling

Decomposition-Based Variational Network for Multi-Contrast MRI Super-Resolution and Reconstruction

CuNeRF: Cube-Based Neural Radiance Field for Zero-Shot Medical Image Arbitrary-Scale Super Resolution

Burst Super Resolution

Towards Real-World Burst Image Super-Resolution: Benchmark and Method

Self-Supervised Burst Super-Resolution

Video Super Resolution

Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution

Multi-Frequency Representation Enhancement with Privilege Information for Video Super-Resolution

Spatial-Temporal Video Super-Resolution

MoTIF: Learning Motion Trajectories with Local Implicit Neural Functions for Continuous Space-Time Video Super-Resolution

[Back-to-Overview]

Image Rescaling

Downscaled Representation Matters: Improving Image Rescaling with Collaborative Downscaled Images

[Back-to-Overview]

Denoising

Image Denoising

Random Sub-Samples Generation for Self-Supervised Real Image Denoising

Score Priors Guided Deep Variational Inference for Unsupervised Real-World Single Image Denoising

Unsupervised Image Denoising in Real-World Scenarios via Self-Collaboration Parallel Generative Adversarial Branches

Self-supervised Image Denoising with Downsampled Invariance Loss and Conditional Blind-Spot Network

Multi-view Self-supervised Disentanglement for General Image Denoising

Iterative Denoiser and Noise Estimator for Self-Supervised Image Denoising

Noise2Info: Noisy Image to Information of Noise for Self-Supervised Image Denoising

The Devil is in the Upsampling: Architectural Decisions Made Simpler for Denoising with Deep Image Prior

Lighting Every Darkness in Two Pairs: A Calibration-Free Pipeline for RAW Denoising

ExposureDiffusion: Learning to Expose for Low-light Image Enhancement

Towards General Low-Light Raw Noise Synthesis and Modeling

Hybrid Spectral Denoising Transformer with Guided Attention

[Back-to-Overview]

Deblurring

Image Deblurring

Multiscale Structure Guided Diffusion for Image Deblurring

Multi-Scale Residual Low-Pass Filter Network for Image Deblurring

Single Image Defocus Deblurring via Implicit Neural Inverse Kernels

Single Image Deblurring with Row-dependent Blur Magnitude

Non-Coaxial Event-Guided Motion Deblurring with Spatial Alignment

Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Video Deblurring

Exploring Temporal Frequency Spectrum in Deep Video Deblurring

[Back-to-Overview]

Deraining

From Sky to the Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal

Learning Rain Location Prior for Nighttime Deraining

Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks

Unsupervised Video Deraining with An Event Camera

Both Diverse and Realism Matter: Physical Attribute and Style Alignment for Rainy Image Generation

[Back-to-Overview]

Dehazing

MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor Formula for Image Dehazing

[Back-to-Overview]

Demosaicing

Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors

[Back-to-Overview]

HDR Imaging / Multi-Exposure Image Fusion

Alignment-free HDR Deghosting with Semantics Consistent Transformer

MEFLUT: Unsupervised 1D Lookup Tables for Multi-exposure Image Fusion

RawHDR: High Dynamic Range Image Reconstruction from a Single Raw Image

Learning Continuous Exposure Value Representations for Single-Image HDR Reconstruction

LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video Reconstruction

Joint Demosaicing and Deghosting of Time-Varying Exposures for Single-Shot HDR Imaging

GlowGAN: Unsupervised Learning of HDR Images from LDR Images in the Wild

Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Prediction

[Back-to-Overview]

Frame Interpolation

Video Object Segmentation-aware Video Frame Interpolation

Rethinking Video Frame Interpolation from Shutter Mode Induced Degradation

[Back-to-Overview]

Image Enhancement

Iterative Prompt Learning for Unsupervised Backlit Image Enhancement

Low-Light Image Enhancement

ExposureDiffusion: Learning to Expose for Low-light Image Enhancement

Implicit Neural Representation for Cooperative Low-light Image Enhancement

Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network

Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model

Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement

Low-Light Image Enhancement with Multi-Stage Residue Quantization and Brightness-Aware Attention

Dancing in the Dark: A Benchmark towards General Low-light Video Enhancement

NIR-assisted Video Enhancement via Unpaired 24-hour Data

Coherent Event Guided Low-Light Video Enhancement

[Back-to-Overview]

Image Harmonization/Composition

Deep Image Harmonization with Learnable Augmentation

Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation

TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

[Back-to-Overview]

Image Completion/Inpainting

Diverse Inpainting and Editing with GAN Inversion

Rethinking Fast Fourier Convolution in Image Inpainting

Continuously Masked Transformer for Image Inpainting

MI-GAN: A Simple Baseline for Image Inpainting on Mobile Devices

PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting

Video Inpainting

ProPainter: Improving Propagation and Transformer for Video Inpainting

Semantic-Aware Dynamic Parameter for Video Inpainting Transformer

CIRI: Curricular Inactivation for Residue-aware One-shot Video Inpainting

[Back-to-Overview]

Image Stitching

Parallax-Tolerant Unsupervised Deep Image Stitching

[Back-to-Overview]

Image Compression

RFD-ECNet: Extreme Underwater Image Compression with Reference to Feature Dictionary

COMPASS: High-Efficiency Deep Image Compression with Arbitrary-scale Spatial Scalability

Computationally-Efficient Neural Image Compression with Shallow Decoders

Dec-Adapter: Exploring Efficient Decoder-Side Adapter for Bridging Screen Content and Natural Image Compression

Semantically Structured Image Compression via Irregular Group-Based Decoupling

TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception

AdaNIC: Towards Practical Neural Image Compression via Dynamic Transform Routing

COOL-CHIC: Coordinate-based Low Complexity Hierarchical Image Codec

Video Compression

Scene Matters: Model-based Deep Video Compression

[Back-to-Overview]

Image Quality Assessment

Thinking Image Color Aesthetics Assessment: Models, Datasets and Benchmarks

Test Time Adaptation for Blind Image Quality Assessment

Troubleshooting Ethnic Quality Bias with Curriculum Domain Adaptation for Face Image Quality Assessment

SQAD: Automatic Smartphone Camera Quality Assessment and Benchmarking

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

[Back-to-Overview]

Style Transfer

AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks

Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers

All-to-key Attention for Arbitrary Style Transfer

StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models

StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model

Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer

HairNeRF: Geometry-Aware Image Synthesis for Hairstyle Transfer

[Back-to-Overview]

Image Editing

Adaptive Nonlinear Latent Transformation for Conditional Face Editing

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation

HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending

StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces

Diverse Inpainting and Editing with GAN Inversion

Effective Real Image Editing with Accelerated Iterative Diffusion Inversion

Conceptual and Hierarchical Latent Space Decomposition for Face Editing

Editing Implicit Assumptions in Text-to-Image Diffusion Models

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance

Out-of-domain GAN inversion via Invertibility Decomposition for Photo-Realistic Human Face Manipulation

Video Editing

RIGID: Recurrent GAN Inversion and Editing of Real Face Videos

Pix2Video: Video Editing using Image Diffusion

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs

[Back-to-Overview]

Image Generation/Synthesis / Image-to-Image Translation

Text-to-Image / Text Guided / Multi-Modal

Adding Conditional Control to Text-to-Image Diffusion Models

MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

Unleashing Text-to-Image Diffusion Models for Visual Perception

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Ablating Concepts in Text-to-Image Diffusion Models

Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation

Story Visualization by Online Text Augmentation with Context Memory

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

Dense Text-to-Image Generation with Attention Modulation

ITI-GEN: Inclusive Text-to-Image Generation

Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis

Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models

Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis

Zero-shot spatial layout conditioning for text-to-image diffusion models

A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis

Evaluating Data Attribution for Text-to-Image Models

Expressive Text-to-Image Generation with Rich Text

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis

Localizing Object-level Shape Variations with Text-to-Image Diffusion Models

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

Anti-DreamBooth: Protecting Users from Personalized Text-to-image Synthesis

Discriminative Class Tokens for Text-to-Image Diffusion Models

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

Image-to-Image / Image Guided

Reinforced Disentanglement for Face Swapping without Skip Connection

BlendFace: Re-designing Identity Encoders for Face-Swapping

General Image-to-Image Translation with One-Shot Image Guidance

GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images

Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

Controllable Person Image Synthesis with Pose-Constrained Latent Diffusion

Others for image generation

Conditional 360-degree Image Synthesis for Immersive Indoor Scene Decoration

Masked Diffusion Transformer is a Strong Image Synthesizer

Q-Diffusion: Quantizing Diffusion Models

The Euclidean Space is Evil: Hyperbolic Attribute Editing for Few-shot Image Generation

LFS-GAN: Lifelong Few-Shot Image Generation

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations

Smoothness Similarity Regularization for Few-Shot GAN Adaptation

UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation

Ray Conditioning: Trading Photo-consistency for Photo-realism in Multi-view Image Generation

Personalized Image Generation for Color Vision Deficiency Population

EGC: Image Generation and Classification via a Diffusion Energy-Based Model

Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers

Neural Characteristic Function Learning for Conditional Image Generation

LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

Perceptual Artifacts Localization for Image Synthesis Tasks

SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

Erasing Concepts from Diffusion Models

A Complete Recipe for Diffusion Generative Models

Efficient Diffusion Training via Min-SNR Weighting Strategy

Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption

AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

Video Generation

Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer

MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation

The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion

SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Text2Performer: Text-Driven Human Video Generation

StyleLipSync: Style-based Personalized Lip-sync Video Generation

Mixed Neural Voxels for Fast Multi-view Video Synthesis

WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

DreamPose: Fashion Video Synthesis with Stable Diffusion

Structure and Content-Guided Video Synthesis with Diffusion Models

[Back-to-Overview]

Others [back]

DDColor: Towards Photo-Realistic and Semantic-Aware Image Colorization via Dual Decoders

DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

Name Your Colour For the Task: Artificially Discover Colour Naming via Colour Quantisation Transformer

Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging

Deep Optics for Video Snapshot Compressive Imaging

SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning

Single Image Reflection Separation via Component Synergy

Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-Spectral Image Fusion

Talking Head Generation

Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

Handwriting/Font Generation

Few shot font generation via transferring similarity guided global style and quantization local style

<!-- ## Virtual Try-on -->