Awesome

Osmosis: RGBD Diffusion Prior for Underwater Image Restoration [ECCV 2024]

Paper | Project Page

Paper accepted by ECCV 2024!

Osmosis: RGBD Diffusion Prior for Underwater Image Restoration

Opher Bar Nathan | Deborah Levy | Tali Treibitz | Dan Rosenbaum

This repository contains official PyTorch implementation for Osmosis: RGBD Diffusion Prior for Underwater Image Restoration, ECCV, 2024.

intro

This code is based on guided-diffusion and DPS.

Abstract

Underwater image restoration is a challenging task because of water effects that increase dramatically with distance. This is worsened by lack of ground truth data of clean scenes without water. Diffusion priors have emerged as strong image restoration priors. However, they are often trained with a dataset of the desired restored output, which is not available in our case. We also observe that using only color data is insufficient, and therefore augment the prior with a depth channel. We train an unconditional diffusion model prior on the joint space of color and depth, using standard RGBD datasets of natural outdoor scenes in air. Using this prior together with a novel guidance method based on the underwater image formation model, we generate posterior samples of clean images, removing the water effects. Even though our prior did not see any underwater images during training, our method outperforms state-of-the-art baselines for image restoration on very challenging scenes.

RGBD Prior

In the course of this research, an unconditional Diffusion Probabilistic Model (DDPM) is trained on RGBD (color image and depth map) data. The training follows improved-diffusion and guided-diffusion. To adapt the model for RGBD data (instead of RGB), we made specific modifications by adjusting the UNet input layer to handle 4 channels and the output layers to generate 8 channels.

The new prior is trained using 4 outdoor RGBD datasets: DIODE (only outdoor scenes), HRWSI, KITTI and ReDWeb-S.

The trained RGBD prior, named "osmosis_outdoor.pt," can be downloaded from the provided link

Datasets

The method is specifically designed for underwater scenes.

Consequently, underwater images are supplied, and simulated data was also examined for quantitative analysis.

Furthermore, the algorithm exhibits versatility for additional tasks such as dehazing, hence, a set of images with haze is included.

Underwater images - real data - link

This directory contains the real world underwater images showcased in both the paper and the appendix.

This folder contains two similar datasets.

Low Resolusion set - link - 256x256
High Resolusion set - link

Both datasets contain identical images, with the Low-Resolution Set serving as a cropped and resized version of the High-Resolution Set images.

Our method accepts input images of any resolution, but it standardizes the resolution by resizing them to 256 pixels on the smaller side and subsequently center cropping them.

The underwater images are sourced from three datasets: SQUID, SeaThru, SeaThru-Nerf and additional scenes captured by Dr. Derya Akkaynak, Matan Yuval and Deborah Levy.

The images are linear (were not undergo any non-linear processing) and undergo a white balance process.

Underwater images - Simulated data with Ground Truth - Link

As part of this study, underwater scenes were simulated to facilitate quantitative comparisons.

Images are sourced from the indoor dataset NYUv2, each accompanied by its corresponding depth map. This dataset comprises a total of 1449 images.

Each simulation includes 3 folders:

input - the simulated images
gt_rgb - Ground Truth color images
gt_depth - Ground Truth depth maps

Hazed images - link

We present preliminary results of applying this method to the dehazing task, therefore, we provide several images captured in hazed conditions.

Using your own data

In case you would like to try this method on your own data:

Place all images in the same folder.
In the configurations file, modify the field data: root: <path> to the folder path.
Specify the name in the data: name: <dataset_name> field; the results will be saved into a folder with the same name.
If there is ground truth data, indicate its path in the data: gt_rgb: <path> and data: gt_depth: <path> fields. Change the flag data: ground_truth: True (similar to the configurations in osmosis_simulation_sample_config.yaml).
If your data is not simulated or is not include linear images, setting the flag degamma_input: True often produces improved results.

Prerequisites

See the environment file: link

Getting started

1) Clone the repository

git clone https://github.com/osmosis-diffusion/osmosis-diffusion-code

cd osmosis-diffusion-code

2) Download pretrained checkpoint and data

Checkpoint

If such a directory does not exist, create a new directory named ./models.

From the link, download the checkpoint "osmosis_outdoor.pt" into ./models/ directory.

Datasets

If such a directory does not exist, create a new directory named ./data.

Download the relevant dataset into ./data/ directory.

3) Set environment

For This section there are two options:

Setting of local environment
Build Docker image

Option 1 - Local environment setting

Install dependencies

conda create -n osmosis python=3.8

conda activate osmosis

See dependencies at environment.yml file - link

Option 2 - Build Docker image

Before executing the following commands, ensure that the Docker engine, GPU driver, and appropriate CUDA are installed.

If using the Docker image, ensure that the data paths, model path, and results path are in the working directory.

Navigate to the osmosis-diffusion-code directory (where the project was cloned), and run the following commands in the command line:

Build a Docker image

docker build -t osmosis_docker .

Run docker image on Windows:

docker run -v %cd%:/home/osmosis-diffusion-code --gpus all -it --rm osmosis_docker

Run docker image on Linux:

docker run -v $(pwd):/home/osmosis-diffusion-code --gpus all -it --rm osmosis_docker

4) Inference

The configuration file structure is thoroughly outlined in this section, enabling users to modify configurations and fine-tune parameters for experimental purposes.

By default, results are saved in the directory ./results/<operator name>/<dataset name>/<date>/<run#>.

Additionally, both a log file and configuration file are stored in the same path.

To execute inference from the command line, navigate to the running directory and specify the Python file to run along with two arguments:

The first argument is the required configuration file (-c <path to config file>)
The second argument is the device ID (GPU) to run the inference on, default is 0. (-d <device id>)

python <script_name>.py -c <path to config file> -d <device id>

There are several examples bellow.

There are 4 possible configurations:

a) Underwater Image Restoration and Depth Estimation - real data

Relevant for real underwater images.

python osmosis_sampling.py -c ./configs/osmosis_sample_config.yaml

On the left is an underwater image, serving as the input to our method. In the middle is the restored RGB image, and on the right is the depth estimation, where blue represents close distances and yellow farther distances.

b) Underwater Image Restoration and Depth Estimation - simulated data

Applicable to simulated underwater images, where both the Ground Truth RGB image and depth map are provided.

python osmosis_sampling.py -c ./configs/osmosis_simulation_sample_config.yaml

The first row is the same as above. On the left is a simulated underwater image, serving as the input to our method. In the middle is the restored RGB image, and on the right is the depth estimation, where blue represents close distances and yellow farther distances. In the second row, there is the ground truth RGB image and the depth map.

c) Hazed Image Restoration and Depth Estimation

Relevant for images in haze environment.

python osmosis_sampling.py -c ./configs/osmosis_haze_sample_config.yaml

On the left is a simulated underwater image, serving as the input to our method. In the middle is the restored RGB image, and on the right is the depth estimation, where where blue represents close distances and yellow farther distances.

d) Sample from RGBD Prior - Without guidance

In this scenario, there is no guidance provided for the sampling process, resulting in the production of an RGB image and its corresponding depth map.

The absence of guidance implies no constraints on achieving a visually coherent image.

python RGBD_prior_sampling.py -c ./configs/RGBD_sample_config.yaml

Each pair of images (RGB image and depth map) is generated from the prior without any guidance on the sampling process. Here, black indicates close distances, and white represents farther distances.

Structure of configurations file

In this section the structure and the relevant fields in the configuration file are explained.

save_dir: ./results    # saving directory path

degamma_input: False # should be True in case of NOT linear images, or NOT simulated images, otherwise False
manual_seed: 0       # manual seed for the diffusion sampling process
rgb_guidance: False   # relevant only for the RGB guidance and producing depth map for the input image

save_singles: True   # save single results images - 1)reference image (input), 2)restored RGB image and 3)depth estimation image
save_grids: True     # save grid of the results, next to each other

record_process: True # record the sampling process
record_every: 200    # in case "record_process: True" - record every <value> steps (in this case - 200)

# change unet input and output - for RGBD
change_input_output_channels: True
input_channels: 4   # RGBD
output_channels: 8  # RGBD * 2 - learning sigma = True, if False 4

sample_pattern:     # the diffusion sampling pattern for the 
  pattern: pcgs     # original, pcgs - from gibbsDDRM

  # relevant only for "pattern: pcgs"
  # update phi's
  update_start: 0.7    # optimizing phi's (<value>*T)
  update_end: 0        
  n_iter: 20           # for each t step, the number of optimization steps for te phi's
  
unet_model:                               # unet model configurations
  model_path: ./models/osmosis_outdoor.pt # pretrained model path
  pretrain_model: osmosis                 # pretrained model name

conditioning:
  method: osmosis                       # conditioning method - osmosis, ps 

params:    
    loss_weight: depth                 # none, depth # if "none" so the rest has no meaning
    weight_function: gamma,1.4,1.4,1   # function,original- [0,1], gamma=((x+value[0])*value[1])^value[2]
    scale: 7,7,7,0.9                   # guidance scale for each channel (RGBD)
    gradient_clip: True,0.005          # gradient clipping value (is True)

# specify the loss and its weight/scale, if not specified so no auxiliary loss
# see the paper for details on the losses
aux_loss:
  aux_loss:
    avrg_loss: 0.5        # scale of average loss
    val_loss: 20          # scale of value loss

data:
  name: osmosis                      # dataset name
  root: .\data\underwater\high_res          # path of the dataset
  ground_truth: False                       # if the dataset includes ground truth - True, else - False
  gt_rgb: .\data\simulation_1\gt_rgb        # dataset ground truth paths - comment when no GT data
  gt_depth: .\data\simulation_1\gt_depth    # dataset ground truth paths - comment when no GT data


measurement:
  operator:

    name: underwater_physical_revised # underwater_physical_revised, haze_physical, noise (for check prior)
    optimizer: sgd                    # water parameters optimizer - options are adam, sgd

    depth_type: gamma                 # original- [0,1], gamma=((x+value[0])*value[1])^value[2]
    value: 1.4,1.4,1

    phi_a: 1.1,0.95,0.95              # initalized values
    phi_a_eta: 1e-5                   # step size for the optimization
    phi_a_learn_flag: True            # optimization flaf - if False, there is no optimization for this parameter  

    phi_b: 0.95, 0.8, 0.8             # initalized values
    phi_b_eta: 1e-5                   # step size for the optimization
    phi_b_learn_flag: True            # optimization flaf - if False, there is no optimization for this parameter  

    phi_inf: 0.14, 0.29, 0.49         # initalized values
    phi_inf_eta: 1e-5                 # step size for the optimization
    phi_inf_learn_flag: True          # optimization flaf - if False, there is no optimization for this parameter  

  noise:                              # added noise
    name: clean                       # clean - osmosis, gaussian - ps
    # sigma: 0                      # comment in case of "clean" uncomment in case of "gaussian"

Results Directory

The results are saved in the directory specified by the save_dir: <path> parameter in the configuration file.

Subdirectories are created within the specified <path> based on measurement: operator: name: <operator_name>, data: name: <data_name>, current date and run number, to prevent overwriting existing files.

Individual images are stored in the single_images directory, while grid results (consisting of the input image, restored RGB, and predicted depth displayed side by side) and process are saved under grid_results.

The path for single images will be: <path>/<operator_name>/<data_name>/<today_date>/<run_number>/single_images/.

For example: <path>/underwater_physical_revised/simulation/21-6-24/run2/single_images/.

Process Records

In case you would like to see how the sampling process looks like, set those fields to True and specify a value in the record_every: <value> field.

save_grids: True     
record_process: True 
record_every: 150

An example: (the process starts from left to right and sampled every 150 time step)

The first row shows the predicted image on that step, and the second row shows the depth map in that step, where blue represents close distances and yellow farther distances.

Citation

If you find this project useful, please consider citing:

@article{nathan2024osmosis,
  title={Osmosis: RGBD Diffusion Prior for Underwater Image Restoration}, 
  author={Bar Nathan, Opher and Levy, Deborah and Treibitz, Tali and Rosenbaum, Dan},
  journal={arXiv preprint arXiv:2403.14837},
  year={2024}
}