Awesome

MF-LBM: A Portable, Scalable and High-performance Lattice Boltzmann Code for DNS of Flow in Porous Media

<img src="images/core-dns.png" alt="drawing" width="900"/>  <details open="open"> <summary>Table of Contents</summary> <ol> <li><a href="#about-the-code">About The Code</a></li> <ul> <li><a href="#features">Features</a></li> </ul> <ul> <li><a href="#components">Components</a></li> </ul> <ul> <li><a href="#technical-details-of-the-main-simulation-code">Technical details of the main simulation code</a></li> </ul> <ul> <li><a href="#citing-mf-lbm">Citing MF-LBM</a></li> </ul> <li><a href="#performance-benchmarking">Performance Benchmarking</a></li> <li><a href="#build-instructions">Build Instructions</a></li> <ul> <li><a href="#prerequisites">Prerequisites</a></li> </ul> <ul> <li><a href="#installation">Installation</a></li> </ul> <li><a href="#usage">Usage</a></li> <ul> <li><a href="#pre-processing-code">Pre-processing code</a></li>  </ul> <ul> <li><a href="#the-main-simulation-code">The main simulation code</a></li> </ul> <ul> <li><a href="#output-files">Output files</a></li> </ul> <li><a href="#important-notes">Important Notes</a></li> <li><a href="#acknowledgements">Acknowledgements</a></li> <li><a href="#license">License</a></li> <li><a href="#contact">Contact</a></li> </ol> </details>

About The Code

MF-LBM 1 2 is a high-performance lattice Boltzmann (LB) code for direct numerical simulation (DNS) of flow in porous media, primarily developed by Dr. Yu Chen (LANL), under the supervision of Prof. Albert Valocchi (UIUC), Dr. Qinjun Kang (LANL) and Dr. Hari Viswananthan (LANL). 'MF' refers to microfluidics or 'Magic Find'. The code was first developed at University of Illinois at Urbana-Champaign based on a mainstream LB color-gradient multiphase model and further improved at Los Alamos National Laboratory by implementing the Continuum-Surface-Force and geometrical wetting models to reduce spurious currents so that the inertial effects in scCO2 and brine displacement in porous media can be accounted for 2. In addition, a single-phase flow solver is also provided for absolute permeability measurement or DNS of turbulent flow. Only double precision is supported.

Features

exploring multiple levels of parallelism
extensively optimized for vectorization
directive-based parallel programing model supporting CPU, GPU, MIC and ARM
advanced LB multiphase model (CSF model + geometrical wetting model) ensuring relatively small spurious currents
overlapped communication and computation
sample geometry files, pre-processing and post-processing code included

Components

Sample geometry files (in submodule):
- simple_geometry_tube_sphere: a tube with a spherical obstacle in the center
- rock_sample_text_images: text images from cropped Bentheimer sandstone scans 3
- rock_sample_wall_array_converted: binary file of the wall array converted from text images
- rock_sample_wall_array_processed: binary file of the wall array converted from text images with added buffer layers
Pre-processing code:
- convert_textimages_to_WallArray: converting text images from rock scans to a single 3D wall array stored in binary format
- create_geometry_to_WallArray: creating simple geometries
- geometry_modification: modifying the 3D wall array, i.e., adding inlet/outlet buffer layers or cropping
- wall_boundary_preprocess: computing normal directions of the solid surfaces
Main simulation code:
- multiphase_3D: 3D multiphase flow simulation code
- singlephase_3D: 3D single-phase flow simulation code
Post-processing code:
- distributed_phi_to_vtk_conversion: for extremely large simulation only, converting distributed phase field data to vtk file for visualization
- distributed_full_flow_field_process: for extremely large simulation only, converting distributed full flow field data for further analysis

Technical details of the main simulation code

Modern manycore processors/coprocessors, such as GPUs, are developing rapidly and greatly boost computing power. These processors not only provide much higher FLOPS (floating-point operations per second) but also much higher memory bandwidth compared with traditional CPU. One of the most attractive features of the lattice Boltzmann method (LBM) is that it is explicit in time, has nearest neighbor communication, and the computational effort is in the collision step, which is localized at a grid node. For these reasons, the LBM is well suited for manycore processors which require a higher degree of explicit parallelism. The data movement in the LBM is much more intensive than for traditional CFD considering that the D3Q19 lattice model has 19 lattice velocities. Given the current state of computational hardware, in particular the relative speed and capacity of processors and memory, the LBM is a memory-bandwidth-bound numerical method. The high memory bandwidth provided by GPUs greatly benefits the LBM.

The code is written on Fortran 90 and employs MPI-OpenACC/OpenMP hybrid programing model. The main reason that we chose OpenACC/OpenMP (directive-based parallel programming models) over CUDA is that we want to keep the code portable across different computing platforms so that we are not limited by the NVIDIA GPU solution. As GPU and Intel Xeon Phi processor (and even latest CPU from Intel with AVX512 instructions) rely heavily on SIMT/SIMD, the optimization strategy for these manycore processors/coprocessors are similar, which enables us to achieve reasonable performance across different platforms:

The AA pattern streaming method is employed to significantly reduce memory access and memory consumption.
The structure of arrays (SoA) data layout is used to achieve coalesced memory access and maximize vectorization.
Communication and computation is overlapped to achieve good parallel efficiency, particularly for heterogenous computing platforms.

Citing MF-LBM

Chen, Y., Valocchi, A., Kang, Q., & Viswanathan, H. S. (2019). Inertial effects during the process of supercritical CO2 displacing brine in a sandstone: Lattice Boltzmann simulations based on the continuum‐surface‐force and geometrical wetting models. Water Resources Research, 55, 11144– 11165. https://doi.org/10.1029/2019WR025746

Build Instructions

Prerequisites

A Fortran compiler
A MPI implementation (most of the run scripts in this repo are based on OpenMPI)
NVIDIA Fortran compiler (free from NVIDIA HPC SDK, for NVIDIA GPU platform)
Intel Fortran compiler (free from Intel oneAPI HPC Toolkit, for Intel Xeon Phi platform)
Make

Installation

Clone the repo

git clone https://github.com/lanl/MF-LBM.git

Initiate submodules for external non-code files (geometry files used in the test suites)
```
cd path-to-MF-LBM
git submodule init
git submodule update
```

Make

cd path-to-MF-LBM/multiphase_3D

CPU version

# Make necessary changes to the makefile:
# Choose CPU as the architecture option.
# Choose a proper compiler.
# Enabling OpenMP is recommended.
# See instructions in the makefile for more information.  
your-preferred-editor makefile
make
# MF_LBM.cpu will be generated

GPU version

# Make necessary changes to the makefile:
# Choose GPU as the architecture option.
# Choose the NVIDIA compiler (recommended for NVIDIA GPU).
# OpenMP must be disabled.
# See instruction in the makefile for more information.  
your-preferred-editor makefile
make
# MF_LBM.gpu will be generated

MIC (Intel Xeon Phi) version

# Make necessary changes to the makefile:
# Choose MIC as the architecture option.
# Choose a proper compiler (Intel compiler is recommended).
# OpenMP and AVX512 must be enabled for the MIC version.
# See instructions in the makefile for more information.  
your-preferred-editor makefile   
make
# MF_LBM.mic will be generated

Configure the run script (for CPU platform)

cd working_directory
cp path-to-MF-LBM/multiphase_3D/run_template/template-config.sh ./config.sh
# Make necessary changes to config.sh (i.e., paths and run command). 
# See instructions in template-config.sh for more information.
your-preferred-editor config.sh
./config.sh

# A script, irun.sh, will be generated.  

# If OpenMP is enabled (recommended for CPU, MIC, and ARM platform), then run the following command:
# export OMP_NUM_THREADS=n, where n is recommended to be the core or thread count of the UMA domain of the CPU. Hyperthreading may help in some cases.

# At least one MPI rank per UMA domain is recommended.

# For GPU platform, the number of MPI processes should equal to the total number of GPUs: one MPI rank per GPU.

Modify the simulation control file

# Make necessary changes to simulation_control.txt. See instructions in simulation_control.txt for more information.
your-preferred-editor simulation_control.txt

Run the program

# See instructions in template-config.sh and irun.sh for more information.
./irun.sh new

Usage

Pre-processing code

convert text images to wall array
```
cd path-to-MF-LBM/preprocessing/1.convert_textimages_to_WallArray
./compile.sh
./a.out
```
This example converts rock_sample_text_images to a single wall array stored in binary format. No cropping and modification of the rock geometry are made. Loading a binary wall array is much faster than loading large number of text images. This shall be the first step to read in text images before further modification.
create geometry
```
cd path-to-MF-LBM/preprocessing/1.create_geometry_to_WallArray
./compile.sh
./a.out
```
This example creates a tube with a spherical obstacle in the center and stored the corresponding wall array in binary format. One can modify the code to create different geometries.
geometry modification
```
cd path-to-MF-LBM/preprocessing/2.geometry_modification
./compile.sh
./a.out
```
This example reads in a wall array file rock_sample_wall_array_converted which is converted from rock_sample_text_images, and add buffer layers at the inlet and outlet so that proper boundary conditions can be applied. This code can also be used to crop large samples.
wall boundary pre-processing
```
cd path-to-MF-LBM/preprocessing/3.wall_boundary_preprocess
./compile.sh
./a.out
```
This example reads in the processed wall array file rock_sample_wall_array_processed, computes the normal directions of the solid surface on all solid boundary nodes and stores the boundary information in binary format which can be later loaded into the main flow simulation program. Read compile.sh for the compiler information (extremely important) for this example. The same boundary info calculation can be completed in the main flow simulation code (Geometry_preprocessing.F90). However, when the rock sample is relatively large, it is recommended to perform the boundary info calculation before the main simulation and load the boundary info from the external file into the main simulation.

The main simulation code

Check out template-simulation_control.txt for more information regarding simulation control. The units used in the simulation control file are all lattice units. One can control capillary number, contact angle, absolute values of surface tension and viscosities to link the simulation with a physical system. In particular, the absolute values of surface tension and viscosities will affect Reynolds number even when the capillary number is fixed. The Ohnesorge number is recommended to control the parameters when inertial effects are not negligible 2.

Contact angle measurement

cd working_directory
cp path-to-MF-LBM/test_suites/3D_simulation/1.drop_attached_wall/config.sh ./
# edit config.sh (path and run command based on your system; path does not need to be changed if using the default folder)
your-preferred-editor config.sh
./config.sh  
./irun.sh new

This example is used to measure contact angle, where a nonwetting drop (fluid1 drop) is attached to a flat wall (y=1 plane). The default run script is for a CPU system with two NUMA domains.

Drainage in a square duct

cd working_directory
cp path-to-MF-LBM/test_suites/3D_simulation/2.drainage/config.sh ./
# edit config.sh (path and run command based on your system; path does not need to be changed if using the default folder)
your-preferred-editor config.sh
./config.sh  
./irun.sh new

This example simulates nonwetting fluid1 displacing wetting fluid2 in a square duct until one pore-volume of fluid1 is injected. The default run script is for a single GPU card.

Drainage in a tube with a spherical obstacle

# The geometry is created inside the main flow simulation code for convenience (Misc.F90/modify_geometry subroutine). Not the recommended way to run the program. 
cd working_directory
cp path-to-MF-LBM/test_suites/3D_simulation/3.drainage_external_geometry/config.sh ./
# edit config.sh (path and run command based on your system; path does not need to be changed if using the default folder)
your-preferred-editor config.sh
./config.sh  
./irun.sh new

This example simulates nonwetting fluid1 displacing wetting fluid2 in a tube with a spherical obstacle in the center. The tube and spherical obstacle are created inside the simulation code. Simulation stops when the nonwetting fluid reaches outlet. The default run script is for a single GPU card.

Imbibition in a real rock sample using external rock geometry file

# The geometry file is created from the pre-processing code example (MF-LBM-extFiles/geometry_files/sample_rock_geometry_wallarray/bentheimer_in10_240_240_240_out10.dat)
cd working_directory
cp path-to-MF-LBM/test_suites/3D_simulation/4.imbibition_external_geometry/config.sh ./
# edit config.sh (path and run command based on your system; path does not need to be changed if using the default folder)
your-preferred-editor config.sh
./config.sh  
./irun.sh new

This example simulates wetting fluid2 displacing nonwetting fluid1 in a real rock sample using an external rock geometry file. Simulation stops when one pore-volume of fluid2 is injected. The default run script is for a single GPU card.

Steady state relative permeability measurement

# The geometry file is created from the pre-processing code example (MF-LBM-extFiles/geometry_files/sample_rock_geometry_wallarray/bentheimer_in10_240_240_240_out10.dat). The boundary info file need to be created using the wall_boundary_preprocess code (preprocessing/3.wall_boundary_preprocess)
cd path-to-MF-LBM/preprocessing/3.wall_boundary_preprocess
./compile.sh
./a.out

cd working_directory
cp path-to-MF-LBM/test_suites/3D_simulation/5.fractional_flow_external_geometry_preprocessed/config.sh ./
# edit config.sh (path and run command based on your system; path does not need to be changed if using the default folder)
# specify the paths of the geometry file and solid-boundary-info file on config.sh
your-preferred-editor config.sh
./config.sh  
./irun.sh new

This example simulates body force driven fractional flow for steady state relative permeability measurement, using an external rock geometry file and the corresponding pre-computed boundary info file. The default run script is for a single GPU card.

Performance benchmarking

# The geometry file is created from pre-processing code example (MF-LBM-extFiles/geometry_files/sample_rock_geometry_wallarray/bentheimer_in10_240_240_240_out10.dat). The boundary info file need to be created use the wall_boundary_preprocess code (preprocessing/3.wall_boundary_preprocess)
cd path-to-MF-LBM/preprocessing/3.wall_boundary_preprocess
./compile.sh
./a.out

cd working_directory
cp path-to-MF-LBM/test_suites/3D_simulation/6.performance_benchmarking/config.sh ./
# edit config.sh (path and run command based on your system; path does not need to be changed if using the default folder)
# specify the paths of the geometry file and solid-boundary-info file on config.sh
your-preferred-editor config.sh
./config.sh  
./irun.sh new

This example is identical to the previous example except that the benchmarking command is enabled in configuration file. The simulation will run 300 time steps and give the computational performance in MFLUPS (million fluid lattices update per second) every 100 time steps. Due to the size of the sample, this example is suitable for benchmarking performance on a single CPU computing node or GPU card. The default run script is for a single GPU card.

Absolute permeability measurement

# The geometry file is created from the pre-processing code example (MF-LBM-extFiles/geometry_files/sample_rock_geometry_wallarray/bentheimer_in10_240_240_240_out10.dat).)
cd working_directory
cp path-to-MF-LBM/test_suites/3D_simulation/7.singlephase_abs_perm/config.sh ./
# edit config.sh (path and run command based on your system; path does not need to be changed if using the default folder)
# specify the paths of the geometry file on config.sh
your-preferred-editor config.sh
./config.sh  
./irun.sh new

This example simulates body force driven single-phase flow for absolute permeability measurement, using an external rock geometry file. The value of the body force should be adjusted so that the flow is in the Stokes flow regime. The default run script is for a single GPU card.

Performance benchmarking - singlephase

# The geometry file is created from pre-processing code example (MF-LBM-extFiles/geometry_files/sample_rock_geometry_wallarray/bentheimer_in10_240_240_240_out10.dat)
cd working_directory
cp path-to-MF-LBM/test_suites/3D_simulation/6.performance_benchmarking/config.sh ./
# edit config.sh (path and run command based on your system; path does not need to be changed if using the default folder)
# specify the paths of the geometry file on config.sh
your-preferred-editor config.sh
./config.sh  
./irun.sh new

Output files

Three output directories will be created:

out1.output: bulk properties (i.e., saturation, flow rate) against time. See Monitor.F90 for more information.
out2.checkpoint: checkpoint data used to restart simulation. See IO_multiphase.F90 for more information.
out3.field_data: legacy vtk files for flow analysis. See IO_multiphase.F90 for more information. For extremely large simulation (extreme_large_sim_cmd=1 in template-simulation_control.txt), distributed flow field data will be stored for performance consideration. Post-processing code is provided to process those distributed data.

Important Notes

Future development

There is no plan to further develop this code in the future. Main limitations of this code are:
- Fortran as a dying language
- AA pattern streaming method
- No OOP design
Potential future routes for interested developers/researchers:
- C++ MPI/OpenMP/CUDA hybrid programming
- SSS streaming technique for direct addressing data structure
- Esoteric type streaming technique for semi-direct or indirect addressing data structure
Best practice of running MF-LBM on different platforms:
1. AVX512 is recommended to be enabled for Intel CPUs that support AVX512. AVX512 must be enabled for Intel Xeon Phi processors.
2. Multithreading generally improves performance of the code. However, for small problems, if there are already many CPU cores (i.e., an AMD 64-core CPU), multithreading many not bring any benefits.
3. GPUs require a high degree of parallelism, where a small domain problem may not utilize the full potential of a GPU. Recommended domain size per GPU: from 2003 until GPU memory full.
Contact angle:

The contact angle in the control file must be less or equal to 90 degrees due to the particular numerical scheme used, meaning that fluid1 and fluid2 will always be the nonwetting phase and wetting phase, respectively. Drainage and imbibition can be completed by injecting fluid1 and fluid2 respectively. The contact angle alteration scheme from Akai et al., 2018 has been implemented and the above limitation no longer exists.
Run command:

Several sample run commands are listed in template-config.sh. This code employs MPI-OpenMP hybrid programing model for the non-GPU versions, where memory affinity on NUMA architectures is very important to achieve expected performance. One should use one or more MPI ranks per UMA domain to avoid OpenMP parallelization across NUMA domains, and set appropriate socket/NUMA affinity in OpenMPI (or other MPI implementations). Number of threads in OpenMP should not exceed the core count or thread count (if multithreading is enabled) of corresponding UMA.
Domain decomposition:

Domain decomposition along X direction is no longer supported in the main simulation code for the moment, due to the consideration of non-vectorized data packing and halo area computation. Domain decompositions along Y and Z direction are usually sufficient as single MPI rank corresponds to tens of CPU cores or a full GPU.
GCC10 compiler issue:

Building the code with GCC10 may show error messages like

Type mismatch between actual argument at (1) and actual argument at (2)

This is a known issue with GCC10. Use the new GCC10 option -fallow-argument-mismatch to turn these errors to warnings for the moment.

Acknowledgements

This work was supported by LANL's LDRD program and was supported as part of the Center for Geologic Storage of CO2, an Energy Frontier Research Center funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, under Award DE-SC0C12504.

License

Distributed under the BSD-3 License. See LICENSE for more information.

This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.

Contact

Dr. Yu Chen - yu_chen_007@outlook.com

~~MF-LBM-live which is forked from this repo will be used to develop new features by Dr. Yu Chen.~~

Dr. Qinjun Kang - qkang@lanl.gov