Home

Awesome

Nonlinear Causal Discovery with Confounders

This repository contains an implementation of the following paper

The method is named Deconfounded Functional Structure Estimation (DeFuSE).

<img src="defuse.png" alt="DeFuSE" width="400"/>

Contents

The simulations of DeFuSE are in Jupyter Notebooks:

The implementation of DeFuSE is in directory ./defuse/.

The code of full simulations (including other methods) is in directory ./simulation/.

Preliminaries

Environments

For Python, use conda to create an environment named defuse.

git clone https://github.com/chunlinli/defuse.git
cd defuse
conda env create -f environment.yml
conda activate defuse

Installing DeFuSE

To install DeFuSE, run the following Bash script.

pip install .

Installing other packages

To install NOTEARS, run the following Bash script.

pip install simulation/Python/notears 

For R, the version is 4.1.1 and the following packages are used.

pkg <- c(
    "CAM","lrpsadmm","pcalg","bnlearn","mvtnorm", # required
    "dplyr","tidyr","progress","ggplot2","tidyverse","glue","scales","kableExtra" # suggested
)
install.packages(pkg)

NOTE: some packages have dependencies unavailable from CRAN. The user may need to install them manually.

System information

The code is tested on a server with specs:

System Version:             Ubuntu 18.04.6 LTS 4.15.0-176-generic x86_64
Model name:                 Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
Total Number of Cores:      64
Memory:                     528 GB

No GPU is required.

Usage

For DeFuSE simulations, run the following notebooks.

For complete simulations, first run the following script to generate data.

python simulation/data.py

Then run the following scripts.

python simulation/defuse_simulation.py
python simulation/notears_simulation.py # requires NOTEARS
Rscript simulation/simulation.R         # requires other R packages

NOTE: the complete simulations will take more than 100 hrs to complete.

Citing information

If you find the code useful, please consider citing

@article{li2023nonlinear,
    author = {Chunlin Li, Xiaotong Shen, Wei Pan},
    title = {Nonlinear causal discovery with confounders},
    year = {2023},
    journal={Journal of the American Statistical Association}
}

The code is maintained on GitHub. This project is in development.

Implementing the structure learning algorithms is error-prone. If you spot any error, please file an issue here or contact me via email -- I will be grateful to be informed.

References

[1] Frot, B., Nandy, P., & Maathuis, M. H. (2019). Robust causal structure learning with some hidden variables, JRSSB. Open-sourced softwares: LRpS+GES is implemented by lrpsadmm and pcalg.

[2] Zheng, X., Dan, C., Aragam, B., Ravikumar, P., & Xing, E. P. (2020). Learning sparse nonparametric DAGs, AISTATS 2020. Open-sourced software: NOTEARS.

[3] Bühlmann, P., Peters, J., & Ernest, J. (2014). CAM: Causal additive models, high-dimensional order search and penalized regression, AOS. Open-sourced software: CAM.

[4] Colombo, D., Maathuis, M. H., Kalisch, M., & Richardson, T. S. (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables, AOS. Open-sourced software: RFCI is implemented by pcalg.

[5] Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. H., & Bühlmann, P. (2012). Causal Inference Using Graphical Models with the R Package pcalg, JSS. Open-sourced software: pcalg.

In addition, part of the simulation code is adapted from Frot's code and Zheng's code.

I would like to thank the authors of above open-sourced software.