Awesome
LUAD analysis using DeepProfiler
This repository contains the source code to run the cmVIP analysis in the LUAD dataset. Our internal repository is here.
Profiling
1. Install requirements
This folder is a DeepProfiler
project. Experiments reported in the paper used the
c91b9d8
commit.
To install the dependencies, including the DeepProfiler version we used, run:
$ pip install -r requirements.txt
2. Download the data
Be aware this script will override any previous data. To download the data run:
$ utils/download_all.sh
3. Prepare the data.
-
Run
extract_locations.py
script to generate location files. -
Use DeepProfiler to prepare the dataset:
$ python3 -m deepprofiler --root=./ --config luad.json --gpu 0 prepare
--gpu
option sets the GPU id to use.
4. Extract features.
Use DeepProfiler to extract features:
$ python3 -m deepprofiler --gpu 0 --exp efn_pretrained --root ./ --config luad.json profile
5. Create well profiles.
To create the well-based profiles run:
$ python3 utils/create_profiles.py
It will write a pd.DataFrame
in parquet with profiles.
VIP analysis
The analysis is split in three notebooks:
- 1-Expression-VIP.ipynb: Run the baseline analysis using L1000 profiling.
- 2-Cell-Morphology-VIP.ipynb: Run the Cell Morphology VIP method.
- 3-Aggregation-plots.ipynb: Create the plots summarizing results.
Notes about the dataset
From the paper:
An additional 88 constructs are included in the dataset, representing TP53 alleles that inadvertently had double mutations. A comprehensive description of the process for selecting the constructs that were analyzed is presented in Supplementary Figure 2.
We have filtered out these constructs in the Filter quality control status section of the 2-Cell-Morphology-VIP.ipynb notebook.