Awesome
FIDDLE
An integrative deep learning framework for functional genomic data inference.
A project from the Churchman Lab, Harvard Medical School Department of Genetics.
Based on: [http://biorxiv.org/content/early/2016/10/17/081380.full.pdf]
Ongoing:
- Generalized data preparation pipeline
- GUI interface
On this page:
- Installation and Quick Start
- Input File Details
- HMS Orchestra HPC Instructions
Installation and Quick Start
The quick start can be done on a local machine, an HPC environment is more desirable however.
1. Set up FIDDLE environment:
NOTE: Requires python 2.7 and pip. Anaconda can be a nuisance, make sure to comment out any "export PATH"s to Anaconda in your ~/.bash_profile or ~/.bashrc and then re-source it (or even restart current terminal session):
a) Install Python package manager pip:
$ sudo easy_install pip
b) Install isolated Python environments:
$ sudo pip install virtualenv
c) Clone this repository to an appropriate location (for instance ~/Desktop):
$ git clone https://github.com/ueser/FIDDLE.git
d) Instantiate FIDDLE virtual environment, source it:
$ sudo virtualenv venvFIDDLE
$ source venvFIDDLE/bin/activate
e) Install necessary Python packages to FIDDLE virtual environment:
$ pip install -r requirements.txt
2. Download training/validation/test datasets:
a) Create data directory:
$ cd FIDDLE/
$ mkdir -p data/hdf5datasets/
b) Download quickstart datasets:
Place the following datasets in /FIDDLE/data/hdf5datasets/
WARNING: several gb of data
3) Run FIDDLE
$ cd fiddle
Documentation Interlude
There are two (of many) methods to examine FIDDLE's internal documentation and docstrings:
a) Instantiating a Python session and using the help() function:
$ python
>>> import main # or any other FIDDLE Python script
>>> help(main)
b) Employing the --help (or -h) flag (only shows information about flags):
$ python main.py --help
$ python main.py
4) Create visualization of training:
$ python visualization.py
5) Create representations and predictions datasets:
$ python analysis.py
6) Examine training trajectory:
Change directories to FIDDLE/results/ < --runName (default = experiment) > /. The training trajectory visualization files (.png and .gif) are found in this directory. The representations and predictions created in step 5 are found in the hdf5 files "representations.h5" and "predictions.h5".
7) Plot results:
Change directories to FIDDLE/fiddle and instantiate a jupter notebook session, start up the 'predictions_visualization.ipynb' and follow the instructions outlined in the Markdown cells.
To download Jupyter Notebook, start here: http://jupyter.readthedocs.io/en/latest/install.html.
$ jupyter notebook
Input File Details:
For more complete instructions on file types and FIDDLE's work flow, open up the 'guide.ipynb' jupyter notebook.
$ cd FIDDLE/fiddle
$ jupyter notebook
HMS Orchestra HPC Instructions:
1) Start interactive session, enter FIDDLE directory:
$ bsub -Is -q interactive bash
$ cd FIDDLE/
2) Load correct Tensorflow module
$ module load dev/tensorflow/1.0-GPU
3) Set up virtual environment
Orchestra's Tensorflow module does not play nice with virtual environments, the module above must be loaded before instantiating and then sourcing a virtual environment. More here: https://wiki.med.harvard.edu/Orchestra/PersonalPythonPackages
a) Instantiate, then source the virtual environment:
$ virtualenv venvFIDDLE --system-site-packages
$ source venvFIDDLE/bin/activate
b) Comment out the 'tensorflow==1.0.1' line in the requirements.txt file:
$ vim requirements.txt
tensorflow==1.0.1 --> #tensorflow==1.0.1
c) pip install remaining requirements:
$ pip install -r requirements.txt
4) Put those dwindling GPUs on blast:
A template for submission lies in FIDDLE/fiddle/, modify accordingly. More on GPU usage here: https://wiki.med.harvard.edu/Orchestra/OrchestraNvidiaGPUs.
$ vim orchestra_job_submit.sh
$ bash orchestra_job_submit.sh