Home

Awesome

SBG-CGC course 2018

Join the chat at https://gitter.im/IARCbioinfo/SBG-CGC_course2018

IARC course on analyzing TCGA data in the SevenBridges Genomics Cancer Genomics Cloud (SBG-CGC). Slides are in the slide folder.

Description of the course

Learning objectives
After completing this workshop, participants will be able to run their own computational tools on the cloud using TCGA data using:

Main topics

Agenda and slides

Wednesday 28 February
09:00-10:00 Introduction to cloud computing and the SevenBridges architecture
10:00-10:30 Introduction to TCGA data
10:30-11:00 Break
11:00-11:30 Introduction to the SevenBridges web interface to run analyses
11:30-12:30 Practical application: run your first basic analysis in the cloud

Thursday 1 March
09:00-09:30 Introduction to Docker and DockerHub
09:30-11:00 Practical application: building your own Docker container and run it in the cloud
11:00-11:30 Break
11:30-12:30 Introduction to the R api and the CWL language

Friday 2 March
09:00-12:30 Practical application: running your own practical project in the cloud using the R api, CWL and Docker.
12:30-14:00 Lunch Break
14:00-17:00 Practical application: running your own practical project in the cloud using the R api, CWL and Docker.

Gitter Chat

A gitter channel is open for the course. This will allow participants to discuss on their projects but also to ask any question regarding the course.

IARC project presentation

We presented the scientific projects conducted during the last day of the course at the IARC omics discussion (april 6th 2018). Slides are hosted here

Laptop setup

Laptops use Ubuntu 16.04.

Docker is already installed. If you are curious, here is how to install it on Docker website.

If you need a good text editor, Atom is also installed.

Participants would need to install R and Rstudio. One possibility is to use the steps proposed in this gist.
Caution:

R package sevenbridges-r is also needed:

source("https://bioconductor.org/biocLite.R")
biocLite("sevenbridges")

Useful links

Projects

Important: your CGC token gives full access to your CGC account, including the protected TCGA data if you have access to it. This is like your username and password. This means that you should never share it with anyone, and only keep it in a secure location (not a USB key, a non-secure computer or a laptop leaving IARC).

Main steps to think about:

For each project, we have opened an issue to discuss on, and add a folder to host the code.

Project 1: needlestack variant calling. Issue. Code.

Project 2: neutral tumor evolution. Issue. Code.

Project 3: cell populations from RNA-seq. Issue. Code.

Tips and tricks

Add public reference files to a project

Through the web interface, choose the file and copy to your project.

You can also do this easily with the R client for the API:

a$copyFile(id = a$public_file(name = "Homo_sapiens_assembly38.fasta", exact = TRUE)$id, project = p$id)
a$copyFile(id = a$public_file(name = "Homo_sapiens_assembly38.fasta.fai", exact = TRUE)$id, project = p$id)

You can use the interface to get the precise name of the file you need.

Use the R API client to query data and add it to a project

This R script gives an example of how using the sevenbridges-r R package to query data in the CGC platefrom, and copy the resulting files to your project.

Create your docker container

A good starting point it to run the base container on your machine (docker run) and then to interactively install the software you need in the container. Keep note of the commands you use and then create a Dockerfile with them. Once done try to build from your docker file using docker build. See the docker tutorial for more details.