Home

Awesome

scTCRseq

Introduction

For specific questions/problems please email David Redmond at: dar2042@med.cornell.edu

This project is an implementation of a pipeline for Single-cell RNAseq package for recovering TCR data in python

Github Project

Configuration and Dependencies

The pipeline needs for the following programs to be installed and the paths :

###SEQTK: https://github.com/lh3/seqtk

###Blastall: http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/blast/executables/release/2.2.15/

###GapFiller: http://www.baseclear.com/genomics/bioinformatics/basetools/gapfiller

###Vidjil: https://github.com/vidjil/vidjil

And their accompanying paths need to be changed in the script cmd_line_sctcrseq.py:

seqTkDir="/path/to/seqtk/"

blastallDir="/path/to/blastall/"

gapFillerDir="/path/to/GapFiller_v1-10_linux-x86_64/"

vidjildir="/path/to/vidjil/"

lengthScript="/path/to/calc.median.read.length.pl"

###Reference TCR sequences:

Also the user can select their chosen TCR alpha and beta V and C reference databases (we recommend downloading from imgt.org) and enter their locations:

location for FASTA BLAST reference sequences downloadable from imgt.org - (needs to be changed manually)

humanTRAVblast="/path/to/TRAV.human.fa"

humanTRBVblast="/path/to/TRBV.human.fa"

humanTRACblast="/path/to/TRAC.human.fa"

humanTRBCblast="/path/to/TRBC.human.fa"

mouseTRAVblast="/path/to/TRAV.mouse.fa"

mouseTRBVblast="/path/to/TRBV.mouse.fa"

mouseTRACblast="/path/to/TRAC.mouse.fa"

mouseTRBCblast="/path/to/TRBC.mouse.fa"

location for Vidjil BLAST reference sequences in vidjil program - (needs to be changed manually)

humanVidjilRef="/path/to/tr_germline/human"

mouseVidjilRef="/path/to/tr_germline/mouse"

Example Command Line

We recommend running the pipeline on paired end fluidigm single cell RNA seq data.

The usage is as follows:

####python cmd_line_sctcrseq.py --fastq1 FASTQ1 --fastq2 FASTQ2 --species human/mouse --outdir OUTPUT DIRECTORY --label OUTPUT LABEL

Also running:

####python cmd_line_sctcrseq.py

Will give command line options

To test the program we recommend either using your own single cell RNA sequencing data or downloading data such as at

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1104129