Home

Awesome

SQUID logo{:height="50%" width="50%"}

OVERVIEW

SQUID is designed to detect both fusion-gene and non-fusion-gene transcriptomic structural variations from RNA-seq alignment.

SQUID paper is published at Genome Biology. To reproduce the result of applying SQUID on simulation data and previously studied cell lines, follow the instructions from squidtest

INSTALLING PRE-COMPILED BINARIES

You do NOT need to install SQUID before using it, find the binary release here!

BUILDING FROM SOURCE

You only need to build from source if either the pre-built binaries (see above) don't work on your system or you want to make a change to the SQUID code.

Compiling SQUID requires Boost, GLPK, BamTools. A step by step installation construction can be found here for linux, and here for mac.

On Mac, you need to additionly run the following command to dynamicly linking dependent libraries:

export DYLD_LIBRARY_PATH=<bamtools_folder>/lib
export DYLD_LIBRARY_PATH=<glpk_folder>/lib

USAGE

SQUID takes in a sorted BAM file of RNA-seq alignment and outputs the detection of TSVs. When the concordant and chimeric alignments are separated into two BAM files in the case of STAR alignment, the concordant BAM file must be sorted. The command to run SQUID and the parameters are as follows.

squid [options] -b <Input_sorted_BAM> -o <Output_Prefix>
ParametersDefault valueData typeDescription
-cstring
-fstring
-pt0boolPhred type: 0 for Phred33, 1 for Phred64
-pl10intMaximum Length of continuous low Phred score to filter alignment
-pm4intThreshold to count as low Phred score
-mq1intMinimum mapping quality
-dp50000intMaximum paired-end aligning distance to be count as concordant alignment
-di20intMaximum distance of segment indexes to be count as read-through
-w5intMinimum edge weight
-r8doubleDiscordant edge ratio multiplier (normal/tumor cell ratio)
-a5intMax allowed degree
-G0boolWhether or not output graph file (0 for not outputing, 1 for outputing)
-CO0boolWhether or not output ordering of connected components (0 for not outputing, 1 for outputing)
-TO0boolWhether or not output ordering of all segments (0 for not outputing, 1 for outputing)
-RG0boolWhether or not output rearranged genome sequence (0 for not outputing, 1 for outputing)

OUTPUT SPECIFICATION

EXAMPLE WORKFLOW

Suppose you have the alignment BAM file, and chimeric BAM file generated by STAR (https://github.com/alexdobin/STAR), run SQUID with:

squid -b alignment.bam -c chimeric.bam -o squidout

Or a combined BAM file of both concordant and discordant alignments generated by BWA (http://bio-bwa.sourceforge.net/) or SpeedSeq (https://github.com/hall-lab/speedseq), run SQUID with

squid --bwa -b combined_alignment.bam -o squidout

An example can be run be downloading the sample data (sampledata.tgz) from (https://cmu.box.com/s/e9u6alp73rfdhfve2a51p6v391vweodq) into example folder, and decompress it with

tar -xzvf sampledata.tgz

Run SQUID command in example/SQUIDcommand.sh. Or if you want to test the workflow of STAR and SQUID, make sure STAR is in your path, and run bash script example/STARnSQUIDcommand.sh.

cd example
./SQUIDcommand.sh
./STARnSQUIDcommand.sh

Annotate SQUID output

To label the predicted TSVs as fusion-gene or non-fusion-gene type, and retrieve the corresponding gene names of fusion-gene TSVs, you can use the following python script.

Python dependencies:

Usage:

python <squid_folder>/utils/AnnotateSQUIDOutput.py [options] <GTFfile> <SquidPrediction> <OutputFile>

Note that the GTF file must have the same chromosome name as in SQUID output, and must contain 3 attributes in the transcript record: transcript ID, gene ID, and gene symbol (or gene name).

OptionsDefault valueData typeDescription
--geneidgene_idstringGTF gene ID attribute string, the attribute name in GTF record that corresponds to the gene ID
--genesymbolgene_namestringGTF gene symbol attribute string, the attribute name in GTF record that corresponds to the gene symbol