Home

Awesome

Genotyping imputation : Pipeline V1.0

A nextflow pipeline to realise a dataset's genotyping imputation

CircleCI Docker Hub https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg DOI

Workflow representation

Description

The pipeline used to perform the imputation of several targets datasets processed with standard input.

Here is a summary of the method :

See the Usage section to test the full pipeline with your target dataset.

Dependencies

The pipeline works under Linux distributions.

  1. This pipeline is based on nextflow. As we have several nextflow pipelines, we have centralized the common information in the IARC-nf repository. Please read it carefully as it contains essential information for the installation, basic usage and configuration of nextflow and our pipelines.

  2. External software:

  1. File to download :
  1. Other to know :

You can avoid installing all the external software of the main scritp by only installing Docker. See the IARC-nf repository for more information.

Input

TypeDescription
Plink datasetsCorresponds to the target dataset to be analysed. Composed by the following files : bed, bim & fam
Input environmentPath to your input directory

Parameters

NameExample valueDescription
--targetmy_targetPattern of the target dataset which do the link with the file .bed/.bim./fam for plink
--inputuser/main_data/The path of the main directory where we can find 2 directory : my_target/ + files/
--outputuser/my_result/The path of the main directory where you want to place your results
NameDefault valueDescription
--scriptmy/directory/script/binThe path of the bin script directory, to be able to run the annexe programme grom the pipeline
--geno10.03First genotyping call rate plink threshold, apply in the full target dataset
--geno20.03Second genotyping call rate plink threshold, apply in the target dataset divide by population
--maf0.01Minor allele frequencies plink threshold, apply in the full target dataset
--pihat0.185Minimum pi_hat value use for the relatedness test, 0.185 is halfway between the expected IBD for third- and second-degree relatives
--hwe1e-8Hardy-Weinberg Equilibrium plink p-value threshold
--legendALL.chr_GRCh38.genotypes.20170504.legendFile to use as .legend
--fastaGRCh38_full_analysis_set_plus_decoy_hla.faFile to use as fasta reference
--chainhg18ToHg38.over.chainFile to use as liftover conversion
--VCFrefmy/directory/ref/vcf/Directory to use as VCF reference
--BCFrefmy/directory/ref/bcf/Directory to use as BCF reference
--M3VCFrefmy/directory/ref/m3vcf/Directory to use as M3VCF reference
--conversionhg38/hg18/hg19Option to convert data from hg18 to HG38 version of the genome. Standard value is hg38
--cloudhg38/hg18/hg19Option to convert data from hg18 to HG38 version of the genome. Standard value is hg38
--token_Michighanpath/to/my_token.txtOption to convert data from hg18 to HG38 version of the genome. Standard value is hg38
--token_TOPMedpath/to/my_token.txtOption to convert data from hg18 to HG38 version of the genome. Standard value is hg38
--QC_cloudmy/directory/donwload_imputation_serverOption to convert data from hg18 to HG38 version of the genome. Standard value is hg38

Flags are special parameters without value.

NameDescription
--helpDisplay help

Usage

  1. Prepare the environment to run the imputation pipeline.
mkdir data
cd data
nextflow run IARCbioinfo/Imputation-nf/bin/Preparation.nf --out /data/
  1. Paste the bim/bed/fam plink target files in a directory, and the directory in your "data/" directory. You have to call the plink files and your directory with the same pattern, as the following exemple : data/target/target{.bed,.bim,.fam}. So now you have 2 directories in your "data/" repertory :

_ data/my_target/ : with the plink target files (my_target.bed, my_target.bim, my_target.fam).

_ data/files/ : with all the dependencies.

  1. Run the imputation pipeline.
nextflow run IARCbioinfo/Imputation.nf --target my_target --input /data/ --output /results/ -r v1.0 -profile singularity 
  1. If you want to run the imputation in one of the server (Michigan and/or TOPMed Imputation), you need you write your token acces in a file and to give it in argument. For example :
nextflow run IARCbioinfo/Imputation.nf --target my_target --input /data/ --output /results/ --cloud on --token_Michighan /folder/my_token_Michighan.txt --token_TOPMed /folder/my_token_TOPMed.txt -r v1.0 -profile singularity 

Once your imputation data is downloaded, you can run the end of the QC analysis :

nextflow run IARCbioinfo/Imputation.nf --target my_target --input /data/ --output /results/ --QC_cloud /downloaded_imputation_server_file/ -r v1.0 -profile singularity 

Output

TypeDescription
output1......
output2......

Detailed description (optional section)

...

Directed Acyclic Graph

DAG

Contributions

NameEmailDescription
Gabriel Auréliegabriela@students.iarc.frDeveloper to contact for support
Lipinski BorisLipinskiB@students.iarc.fr / boris.lipinski@etu.univ-lyon1.frDeveloper to contact for support

References (optional)

FAQ (optional)

test-pipeline