Home

Awesome

giab_data_indexes

This repository contains data indexes from NIST's Genome in a Bottle (GIAB) project. The indexes for sequences and alignments are also available under: https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_indexes .

<br /> <strong>AshkenazimTrio</strong><br /> <br /> Son:HG002 &nbsp; &nbsp; <sub>https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/ </sub><br> Father:HG003&nbsp; &nbsp; <sub> https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG003_NA24149_father/ </sub><br> Mother:HG004 &nbsp; &nbsp; <sub> https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG004_NA24143_mother/ </sub> <br /> <br />
<sub>Sequencing Platform</sub><sub>Sequence</sub><sub>Alignment</sub>
<sub>Illumina WGS 2x150bp 300X per individual </sub><sub>All     HG002     HG003     HG004 </sub><sub> novoalign:   All     HG002     HG003     HG004</sub>
<sub>Illumina 6KB Matepair </sub><sub>All     HG002    HG003     HG004 </sub><sub> bwamem:hg19   All     HG002     HG003     HG004 </sub>
<sub>Illumina WGS 2X250bp </sub><sub>All     HG002     HG003     HG004 </sub><sub> isaac:hg19   All     HG002     HG003     HG004 <br> novoalign:   All    HG002     HG003     HG004 </sub>
<sub>Moleculo</sub><sub>All     HG002     HG003     HG004 </sub><sub> </sub>
<sub>Illumina Whole Exome</sub><sub> - </sub><sub>bwamem:hg19   All     HG002    HG003     HG004</sub>
<sub>SOLiD 60x for son</sub><sub>All     HG002 </sub><sub>LifeScope:hg19   All     HG002 </sub>
<sub>CompleteGenomics</sub><sub> - </sub><sub>CGAtools:hg19   All     HG002     HG003     HG004 </sub>
<sub>Ion Proton 1000x Exome</sub><sub> - </sub><sub>TMAP:hg19   All     HG002     HG003     HG004 </sub>
<sub>10X Genomics</sub><sub> - </sub><sub>bwamem:hg19   All     HG002     HG003     HG004 </sub>
<sub>10X Genomics ChromiumGenome</sub><sub>All     HG002 </sub><sub>LongRanger2.0:hg19   All     HG002     HG003     HG004 </sub>
<sub>BioNano</sub><sub>All:bnx     HG002:bnx     HG003:bnx     HG004:bnx </sub><sub> All:cmap     HG002     HG003     HG004 </sub>
<sub>PacBio 70x/30x/30x</sub><sub>All     HG002     HG003     HG004 <br >All:hdf5     HG002     HG003     HG004 </sub><sub> NGMLR:hg19   All     HG002     HG003     HG004 <br > minimap2:   All     HG002     HG003     HG004 </sub>
<sub>PacBio CCS 10kb</sub><sub>All     HG002 </sub><sub>pbmm2:hg19   All     HG002 </sub>
<sub>PacBio CCS 11kb</sub><sub> All     HG002 </sub><sub> pbmm2:hg19   All     HG002</sub>
<sub>PacBio CCS 15kb</sub><sub>All     HG002 </sub><sub>pbmm2:hg19   All     HG002 </sub>
<sub>PacBio CCS 15kb_20kb chemistry2</sub><sub> All     HG002 </sub><sub> pbmm2:   All     HG002     HG003     HG004 </sub>
<sub>Oxford Nanopore 2D</sub><sub>All     HG002 </sub><sub> - </sub>
<sub>Oxford Nanopore ultralong (guppy-V3.2.4_2020-01-22) </sub><sub>All     HG002 </sub><sub> minimap2:whatshap:hg19   All     HG002 </sub>
<sub>Oxford Nanopore ultralong Promethion</sub><sub> All     HG002     HG003     HG004 </sub><sub> - </sub>
<sub>BGI BGISEQ500</sub><sub> All     HG002 </sub><sub> - </sub>
<sub>BGI MGISEQ PCR-free</sub><sub> All     HG002 </sub><sub> - </sub>
<sub>BGI stLFR</sub><sub> All     HG002     HG003     HG004 </sub><sub> All:bwamem:hg19     HG002     HG003     HG004</sub>
<sub>Strand-Seq HG002 by BCCRC</sub><sub> All     HG002 </sub><sub> - </sub>

<sub> * CompleteGenomics LFR raw or alignment data not available, but analysis results available under: https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/CompleteGenomics_newLFR_CGAtools_06122015/ </sub> <br /> <br /> <br /> <strong>ChineseTrio</strong><br /> <br /> Son:HG005     <sub> https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG005_NA24631_son/ </sub><br> Father:HG006     <sub> https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG006_NA24694-huCA017E_father/ </sub><br> Mother:HG007     <sub> https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG007_NA24695-hu38168_mother/ </sub> <br />

<sub>Sequencing Platform</sub><sub>Sequence</sub><sub>Alignment</sub>
<sub>Illumina WGS 2x250bp 300X for son; <br /> 2x150bp 100x for parents</sub><sub> All     HG005     HG006     HG007 </sub><sub> novoalign:   All:hg19-hg38     HG005:hg19-hg38     HG006:hg19-hg38     HG007:hg19-hg38 </sub>
<sub>Illumina 6KB Matepair </sub><sub>All     HG005     HG006     HG007</sub><sub> </sub>
<sub>Moleculo</sub><sub>All     HG005     HG006     HG007</sub><sub> </sub>
<sub>SOLiD 60x for son</sub><sub>All:xsq     HG005:xsq </sub><sub>LifeScope:   All:hg19     HG005:hg19</sub>
<sub>CompleteGenomics </sub><sub> </sub><sub>CGAtools: All:hg19 (RMDNA)     HG005:hg19     HG006:hg19     HG007:hg19<br /> CGAtools: All:hg19 (cellsDNA)     HG005:hg19</sub>
<sub>Illumina Whole Exome</sub><sub> </sub><sub>bwamem:   All:hg19     HG005:hg19 </sub>
<sub>Ion Proton 1000x Exome </sub><sub> </sub><sub>TMAP:   All:hg19     HG005:hg19</sub>
<sub>BioNano for son </sub><sub>All:bnx     HG005:bnx </sub><sub> All:hg19 (cmap)     HG005:hg19 (cmap) </sub>
<sub>PacBio Sequel for the trio</sub><sub>All     HG005     HG006     HG007 </sub><sub> </sub>
<sub>PacBio SequelII CCS 11kb</sub><sub> <br /> </sub><sub> </sub>
<sub>BGI BGISEQ500, MGISEQ, stLFR</sub><sub> <br /> </sub><sub> </sub>
<br /> <br /> <strong>NA12878</strong><br /> <br /> NA12878:HG001 &nbsp; &nbsp; <sub> https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/ </sub> <br /> <br />
<sub>Sequencing Platform</sub><sub>Sequence</sub><sub>Alignment</sub>
<sub>Illumina WGS 2x150bp 300X </sub><sub>HG001</sub><sub>bwamem:   HG001:hg19 (downsampled30x)</sub><br /> <sub> novoalign:   HG001</sub>
<sub>Illumina HiSeq Exome</sub><sub>HG001 <br /> HG001:trimmed_fastq </sub><sub>bwamem:   HG001:hg19</sub>
<sub>Illumina TruSeq Exome</sub><sub> </sub><sub>bwamem:   HG001:hg19</sub>
<sub>10X Genomics</sub><sub> </sub><sub>bwamem:   HG001:hg19 <br /> bwamem:   HG001:hg19 (size_selected)</sub>
<sub>10X Genomics ChromiumGenome</sub><sub> </sub><sub>LongRanger2.0:   HG001:hg19-hg38 <br /> LongRanger2.1:   HG001:hg19-hg38</sub>
<sub>CompleteGenomics</sub><sub> </sub><sub>CGAtools:   HG001:hg19 </sub>
<sub>Ion Proton 1000x Exome</sub><sub> </sub><sub>TMAP:   HG001:hg19</sub>
<sub>NA12878 SOLiD5500W</sub><sub> </sub><sub>LifeScope:   HG001:hg19</sub>
<sub>BGI BGISEQ500, MGISEQ, stLFR</sub><sub><br /> </sub><sub> </sub>
<sub>PacBio 40x</sub><sub>HG001:hdf5 </sub><sub> </sub>
<sub>PacBio SequelII CCS 11kb</sub><sub><br /> </sub><sub> </sub>
<sub>Ultralong_OxfordNanopore</sub><sub> - <br /> </sub><sub>minimap2:   HG001</sub>
<br /> <br /> <br /> <strong>Please Note:</strong><br /> <sub>1. If you want to use raw sequencing data (fastq, fasta, hdf5, xsq, bnx etc) for your analysis, then you can use the sequence.index.* files when you need to download the data.</sub> <br /> <sub>2. If you want to use aligned data (bam, xmap/cmap etc.) for your analysis, then you can use the alignment.index.* files when you need to download the data.</sub>