Home

Awesome

SeqKit - a cross-platform and ultrafast toolkit for FASTA/Q file manipulation

<a href="https://doi.org/10.1002/imt2.191"><img src="seqkit2.jpg" alt="Subcommands of SeqKit2" width="700"/></a>

Features

Installation

Go to Download Page for more download options and changelogs, or install via conda:

conda install -c bioconda seqkit

Subcommands

CategoryCommandFunctionInputStrand-sensitivityMulti-threads
Basic operationseqTransform sequences: extract ID/seq, filter by length/quality, remove gaps…FASTA/Q
statsSimple statistics: #seqs, min/max_len, N50, Q20%, Q30%…FASTA/Q
subseqGet subsequences by region/gtf/bed, including flanking sequencesFASTA/Q+ or/and -
slidingExtract subsequences in sliding windowsFASTA/Q+ only
faidxCreate the FASTA index file and extract subsequences (with more features than samtools faidx)FASTA+ or/and -
translatetranslate DNA/RNA to protein sequenceFASTA/Q+ or/and -
watch Monitoring and online histograms of sequence featuresFASTA/Q
scat Real time concatenation and streaming of fastx filesFASTA/Q
Format conversionfq2faConvert FASTQ to FASTA formatFASTQ
fx2tabConvert FASTA/Q to tabular formatFASTA/Q
fa2fqRetrieve corresponding FASTQ records by a FASTA fileFASTA/Q+ only
tab2fxConvert tabular format to FASTA/Q formatTSV
convertConvert FASTQ quality encoding between Sanger, Solexa and IlluminaFASTA/Q
SearchinggrepSearch sequences by ID/name/sequence/sequence motifs, mismatch allowedFASTA/Q+ and -partly, -m
locateLocate subsequences/motifs, mismatch allowedFASTA/Q+ and -partly, -m
ampliconExtract amplicon (or specific region around it), mismatch allowedFASTA/Q+ and -partly, -m
fishLook for short sequences in larger sequencesFASTA/Q+ and -
Set operationsampleSample sequences by number or proportionFASTA/Q
rmdupRemove duplicated sequences by ID/name/sequenceFASTA/Q+ and -
commonFind common sequences of multiple files by id/name/sequenceFASTA/Q+ and -
duplicateDuplicate sequences N timesFASTA/Q
splitSplit sequences into files by id/seq region/size/parts (mainly for FASTA)FASTA preffered
split2Split sequences into files by size/parts (FASTA, PE/SE FASTQ)FASTA/Q
headPrint first N FASTA/Q recordsFASTA/Q
head-genomePrint sequences of the first genome with common prefixes in nameFASTA/Q
rangePrint FASTA/Q records in a range (start:end)FASTA/Q
pairPatch up paired-end reads from two fastq filesFASTA/Q
EditreplaceReplace name/sequence by regular expressionFASTA/Q+ only
renameRename duplicated IDsFASTA/Q
concatConcatenate sequences with same ID from multiple filesFASTA/Q+ only
restartReset start position for circular genomeFASTA/Q+ only
mutateEdit sequence (point mutation, insertion, deletion)FASTA/Q+ only
sanaSanitize broken single line FASTQ filesFASTQ
OrderingsortSort sequences by id/name/sequence/lengthFASTA preffered
shuffleShuffle sequencesFASTA preffered
BAM processingbamMonitoring and online histograms of BAM record featuresBAM
MiscellaneoussumCompute message digest for all sequences in FASTA/Q filesFASTA/Q
merge-slidesMerge sliding windows generated from seqkit slidingTSV

Notes:

Citation

  1. Wei Shen*, Botond Sipos, and Liuyang Zhao. 2024. SeqKit2: A Swiss Army Knife for Sequence and Alignment Processing. iMeta e191. doi:10.1002/imt2.191. <span class="__dimensions_badge_embed__" data-doi="10.1002/imt2.191" data-style="small_rectangle"></span>
  2. Wei Shen, Shuai Le, Yan Li*, and Fuquan Hu*. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLOS ONE. doi:10.1371/journal.pone.0163962. <span class="__dimensions_badge_embed__" data-doi="10.1371/journal.pone.0163962" data-style="small_rectangle"></span>

Contributors

Acknowledgements

We thank all users for their valuable feedback and suggestions. We thank all contributors for improving the code and documentation.

We appreciate Klaus Post for his fantastic packages ( compress and pgzip ) which accelerate gzip file reading and writing.

Contact

Create an issue to report bugs, propose new functions or ask for help.

License

MIT License

Starchart

<img src="https://starchart.cc/shenwei356/seqkit.svg" alt="Stargazers over time" style="max-width: 100%">