No description
Find a file
2023-06-28 17:46:20 +08:00
benchmark Added readFQPtr to benchmark 2023-06-28 17:46:20 +08:00
readfq Update kseq.h 2023-06-13 15:04:15 +01:00
tests Renamed to nimreadfq 2019-11-22 16:52:55 +08:00
.gitignore Added Heng Li's native implementation 2023-06-28 17:21:31 +08:00
example.nim Renamed to nimreadfq 2019-11-22 16:52:55 +08:00
example.nim.README Minor cleanup 2019-11-22 23:36:06 +08:00
LICENSE Support for reading from stdin 2019-11-22 16:31:12 +08:00
readfq.nim Fixed nimble installation and linking issues when used as lib 2019-11-22 23:25:34 +08:00
readfq.nimble Fixed nimble installation and linking issues when used as lib 2019-11-22 23:25:34 +08:00
README.md Added readFQPtr to benchmark 2023-06-28 17:46:20 +08:00

Note

This repository was started before Heng Li wrote his article "Fast high-level programming languages", which contains a native Nim implementation (see klib below), which is just as fast as the implementation here (depending on whether you reuse memory or not) and could simply be used instead.

nimreadfq

A Nim wrapper for Heng Li's kseq/readfq, an efficient and fast parser for FastQ and Fasta files. nimreadfq supports reading of FastQ and Fasta files from stdin (use "-"), gzipped or flat files and is fast (see benchmark below).

The main function is readFQ(), an iterator that yields FQRecord(s). An alternative is readFQPtr(), which returns FQRecordPtr(s). The difference is that the latter uses ptr char instead of strings and is thus potentially faster but memory is reused during iterations.

See example.nim and tests/tester.nim for code examples.

The initial Nim integration (and hard work) was done by Haibao Tang as part of his bio-pipeline repo. Haibao generously granted full rights to his code base, after which I started this separate package called nimreadfq for integration into nimble.

Benchmark

nimreadfq is significantly faster than packages with similar functionality. Below are example timings for reading 5,682,010 sequences from M_abscessus_HiSeq.fq (source; see also ./benchmark/get_fq.sh) run on my MacBook Pro 2019:

fastq:

  • readfqPtr: 2.3s
  • klib: 7.0s
  • readfq: 7.6s
  • fastx: 39.6s
  • bioseq: 42.1s

fastq.gz:

  • readfq gz: 15.6s
  • klib gz: 15.8s
  • bioseq gz: 150.0s

How to reproduce results:

cd ./benchmark
nimble build
./benchmark