Comparing fastq compression with gzip, bzip2 and xz


Storage is expensive. Compression is a lossless approach to reduce the storage requirements.
Sébastien Boisvert
2012-11-05


Table 1: Comparison of compression methods on SRR001665_1.fastq -- 10408224 DNA sequences of length 36. Tests were on Fedora 17 x86_64 with a Intel Core i5-3360M processor and a Intel SSDSC2BW180A3 drive. Tests were not run in parallel. The time is the real entry from the time command. Each test was done twice.
Compression
Time
Size (bytes)
none
0
2085533064
(100%)
time cat SRR001665_1.fastq | gzip -9 > SRR001665_1.fastq.gz
7m31.519s
7m20.340s
376373619
(18%)
time cat SRR001665_1.fastq | bzip2 -9 > SRR001665_1.fastq.bz2
3m12.601s
3m25.243s
295000140
(14%)
time cat SRR001665_1.fastq | xz -9 > SRR001665_1.fastq.xz
32m45.933s



257621508
(12%)



Table 2: Decompression tests. Each test was run twice.
Decompression
Time
time cat SRR001665_1.fastq.gz | gunzip > SRR001665_1.fastq
0m14.612s
0m13.247s

time cat SRR001665_1.fastq.bz2 | bunzip2 > SRR001665_1.fastq
1m3.412s
1m4.337s

time cat SRR001665_1.fastq.xz | unxz > SRR001665_1.fastq
0m24.194s
0m23.923s


It is strange that bzip2 is faster than gzip for compression.

Comments

Popular posts from this blog

Le tissu adipeux brun, la thermogénèse, et les bains froids

My 2022 Calisthenics split routine

Adding ZVOL VIRTIO disks to a guest running on a host with the FreeBSD BHYVE hypervisor