2012-11-06

Comparing fastq compression with gzip, bzip2 and xz


Storage is expensive. Compression is a lossless approach to reduce the storage requirements.
Sébastien Boisvert
2012-11-05


Table 1: Comparison of compression methods on SRR001665_1.fastq -- 10408224 DNA sequences of length 36. Tests were on Fedora 17 x86_64 with a Intel Core i5-3360M processor and a Intel SSDSC2BW180A3 drive. Tests were not run in parallel. The time is the real entry from the time command. Each test was done twice.
Compression
Time
Size (bytes)
none
0
2085533064
(100%)
time cat SRR001665_1.fastq | gzip -9 > SRR001665_1.fastq.gz
7m31.519s
7m20.340s
376373619
(18%)
time cat SRR001665_1.fastq | bzip2 -9 > SRR001665_1.fastq.bz2
3m12.601s
3m25.243s
295000140
(14%)
time cat SRR001665_1.fastq | xz -9 > SRR001665_1.fastq.xz
32m45.933s



257621508
(12%)



Table 2: Decompression tests. Each test was run twice.
Decompression
Time
time cat SRR001665_1.fastq.gz | gunzip > SRR001665_1.fastq
0m14.612s
0m13.247s

time cat SRR001665_1.fastq.bz2 | bunzip2 > SRR001665_1.fastq
1m3.412s
1m4.337s

time cat SRR001665_1.fastq.xz | unxz > SRR001665_1.fastq
0m24.194s
0m23.923s


It is strange that bzip2 is faster than gzip for compression.
There was an error in this gadget