Comparing fastq compression with gzip, bzip2 and xz
Storage is expensive. Compression is a lossless approach to reduce the storage requirements.
Sébastien Boisvert
2012-11-05
Table 1: Comparison of compression methods on SRR001665_1.fastq -- 10408224 DNA sequences of length 36. Tests were on Fedora 17 x86_64 with a Intel Core i5-3360M processor and a Intel SSDSC2BW180A3 drive. Tests were not run in parallel. The time is the real entry from the time command. Each test was done twice.
Compression |
Time |
Size (bytes) |
none |
0 |
2085533064 (100%) |
time cat SRR001665_1.fastq | gzip -9 > SRR001665_1.fastq.gz |
7m31.519s 7m20.340s |
376373619 (18%) |
time cat SRR001665_1.fastq | bzip2 -9 >
SRR001665_1.fastq.bz2 |
3m12.601s 3m25.243s |
295000140 (14%) |
time cat SRR001665_1.fastq | xz -9 > SRR001665_1.fastq.xz |
32m45.933s |
257621508 (12%) |
Table 2: Decompression tests. Each
test was run twice.
Decompression |
Time |
time cat SRR001665_1.fastq.gz | gunzip > SRR001665_1.fastq |
0m14.612s 0m13.247s |
time cat SRR001665_1.fastq.bz2 | bunzip2 >
SRR001665_1.fastq |
1m3.412s 1m4.337s |
time cat SRR001665_1.fastq.xz | unxz > SRR001665_1.fastq |
0m24.194s 0m23.923s |
It is strange that bzip2 is faster than gzip for compression.
Comments