Whole genome assembly from 454 sequencing output via modified DNA graph concept
Computational Biology and Chemistry doi:10.1016/j.compbiolchem.2009.04.005 (2009)
The human genome project was a scientific success which allowed bioinformatics to grow. During this project, only Sanger sequencing was in action. Recently, with pyrosequencing, however, new platforms are emerging, and they provide much more data at lower cost, in a few hours. This data storm has prompted the need for novel assembly algorithms.
The authors provide a new computational framework for genome assembly -- SR-ASM (Short Reads ASseMbly). They utilize the 'recently available' Roche/454 technology, released in 2005. They evaluate their tool against Velvet and Newbler. Velvet is designed solely for Illumina (the authors say it runs on 454), whereas Newbler is sold along with the 454 sequencer from Roche. The authors say Newbler can not load fasta files, but it can. With this argument, they avoided the need to compare their tool with Roche's software -- the mighty Newbler. Roche/454 is a commercial success, mainly because of Newbler. The authors write that Phrap is not 454-aware; this is incorrect. The 454 sequencer takes 7 h to read DNA, Newbler assembles reads in about 15 minutes. Meanwhile, SR-ASM runs in only 80 hours.
Tests were done on the Prochlorococcus marinus 1.84 Mbp genome, and a 11717 nt segment of human chromosome 15. The authors write that a lack of coverage in 454 experiments causes assembly gaps. In principle, this is incorrect -- repeated elements are most likely to be responsible because novel sequencing technologies offer large depth and breadth of coverage. In their paper, tables are incomprehensible, and they don't capture the true value of assemblies. The authors only check for contig lengths, they don't assess the assembly errors.
In summary, the paper is hard to read, not scientific, and the novelty is very weak, if not inexistant.