I don't understand why Ray was not included in the paper Bioinformatics 27, 2031–2037

In the paper

Lin, Y. et al. Comparative studies of de novo assembly tools for next-generation sequencing technologies. Bioinformatics 27, 2031–2037 (2011).

Ray is not mentioned.

Why is that so ? 
We have been working on Ray for quite a while with an early prototype of the assembly engine called OpenAssembler (started in 2009-01-21).

We published Ray in 2010 in Journal of Computational Biology.

We have also presented Ray at a few places.

Cool facts about Ray:

- We are participating to Assemblathon 2.
- We assembled a genome on 512 compute cores in 18 hours, there were > 3 000 000 000 Illumina TruSeq paired reads (that is a lot of reads !)
- Ray is an high-performance peer-to-peer assembler
- Can assemble mixtures of technologies
- Is open source, licensed with the GNU GPL
- Ray is free.
- Works on POSIX systems (Linux, Mac, and others) and on Microsoft Windows
- Compiles cleanly with gcc and with Microsoft Visual Studio

- Ray utilises the message-passing interface

- Works well with Open-MPI or MPICH2 -- the 2 main open source implementations of the MPI standard.

- Ray does very few assembly errors.

- Ray is a single executable called Ray
- Implemented in C++
- Ray is object-oriented
- Ray is modular
- Ray is scalable

- Ray is easy to use.

- All the code is on github

I think the paper should have compared Ray with the other assemblers...



Torsten Seemann said…
Sebastian, as you know I supported the development of Ray in the early days as I saw the need for a de novo assembler with a more scalable architecture. So I hope you take my comments the right way. So here are my suggestions as to why maybe Ray was not included:

1. Ray was published in J.Comp.Bio - this is not a major bioinformatics journal, and is not read as widely, and only has impact factor 1.6 and hence a lower audience.

2. Ray, due to its (necessary) use of MPI, is much more difficult to get up and running properly. The earlier versions of Ray often deadlocked and didn't have much verbose output to explain what was happening. The truth is, there is so much software out there to test, that if it fails first time, people often don't go back.

3. You have always claimed it works with a mixture of technologies, but I am still not convinced that it works with 454 or Ion Torrent data alone, especially as you increase k.

5. It has not been used in a highly cited journal paper to assemble a major genome. This is where the publicity really kicks in!

If you do well in Assemblathon2, then I expect your publicity will rise, and Ray will be used more and more. You should contact IBM about optimising it for the BlueGene architecture.

Popular posts from this blog

Adding ZVOL VIRTIO disks to a guest running on a host with the FreeBSD BHYVE hypervisor

Changing the capacity of each VDEV in a ZPOOL without losing data and no downtime with ZFS

Le tissu adipeux brun, la thermogénèse, et les bains froids