I don't understand why Ray was not included in the paper Bioinformatics 27, 2031

I don't understand why Ray was not included in the paper Bioinformatics 27, 2031–2037

- July 19, 2011

In the paper

Lin, Y. et al. Comparative studies of de novo assembly tools for next-generation sequencing technologies. Bioinformatics 27, 2031–2037 (2011).

Ray is not mentioned.

Why is that so ?
We have been working on Ray for quite a while with an early prototype of the assembly engine called OpenAssembler (started in 2009-01-21).

We published Ray in 2010 in Journal of Computational Biology.

We have also presented Ray at a few places.

Cool facts about Ray:

- We are participating to Assemblathon 2.
- We assembled a genome on 512 compute cores in 18 hours, there were > 3 000 000 000 Illumina TruSeq paired reads (that is a lot of reads !)
- Ray is an high-performance peer-to-peer assembler
- Can assemble mixtures of technologies
- Is open source, licensed with the GNU GPL
- Ray is free.
- Works on POSIX systems (Linux, Mac, and others) and on Microsoft Windows
- Compiles cleanly with gcc and with Microsoft Visual Studio

- Ray utilises the message-passing interface

- Works well with Open-MPI or MPICH2 -- the 2 main open source implementations of the MPI standard.

- Ray does very few assembly errors.

- Ray is a single executable called Ray
- Implemented in C++
- Ray is object-oriented
- Ray is modular
- Ray is scalable

- Ray is easy to use.

- All the code is on github

I think the paper should have compared Ray with the other assemblers...

Sébastien

Comments

Torsten Seemann said…

Sebastian, as you know I supported the development of Ray in the early days as I saw the need for a de novo assembler with a more scalable architecture. So I hope you take my comments the right way. So here are my suggestions as to why maybe Ray was not included:

1. Ray was published in J.Comp.Bio - this is not a major bioinformatics journal, and is not read as widely, and only has impact factor 1.6 and hence a lower audience.

2. Ray, due to its (necessary) use of MPI, is much more difficult to get up and running properly. The earlier versions of Ray often deadlocked and didn't have much verbose output to explain what was happening. The truth is, there is so much software out there to test, that if it fails first time, people often don't go back.

3. You have always claimed it works with a mixture of technologies, but I am still not convinced that it works with 454 or Ion Torrent data alone, especially as you increase k.

5. It has not been used in a highly cited journal paper to assemble a major genome. This is where the publicity really kicks in!

If you do well in Assemblathon2, then I expect your publicity will rise, and Ray will be used more and more. You should contact IBM about optimising it for the BlueGene architecture.

Sunday, July 31, 2011 at 2:58:00 AM EDT

Search This Blog

DSKernel: AI and Strength Training

I don't understand why Ray was not included in the paper Bioinformatics 27, 2031–2037

Comments

Popular posts from this blog

The Thorium actor engine is operational now, we can start to work on actor applications for metagenomics

Learning to solve the example 1 of puzzle 3aa6fb7a in the ARC prize

The source code of SOAPdenovo2 sits in the shadows