Posts

Showing posts from August, 2014

Profiling an high-performance actor application for metagenomics

Image
I am currently in an improvement phase where I break, build and improve various components of the system.


The usual way of doing things is to have a static view of one node among all the nodes inside an actor computation. The graphs look like this:

512x16

1024x16


1536x16


2048x16



But with 2048 nodes, the one single selected node may not be an accurate representation of what is going on. This is why, using Thorium profiles, we are generating 3D graphs instead. They look like this:


512x16

1024x16

1536x16

2048x16

The public datasets from the DOE/JGI Great Prairie Soil Metagenome Grand Challenge

I am working on a couple of very large public metagenomics datasets from the Department of Energy (DOE) Joint Genome Institute (JGI). These datasets were produced in the context of the Grand Challenge program.

Professor Janet Jansson was the Principal Investigator for the proposal named Great Prairie Soil Metagenome Grand Challenge ( Proposal ID: 949 ).


Professor C. Titus Brown wrote a blog article about this Grand Challenge.
Moreover, the Brown research group published at least one paper using these Grand Challenge datasets (assembly with digital normalization and partitioning).

Professor James Tiedje presented the Great Challenge at the 2012 Metagenomics Workshop.

Alex Copeland presented interesting work at Sequencing, Finishing and Analysis in the Future (SFAF) in 2012 related to this Grand Challenge.



Jansson's Grand Challenge included 12 projects. Below I made a list with colors (one color for the sample site and one for the type of soil).

Great Pra…

The Thorium actor engine is operational now, we can start to work on actor applications for metagenomics

I have been very busy during the last months. In particular, I completed my doctorate on April 10th, 2014 and we moved from Canada to the United States on April 15th, 2014. I started a new occupation on April 21st, 2014 at Argonne National Laboratory (a U.S. Department of Energy laboratory).

But the biggest change, perhaps, was not one listed in the enumeration above. The biggest change was to stop working on Ray. Ray is built on top of RayPlatform, which in turn uses MPI for the parallelism and distribution. But this approach is not an easy way of devising applications because message passing alone is a very leaky, not self-contained, abstraction. Ray usually works fine, but it has some bugs.

The problem with leaky abstractions is that they lack simplicity and are way too complex to scale out.

For example, it is hard to add new code to an existing code base without breaking anything. This is the case because MPI only offers a fixed number of ranks. Sure, the MPI standard has some fea…