Showing posts from 2011

Qui est en charge ?

Construction d'un bâtiment

2012 commencera bientôt et le Centre de génomique de Québec est toujours en piètre état. Rénovation ici, bruits insupportables là. Et il devait être livré (ou délivré) en 2005. Il faut déjà réparer ses défectuosités majeures.

Selon le Journal de Québec, ce projet aura coûté (au moins) 22 M$.

L'ancien directeur du centre de recherche du CHUQ, Jean-Claude Forest, affirme que l'impact financier est en train d'être calculé.

Mais il est (un peu) tard pour les chercheurs.

Gestion informatique d'étudiants

Un autre échec est le système informatique Capsule de l'Université Laval -- un portail web qui gère (très mal) les informations des étudiants. Ce projet aura coûté 26 M$.

André Armstrong, le directeur du projet de modernisation des études, attribue son échec à la non-participation étudiante à des comités de suivi.

 Je trouve difficile à comprendre comment on peut laisser un projet ayant un aussi gros budget échouer autant à cause de sièges é…

Code review: What happens in Open-MPI's MPI_Iprobe ?

Code review: What happens in Open-MPI's MPI_Iprobe ?

Update 1

Subject: Re: [OMPI devel] Implementation of MPI_Iprobe From: George Bosilca (bosilca_at_[hidden]) Date: 2011-09-27 15:34:05

Your analysis is correct in case the checkpoint/restart approach maintained by ORNL is enabled. This is not the code path of the "normal" MPI processes, where the PML OB1 is used. In this generic case the function mca_pml_ob1_iprobe, defined in the file ompi/mca/pml/ob1/pml_ob1_iprobe.c is used.


End of update 1

The message-passing interface (MPI) standard defines an interface for passing messages between processes.

These processes are not necessarily running on the same physical computer.

Open-MPI is an implementation of the MPI standard.

Here, I have utilised openmpi-1.4.3 to find out what is happening when a call to MPI_Iprobe occurs.

According to the MPI 2.2 standard


The VirtualProcessor technology in Ray

The VirtualProcessor I developed enables any MPI rank to have thousands of workers working
on different tasks. In reality, only one worker can work at any given moment, but
the VirtualProcessor schedules fairly the workers on the only instruction pipeline
available so that is 1d not a problem at all.

The VirtualProcessor is a technology that make thousands of worker compute tasks
in parallel on a single MPI rank. Obviously, only one such worker is active at
any point, but they all get to work.

The idea is that when a worker pushes a message on the VirtualCommunicator, it has
to wait for a reply. And this reply may arrive later. The idea of the
VirtualProcessor is to easily submit communication-intensive tasks.

Basically, the VirtualCommunicator groups smaller messages into larger messages
to send fewer messages on the physical network.

But to achieve that, an easy way of generating a lot of small messages is
needed. This is the use of the VirtualProcessor.

== Implementation ==


Understand the main loop of message-passing-interface software

In video games, the main loop usually looks like this:

12345while(running){ processInput(); updateGameState(); drawScreen(); }


Message-passing-interface (MPI) software can be designed in a similar fashion.

Each message-passing-interface rank of an MPI software has its own message inbox and its own message outbox.

Like for emails, received messages go in the inbox and sent messages go in the outbox.

The main loop of an MPI rank usually looks like this:

I don't understand why Ray was not included in the paper Bioinformatics 27, 2031–2037

In the paper

Lin, Y. et al. Comparative studies of de novo assembly tools for next-generation sequencing technologies. Bioinformatics 27, 2031–2037 (2011).

Ray is not mentioned.

Why is that so ? 
We have been working on Ray for quite a while with an early prototype of the assembly engine called OpenAssembler (started in 2009-01-21).

We published Ray in 2010 in Journal of Computational Biology.

We have also presented Ray at a few places.

Cool facts about Ray:

- We are participating to Assemblathon 2.
- We assembled a genome on 512 compute cores in 18 hours, there were > 3 000 000 000 Illumina TruSeq paired reads (that is a lot of reads !)
- Ray is an high-performance peer-to-peer assembler
- Can assemble mixtures of technologies
- Is open source, licensed with the GNU GPL
- Ray is free.
- Works on POSIX systems (Linux, Mac, and others) and on Microsoft Windows
- Compiles cleanly with gcc and with Microsoft Visual Studio

- Ray utilises the message-passing interface

- Works well with …

More on virtual communication with the message-passing interface.

The message-passing interface (MPI) is a standard that allows numerous computers to communicate in order to achieve a large-scale peer-to-peer communication in a high-performance matter.

OK, let's face it, writing computer programs with MPI can be hard and tedious. However, one can devise a set of techniques that collectively enhance his/her software craftmanship.

I write C++ (1998) using MPI -- you already know that if you read my blog. In ray, I implemented a few tricks to make the message-passing thing a lot easier.

Slave modes and master modes

First, each MPI rank (a rank is usually mapped to a process, if that matters) has a slave mode and a master mode. A slave mode can be called SLAVE_MODE_COUNT_ENTRIES, which would obviously count entries. Master modes follow the same principle. If an MPI rank is in a state such that it must wait for others, than there is this very special slave mode called SLAVE_MODE_DO_NOTHING.

Each of these modes has an associated callback method. SLAVE_…

de novo assembly of Illumina CEO genome in 11.5 h

THE initial aims of our group regarding de novo assembly of genomes were:

1. To assemble genomes using mixes of sequencing technologies simultaneously.
2. To assemble large repeat-rich genomes.
3. To devise novel approaches to deal with repeats.

1. To assemble genomes using mixes of sequencing technologies simultaneously.

We showed the feasibility of using mixes of sequencing technologies simultaneously using Ray -- a de novo genome assembler -- see Journal of Computational Biology 17(11): 1519-1533. Ray follows the Single Program, Multiple Data approach. It is implemented using the Message-Passing Interface.

With this method, a computation is separated into data-dependent parts. Each part is given to a processor and any processor communicates with others to access remote information.

2. To assemble large repeat-rich genomes.

It was thought that message transit in Ray would interfere strongly with the feasibility of genome assembly of large repeat-rich genomes…

The Virtual Communicator

IT WAS a wintry day of January, in a coldly-tempered land. On this island lived peculiar citizens whose main everyday whereabouts involved producing goods and getting back some credits for the hard work. Arnie was one of them. He was cultivating vegetables, in a greenhouse in the cold season, obviously. He would prepare and send a shipment whenever his plums were ready. But most often his daily production, which departed his ranch daily, did not fill all of the available space provided by the Raw Communicator. The Raw Communicator is the entity whose personhood solely involves delivering the production of a worker's goods to the markets, and bringing back the gained assets to the farmer. And because he was so raw, the Raw Communicator would not even care of visiting other folks after visiting Arnie. He would directly transit to the markets. Afterwards, he would visit Arnie's first neighbour, return to the market, and so on.

Arnie, although being just a potent grower of greens,…