2013-09-26

Paperwork and learning about actors

Hello dear readers:

During the last months or so, I have been writing my doctoral thesis (now at version 2.0.0 !). Now it is on my director's desk. I hope he likes what I wrote. I edited my thesis to implement the changes he requested. I look forward for my initial deposit.

Aside from that, I formally applied for a postdoctoral appointment at Argonne National Laboratory (requisition #  321236 MCS). Indeed, I will start a postdoctoral appointment next year at Argonne National Laboratory. I am applying to get my own funding. I am submitting applications to these four programs (deadlines are in parentheses):

  • Fonds de recherche du Québec, Postdoctoral research scholarship (B3)   (2013-10-02),
  • Argonne National Laboratory Named Postdoctoral Fellowships (2013-10-15),
  • Banting Postdoctoral Fellowships (2013-10-23),
  • Canadian Institutes of Health Research Fellowship Awards (2013-11-15).
I made a page with the deadlines because I like lists and backlogs. As a matter of fact, I think I would be a good manager with all these management perks I developed.



Furthermore, my director delegated to me the task of preparing his application for the 2014 Compute Canada Resource Allocation call for proposals which is due on 2013-10-16. How wonderful !

I now have access to Titan (a Cray XK7). I will soon have access to Mira (a IBM Blue Gene/Q) too ! My tests on Titan are advancing well. I read a lot of papers lately to prepare adequately my proposals. I have learnt about the actor model, which was invented by Carl Hewitt, Peter Bishop, and Richard Steiger in 1973. Carl Hewitt explained actors in a video on Channel 9. In the end, the 1986 Ph.D. thesis of Gul Agha is probably the definite source of information for the actor model. Another good read is the 2003 Ph.D. thesis of Joe Amstrong who designed Erlang -- a concurrent functional language with actors. I bought the book Learn You Some Erlang for great good!. And also I wrote my first program in Erlang too !

I programmed a little bit lately, less than what I would prefer. I made a prototype framework that uses MPI and actors in C++. I was inspired by Padraig O Conbhui. However, in his implementation, the main loop in director.h iterates on every alive actors. From what I understand, an actor can only react when a message is received. It can not do anything otherwise. When receiving a message, an actor can (1) send messages, (2) spawn actors, or (3) change its behavior for the next message. I also added a -debug option to Ray, and fixed an issue (reported by to members of the Ray community) in the buggy Bloom filter. Finally, today I wrote a roadmap to modify Ray so that it uses the actor model.

But from now (2013-09-26) to Halloween (2013-10-31), I will basically postpone any task that is not related to paperwork (planning, writing, and editing) as I have too much paperwork on my back anyway.

I will end this post with a somehow funny fact. We have been working with Cray Inc. to test and run Ray on their Cray XE6 product. We published some results over the course of this project. Today, I found (I did not know it existed before today!) an Application Brief about Ray on the Cray XE6 that says "The Ray development team is led by Jacques Corbeil, a full professor in the molecular medicine department at the Faculty of Medicine at Université Laval." From my point of view, I think that I am the lead of the development team with my 2499 commits. But hey, I don't blame anyone for this sentence. I just think that the sentence is totally ill-written.

Sébastien

2013-09-06

Debugging a MPI application is sometimes like finding a needle in a haystack

I am running some tests as usual before releasing a new version of Ray. This time, I will be releasing Ray 2.3.0.

My job 10446216 on colosse (colossus in English) -- the supercomputer of Laval University operated by Calcul Québec (Compute Quebec in English) -- failed, and I did not know why.

All the jobs that run on colosse are automatically profiled by a formidable array of tools. And the nice thing is that I don't need to do anything fancy to get these runtime profiles.

All I need is my job identifier. With my job identifier, I then go to https://www.clumeq.ca/users/common/report/myjobs/10446216/0/#

Below, the runtime report automatically generated for me by Calcul Québec is shown. The job was a 512-core job, running on 64 8-core machines.
Just from the first figure, one can see that one of the machine crashed (the blue line). Such an event may be caused by a software bug in the software I used (in this case, a development version of Ray).

Basically, for each machine, I get 4 metrics sampled throughout the computation:


  • CPU usage, 
  • Memory usage, 
  • Input/Output usage, and 
  • Communication.

We can see that something strange happened to r101-n57 (# 19).

Usage status overview

Job status and usage overview

Informations

Job ID: 10446216
Task ID: 0
Project identifier (RAPI): nne-790-ac
Number of cores: 512
Queue: med
Wallclock: 11h 19m 24s
Submit time: 2013-09-04 16:06:50
Start time: 2013-09-04 20:36:48
End time: 2013-09-05 07:56:12
Submit script: Click to show
#PBS -W x=ENVREQUESTED:TRUE
#PBS -q med
#PBS -S /bin/bash
#PBS -N HiSeq-2500-NA12878-demo-2x150-1
#PBS -o HiSeq-2500-NA12878-demo-2x150-1.stdout
#PBS -e HiSeq-2500-NA12878-demo-2x150-1.stderr
#PBS -A nne-790-ac
#PBS -l walltime=02:00:00:00

#PBS -l nodes=64:ppn=8
 
 cd $PBS_O_WORKDIR

module use /rap/nne-790-ab/modulefiles
module load nne-790-ab/Ray/2.3.0-devel-b3e6b07764f71318408de5fbe632a41ae29c2105-1


mpiexec -n 512 \
Ray -k 31 \
-o HiSeq-2500-NA12878-demo-2x150-1 \
-read-write-checkpoints HiSeq-2500-NA12878-demo-2x150.SavedState \
-route-messages \
-detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \





2013-09-01

My future postdoctoral appointment

Next year, I will begin a postdoctoral appointment at Argonne National Laboratory under the leadership of Professor Rick Stevens.

Argonne National Laboratory has 3 Educational Programs for postdoctoral fellows:
  • Argonne Named Postdoctoral Fellowships
  • Director's Postdoctoral Fellowships
  • Division Postdoctoral Appointments

I suppose I will have a Division Postdoctoral Appointment.

According to glassdoor, the average salary is $66,458 !

This will be an upgrade from my current $30,000 / yr scholarship from the Canadian Institutes for Health Research.

Regardless, I will apply for these 3 postdoctoral fellowships:



Program
Organism
Address
Annual stipend ($ CAD)
Annual research allowance ($ CAD)
Maximum duration (years)
Deadline
Fellowships
Canadian Institutes of Health Research
https://www.researchnet-recherchenet.ca/rnr16/vwOpprtntyDtls.do?prog=1907
40 000
+5 000 if outside Canada
5 000
3
2013-11-15
BOURSES DE RECHERCHE POSTDOCTORALE LOUIS-BERLINGUET EN GÉNOMIQUE (B6)
Fonds de recherche du Québec


50 000
0
2
2013-10-05 (pour 2011, à confirmer)
Banting Postdoctoral Fellowships
Canadian Institutes of Health Research
http://banting.fellowships-bourses.gc.ca/home-accueil-eng.html
70 000
0
2
2013-10-23
BOURSES DE RECHERCHE POSTDOCTORALE (B3)
Fonds de recherche du Québec
http://www.fqrnt.gouv.qc.ca/bourses/Fiches_programmes/index_B3.htm
30 000
0
2
2013-10-02 16:00


I am looking forward to work at Argonne National Laboratory.

My expectations are that I will be programming in C++ and MPI to create super scalable applications to compute important solutions to major society problems.

I already have my email address at Argonne:

  Sébastien Boisvert


There was an error in this gadget