Posts

Profiling an high-performance actor application for metagenomics

Image
I am currently in an improvement phase where I break, build and improve various components of the system. The usual way of doing things is to have a static view of one node among all the nodes inside an actor computation. The graphs look like this: 512x16 1024x16 1536x16 2048x16 But with 2048 nodes, the one single selected node may not be an accurate representation of what is going on. This is why, using Thorium profiles, we are generating 3D graphs instead. They look like this: 512x16 1024x16 1536x16 2048x16

The public datasets from the DOE/JGI Great Prairie Soil Metagenome Grand Challenge

I am working on a couple of very large public metagenomics datasets from the Department of Energy (DOE) Joint Genome Institute (JGI). These datasets were produced in the context of the Grand Challenge program. Professor Janet Jansson was the Principal Investigator for the proposal named Great Prairie Soil Metagenome Grand Challenge ( Proposal ID: 949 ). Professor C. Titus Brown wrote a blog article about this Grand Challenge . Moreover, the Brown research group published at least one paper using these Grand Challenge datasets ( assembly with digital normalization and partitioning ). Professor James Tiedje presented the Great Challenge at the 2012 Metagenomics Workshop. Alex Copeland presented interesting work at Sequencing, Finishing and Analysis in the Future (SFAF) in 2012 related to this Grand Challenge. Jansson 's Grand Challenge included 12 projects . Below I made a list with colors (one color for the sample site and one for t...

The Thorium actor engine is operational now, we can start to work on actor applications for metagenomics

I have been very busy during the last months. In particular, I completed my doctorate on April 10th, 2014 and we moved from Canada to the United States on April 15th, 2014. I started a new occupation on April 21st, 2014 at Argonne National Laboratory (a U.S. Department of Energy laboratory). But the biggest change, perhaps, was not one listed in the enumeration above. The biggest change was to stop working on Ray. Ray is built on top of RayPlatform, which in turn uses MPI for the parallelism and distribution. But this approach is not an easy way of devising applications because message passing alone is a very leaky, not self-contained, abstraction. Ray usually works fine, but it has some bugs . The problem with leaky abstractions is that they lack simplicity and are way too complex to scale out. For example, it is hard to add new code to an existing code base without breaking anything. This is the case because MPI only offers a fixed number of ranks. Sure, the MPI standard has s...

Is it required to use different priority in a high-performance actor system ?

I was reading a log file from an actor computation. In particular, I was looking at the outcome of a kmer counting computation performed with Argonnite, which runs on top of Thorium. Argonnite is an application in the BIOSAL project and Thorium is the engine of the BIOSAL project (which means that all BIOSAL applications run on top of Thorium). In BIOSAL, everything is an actor or a message. And these are handled by the Thorium engine. Thorium is a distributed engine. A computation with Thorium is distributed across BIOSAL runtime nodes. Each node has 1 pacing thread and 1 bunch of worker threads (for example, with 32 threads, you get 1 pacing thread and 31 workers). Each worker is responsible for a subset of the actors that live inside a given BIOSAL node. Obviously, you want each worker to have their own actors to keep every worker busy. Each worker has a scheduling queue with 4 priorities: max, high, normal, and low (these are the priority used by the Erlang ERTS called BE...

The actor model of computation

This video has Professor Carl Hewitt in it. Hewitt is the original creator of the concept of actors. Hewitt created the concept of actors in 1973 with Bishop, and Steiger. Everything is in this paper and it's 11 page. If you have little time, you can read only the bottom of page 12 of Gul Agha's PhD thesis. http://dspace.mit.edu/bitstream/handle/1721.1/6952/AITR-844.pdf   page 12, section "2.1.3 Actors" If you have more time: Baker was a PhD student with Hewitt. He and Hewitt wrote the "laws of actors" in 1977 (see below). Agha was also a PhD student with Hewitt and he reformulated the concepts in his thesis. After that, Joe Armstrong (at Ericsson) advanced the subject (he also did his PhD on the topic). Anyway, below are the 4 most important pieces of work in the literature (according to me). Hewitt, Bishop, Steiger 1973     Hewitt, C., Bishop, P. & Steiger, R. A universal modular ACTOR formalism for artificial intelligence. In Proceedings of the 3...

Early prototype of our work with actors for genomics

Our work on biosal (bsal) is advancing well. Here is an example of what you can do easily with the actor model. This actor machine emulator contains 4 biosal nodes, each with 24 threads and 23 workers (23 worker threads and 1 communication thread). The actor machine executes actors, and the behavior of actor is specified with scripts (in C 1999). The command I used on the supercomputer to launch the app: # 4 physical computers, with a fast interconnect # 4 bsal nodes # 24 threads per bsal node #       -d depth       Depth of each processor (number of threads) #       -N pes         PEs per node #       -n width       Number of processors needed aprun -n 4 -N 1 -d 24 biosal/example_controller -threads-per-node 24 datasets/Iowa_Continuous_Corn/* This is the standard output. This actor computat...