A list of interesting papers on optical mapping

At GYDLE inc., I work on optical mapping problems, and sometimes I find solutions to these problems. To tackle efficiently any problem, the error model for the problem must be understood. To write software that deals well with a given type of data, this type of data has to be understood, and one must be able to separate the good data from the bad data.

Once in a while, I read scientific papers to keep myself up to date with the optical mapping technologies. I created a Mendeley group where I maintain a list of papers on optical mapping. The list is available here. Some of these papers describe error models for optical mapping.


The Bioinformatics Adventure continues.

According to  the social network LinkedIn, I have been working on optical mapping problems at Gydle Inc. (with Philippe and Marc-Alexandre) for 8 months so far. I previously worked at Argonne National Laboratory (with Fangfang and Rick and other people).

The Bioinformatics Adventure continues. Yes, I am still doing bioinformatics. No, I no longer work on assemblers.

I have not worked on any genome assemblers for a while. As of now, I am more an aligner person than an assembler person (see "Sense from sequence reads: methods for alignment and assembly").

At Gydle Inc, we are 3 employees (the CEO, a computer scientist, and myself). Our CEO worked at Illumina at some point. Working in a small company is very fun. We get to do a lot of different things:

  • algorithm design,
  • software design,
  • software development,
  • software quality control,
  • test development,
  • ticket management,
  • data analysis,
  • data curation.

I schedule my time around activities related to the development of optical mapping software -- mainly an aligner and tools to parse / analyze the resulting alignments. We use g++ and clang++, CMake, and Qt. Right now, we are using the option -std=c++11 (short for C++ 2011).

Here is a list of Gydle optical tools.

  • gydle-optical-aligner
  • gydle-optical-alignment-aligner
  • gydle-optical-alignment-viewer
  • gydle-optical-asset-viewer
  • gydle-optical-checker
  • gydle-optical-extender
  • gydle-optical-filter
  • gydle-optical-joiner
  • gydle-optical-recaller
  • gydle-optical-splitter
  • gydle-optical-tester
  • gydle-optical-tiler
  • gydle-optical-validator
  • gydle-optical-xmap-comparator

Our main tool is gydle-optical-aligner. It is multi-threaded, and aligns sequences to references. The sequences are typically optical molecules and the references are optical maps.

Here are some examples of optical alignments (with the format of BioNano Genomics, and the format of Gydle).

SequenceName: Wheat_7AS_mol_FC1_00000170

XMAP alignment:

271 Wheat_7AS_mol_FC1_00000170 Wheat_7AS_map_00415 7715.8 143567.6 677658.2 812605.7 + 13.60 11M1D3M 156833.3 898716.1 1 (73,2)(74,3)(75,4)(76,5)(77,6)(78,7)(79,8)(80,9)(81,10)(82,11)(83,12)(85,13)(86,14)(87,15)

ALI alignments (1)

### GammaScore: 0.53
Wheat_7AS_map_00415 898256 96 Wheat_7AS_mol_FC1_00000170 156833 16 T 1 + 11M1D1G2g2M1e 0 134960 13266 0.66 11.36 16.36 73 677230 87 812189 2 7715 16 143567

==> The reference is the same.

Everyday, we need to fix the behavior of our software. This is achieved using a test suite with many test cases which in turn contain many assertions. So far, this approach is paying off because we rarely have regressions. However, there is of course a cost (human time) to maintain a test suite. I found the article Good Coding Practices written by Nicholas Nethercote very interesting. In it, the author introduces the concept of envelope of known behaviour. For the Gydle Inc optical mapping business case, our envelope of known behavior for our optical aligner mostly consists in the capability of aligning things to other things, within a range of scaling factors.

Even with its very small size, Gydle Inc. does have a culture. The tenets at Gydle Inc are not numerous, but they are important. I tabulated a list that I find to be true.

  1. The reference is always right, but the reference can evolve according to newly obtained evidence.
  2. Everything must have a name: sequences, assets, maps, references, contigs, and so on. We use these name to better communicate and describe what is going on.
  3. We always test something on a whole dataset and not on just a sample.
  4. The paradigms implemented in our aligners use several phases: indexing, searching, baby-HSP (high-scoring segment pair) generation, HSP sorting, HSP binning, HSP optimization, filtering.
  5. It is generally better to combine different data types to alleviate their respective weaknesses.
The number 2. is important. If you take, for example, the PDF (printable document format) specifications of the BNX, CMAP, and XMAP formats from the next-generation mapping (NGM) company BioNano Genomics, all the names of molecules, maps, or whatnot, are stored as integers (the type int). So it follows that you have to say things like:

"Hey Joe, how are you ? Can you align the molecule "1" on the map "1" and report back to me if you are getting anything good at all. So, say the mapping project used many flow cells, then you are possibly stuck with many optical assets with the same exact name. Yeah, you could have 3 molecules with the name "1". Adding a namespace, like "FC1:1", "FC2:1", and "FC3:1" promptly solves the problem, but this is not possible directly because, as stated above, the BNX, CMAP, and XMAP format specify that integers are to be used for the names.

If you look at Illumina sequences, they also have names that are not integers.

The Bioinformatics Adventure continues.


The Spark Notebook from creator Kate Matsudaira

I received my Spark Notebook. What is the Spark Notebook, you may ask.
The Spark Notebook combines form and function. The Spark Notebook project raised funding on Kickstarter too.

The way I see it, the Spark Notebook is an agenda (a 6-month agenda) with additional features. These features include (from the guide) the yearly planning pages, accomplishments, the monthly planning pages, the weekly planning pages, the inspiration pages, and the note pages. The monthly planning pages include something called the 30-day challenge. According to creator Kate Matsudaira, the 30-day challenge feature is useful to help start (or break) a habit.

The Spark Notebook comes with a guide. On this website, some of the text is white with a background image which makes it hard to read. For example, consider the screenshot below.

How did I learn about the existence of the Spark Notebook ?

One of my hobbies consists in watching videos on the Amazon Web Services (AWS) YouTube channel. I like these videos because they are usually easy to understand although the subject may be a bit abstract.

In particular, I watch the videos about AWS products. One example of this is the video called Introduction to Amazon S3 which explains in 3 minutes the general idea of the Internet storage service called S3 (Simple Storage Service).

The other types of videos that I like on the AWS YouTube channel are those about satisfied AWS customers. The customer experience videos are usually animated by AWS Chief Evangelist Jeff Barr.

One of the videos I watched was Kate Matsudaira, CTO of Decide.com. I like the customer experience videos because they usually mention what the customer is doing with AWS (value proposition / business model) and also the AWS building blocks that the customer is using to make things happen are also listed.

So, in that video, I got to know more about Decide.com. Then, I continued my adventure. I searched for Decide.com to try their offering. However, I was not able to do so because Decide.com was acquired in 2013.

The next logical step was to look for the next accomplishment of Kate Matsudaira because I was not able to experiment with Decide.com because it had been acquired. That's when I found popforms.

The company popforms provides "bite-size career development
for the modern leader" through online courses. These courses are called sparks. I found this interesting, because I think self-reflection is important for improving who we want to be.

This company (popforms) was also acquired by a bigger fish (Safari Books Online).

At that point, I was thinking that this entrepreneur probably has a secret sauce, and that perhaps she shared it in the form of a book. This is how I discovered the Spark Notebook.

Also, when I discussed the Spark Notebook concept with my significant other, she said that she has been reading Kate Matsudaira's blog for years. So, Kate Matsudaira is an entrepreneur, technologist, creator, and also a role model.

I think that the Spark Notebook is a great product.

Product: Spark Notebook
Price: $US 28.00
Purchase links: Manufacturer or Amazon
Score: 9/10

- innovative form, impressive function
- compact size
- self-contained / self-explanatory
- online guide at http://www.thesparknotebook.com/guide
- expensive for a 6-month agenda
- only 6 months


The convergence of HPC and cloud computing

An exascale computer is an imaginary system that can sustain one exaflops (10^18 floating point operations per second.) Such an object is needed in science and engineering, mostly for simulating virtual versions of objects found in the real world, such as proteins, planes, and cities. Important requirements for such a computer are 1) memory bandwidth, 2) floating point operation throughput, 3) low network latency, and so on.

2 of the many challenges for possibly having exascale supercomputers by 2020 are 1) improving fault-tolerance and 2) lowering energy consumption. (see "No Exascale for You!" An Interview with Berkeley Lab's Horst Simon).

One typical solution to implement fault tolerance in HPC is the use of the checkpoint/restart cycle whereas in most cloud technologies fault tolerance is instead implemented using different principles/abstractions such as load balancing and replication (see the CAP theorem). The checkpoint/restart can not work at the exa scale because there will almost always be a failing component at this scale. So, an exascale computation would need to survive such failures. In that regard, Facebook is a very large system that is fault-tolerant and that is based on cloud technologies rather than HPC.

The fact that fault tolerance has been figured out for a while now in cloud technologies allowed the cloud community to solve other important problems. One active area of development in cloud computing in 2015 has been without a doubt that of orchestration and provisioning. HPC is still making progress on solving the fault-tolerance problem in the HPC context.


A significant body of research output is coming endlessly from UC Berkeley's AMPLab and other research groups and also from Internet companies (Google, Facebook, Amazon, Microsoft, and others). The "cloud stack" (see all the Apache projects, like Spark, Mesos, ZooKeeper, Cassandra) is covering a significant part of today's market needs (datacenter abstraction, distributed databases, map-reduce abstractions at scale). What I mean here is that anyone can get started very quickly with all these off-the-shelf components, typically using high levels of abstractions (such as Spark's Resilient Distributed Datasets or RDD). Further, in addition to having these off-the-shelf building blocks available, they can be deployed very easily in various cloud environments, whereas this is rarely the case in HPC.

One observation that can be made is that HPC always want the highest processing speed, usually on bare metal. This low level of abstraction comes with the convenience that things are built on a very low number of abstractions (typically MPI libraries and job schedulers).

On the other hand, abstractions abound in the cloud world. Things are evolving much faster in the cloud than in HPC. (see "HPC is dying, and MPI is killing it").

But... I need a fast network for my HPC workflow

One thing that is typically associated to HPC and not with the cloud is the concept of having a very fast network. But this fast-network gap is closing, and the cloud is catching on in that regard. Recently, Microsoft added RDMA in Windows Azure. Thus, now the cloud technically offers a low latency (in microseconds) and high bandwidth (40 Gbps). This is no longer an exclusive feature of HPC.

The network is the computer

In the end, as Sun Microsystems's John Cage said, "The Network is the Computer."  The HPC stack is already converging to what is being found in the web/cloud/big data stack (see this piece). There are significant advances in cloud networking too (such as software-defined networks, convenient/automated network provisioning, and performance improvements. So, the prediction that can perhaps be made today is that HPC and cloud will no longer be 2 solitudes in a not-so-distant future. HPC will benefit from the cloud and vice-versa.

What the future hold in this ongoing convergence will be very exciting.


Daniel A. Reed, Jack Dongarra
Exascale Computing and Big Data
Communications of the ACM, Vol. 58 No. 7, Pages 56-68, 10.1145/2699414
This survey paper is very comprehensive and highlights how HPC (called exascale computing even though there is no operational exascale computer as of today) and cloud can meet at the crossroads.

Tiffany Trader
Fastest Supercomputer Runs Ubuntu, OpenStack
HPCwire  May 27, 2014

This article reports on a very large supercomputer that is running OpenStack instead of the classic HPC schedulers (like MOAB, SGE, Cobalt, Maui).

Jonathan Dursi
HPC is dying, and MPI is killing it
R&D computing at scale, 2015-04-03

This piece is a provocative, yet realistic, depiction of the current state of popularity of various hpc and cloud technologies (surveyed using Google Trends).


Profiling the Thorium actor model engine with LTTng UST

Thorium is an actor model engine in C 1999. It uses MPI and Pthreads.

The latency (in Thorium) when sending small messages between actors recently came to my attention.

In this post, LTTng-UST is used to generate actor message delivery paths annotated with time deltas in each step.


I have been working with perf for a while now, but found it only useful mostly for hardware counters.

I typically use the following command to record events with perf. Note that ($thread is the Linux LWP (lightweight process) thread number.

perf record -g \
    -e cache-references,cache-misses,cpu-cycles,ref-cycles,instructions,branch-instructions,branch-misses \
    -t $thread -o $thread.perf.data


 As far as I know, perf can not trace things like message delivery paths in userspace.

Tracing with LTTng-UST

This week, I started to read about tracepoints (perf does support "Tracepoint Events"). In particular, I wanted to use tracepoints to understand some erratic behaviors in Thorium.

LTTng-UST, the Linux Trace Toolkit Next Generation Userspace Tracer, has quite a long name.

A friend of mine (Francis Giraldeau) is a researcher in the field of tracing. He helped me getting started with LTTng-UST. I also got some help from the lttng-dev mailing list.

The data model for defining and storing tracepoint events (called CTF or Common Trace Format) is probably the big difference between LTTng and the other tracers. The LWN papers (part 1 and part 2) about LTTng are very interesting too and they discuss the CTF format.

According to the LTTng documentation (which is great by the way), tracing is done today in 3 easy steps:
  1. Instrument (add tracepoints in the source code);
  2. Trace (run the application while recording tracepoint events);
  3. Investigate (analyze tracepoint data using various techniques).

In the BIOSAL project, we already have (primitive) tracepoints, but the data gathered is basically analyzed (printed or discarded) in real time, which is probably a bad idea in the first place.

Regardless of where the tracepoint data go, we are using the same semantic as the one in LTTng-UST (LTTng was the inspiration). We insert a tracepoint in our code with something like this:

    /* trace message:actor_send events */
    thorium_tracepoint(message, actor_send, message);

Here is a little explanation: message (the first argument) is the provider, actor_send is the event name, and message (the third argument) is the data that we want to submit for tracing.

Adding my first tracepoint

To try LTTng, I added a define (#define thorium_tracepoint tracepoint) to use LTTng's tracepoint().

I also added a tracepoint in thorium_actor_send (actor.c) with the line below.

tracepoint(hello_world, my_first_tracepoint, name * name, "x^2");

I also added a rule for engine/thorium/tracepoints/lttng/hello-tp.o
in the Makefile (option: THORIUM_USE_LTTNG).

I then started the Spate application, and ran lttng list.

[boisvert@bigmem biosal]$ lttng list --userspace
Spawning a session daemon
UST events:

I ran it again since the LTTng daemon was not running.

[boisvert@bigmem biosal]$ lttng list --userspace
UST events:

PID: 23298 - Name: ./applications/spate_metagenome_assembler/spate
      ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      hello_world:my_first_tracepoint (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)

PID: 23300 - Name: ./applications/spate_metagenome_assembler/spate
      ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      hello_world:my_first_tracepoint (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)

PID: 23299 - Name: ./applications/spate_metagenome_assembler/spate
      ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      hello_world:my_first_tracepoint (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)

PID: 23297 - Name: ./applications/spate_metagenome_assembler/spate
      ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      hello_world:my_first_tracepoint (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)

Great. This is very simple to use.

Tracing inside a LTTng session

Now, I am ready to do actual tracing (with the tracepoint "hello_world:my_first_tracepoint"). The first thing to do now is to create a session.

[boisvert@bigmem biosal]$ lttng create my-userspace-session
Session my-userspace-session created.
Traces will be written in /home/boisvert/lttng-traces/my-userspace-session-20141017-154714

Then, I enable all userspace tracepoints for my session and start tracing.

[boisvert@bigmem biosal]$ lttng enable-event --userspace --all
All UST events are enabled in channel channel0

[boisvert@bigmem biosal]$ lttng start
Tracing started for session my-userspace-session

Then I run my application.

[boisvert@bigmem biosal]$ mpiexec -n 4 ./applications/spate_metagenome_assembler/spate -k 51 -threads-per-node 8 ~/dropbox/S.aureus.fasta.gz

Once my application terminates, I stopped tracing and destroyed my session.

[boisvert@bigmem biosal]$ lttng stop
Waiting for data availability.
Tracing stopped for session my-userspace-session
[boisvert@bigmem biosal]$
[boisvert@bigmem biosal]$ lttng destroy
Session my-userspace-session destroyed

Tracepoint data files were written in a directory in my home.

[boisvert@bigmem biosal]$ ls ~/lttng-traces/

There are now 135 tracepoint files (all related to "channel0" for this single LTTng tracepoint session).

[boisvert@bigmem biosal]$ find ~/lttng-traces/my-userspace-session-20141017-154714/|wc -l

So far, I did step 1) Instrument and step 2) Trace. The third and last step is 3) Investigate.

One easy way to inspect LTTng traces is a tool named babeltrace. By default, this tool dumps the events in ASCII text in the standard output.

[boisvert@bigmem ~]$ babeltrace ~/lttng-traces/my-userspace-session-20141017-154714/ | head
[15:50:56.347344203] (+?.?????????) bigmem hello_world:my_first_tracepoint: { cpu_id = 38 }, { my_string_field = "x^2", my_integer_field = -689379607 }
[15:50:56.347520299] (+0.000176096) bigmem hello_world:my_first_tracepoint: { cpu_id = 39 }, { my_string_field = "x^2", my_integer_field = -695379712 }
[15:50:56.347582196] (+0.000061897) bigmem hello_world:my_first_tracepoint: { cpu_id = 38 }, { my_string_field = "x^2", my_integer_field = -691379644 }
[15:50:56.347698898] (+0.000116702) bigmem hello_world:my_first_tracepoint: { cpu_id = 36 }, { my_string_field = "x^2", my_integer_field = -695379712 }
[15:50:56.347765853] (+0.000066955) bigmem hello_world:my_first_tracepoint: { cpu_id = 38 }, { my_string_field = "x^2", my_integer_field = -693379679 }
[15:50:56.348014813] (+0.000248960) bigmem hello_world:my_first_tracepoint: { cpu_id = 7 }, { my_string_field = "x^2", my_integer_field = -695379712 }
[15:50:56.348053162] (+0.000038349) bigmem hello_world:my_first_tracepoint: { cpu_id = 38 }, { my_string_field = "x^2", my_integer_field = -683379484 }
[15:50:56.348172916] (+0.000119754) bigmem hello_world:my_first_tracepoint: { cpu_id = 46 }, { my_string_field = "x^2", my_integer_field = -695379712 }
[15:50:56.348228416] (+0.000055500) bigmem hello_world:my_first_tracepoint: { cpu_id = 38 }, { my_string_field = "x^2", my_integer_field = -683379484 }
[15:50:56.348305385] (+0.000076969) bigmem hello_world:my_first_tracepoint: { cpu_id = 46 }, { my_string_field = "x^2", my_integer_field = -695379712 }

Let's create useful tracepoints now.

Concrete tracepoints in Thorium

Below is a list of tracepoints in the delivery path of any message in Thorium.

- thorium_message:actor_send
- thorium_message:worker_send
- thorium_message:worker_send_enqueue
- thorium_message:node_send
- thorium_message:node_send_system
- thorium_message:node_send_dispatch
- thorium_message:node_dispatch_message
- thorium_message:worker_pool_enqueue
- thorium_message:transport_send
- thorium_message:transport_receive
- thorium_message:node_receive
- thorium_message:worker_receive
- thorium_message:actor_receive

Here is a delivery trace for a message delivered within a node (suspicious time delta is shown in red).

[boisvert at bigmem biosal]$ babeltrace  ~/lttng-traces/auto-20141018-104939/ |grep "message_number = 8000," | awk '{print $4" "$2}'|sed 's/: / /g'|sed 's/(//g'|sed 's/)//g'
thorium_message:actor_send +0.000010835
thorium_message:worker_send +0.000000876
thorium_message:worker_enqueue_message +0.000001410
thorium_message:worker_dequeue_message +0.003886256  
thorium_message:worker_pool_dequeue +0.000001434
thorium_message:node_send +0.000001438
thorium_message:node_send_system +0.000001006
thorium_message:node_dispatch_message +0.000001231
thorium_message:worker_pool_enqueue +0.000001827
thorium_message:node_send_dispatch +0.000001100
thorium_message:worker_receive +0.000003944
thorium_message:actor_receive +0.000003028

A message sent between nodes also shows this high delta (4 ms).

[boisvert at bigmem biosal]$ babeltrace  ~/lttng-traces/auto-20141018-104939/ |grep "message_number = 8800," | awk '{print $4" "$2}'|sed 's/: / /g'|sed 's/(//g'|sed 's/)//g'
thorium_message:actor_send +0.004115537
thorium_message:worker_send +0.000001402
thorium_message:worker_enqueue_message +0.000001457
thorium_message:worker_dequeue_message +0.004062639  
thorium_message:worker_pool_dequeue +0.000001536
thorium_message:node_send +0.000000934
thorium_message:node_send_system +0.000001185
thorium_message:transport_send +0.000001039
thorium_message:node_receive +0.000061961
thorium_message:node_dispatch_message +0.000001502
thorium_message:worker_pool_enqueue +0.000002299
thorium_message:worker_receive +0.000001632
thorium_message:actor_receive +0.000003461


According to the LTTng traces, the way messages are dequeued in Thorium is not very efficient.

Edit 2014-10-18: added actual tracepoints
Edit 2014-10-18: fixed format


Profiling an high-performance actor application for metagenomics

I am currently in an improvement phase where I break, build and improve various components of the system.

The usual way of doing things is to have a static view of one node among all the nodes inside an actor computation. The graphs look like this:





But with 2048 nodes, the one single selected node may not be an accurate representation of what is going on. This is why, using Thorium profiles, we are generating 3D graphs instead. They look like this:






The public datasets from the DOE/JGI Great Prairie Soil Metagenome Grand Challenge

I am working on a couple of very large public metagenomics datasets from the Department of Energy (DOE) Joint Genome Institute (JGI). These datasets were produced in the context of the Grand Challenge program.

Professor Janet Jansson was the Principal Investigator for the proposal named Great Prairie Soil Metagenome Grand Challenge ( Proposal ID: 949 ).

Professor C. Titus Brown wrote a blog article about this Grand Challenge.
Moreover, the Brown research group published at least one paper using these Grand Challenge datasets (assembly with digital normalization and partitioning).

Professor James Tiedje presented the Great Challenge at the 2012 Metagenomics Workshop.

Alex Copeland presented interesting work at Sequencing, Finishing and Analysis in the Future (SFAF) in 2012 related to this Grand Challenge.

Jansson's Grand Challenge included 12 projects. Below I made a list with colors (one color for the sample site and one for the type of soil).

  1. Great Prairie Soil Metagenome Grand Challenge: Kansas, Cultivated corn soil metagenome reference core (402463)
  2. Great Prairie Soil Metagenome Grand Challenge: Kansas, Native Prairie metagenome reference core (402464)
  3. Great Prairie Soil Metagenome Grand Challenge: Kansas, Native Prairie metagenome reference core (402464) (I don't know why it's listed twice)
  4. Great Prairie Soil Metagenome Grand Challenge: Kansas soil pyrotag survey (402466)
  5. Great Prairie Soil Metagenome Grand Challenge: Iowa, Continuous corn soil metagenome reference core (402461)
  6. Great Prairie Soil Metagenome Grand Challenge: Iowa, Native Prairie soil metagenome reference core (402462)
  7. Great Prairie Soil Metagenome Grand Challenge: Iowa soil pyrotag survey (402465)
  8. Great Prairie Soil Metagenome Grand Challenge: Wisconsin, Continuous corn soil metagenome reference core (402460)
  9. Great Prairie Soil Metagenome Grand Challenge: Wisconsin, Native Prairie soil metagenome reference core (402459)
  10. Great Prairie Soil Metagenome Grand Challenge: Wisconsin, Restored Prairie soil metagenome reference core (402457)
  11. Great Prairie Soil Metagenome Grand Challenge: Wisconsin, Switchgrass soil metagenome reference core (402458)
  12. Great Prairie Soil Metagenome Grand Challenge: Wisconsin soil pyrotag survey (402456)

I thank the Jansson research group for making these datasets public so that I don't have to look further for large politics-free metagenomics datasets.

Table 1: number of files, reads, and bases in the Grand Challenge datasets. Most of the sequences are paired reads.
File count
Read count
Base count
Iowa_Continuous_Corn_Soil (details)
252 055 601 258196 708 830 076
Iowa_Native_Prairie_Soil (details)253 750 844 486326 986 888 235
Kansas_Cultivated_Corn_Soil (details)302 677 222 281272 276 185 410
Kansas_Native_Prairie_Soil (details)335 126 775 452597 933 511 278
Wisconsin_Continuous_Corn_Soil (details)181 912 865 700192 128 891 088
Wisconsin_Native_Prairie_Soil (details)202 098 317 886211 016 377 208
Wisconsin_Restored_Prairie_Soil (details)6347 778 67052 514 579 170
Wisconsin_Switchgrass_Soil (details)7448 382 76658 323 428 574
Total16418 417 788 4991 907 888 691 039

At Argonne we are using these datasets to develop a next-generation metagenomics assembler named "Spate" built on top of the Thorium actor engine. The word spate means a large number of similar things or events appearing or occurring in quick succession. With the actor model, every single message is an active message. Active messages are very neat and there is a lot of them with the actor model.

Similar posts: