2013-07-22

Easing the programming of distributed systems

One-line summary: control messages are useful

According to the paper entitled "Messaging Design Pattern and Pattern Implementation", the business of sending a message involves four (4) participants: 
  • Message Sender,
  • Message Recipient (Receiver),
  • Messenger, and
  • Message.
And also according to this document, a message can trigger a cascading effect.

In distributed systems such as Dynamo, Cassandra, or other similar key-value stores, the storage system is distributed. And just like Lustre, clients are distributed too.

A lot of use cases of these key-value storage systems is to prepare content for a customer, in general a web page or something like that. Once the content is prepared, it can be sent to the consumer and the process that prepared it can then discard its local copy of whatever was fetched from the key-value store. For the web, the client of these key-value stores (for example, the worker processes (or threads) of a web server) are not tightly coupled together.

In high performance computing, generating results from data assets can take hours, so it can be useful to allow and encourage tightly-coupled communication between the clients of the distributed key-value storage system (to distribute the data). And sometimes the distributed storage system is also bundled inside the clients (using the  so-called interlacement of communication and processing).

In some of these systems, the clients don't talk to each other without using the storage store. But if you go inside the implementation of a key-value store, I suppose that sometimes objects can move around or be copied for redundancy. So the actors inside a key-value distributed store probably discuss together.

In distributed programming, something that is cool is when a process can receive an order from an other process. An example can be the case where

      Process A asks process B to send item "kmers/19123890" to process C.

Here, process A does not want the item "kmers/19123890". Process A is just asking that the object be copied from process B to process C. Such a query implies that any process has the ability to reach out and talk to some other processes, which is not always the case.



Obviously, with virtual processes running on processes, you can have many such commanders giving orders.

In video games, scripting is used a lot so that not every person involved with game development has to be a C++ game engine guru. So this blog post is about an idea, about scripting a genomics storage engine, but with high performance computing in mind.

In Globus Online, the end user is the commander and dispatch commands such as

    Copy file /home/ernest/data/graph.txt from endpoint sebhtml#laptop to endpoint sebhtml#moon.

So such scripting (maybe with the interpreter design pattern) would be useful to reduce the difficulties implied in programming a software with the message passing interface (MPI).

I have not yet figured out the best way to integrate such a feature in RayPlatform­. In Ray, I added the interface CarriageableItem for objects that can be transported. So far, three (3) classes implement this:

Kmer,
GraphPath, and
GraphSarchResult.



Séb

No comments:

There was an error in this gadget