2012-12-04

Showcasing a pre-alpha version of Ray Cloud Browser

Dear genomic enthusiasts,

Explaining the de Bruijn graph -- or the de novo assembly process for that matter -- to people can be a daunting task. All biologists have a web browser ready to fire up at anytime. Furthermore, all modern browsers support HTML5 -- a way of making nice portable user interfaces.

Ray Cloud Browser is a visualizer for genomic data. But unlike classical genome browsers, Ray Cloud Browser is dynamic, and you can move things with energy if you want to.

The current version is really pre-alpha, but the hardest-to-implement core features are there. The client is in Javascript (ECMA script). The web server is Apache httpd, but any web server will do. The server-side application code is in C++ 1998, and runs in Apache httpd using the standard CGI 1.1 (Common Gateway Interface).

The stateful HTML5 client provides the graph layout engine, the physics engine, the rendering engine, the active-object engine, the communication engine. All the client code was written from scratch for efficiency. No external Javascript library is required.

The stateless server is in C++ 1998. It receives HTTP GET queries from clients in the form of:

/cgi-bin/RayCloudBrowser.webServer.cgi?tag=RAY_MESSAGE_TAG_GET_KMER_FROM_STORE&object=TCGTCTTCGTCTCGGCCATCGGCGTGACGCT&depth=512

RayCloudBrowser.webServer.cgi is the single executable that needs to deployed by the web server. The QUERY_TAG is what is provided to the executable RayCloudBrowser.webServer.cgi.

A mandatory parameter is the tag parameter, which is a message tag. Given the message tag, the program will do something in particular.

For instance,

http://ec2-54-242-197-219.compute-1.amazonaws.com/cgi-bin/RayCloudBrowser.webServer.cgi?tag=RAY_MESSAGE_TAG_GET_KMER_FROM_STORE&object=TCGTCTTCGTCTCGGCCATCGGCGTGACGCT&depth=1

gives you information about the requested object.

A readahead technology is also implemented and can be enabled by increasing the depth parameter.

For example:

http://ec2-54-242-197-219.compute-1.amazonaws.com/cgi-bin/RayCloudBrowser.webServer.cgi?tag=RAY_MESSAGE_TAG_GET_KMER_FROM_STORE&object=TCGTCTTCGTCTCGGCCATCGGCGTGACGCT&depth=512

gives at objects with a depth of at most 512 from the origin, which is the object provided with the object parameter. The maximum depth is 4096, so it's safe against denial of services.

Every communication between the client and the server is done in JSON, which is a standard that means JavaScript Object Notation. At any moment, there is a maximum number of active communication pipes between  the client and the server. The default is 8.


Now, more about the really sophisticated production server in Amazon EC2 (a free micro instance):

Physical memory (RAM):

[root@ip-10-194-103-146 cgi-bin]# head -n1 /proc/meminfo
MemTotal:         608740 kB

No swap partition:

[root@ip-10-194-103-146 cgi-bin]# cat /proc/swaps
Filename                Type        Size    Used    Priority


The full spefication from the cloud provider (Amazon Web Services, LLC) for a micro instance is:

613 MiB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
EBS-Optimized Available: No
API name: t1.micro


The stuff that the web server needs are:

At the moment, there is one data file:

[root@ip-10-194-103-146 cgi-bin]# file *
Database.dat:                  data
RayCloudBrowser.webServer.cgi: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped


[root@ip-10-194-103-146 cgi-bin]# ls -lh
total 2.6G
-rw-r--r-- 1 sebhtml sebhtml 2.6G Dec  2 06:48 Database.dat
-rwxr-xr-x 1 root    root     31K Dec  3 19:21 RayCloudBrowser.webServer.cgi


 With some magic, the 613-MB instance can server queries using a 2.6-GB data file.

1 comment:

Blogger said...

Bluehost is ultimately the best website hosting company for any hosting services you require.

There was an error in this gadget