2013-02-22

Big milestone reached for Ray Cloud Browser

It's almost March, and yet another milestone for Ray Cloud Browser was successfully reached this week. The data model of this software is composited of 4 types of objects: a map (a DNA kmer graph with a name), a section (a group of DNA sequences which are called regions), a region (a DNA sequence), and a location (a position in a region).

Ray Cloud Browser is a distributed application: some parts run in your browser, and some other parts run in the cloud (or your other favorite place to host your infrastructure). The client is in Javascript and HTML5 and runs in a web browser. The web service is in C++ and runs atop a web server.

The web services is implemented in C++ and is really efficient. There are 3 file binary file formats (with ASCII version that can be converted). The first is the map, which contains all the k-mers of a sample, their coverage, their parents and their children. Any k-mer can obtained in a logarithmic time using the C++ API of this file format. The second file format is the region file format. It allows the retrieval of parts of any region in constant time (each operation is constant time, fetching N locations of a section will perform O(N) operations obviously). The last format is implements annotations. Annotations allow a reverse search. With annotations, it's easy and fast to get a list of locations (map, section, region, location) for any k-mer. This is necessary to have a rich user experience in the HTML5 client where several regions are to be rendered in the user interface.

For the end user, the starting point is

http://smart.cloud.raytrek.com:55001/client/

The port 55001 is just because I am using IBM SmartCloud. Usually, the port is implicit and it's 80.

Below, the HTTP query is in red and the HTTP response is in blue. In some cases, I truncated the message body.

The HTTP query for the first communication follows.

GET /client/ HTTP/1.1
Host: smart.cloud.raytrek.com:55001

HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 04:56:58 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Wed, 20 Feb 2013 03:37:53 GMT
ETag: "102a33-9f5-4d61faedea640"
Accept-Ranges: bytes
Content-Length: 2549
Connection: close
Content-Type: text/html; charset=UTF-8







<br>Ray Cloud Browser: interactively skim processed genomics data with energy<br>


(message body is truncated)

This returns a HTML content and the client will fetch all the required Javascript files and so on.

The first HTTP query performed by the client returns the list of maps and associated sections for each map.

GET /server/?tag=RAY_MESSAGE_TAG_GET_MAPS HTTP/1.1
Host: smart.cloud.raytrek.com:55001
 
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 04:54:03 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json

{"maps": [
{    "name": "Sample 2-3 2013-02-19-1",
    "sections": [
        { "name": "contigs" } ,
        { "name": "scaffolds" } ,
        { "name": "seeds" } ,
        { "name": "extensions" }
] },
{    "name": "American eel 2013-01-31-8",
    "sections": [
        { "name": "contigs" }
] }
]}



The next query fetches information about a particular map.

GET /server/?tag=RAY_MESSAGE_TAG_GET_MAP_INFORMATION&map=0 HTTP/1.1
Host: smart.cloud.raytrek.com:55001

HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:00:36 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json

{
"map": 0,
"kmerLength": 61,
"entries": 177593546
}


GET /server/?tag=RAY_MESSAGE_TAG_GET_REGIONS&map=0&section=0&first=0&readahead=4096 HTTP/1.1
Host: smart.cloud.raytrek.com:55001

Date: Fri, 22 Feb 2013 05:03:23 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json

{ "map": 0,
"section": 0,
"count": 31701,
"first": 0,
"readahead": 4096,
"regions": [
{"name":"contig-256000092 485463 nucleotides", "nucleotides":485463},
{"name":"contig-207000075 447363 nucleotides", "nucleotides":447363},
{"name":"contig-255000091 320321 nucleotides", "nucleotides":320321},
{"name":"contig-17 290352 nucleotides", "nucleotides":290352},
{"name":"contig-80 255554 nucleotides", "nucleotides":255554},
{"name":"contig-269000011 233955 nucleotides", "nucleotides":233955},
{"name":"contig-5 207507 nucleotides", "nucleotides":207507},
{"name":"contig-253000001 203979 nucleotides", "nucleotides":203979},
{"name":"contig-24 176868 nucleotides", "nucleotides":176868},
{"name":"contig-51 139462 nucleotides", "nucleotides":139462},
{"name":"contig-79 134613 nucleotides", "nucleotides":134613},
{"name":"contig-93 132985 nucleotides", "nucleotides":132985},
{"name":"contig-105 125302 nucleotides", "nucleotides":125302},


(message body is truncated)

The client can then ask for a bunch of k-mers for a given region.

GET /server/?tag=RAY_MESSAGE_TAG_GET_REGION_KMER_AT_LOCATION&map=0&section=0&region=4&location=2000&readahead=512 HTTP/1.1
Host: smart.cloud.raytrek.com:55001

HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:05:29 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json

{
"map": 0,
"section": 0,
"region": 4,
"kmerLength": 61,
"location": 2000,
"name":"contig-80 255554 nucleotides",
"nucleotides":255554,
"readahead": 512,
"vertices": [
{"position":1744,"value":"CCGGTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTG"},
{"position":1745,"value":"CGGTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGC"},
{"position":1746,"value":"GGTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCA"},
{"position":1747,"value":"GTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCAT"},
{"position":1748,"value":"TCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATG"},
{"position":1749,"value":"CAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGA"},
{"position":1750,"value":"AAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAA"},
{"position":1751,"value":"AACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAG"},
{"position":1752,"value":"ACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGC"},
{"position":1753,"value":"CGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCG"},
{"position":1754,"value":"GTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGT"},
{"position":1755,"value":"TACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTT"},
{"position":1756,"value":"ACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTTA"},
{"position":1757,"value":"CATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTTAT"},
{"position":1758,"value":"ATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTTATC"},


(message body is truncated)

The two last queries in the HTTP API of Ray Cloud Browser allows the client to get attributes of a k-mer and to get annotations of a k-mers.

GET /server/?tag=RAY_MESSAGE_TAG_GET_KMER_FROM_STORE&map=0&object=CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC&depth=512 HTTP/1.1
Host: smart.cloud.raytrek.com:55001

HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:07:59 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json

{
"map": 0,
"object": "CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC",
"vertices": [
{
        "value": "CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC",
        "coverage": 144,
        "parents": ["G"],
        "children": ["G"]
},
{
        "value": "GCGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATAT",
        "coverage": 155,
        "parents": ["C", "T"],
        "children": ["A", "C"]
},


(message body is truncated)

GET /server/?tag=RAY_MESSAGE_TAG_GET_OBJECT_ANNOTATIONS&map=0&object=CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC HTTP/1.1
Host: smart.cloud.raytrek.com:55001

HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:10:08 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json

{
"results": [
{ "object": "CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC",
"annotations": [
{ "type": "LocationAnnotation", "section": 0,  "region": 4,  "location": 2000 }
]
}]}



The data inside the web service are currently added and managed with RayCloudBrowser-client -- a command-line client that uses the Ray Cloud Browser C++ API. The available commands are:

RayCloudBrowser-client add-map
RayCloudBrowser-client add-section
RayCloudBrowser-client create-map
RayCloudBrowser-client create-map-annotations-with-section
RayCloudBrowser-client create-section
RayCloudBrowser-client describe-configuration
RayCloudBrowser-client describe-json-file
RayCloudBrowser-client describe-map
RayCloudBrowser-client describe-map-annotations
RayCloudBrowser-client describe-map-object
RayCloudBrowser-client describe-map-object-annotations
RayCloudBrowser-client describe-map-objects
RayCloudBrowser-client describe-map-with-region
RayCloudBrowser-client describe-section


Running any of these commands without arguments will give you a help page.


I think this visualization project is exciting and eventually, the command-line client for managing a deployment will be totally replaced by new actions available in the endpoint of the web service, like pushing new maps or new sections.

A really cool feature for the long term vision is to have a web action in the HTTP API of Ray Cloud Browser to allow end users to push their FASTQ sequences directly into the cloud.

Something that I am really proud of with the HTTP API of Ray Cloud Browser is that it abstracts totally how the objects are actually stored by the web service.

For instance, RAY_MESSAGE_TAG_GET_MAP_INFORMATION just tells the endpoint that it's for the map # 0 in the list of maps returned by RAY_MESSAGE_TAG_GET_MAPS.

Right now, the storage engine uses memory-mapped files with O_RDONLY for open(), and PROT_READ and MAP_SHARED for mmap().

No comments:

There was an error in this gadget