Showing posts from February, 2013

Introducing genome subway maps

It's no secret, data visualization is more appealing than bare tables with floating numbers and integers. And visualization can be dynamic and responsive too, if designed correctly. In November 2012, I started to work on a pet project called Ray Cloud Browser. From the name, you can tell that it's something to browse stuff related to astronomy: rays and clouds. In fact, that's untrue. Ray Cloud Browser is a data browser that can run in the cloud -- an abstraction for virtualized hardware that you pay by the hour. Ray is just the brand name of the products I am working on during my doctoral projects. Ray Cloud Browser is open source and free software. It's all on github with nice documentation and all that. Anyway, enough with the chitchat. The first picture I want to share is this view that illustrates repeated regions in a genome. It's very like a subway map, hence the title of this post. You can visit this subway location by yourself here . The dem

Building a client for visualizing graphs, in a browser

A graph has a set of vertices and a set of edges. An edge is a relationship between two vertices. If you take Facebook, the vertices are people and the edges are friendships. If you take two people on Facebook, they are probably connected by just a few links -- like in pretty much every discrete systems known to mankind. A path (like that path between two people) is the second class of interesting objects for visualizing a given system, the first class being the graphs. With graphs and paths, it is possible to describe numerous discrete systems. In Ray Cloud Browser -- a graph visualizer for genomics, vocabulary terms were carefully selected. In Ray Cloud Browser, the 4 main object types are maps, sections, regions, and locations. A map is a graph in genomics. The vertices of a map are DNA sequences (like GATTACA), and edges are direct neighbourhood relationships (such as G ATTACA -> ATTACA G). A section is really just a bunch of paths in the graph. The paths in the map are

Using Cost Allocation Report on Amazon Web Services (AWS)

AWS offers web services like compute instances. Lately, I have been using one cc2.8xlarge instance for 3 hours on a weekly basis to give training sessions. My 14 students connect to (a canonical name to my AWS instance) during every training session.The instance has one additional 300 GiB EBS volume attached to it so that my students keep their data for the whole duration of the training program. On AWS, I can tag anything I use: EC2 instances, EC2 EBS volumes, S3 buckets, and so on. A tag is a key and a value (key=value), for example Project=Ray-Cloud-Browser-public-demo . On AWS, it's possible to activate a feature called Cost Allocation Report. This feature deposits detailed usage reports in one S3 bucket that I own. These reports include costs. I tagged my Cost Allocation Report S3 bucket with Project=Billing to get a grasp on on much it costs to use the Cost Allocation Report feature. The cost of getting my Cost Allocation Report reports is only $

Big milestone reached for Ray Cloud Browser

It's almost March, and yet another milestone for Ray Cloud Browser was successfully reached this week. The data model of this software is composited of 4 types of objects: a map (a DNA kmer graph with a name), a section (a group of DNA sequences which are called regions), a region (a DNA sequence), and a location (a position in a region). Ray Cloud Browser is a distributed application: some parts run in your browser, and some other parts run in the cloud (or your other favorite place to host your infrastructure). The client is in Javascript and HTML5 and runs in a web browser. The web service is in C++ and runs atop a web server. The web services is implemented in C++ and is really efficient. There are 3 file binary file formats (with ASCII version that can be converted). The first is the map, which contains all the k-mers of a sample, their coverage, their parents and their children. Any k-mer can obtained in a logarithmic time using the C++ API of this file format. The second

Using canonical names for cloud instances

I am using these public cloud services: Product Service Provider Amazon Elastic Compute Cloud (EC2) Amazon Web Services, Inc. (AWS) Windows Azure Linux Virtual Machines Microsoft Corporation Rackspace Cloud Servers Rackspace, U.S. Inc. IBM SmartCloud® IBM Corporation My canonical names: Name Type Value CNAME CNAME CNAME CNAME CNAME CNAME

Testing a Silver instance on IBM SmartCloud

IBM SmartCloud is free for 90 days. My Silver instance runs Red Hat Enterprise Linux v5.4, has 4 cCPU has 8 GiB RAM, and 1060 GiB disk. I can connect to my free-of-charge instance with the following command: ssh -i ibmcloud_seb@boisvert.info_rsa In the documentation, it says: Open ports 22 is the SSH port for the idcuser account 523 is the DB2 Administration Server port 50001 is the DB2 instance port for the db2inst1 user 55001 is the DB2 Text Search port 60000:60003 are the DPF ports for the FCM protocol Warning: Every additional port is a potential security risk. I usually like my port 80 when it's opened. Another thing that I don't like is that there is no vim (the editor). [idcuser@vhost0147 conf]$ vim -bash: vim: command not found And you can not install it either. [idcuser@vhost0147 conf]$ sudo yum install -y vim Loaded plugins: rhnplugin, security This system is not registered with R

Architecture informatique pour l'achat de billets du Festival d'été de Québec

Aujourd'hui, l'achat de billets pour le Festival d'été de Québec était pénible pour les clients ( , ). À partir du site il est possible d'aller sur le site des achats de billets. Le site est Le site est en Ontario (Brampton) alors que est dans les nuages, chez Amazon Web Services, Inc., dans la ville de Ashburn aux États-Unis. L'entrée A dans le fichier de zone DNS pour infofestival:    31    IN    A Voici les entrées DNS de type CNAME dans le fichier de zone pour 300     IN      CNAME 60 IN A 60 IN A 6

A catalog of IBM Blue Gene/Q errors (for science)

Once in a while, I get to run things on a IBM Blue Gene/Q. And when that happens, some of my jobs always crash with random errors. For science, here are some of them. Update 2013-02-27 with MPI I/O:  This requires fcntl(2) to be implemented. As of 8/25/2011 it is not. Generic MPICH Message: File locking failed in ADIOI_Set_lock(fd 4,cmd F_SETLKW/E,type F_WRLCK/1,whence 0) with return value FFFFFFFF and errno 23. - If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching). - If the file system is LUSTRE, ensure that the directory is mounted with the 'flock' option. ADIOI_Set_lock:: Resource deadlock avoided ADIOI_Set_lock:offset 25625204815, length 6282003 Abort(1) on node 2501 (rank 2501 in comm 1140850688): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2501 SRA056234-Picea-glauca-2013-02-12-1 2013-02-13 04:56:26.627 (F

Ray is a software robot

Text snippets below that are in bold font face are from Wikipedia. According to Wikipedia , Ray     "comes in two variants: a manned prototype version (...) and an    unmanned, computer-controlled version (...)." This part means basically that our product can be launched in a interactive terminal (manned) or with a job scheduler using a job description (unmanned).     "RAY differs from previous Metal Gears in that it is not a nuclear launch platform, but instead a weapon of conventional warfare." Ray is a versatile software for conventional workflows, although it can also perform assembly from nuclear DNA (DNA from the nucleus of a cell).     "The Metal Gear RAY is more organic in appearance and in function than previous models." Ray is appealing both from its exterior look, but also in its design blueprints, and source code.     "Its streamlined shape helps to deflect enemy fire and allows for greater maneuverability both on land and