2013-06-19

A survey of the burgeoning industry of cloud genomics

I like lists. Here is a list of companies in the cloud genomics industry.

Table 1: Companies acting in the industry of cloud genomics. The list is in no particular order.

Company
Products
People
Founded
Link
GenoSpace, LLC
- GenoSpace platform
- John Quackenbush (CEO)
2011
DNAnexus, Inc.
- DNAnexus Platform
- DNAnexus Platform SDK (a.k.a. dx-toolkit)
- Serafim Batzoglou (co-founder)
- Andreas Sundquist (co-founder and CTO)
- a bunch of other famous people in the field
2009
Seven Bridges Genomics, Inc.
- IGOR
- IGOR Python SDK
- Igor Bogicevic (CTO)
2009
Illumina, Inc.
- BaseSpace™
- BaseSpace API
- Alex Dickinson (SVP of Cloud Genomics)
- MG7
- CG7
- Raquel Tobes (CSO)
2004http://era7.com/



Some opinions:

Era7 and GenoSpace do not really have products that people can launch themselve using a platform. They are service companies I think.


In EasyGenomics, the customer can choose (top right-hand side) the region that is the nearest geographically. Choices are Hong Kong and Shenzhen.




It is strange to have igor in my team of every project in IGOR (a product of Seven Bridges Genomics, Inc.).


I tried DNAnexus and it’s my favorite so far. I even published an app (Ray).


It is actually fun to start an app in DNAnexus.


Illumina, Inc. should allow other sequencing vendors in their data warehouse.


Illumina, Inc. is mostly a monopoly in the DNA sequencing market. So BaseSpace (if it gains momentum at some point for analyzing data) is presumably a case of antithrust (Wikipedia, IMDb). Illumina, Inc. should also focus on native apps to provide a consistent user interface for consumers. Otherwise, I just feel like closing my web browser when BaseSpace redirects me to a third-party vendor.


If am not mistaken, all these products (except maybe EasyGenomics, I don't know)  use Amazon EC2 and/or Amazon S3.

Amazon EC2 (Elastic Compute Cloud)  is used to compute stuff and Amazon S3 (Simple Storage Service) to store stuff.

Other similar (but open) projects include Galaxy and GenomeSpace.


Micro-blogging accounts:

http://weibo.com/hibgi (12460 fans)
https://twitter.com/dnanexus (1550 followers)
https://twitter.com/sbgenomics (204 followers)
https://twitter.com/Era7bioinfo (140 followers)
https://twitter.com/genospace (125 followers)
https://twitter.com/basespace (13 followers)


Spot instances that stop (or reboot) on their own: a new feature in AWS ?

So I have a spot instance for my visualization project. I am using one t1.micro instance in the spot market.

My instance is i-05cabc6a and its spot request is sir-4e01da35. It has a 64 GB EBS volume attached to it (vol-0eba4a7f) and a 8 GB EBS volume for the operating system (vol-b23d42e9). 

According to the EC2 Management Console, its launch time was 2013-06-06 11:00 EDT (314 hours). This is a spot instance, it can not be stopped. It can only be terminated, or obviously it can continue to live too.


So today I was showing people my visualization project, but it did not work.

I use http://browser.cloud.raytrek.com/client for the permanent address.

In the DNS:

;; ANSWER SECTION:
browser.cloud.raytrek.com. 14113 IN     CNAME   browser-10.raytrek.com.
browser-10.raytrek.com. 14113   IN      CNAME   ec2-54-226-13-244.compute-1.amazonaws.com.
ec2-54-226-13-244.compute-1.amazonaws.com. 155620 IN A 54.226.13.244

So there was something wrong. I decided from this point to connect to my instance.

I was surprised to see that the uptime of the operating system on my instance was only 1 day and 20 hours. Today is 2013-06-19, and my spot instance started on 2013-06-06.

[web@ip-10-125-9-5 ~]$ w
 17:52:05 up 1 day, 20:08,  1 user,  load average: 0.00, 0.01, 0.05
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
ec2-user pts/0    132.203.106.196  17:45    5.00s  0.03s  0.01s sshd: ec2-user [priv]
 

My web server was not running neither.

[web@ip-10-125-9-5 ~]$ ps aux|grep lighttpd|grep -v grep

This the second time that this is happening to me, see my Twitter. A Senior Product Manager from Amazon Web Services even contacted me about this in the past. See the message below (I redacted names and addresses):


Hi Sébastien,

I’m the Sr. Product Manager focusing on Amazon EC2 Spot Instances and I noticed your tweet that your Spot instance was stopped instead of terminated. We definitely didn’t introduce a new feature—I’d love to understand what happened. Would you be willing to share some details to help us figure it out (e.g., Spot instance request id)?

Thanks!
############

########### | Amazon Web Services—EC2 | Sr. Product Manager | #############| ###########

In my case, starting my stuff takes less than 5 seconds since all my data and software are in Amazon EBS.

More commands to start Ray Cloud Browser (on TCP port 80):

[ec2-user@ip-10-125-9-5 ~]$ sudo su
[root@ip-10-125-9-5 ec2-user]# cd /mnt/
[root@ip-10-125-9-5 mnt]# ls
vol-0a34c87b  x
[root@ip-10-125-9-5 mnt]# mkdir vol-0eba4a7f
[root@ip-10-125-9-5 mnt]# mount /dev/sdf vol-0eba4a7f/
[root@ip-10-125-9-5 mnt]# exit
[ec2-user@ip-10-125-9-5 ~]$ sudo su web
find: failed to restore initial working directory: Permission denied
[web@ip-10-125-9-5 ec2-user]$ cd
[web@ip-10-125-9-5 ~]$ ls
Ray-Cloud-Browser-instance
[web@ip-10-125-9-5 ~]$ rm -rf Ray-Cloud-Browser-instance
[web@ip-10-125-9-5 ~]$ ln -s /mnt/vol-0eba4a7f/
lost+found/       vol-0a34c87b.tar  web/             
[web@ip-10-125-9-5 ~]$ ln -s /mnt/vol-0eba4a7f/web/Ray-Cloud-Browser-instance
[web@ip-10-125-9-5 ~]$ ls
Ray-Cloud-Browser-instance
[web@ip-10-125-9-5 ~]$ cd Ray-Cloud-Browser-instance/
[web@ip-10-125-9-5 Ray-Cloud-Browser-instance]$ ls
lighttpd.conf  logs  pid.txt  Ray-Cloud-Browser  Ray-Cloud-Browser-data
[web@ip-10-125-9-5 Ray-Cloud-Browser-instance]$ exit
[ec2-user@ip-10-125-9-5 ~]$ sudo su
[root@ip-10-125-9-5 ec2-user]# cd /home/web/Ray-Cloud-Browser-instance/
[root@ip-10-125-9-5 Ray-Cloud-Browser-instance]# lighttpd -f lighttpd.conf

2013-06-11

Open access doctoral theses on de novo genome assembly


I am presently writing my doctoral thesis.

So far, I have found four open access doctoral thesis on de novo genome assembly.


Table 1: Doctoral theses on de novo assembly in the next-generation sequencing era. Links to citeulike entries are provided on dates.


Date
Person
Institution
Mark Chaisson
University of California, San Diego, United States of America
Daniel Robert Zerbino
Darwin College, University of Cambridge, United Kingdom
Rayan Chikhi
École normale supérieure de Cachan - Antenne
de Bretagne, France
Jared Thomas Simpson
Queens’ College, University of Cambridge, United Kingdom




There was an error in this gadget