Best Amazon EC2 instance type for Ray Cloud Browser on metagenomes and bacterial genomes


I am using spot instances on Amazon Elastic Compute Cloud (EC2) to deploy a few installations of Ray Cloud Browser. Initially, I opted for m1.small instances because the Ray Cloud Browser web service that answers a bunch of HTTP GET API calls was not optimized. Namely, the C++ back-end code was memory-mapping a huge file (~16-20 GiB). This huge bloated binary file was the index -- the source of information about the huge graph describing a given biological sample. A recent patch improved the performance by packing information in every available bit in the binary file, reducing the number of blocks by 75%, hereby enhancing performance as well.

Recently, there were a lot of peaks in the m1.small spot pricing, and I figured out that my use case was all about bursts -- discrete HTTP API calls.

I then looked at the pricing history for the last 3 months, and these pesky peaks seem to be a 2013 thing.

The t1.micro spot instance pricing history also has these sophisticated highs.

With all these recent pricing surges, I decided to provision some standard Elastic Block Storage (EBS) volumes so that my stuff stays in the cloud when the spot market price exceeds what I want to pay as a customer.

Before today, I was using ephemeral volumes, which are basically just EBS storage volumes that vanish when the instance is stopped or terminated.
I don't use ephemeral EBS volumes anymore as they are not useful for my user story.

Running one m1.small on-demand instance costs 47.48 $ per month whereas one single 64-GiB EBS standard volume costs 6.40 $ per month (excluding input/output requests). And running one m1.small spot instance costs around 5.11 $ per month whereas running one t1.micro spot instance costs around 2.19 $ per month.

Therefore, running Ray Cloud Browser on one t1.micro spot instance using one 64-GiB EBS volume costs 8.59 $ per month. Basically, it costs nothing at all considering the cost of other things in genomics research, such as sequencing runs on instruments, bioinformatician-hours, developper-hours, and so on.

And the software called Ray Cloud Browser costs nothing too -- I authored it and it's free software distributed under the GNU General Public License version 3.

I hope that people everywhere use Ray Cloud Browser to visualize their DNA samples !

The cloud is really a nice thing, it removes barriers. But you have to use it to gain experience in order to lower your costs.

No comments:

There was an error in this gadget