Cost Effectiveness Analysis (CEA) of running Ray on Amazon EC2

Sample: SRA001125
URL: http://trace.ddbj.nig.ac.jp/DRASearch/submission?acc=SRA001125
DNA reads: 34911784 (2 * 17455892)
Read length (nt): 36
Technology: Illumina Genome Analyzer

API name: m1.large
2 Ray processes
Running time: 05:28:46
Pricing: 0.260 $ / h
Cost: 1.560 $

API name: m3.xlarge
4 Ray processes
Running time: 02:31:34
Pricing: 0.580 $ / h
Cost: 1.730 $

API name: cc2.8xlarge
32 Ray processes
Running time: 00:54:06
Pricing: 2.400 / h 
Cost: 2.400 $

Conclusions:

1. You get your results faster if you pay more.

2. For cc2.8xlarge, 33% (00:19:40) of the time was loading sequences from EBS.
That's a lot !

3. The scalability on this problem is not that good because the
problem size is not very large.

4. Amazon EC2 is really affordable for de novo assemblies of bacterial genomes. 
 
 
 
If you want to try these tests yourself => http://github.com/sebhtml/Ray-in-Amazon-EC2-CLOUD

Comments

Popular posts from this blog

The Thorium actor engine is operational now, we can start to work on actor applications for metagenomics

Learning to solve the example 1 of puzzle 3aa6fb7a in the ARC prize

Adding ZVOL VIRTIO disks to a guest running on a host with the FreeBSD BHYVE hypervisor