Cost Effectiveness Analysis (CEA) of running Ray on Amazon EC2

Sample: SRA001125
URL: http://trace.ddbj.nig.ac.jp/DRASearch/submission?acc=SRA001125
DNA reads: 34911784 (2 * 17455892)
Read length (nt): 36
Technology: Illumina Genome Analyzer

API name: m1.large
2 Ray processes
Running time: 05:28:46
Pricing: 0.260 $ / h
Cost: 1.560 $

API name: m3.xlarge
4 Ray processes
Running time: 02:31:34
Pricing: 0.580 $ / h
Cost: 1.730 $

API name: cc2.8xlarge
32 Ray processes
Running time: 00:54:06
Pricing: 2.400 / h 
Cost: 2.400 $

Conclusions:

1. You get your results faster if you pay more.

2. For cc2.8xlarge, 33% (00:19:40) of the time was loading sequences from EBS.
That's a lot !

3. The scalability on this problem is not that good because the
problem size is not very large.

4. Amazon EC2 is really affordable for de novo assemblies of bacterial genomes. 
 
 
 
If you want to try these tests yourself => http://github.com/sebhtml/Ray-in-Amazon-EC2-CLOUD

Comments

Popular posts from this blog

A survey of the burgeoning industry of cloud genomics

Generating neural machine instructions for multi-head attention

Adding ZVOL VIRTIO disks to a guest running on a host with the FreeBSD BHYVE hypervisor