Ray and large datasets
Ray is a de novo  de Bruijn genome assembler that uses message passing interface. Sébastien Boisvert reports the memory utilization for Illumina CEO Jay T. Flatley (SRA010766) with k=19.    The short read archive hosts several large sequencing datasets. Among these, there are  SRA000271  and SRA010766  -- the two are reads from human genomes : the first is an African male individual (HapMap: NA18507) and the other is Illumina CEO Jay T. Flatley.   Variations in the genome of these individuals can be harnessed by mapping the reads onto a sequence reference  -- a set of files containing the genetic blueprint as determined by the investigators of the human genome project.   But when a reference is not available, de novo  assembly of the sequencing reads into contiguous sequences is one of the first step toward  acquiring an high-quality reference sequence.   Unparalleled parallelism   In parallel computation, powerful processors can be combined to form an alliance stronger and fa...