Showing posts from 2010

Ray and large datasets

Ray is a de novo de Bruijn genome assembler that uses message passing interface. S├ębastien Boisvert reports the memory utilization for Illumina CEO Jay T. Flatley (SRA010766) with k=19.

The short read archive hosts several large sequencing datasets. Among these, there are  SRA000271 and SRA010766 -- the two are reads from human genomes: the first is an African male individual (HapMap: NA18507) and the other is Illumina CEO Jay T. Flatley.

Variations in the genome of these individuals can be harnessed by mapping the reads onto a sequence reference -- a set of files containing the genetic blueprint as determined by the investigators of the human genome project.

But when a reference is not available, de novo assembly of the sequencing reads into contiguous sequences is one of the first step toward  acquiring an high-quality reference sequence.

Unparalleled parallelism

In parallel computation, powerful processors can be combined to form an alliance stronger and faster at analyzing proble…