Showing posts from 2010

Ray and large datasets

Ray is a de novo de Bruijn genome assembler that uses message passing interface. S├ębastien Boisvert reports the memory utilization for Illumina CEO Jay T. Flatley (SRA010766) with k=19. The short read archive hosts several large sequencing datasets. Among these, there are  SRA000271 and SRA010766 -- the two are reads from human genomes : the first is an African male individual (HapMap: NA18507) and the other is Illumina CEO Jay T. Flatley. Variations in the genome of these individuals can be harnessed by mapping the reads onto a sequence reference -- a set of files containing the genetic blueprint as determined by the investigators of the human genome project. But when a reference is not available, de novo assembly of the sequencing reads into contiguous sequences is one of the first step toward  acquiring an high-quality reference sequence. Unparalleled parallelism In parallel computation, powerful processors can be combined to form an alliance stronger and faster at a