The polytope router and human genomes
My test job for a human genome (dataset: HiSeq-2500-NA12878-demo-2x150) just completed on colosse.calculquebec.ca. This dataset is compressed and its size is 145 GiB.
18G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R1_001.fastq.gz
19G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R1_002.fastq.gz
19G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R2_001.fastq.gz
19G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R2_002.fastq.gz
18G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R1_001.fastq.gz
18G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R1_002.fastq.gz
19G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R2_001.fastq.gz
19G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R2_002.fastq.gz
145G HiSeq-2500-NA12878-demo-2x150/
It took 46 hours to assemble 1 171 357 300 short sequences into a human genome. The longuest step is still "Merging of redundant paths". Here is a table with the running time of each step.
-->
Basically, the distributed merger puts identical things together to remove redundancy in the assembly. This is necessary because there are 512 ranks exploring the same distributed graph and they sometimes meet each other.
My director Jacques Corbeil wants to assemble a human genome in 1 hour (using titan.ccs.ornl.gov). To reach that goal, the step of merging redundant paths must be improved.
Here is the submission script. For starters, the new -detect-sequence-files option of "Smart Ray" finds out the list of options required automatically.
$ cat HiSeq-2500-NA12878-demo-2x150-11.sh
#PBS -S /bin/bash
#PBS -N HiSeq-2500-NA12878-demo-2x150-11
#PBS -o HiSeq-2500-NA12878-demo-2x150-11.stdout
#PBS -e HiSeq-2500-NA12878-demo-2x150-11.stderr
#PBS -A nne-790-ac
#PBS -l walltime=02:00:00:00
#PBS -l nodes=64:ppn=8
#PBS -l gattr=ckpt
cd $PBS_O_WORKDIR
module use /rap/nne-790-ab/modulefiles
module load nne-790-ab/seb-devtools/1.0.0
mpiexec -n 512 \
-output-filename HiSeq-2500-NA12878-demo-2x150-11 \
apps/ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray \
-k 31 \
-o HiSeq-2500-NA12878-demo-2x150-11 \
-read-write-checkpoints HiSeq-2500-NA12878-demo-2x150.SavedState \
-route-messages \
-detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \
The -route-messages activate the polytope software message router. This reduces the latency on low-end supercomputers.
As usual, Ray reports a summary of what it did:
Scaffolds >= 500 nt
Number: 460241
Total length: 2690216650
Average: 5845
N50: 10495
Median: 3356
Largest: 160698
Another interesting feature is the MessageRouter final report on the polytope values:
[MessageRouter] Rank 0 will die in 16 seconds, will not route anything after that point.
[Polytope] Load values:
AlphabetSize: 8
WordLength: 3
Self: 0,0,0
0,0,0 (0) -> 1,0,0 (1) Load: 69520331
0,0,0 (0) -> 2,0,0 (2) Load: 67191666
0,0,0 (0) -> 3,0,0 (3) Load: 68039340
0,0,0 (0) -> 4,0,0 (4) Load: 68091189
0,0,0 (0) -> 5,0,0 (5) Load: 68122849
0,0,0 (0) -> 6,0,0 (6) Load: 67318393
0,0,0 (0) -> 7,0,0 (7) Load: 70024675
0,0,0 (0) -> 0,1,0 (8) Load: 67981950
0,0,0 (0) -> 0,2,0 (16) Load: 66576237
0,0,0 (0) -> 0,3,0 (24) Load: 67868556
0,0,0 (0) -> 0,4,0 (32) Load: 67772883
0,0,0 (0) -> 0,5,0 (40) Load: 68159708
0,0,0 (0) -> 0,6,0 (48) Load: 67893735
0,0,0 (0) -> 0,7,0 (56) Load: 67753182
0,0,0 (0) -> 0,0,1 (64) Load: 66892153
0,0,0 (0) -> 0,0,2 (128) Load: 73895416
0,0,0 (0) -> 0,0,3 (192) Load: 75511892
0,0,0 (0) -> 0,0,4 (256) Load: 71191911
0,0,0 (0) -> 0,0,5 (320) Load: 73344055
0,0,0 (0) -> 0,0,6 (384) Load: 73852478
0,0,0 (0) -> 0,0,7 (448) Load: 68183524
So, to conclude, this dataset requires 46 hours with 512 (Xeon) cores. From that 46 hours, 25 hours are consumed by the merger and 7 hours are consumed by the scaffolder.
18G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R1_001.fastq.gz
19G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R1_002.fastq.gz
19G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R2_001.fastq.gz
19G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R2_002.fastq.gz
18G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R1_001.fastq.gz
18G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R1_002.fastq.gz
19G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R2_001.fastq.gz
19G HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R2_002.fastq.gz
145G HiSeq-2500-NA12878-demo-2x150/
It took 46 hours to assemble 1 171 357 300 short sequences into a human genome. The longuest step is still "Merging of redundant paths". Here is a table with the running time of each step.
-->
Step | Date | Elapsed time | Since Beginning |
Network testing | 2013-10-03T11:33:31 | 2 seconds | 2 seconds |
Counting sequences to assemble | 2013-10-03T11:41:49 | 8 minutes, 18 seconds | 8 minutes, 20 seconds |
Sequence loading | 2013-10-03T12:52:06 | 1 hours, 10 minutes, 17 seconds | 1 hours, 18 minutes, 37 seconds |
K-mer counting | 2013-10-03T13:18:58 | 26 minutes, 52 seconds | 1 hours, 45 minutes, 29 seconds |
Coverage distribution analysis | 2013-10-03T13:19:13 | 15 seconds | 1 hours, 45 minutes, 44 seconds |
Graph construction | 2013-10-03T13:54:47 | 35 minutes, 34 seconds | 2 hours, 21 minutes, 18 seconds |
Null edge purging | 2013-10-03T14:01:09 | 6 minutes, 22 seconds | 2 hours, 27 minutes, 40 seconds |
Selection of optimal read markers | 2013-10-03T14:51:05 | 49 minutes, 56 seconds | 3 hours, 17 minutes, 36 seconds |
Detection of assembly seeds | 2013-10-03T17:43:04 | 2 hours, 51 minutes, 59 seconds | 6 hours, 9 minutes, 35 seconds |
Estimation of outer distances for paired reads | 2013-10-03T17:59:38 | 16 minutes, 34 seconds | 6 hours, 26 minutes, 9 seconds |
Bidirectional extension of seeds | 2013-10-04T00:14:34 | 6 hours, 14 minutes, 56 seconds | 12 hours, 41 minutes, 5 seconds |
Merging of redundant paths | 2013-10-05T01:43:40 | 1 days, 1 hours, 29 minutes, 6 seconds | 1 days, 14 hours, 10 minutes, 11 seconds |
Generation of contigs | 2013-10-05T02:12:52 | 29 minutes, 12 seconds | 1 days, 14 hours, 39 minutes, 23 seconds |
Scaffolding of contigs | 2013-10-05T09:27:55 | 7 hours, 15 minutes, 3 seconds | 1 days, 21 hours, 54 minutes, 26 seconds |
Counting sequences to search | 2013-10-05T09:27:55 | 0 seconds | 1 days, 21 hours, 54 minutes, 26 seconds |
Graph coloring | 2013-10-05T09:28:09 | 14 seconds | 1 days, 21 hours, 54 minutes, 40 seconds |
Counting contig biological abundances | 2013-10-05T09:34:05 | 5 minutes, 56 seconds | 1 days, 22 hours, 36 seconds |
Counting sequence biological abundances | 2013-10-05T09:34:05 | 0 seconds | 1 days, 22 hours, 36 seconds |
Loading taxons | 2013-10-05T09:34:17 | 12 seconds | 1 days, 22 hours, 48 seconds |
Loading tree | 2013-10-05T09:34:31 | 14 seconds | 1 days, 22 hours, 1 minutes, 2 seconds |
Processing gene ontologies | 2013-10-05T09:34:55 | 24 seconds | 1 days, 22 hours, 1 minutes, 26 seconds |
Computing neighbourhoods | 2013-10-05T09:34:55 | 0 seconds | 1 days, 22 hours, 1 minutes, 26 seconds |
Basically, the distributed merger puts identical things together to remove redundancy in the assembly. This is necessary because there are 512 ranks exploring the same distributed graph and they sometimes meet each other.
My director Jacques Corbeil wants to assemble a human genome in 1 hour (using titan.ccs.ornl.gov). To reach that goal, the step of merging redundant paths must be improved.
Here is the submission script. For starters, the new -detect-sequence-files option of "Smart Ray" finds out the list of options required automatically.
$ cat HiSeq-2500-NA12878-demo-2x150-11.sh
#PBS -S /bin/bash
#PBS -N HiSeq-2500-NA12878-demo-2x150-11
#PBS -o HiSeq-2500-NA12878-demo-2x150-11.stdout
#PBS -e HiSeq-2500-NA12878-demo-2x150-11.stderr
#PBS -A nne-790-ac
#PBS -l walltime=02:00:00:00
#PBS -l nodes=64:ppn=8
#PBS -l gattr=ckpt
cd $PBS_O_WORKDIR
module use /rap/nne-790-ab/modulefiles
module load nne-790-ab/seb-devtools/1.0.0
mpiexec -n 512 \
-output-filename HiSeq-2500-NA12878-demo-2x150-11 \
apps/ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray \
-k 31 \
-o HiSeq-2500-NA12878-demo-2x150-11 \
-read-write-checkpoints HiSeq-2500-NA12878-demo-2x150.SavedState \
-route-messages \
-detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \
The -route-messages activate the polytope software message router. This reduces the latency on low-end supercomputers.
As usual, Ray reports a summary of what it did:
Scaffolds >= 500 nt
Number: 460241
Total length: 2690216650
Average: 5845
N50: 10495
Median: 3356
Largest: 160698
Another interesting feature is the MessageRouter final report on the polytope values:
[MessageRouter] Rank 0 will die in 16 seconds, will not route anything after that point.
[Polytope] Load values:
AlphabetSize: 8
WordLength: 3
Self: 0,0,0
0,0,0 (0) -> 1,0,0 (1) Load: 69520331
0,0,0 (0) -> 2,0,0 (2) Load: 67191666
0,0,0 (0) -> 3,0,0 (3) Load: 68039340
0,0,0 (0) -> 4,0,0 (4) Load: 68091189
0,0,0 (0) -> 5,0,0 (5) Load: 68122849
0,0,0 (0) -> 6,0,0 (6) Load: 67318393
0,0,0 (0) -> 7,0,0 (7) Load: 70024675
0,0,0 (0) -> 0,1,0 (8) Load: 67981950
0,0,0 (0) -> 0,2,0 (16) Load: 66576237
0,0,0 (0) -> 0,3,0 (24) Load: 67868556
0,0,0 (0) -> 0,4,0 (32) Load: 67772883
0,0,0 (0) -> 0,5,0 (40) Load: 68159708
0,0,0 (0) -> 0,6,0 (48) Load: 67893735
0,0,0 (0) -> 0,7,0 (56) Load: 67753182
0,0,0 (0) -> 0,0,1 (64) Load: 66892153
0,0,0 (0) -> 0,0,2 (128) Load: 73895416
0,0,0 (0) -> 0,0,3 (192) Load: 75511892
0,0,0 (0) -> 0,0,4 (256) Load: 71191911
0,0,0 (0) -> 0,0,5 (320) Load: 73344055
0,0,0 (0) -> 0,0,6 (384) Load: 73852478
0,0,0 (0) -> 0,0,7 (448) Load: 68183524
So, to conclude, this dataset requires 46 hours with 512 (Xeon) cores. From that 46 hours, 25 hours are consumed by the merger and 7 hours are consumed by the scaffolder.
Comments