Plant genome (white spruce), Illumina HiSeq 2000, IBM Blue Gene/Q, and Ray
The SRA056234 dataset contains reads for Picea glauca (white spruce). The reads were obtained with the Illumina HiSeq 2000. It's 2.8 TiB of uncompressed fastq files.
$ du -sh blocks/
2.8T blocks/
We are using Ray on a IBM Blue Gene/Q. In particular, we are using 512 nodes with 1 IBM PowerPC A2 processor and 16 GiB of DDR3 memory per node. Each processor has 16 cores, and each core has 4 threads.
Because of memory limitation per core, we are only using 16 MPI ranks per node for the time being. Therefore, we are using 16384 Ray processes each with a maximum of 1 GiB of memory on the Blue Gene/Q. We have 16 TiB of distributed memory.
First, we can list the Ray plugins we are using.
$ ls -1 SRA056234-Picea-glauca-2012-12-18-4/Plugins/
plugin_Amos.txt
plugin_CoverageGatherer.txt
plugin_DummySun.txt
plugin_EdgePurger.txt
plugin_FusionData.txt
plugin_FusionTaskCreator.txt
plugin_GeneOntology.txt
plugin_GenomeNeighbourhood.txt
plugin_JoinerTaskCreator.txt
plugin_KmerAcademyBuilder.txt
plugin_Library.txt
plugin_MachineHelper.txt
plugin_MessageProcessor.txt
plugin_Mock.txt
plugin_NetworkTest.txt
plugin_Partitioner.txt
plugin_PhylogenyViewer.txt
plugin_Scaffolder.txt
plugin_Searcher.txt
plugin_SeedExtender.txt
plugin_SeedingData.txt
plugin_SequencesIndexer.txt
plugin_SequencesLoader.txt
plugin_SwitchMan.txt
plugin_VerticesExtractor.txt
The file partition has a nice layout. On the Blue Gene/Q, I needed to split my files in 4288 fastq files with a maximum of 2000000 sequences each because I/O operations are offloaded to I/O drawers. Aside from that, Ray works as is.
$ head SRA056234-Picea-glauca-2012-12-18-4/FilePartition.txt
#File Name FirstSequence LastSequence NumberOfSequences
0 blocks/SRR525188_1-block-0.fastq 0 1999999 2000000
1 blocks/SRR525188_2-block-0.fastq 2000000 3999999 2000000
2 blocks/SRR525188_1-block-1.fastq 4000000 5999999 2000000
3 blocks/SRR525188_2-block-1.fastq 6000000 7999999 2000000
4 blocks/SRR525188_1-block-10.fastq 8000000 9999999 2000000
5 blocks/SRR525188_2-block-10.fastq 10000000 11999999 2000000
6 blocks/SRR525188_1-block-11.fastq 12000000 13999999 2000000
7 blocks/SRR525188_2-block-11.fastq 14000000 15999999 2000000
8 blocks/SRR525188_1-block-12.fastq 16000000 17999999 2000000
$ tail SRA056234-Picea-glauca-2012-12-18-4/FilePartition.txt
4278 blocks/SRR525214_1-block-98.fastq 8498092212 8500092211 2000000
4279 blocks/SRR525214_2-block-98.fastq 8500092212 8502092211 2000000
4280 blocks/SRR525214_1-block-99.fastq 8502092212 8504092211 2000000
4281 blocks/SRR525214_2-block-99.fastq 8504092212 8506092211 2000000
4282 blocks/SRR525215_1-block-0.fastq 8506092212 8508092211 2000000
4283 blocks/SRR525215_2-block-0.fastq 8508092212 8510092211 2000000
4284 blocks/SRR525215_1-block-1.fastq 8510092212 8512092211 2000000
4285 blocks/SRR525215_2-block-1.fastq 8512092212 8514092211 2000000
4286 blocks/SRR525215_1-block-2.fastq 8514092212 8515006210 913999
4287 blocks/SRR525215_2-block-2.fastq 8515006211 8515920209 913999
The 8515920210 input sequences are uniformly distributed onto 16384 MPI ranks. There are 519770 sequences per MPI rank.
$ head SRA056234-Picea-glauca-2012-12-18-4/SequencePartition.txt
#Rank FirstSequence LastSequence NumberOfSequences
0 0 519769 519770
1 519770 1039539 519770
2 1039540 1559309 519770
3 1559310 2079079 519770
4 2079080 2598849 519770
5 2598850 3118619 519770
6 3118620 3638389 519770
7 3638390 4158159 519770
8 4158160 4677929 519770
$ tail SRA056234-Picea-glauca-2012-12-18-4/SequencePartition.txt
16374 8510713980 8511233749 519770
16375 8511233750 8511753519 519770
16376 8511753520 8512273289 519770
16377 8512273290 8512793059 519770
16378 8512793060 8513312829 519770
16379 8513312830 8513832599 519770
16380 8513832600 8514352369 519770
16381 8514352370 8514872139 519770
16382 8514872140 8515391909 519770
16383 8515391910 8515920209 528300
Finally, the 42831057656 k-mers are distributed uniformly on 16384 MPI ranks, with around 2614200 k-mers per MPI rank.
$ head SRA056234-Picea-glauca-2012-12-18-4/GraphPartition.txt
#Rank NumberOfKmers IdealNumberOfKmers Difference RelativeDifference
#TotalKmers: 42831057656
#Ranks: 16384
#IdealNumberOfKmers: 2614200
0 2611430 2614200 -2770 -0.10596%
1 2612276 2614200 -1924 -0.073598%
2 2613476 2614200 -724 -0.0276949%
3 2611618 2614200 -2582 -0.0987683%
4 2616320 2614200 2120 0.0810956%
5 2615454 2614200 1254 0.0479688%
$ tail SRA056234-Picea-glauca-2012-12-18-4/GraphPartition.txt
16374 2612820 2614200 -1380 -0.0527886%
16375 2615682 2614200 1482 0.0566904%
16376 2610428 2614200 -3772 -0.144289%
16377 2617236 2614200 3036 0.116135%
16378 2615978 2614200 1778 0.0680132%
16379 2614000 2614200 -200 -0.00765052%
16380 2619520 2614200 5320 0.203504%
16381 2614508 2614200 308 0.0117818%
16382 2614830 2614200 630 0.0240992%
16383 2614080 2614200 -120 -0.00459031%
The RelativeDifference column indicates really good automated load balancing.
Ray is currently running the slave mode RAY_SLAVE_MODE_INDEX_SEQUENCES.
$ du -sh blocks/
2.8T blocks/
We are using Ray on a IBM Blue Gene/Q. In particular, we are using 512 nodes with 1 IBM PowerPC A2 processor and 16 GiB of DDR3 memory per node. Each processor has 16 cores, and each core has 4 threads.
Because of memory limitation per core, we are only using 16 MPI ranks per node for the time being. Therefore, we are using 16384 Ray processes each with a maximum of 1 GiB of memory on the Blue Gene/Q. We have 16 TiB of distributed memory.
First, we can list the Ray plugins we are using.
$ ls -1 SRA056234-Picea-glauca-2012-12-18-4/Plugins/
plugin_Amos.txt
plugin_CoverageGatherer.txt
plugin_DummySun.txt
plugin_EdgePurger.txt
plugin_FusionData.txt
plugin_FusionTaskCreator.txt
plugin_GeneOntology.txt
plugin_GenomeNeighbourhood.txt
plugin_JoinerTaskCreator.txt
plugin_KmerAcademyBuilder.txt
plugin_Library.txt
plugin_MachineHelper.txt
plugin_MessageProcessor.txt
plugin_Mock.txt
plugin_NetworkTest.txt
plugin_Partitioner.txt
plugin_PhylogenyViewer.txt
plugin_Scaffolder.txt
plugin_Searcher.txt
plugin_SeedExtender.txt
plugin_SeedingData.txt
plugin_SequencesIndexer.txt
plugin_SequencesLoader.txt
plugin_SwitchMan.txt
plugin_VerticesExtractor.txt
The file partition has a nice layout. On the Blue Gene/Q, I needed to split my files in 4288 fastq files with a maximum of 2000000 sequences each because I/O operations are offloaded to I/O drawers. Aside from that, Ray works as is.
$ head SRA056234-Picea-glauca-2012-12-18-4/FilePartition.txt
#File Name FirstSequence LastSequence NumberOfSequences
0 blocks/SRR525188_1-block-0.fastq 0 1999999 2000000
1 blocks/SRR525188_2-block-0.fastq 2000000 3999999 2000000
2 blocks/SRR525188_1-block-1.fastq 4000000 5999999 2000000
3 blocks/SRR525188_2-block-1.fastq 6000000 7999999 2000000
4 blocks/SRR525188_1-block-10.fastq 8000000 9999999 2000000
5 blocks/SRR525188_2-block-10.fastq 10000000 11999999 2000000
6 blocks/SRR525188_1-block-11.fastq 12000000 13999999 2000000
7 blocks/SRR525188_2-block-11.fastq 14000000 15999999 2000000
8 blocks/SRR525188_1-block-12.fastq 16000000 17999999 2000000
$ tail SRA056234-Picea-glauca-2012-12-18-4/FilePartition.txt
4278 blocks/SRR525214_1-block-98.fastq 8498092212 8500092211 2000000
4279 blocks/SRR525214_2-block-98.fastq 8500092212 8502092211 2000000
4280 blocks/SRR525214_1-block-99.fastq 8502092212 8504092211 2000000
4281 blocks/SRR525214_2-block-99.fastq 8504092212 8506092211 2000000
4282 blocks/SRR525215_1-block-0.fastq 8506092212 8508092211 2000000
4283 blocks/SRR525215_2-block-0.fastq 8508092212 8510092211 2000000
4284 blocks/SRR525215_1-block-1.fastq 8510092212 8512092211 2000000
4285 blocks/SRR525215_2-block-1.fastq 8512092212 8514092211 2000000
4286 blocks/SRR525215_1-block-2.fastq 8514092212 8515006210 913999
4287 blocks/SRR525215_2-block-2.fastq 8515006211 8515920209 913999
The 8515920210 input sequences are uniformly distributed onto 16384 MPI ranks. There are 519770 sequences per MPI rank.
$ head SRA056234-Picea-glauca-2012-12-18-4/SequencePartition.txt
#Rank FirstSequence LastSequence NumberOfSequences
0 0 519769 519770
1 519770 1039539 519770
2 1039540 1559309 519770
3 1559310 2079079 519770
4 2079080 2598849 519770
5 2598850 3118619 519770
6 3118620 3638389 519770
7 3638390 4158159 519770
8 4158160 4677929 519770
$ tail SRA056234-Picea-glauca-2012-12-18-4/SequencePartition.txt
16374 8510713980 8511233749 519770
16375 8511233750 8511753519 519770
16376 8511753520 8512273289 519770
16377 8512273290 8512793059 519770
16378 8512793060 8513312829 519770
16379 8513312830 8513832599 519770
16380 8513832600 8514352369 519770
16381 8514352370 8514872139 519770
16382 8514872140 8515391909 519770
16383 8515391910 8515920209 528300
Finally, the 42831057656 k-mers are distributed uniformly on 16384 MPI ranks, with around 2614200 k-mers per MPI rank.
$ head SRA056234-Picea-glauca-2012-12-18-4/GraphPartition.txt
#Rank NumberOfKmers IdealNumberOfKmers Difference RelativeDifference
#TotalKmers: 42831057656
#Ranks: 16384
#IdealNumberOfKmers: 2614200
0 2611430 2614200 -2770 -0.10596%
1 2612276 2614200 -1924 -0.073598%
2 2613476 2614200 -724 -0.0276949%
3 2611618 2614200 -2582 -0.0987683%
4 2616320 2614200 2120 0.0810956%
5 2615454 2614200 1254 0.0479688%
$ tail SRA056234-Picea-glauca-2012-12-18-4/GraphPartition.txt
16374 2612820 2614200 -1380 -0.0527886%
16375 2615682 2614200 1482 0.0566904%
16376 2610428 2614200 -3772 -0.144289%
16377 2617236 2614200 3036 0.116135%
16378 2615978 2614200 1778 0.0680132%
16379 2614000 2614200 -200 -0.00765052%
16380 2619520 2614200 5320 0.203504%
16381 2614508 2614200 308 0.0117818%
16382 2614830 2614200 630 0.0240992%
16383 2614080 2614200 -120 -0.00459031%
The RelativeDifference column indicates really good automated load balancing.
Ray is currently running the slave mode RAY_SLAVE_MODE_INDEX_SEQUENCES.
Comments