The public datasets from the DOE/JGI Great Prairie Soil Metagenome Grand Challenge
I am working on a couple of very large public metagenomics datasets from the Department of Energy (DOE) Joint Genome Institute (JGI). These datasets were produced in the context of the Grand Challenge program.
Professor Janet Jansson was the Principal Investigator for the proposal named Great Prairie Soil Metagenome Grand Challenge ( Proposal ID: 949 ).
Professor C. Titus Brown wrote a blog article about this Grand Challenge.
Moreover, the Brown research group published at least one paper using these Grand Challenge datasets (assembly with digital normalization and partitioning).
Professor James Tiedje presented the Great Challenge at the 2012 Metagenomics Workshop.
Alex Copeland presented interesting work at Sequencing, Finishing and Analysis in the Future (SFAF) in 2012 related to this Grand Challenge.
Jansson's Grand Challenge included 12 projects. Below I made a list with colors (one color for the sample site and one for the type of soil).
- Great Prairie Soil Metagenome Grand Challenge: Kansas, Cultivated corn soil metagenome reference core (402463)
- Great Prairie Soil Metagenome Grand Challenge: Kansas, Native Prairie metagenome reference core (402464)
- Great Prairie Soil Metagenome Grand Challenge: Kansas, Native Prairie metagenome reference core (402464) (I don't know why it's listed twice)
- Great Prairie Soil Metagenome Grand Challenge: Kansas soil pyrotag survey (402466)
- Great Prairie Soil Metagenome Grand Challenge: Iowa, Continuous corn soil metagenome reference core (402461)
- Great Prairie Soil Metagenome Grand Challenge: Iowa, Native Prairie soil metagenome reference core (402462)
- Great Prairie Soil Metagenome Grand Challenge: Iowa soil pyrotag survey (402465)
- Great Prairie Soil Metagenome Grand Challenge: Wisconsin, Continuous corn soil metagenome reference core (402460)
- Great Prairie Soil Metagenome Grand Challenge: Wisconsin, Native Prairie soil metagenome reference core (402459)
- Great Prairie Soil Metagenome Grand Challenge: Wisconsin, Restored Prairie soil metagenome reference core (402457)
- Great Prairie Soil Metagenome Grand Challenge: Wisconsin, Switchgrass soil metagenome reference core (402458)
- Great Prairie Soil Metagenome Grand Challenge: Wisconsin soil pyrotag survey (402456)
I thank the Jansson research group for making these datasets public so that I don't have to look further for large politics-free metagenomics datasets.
Table 1: number of files, reads, and bases in the Grand Challenge datasets. Most of the sequences are paired reads.
Dataset
|
File count
|
Read count
|
Base count
|
25 | 2 055 601 258 | 196 708 830 076 | |
Iowa_Native_Prairie_Soil (details) | 25 | 3 750 844 486 | 326 986 888 235 |
Kansas_Cultivated_Corn_Soil (details) | 30 | 2 677 222 281 | 272 276 185 410 |
Kansas_Native_Prairie_Soil (details) | 33 | 5 126 775 452 | 597 933 511 278 |
Wisconsin_Continuous_Corn_Soil (details) | 18 | 1 912 865 700 | 192 128 891 088 |
Wisconsin_Native_Prairie_Soil (details) | 20 | 2 098 317 886 | 211 016 377 208 |
Wisconsin_Restored_Prairie_Soil (details) | 6 | 347 778 670 | 52 514 579 170 |
Wisconsin_Switchgrass_Soil (details) | 7 | 448 382 766 | 58 323 428 574 |
Total | 164 | 18 417 788 499 | 1 907 888 691 039 |
At Argonne we are using these datasets to develop a next-generation metagenomics assembler named "Spate" built on top of the Thorium actor engine. The word spate means a large number of similar things or events appearing or occurring in quick succession. With the actor model, every single message is an active message. Active messages are very neat and there is a lot of them with the actor model.
Similar posts:
Comments