Pulling next-generation sequences from the European Nucleotide Archive (ENA)
Today, I have read the abstract (and the methods) of a PNAS (Proceedings of the National Academy of Sciences) paper entitled Evolutionary dynamics of Staphylococcus aureus during progression from carriage to disease . In this paper, researchers sequenced the DNA of a high number of Staphylococcus aureus isolates from patients over a certain period of time. This is science. So this is nice and all, but after reading some bits of it, I wanted the data to do my own tests. Getting sequence data is actually very easy, thanks to the International Nucleotide Sequence Database Collaboration ( INSDC ). The INSDC has 3 members (alphabetical order): DNA Data Bank of Japan ( DDBJ ) European Nucleotide Archive ( ENA ) GenBank (raw sequence data is stored in the Sequence Read Archive ( SRA )) In a nutshell, everything submitted to one of these gets mirrored to the others. The Metadata Model is also quite nice. SRA is unusable because I have to download data files in the SRA format. ...