Metagenome lumps & artifactual mutations

Hey !

A recent arXiv manuscript by the group of Professor C. Titus Brown revealed "sequencing artefacts" in metagenomes. In this work, the authors discovered topological features in de Bruijn graphs that they called "lumps."

In the Microbiome40 sample (confidential stuff I guess ;-)), a collaborator observed these metagenome lumps using Ray Cloud Browser (public demo on Amazon EC2; source code on GitHub).

Here are some nice pictures of this system.

Ray assembly plugins are shielded against these as the algorithms are well-designed.

Another recent paper in Nucleic Acids Research by a group at the Broad Institute carefully documented "artifactual mutations" due to particular events during sample preparation.

Also using Ray Cloud Browser, we observed these topological features in the de Bruijn graph on Illumina MiSeq data, namely the public dataset called "E.Coli-DH10B-2x250".

Here are a nice picture of a bubble -- a branching point in the graph where two high-coverage are competing. One of them (the strong part, in blue) has a high coverage (>200 X) where as the weak part has a lower coverage around 20-30 X.

20-30X can not be obtained with random sequencing errors. (edited the word not).

edit: "Yes, it can. GGCxG error in Illumina data."

This will look like genuine SNPs.

edit: "remember when you sequence bacteria you are sequencing a population"


Unknown said...

The lump looks like a tumor!

Charles Joly Beauparlant.

Peter Ruzanov said...

Awesome, thank you for sharing this! Very interesting

There was an error in this gadget