Simulated datasets for the evaluation of metagenomic analysis programs are presented online in Nature Methods.
Metagenomics is the study of microbial organisms in a common habitat with genomic techniques such as sequencing. Rather than isolating and cultivating individual strains, scientists work with material obtained from the natural environment representing a complex mix of different microbes. To assemble the data and identify individual species and their genes, researchers use programs developed for the analysis of individual genomes. The main problem with this approach is that the error rate cannot be determined, since the correct answer to which and how many microbes exist in a given sample is not known. Therefore it is currently not possible to compare the efficiency of different programs.
To address this problem, Konstantinos Mavromatis and colleagues created three complex datasets. They ran them on three commonly used programs and discuss the strength and limitations of each. These datasets are publicly available so that the scientific community can use them as standards to test and improve metagenomic analysis programs.
Konstantinos Mavromatis (Joint Genome Institute Genome Biology, Walnut Creek, CA, USA)
Abstract available online.
(C) Nature Methods press release.
Message posted by: Trevor M. D'Souza