-
Notifications
You must be signed in to change notification settings - Fork 30
Home
Welcome to the curated gut microbiome-metabolome data resource wiki!

The resource was prepared by the Borenstein lab at Tel-Aviv University. Please contact us if you have any questions or would like to add your own dataset to the collection.

This dataset collection includes curated data from multiple studies where both metagenomic and metabolomic profiles were obtained from human fecal samples [1-14]. It was made publicly available for the benefit of the microbiome research community to facilitate integrative microbiome-metabolome meta-analysis and cross-study comparisons. Overall, the collection currently contains 14 datasets, including 2900 samples from 1849 subjects.
This wiki contains details about how the data is organized in the repository, the original studies that generated the data, how the data was processed, and a quick example of how the data could be used for cross-study comparisons. Use the Wiki's sidebar to navigate to the relevant sections. For transparency and reproducibility, scripts used for manipulating the originally-published data are available in the repository as well (and referred to within the relevant sections of this wiki).
📌 We encourage users to review both the original publications and the processing notes provided in this wiki and in our supplementary tables before using and analyzing these data.
- Some of the datasets are from longitudinal studies, meaning that they include multiple samples per subject. Depending on the analysis, users may want to handle such samples differently.
- Users can use either HMDB or KEGG ID's to relate metabolites across studies. For microbiome comparisons, genera tables can be used as is (genus names are all derived from GTDB), and if analyzing only shotgun datasets, species tables can be used as is.
- A simple example of a cross-study comparison using this data collection can be found in the following R notebook: meta-analysis_of_genus_metabolite_associations.Rmd. The rendered html of the R notebook can be viewed here.
📌 Importantly, comparisons and result interpretation should be made with caution, as there is substantial heterogeneity between studies in terms of cohort characteristics (ages, geography, medical backgrounds, etc.) as well as study protocols and data generation (sample collection and storage protocols, metagenomics and metabolomics platforms, etc.). These factors are expected to introduce variation in both fecal microbiome and fecal metabolome profiles [26-31].
To add your own dataset to the collection, please follow the steps described here.
We thank all the authors of the studies included in this collection, for making their data publicly available and for responding to inquires we had during the processing of this collection. We also thank Shira Limon for the illustration at the top of this page, and past and present Borenstein lab members for helpful inputs.
If you use the data provided here, please cite both the original publications who generated and published the data (see Data overview) as well as: [TBC]
- Parks, Donovan H., et al. "GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy." Nucleic acids research 50.D1 (2022): D785-D794.
- Chen, Shifu, et al. "fastp: an ultra-fast all-in-one FASTQ preprocessor." Bioinformatics 34.17 (2018): i884-i890.
- Langmead, Ben, and Steven L. Salzberg. "Fast gapped-read alignment with Bowtie 2." Nature methods 9.4 (2012): 357-359.
- Wood, Derrick E., Jennifer Lu, and Ben Langmead. "Improved metagenomic analysis with Kraken 2." Genome biology 20.1 (2019): 1-13.
- Lu, Jennifer, et al. "Bracken: estimating species abundance in metagenomics data." PeerJ Computer Science 3 (2017): e104.
- Youngblut, Nicholas D., and Ruth E. Ley. "Struo2: efficient metagenome profiling database construction for ever-expanding microbial genome datasets." PeerJ 9 (2021): e12198.
- Bolyen, Evan, et al. "Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2." Nature biotechnology 37.8 (2019): 852-857.
- Callahan, Benjamin J., et al. "DADA2: high-resolution sample inference from Illumina amplicon data." Nature methods 13.7 (2016): 581-583.
- Swedish Biodiversity Infrastructure (SBDI; 2021). SBDI Sativa curated 16S GTDB database. https://doi.org/10.17044/scilifelab.14869077
- Pang, Zhiqiang, et al. "MetaboAnalystR 3.0: toward an optimized workflow for global metabolomics." Metabolites 10.5 (2020): 186.
- Pham, Nhung, et al. "Consistency, inconsistency, and ambiguity of metabolite names in biochemical databases used for genome-scale metabolic modelling." Metabolites 9.2 (2019): 28.
- Smirnov, Kirill S., et al. "Challenges of metabolomics in human gut microbiota research." International Journal of Medical Microbiology 306.5 (2016): 266-279.
- Jovel, Juan, et al. "Characterization of the gut microbiome using 16S or shotgun metagenomics." Frontiers in microbiology 7 (2016): 459.
- Liang, Yali, et al. "Systematic analysis of impact of sampling regions and storage methods on fecal gut microbiome and metabolome profiles." Msphere 5.1 (2020): e00763-19.
- Debelius, Justine, et al. "Tiny microbes, enormous impacts: what matters in gut microbiome studies?." Genome biology 17.1 (2016): 1-12.
- Yatsunenko, Tanya, et al. "Human gut microbiome viewed across age and geography." nature 486.7402 (2012): 222-227.
- Falony, Gwen, et al. "Population-level analysis of gut microbiome variation." Science 352.6285 (2016): 560-564.