Metagenome analysis using the kraken software suite

Metagenome analysis using the kraken software suite

Play all audios:

Loading...

ABSTRACT Metagenomic experiments expose the wide range of microscopic organisms in any microbial environment through high-throughput DNA sequencing. The computational analysis of the


sequencing data is critical for the accurate and complete characterization of the microbial community. To facilitate efficient and reproducible metagenomic analysis, we introduce a


step-by-step protocol for the Kraken suite, an end-to-end pipeline for the classification, quantification and visualization of metagenomic datasets. Our protocol describes the execution of


the Kraken programs, via a sequence of easy-to-use scripts, in two scenarios: (1) quantification of the species in a given metagenomics sample; and (2) detection of a pathogenic agent from a


clinical sample taken from a human patient. The protocol, which is executed within 1–2 h, is targeted to biologists and clinicians working in microbiome or metagenomics analysis who are


familiar with the Unix command-line environment. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access


through your institution Access Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to


this journal Receive 12 print issues and online access $259.00 per year only $21.58 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy


now Prices may be subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact


customer support SIMILAR CONTENT BEING VIEWED BY OTHERS BENCHMARKING SECOND AND THIRD-GENERATION SEQUENCING PLATFORMS FOR MICROBIAL METAGENOMICS Article Open access 11 November 2022 CRITICAL


ASSESSMENT OF METAGENOME INTERPRETATION: THE SECOND ROUND OF CHALLENGES Article Open access 08 April 2022 UNVEILING MICROBIAL DIVERSITY: HARNESSING LONG-READ SEQUENCING TECHNOLOGY Article


30 April 2024 DATA AVAILABILITY The microbiome analysis used three samples from Taur et al.8, and the pathogen identification used ten samples from Li et al.9, all of which can be found on


NCBI with their SRA IDs. Source data are provided with this paper. CODE AVAILABILITY The following website details and links all software and databases used in this protocol:


http://ccb.jhu.edu/data/kraken2_protocol/. We also provide easy-to-use Jupyter notebooks for both workflows, which can be executed in the browser using Google Collab:


https://github.com/martin-steinegger/kraken-protocol/. CHANGE HISTORY * _ 29 AUGUST 2024 A Correction to this paper has been published: https://doi.org/10.1038/s41596-024-01064-1 _


REFERENCES * Rappé, M. S. & Giovannoni, S. J.The uncultured microbial majority. _Annu. Rev. Microbiol._ 57, 369–394 (2003). Article  PubMed  Google Scholar  * Wood, D. E. & Salzberg,


S. L.Kraken: ultrafast metagenomic sequence classification using exact alignments. _Genome Biol._ 15, R46 (2014). Article  PubMed  PubMed Central  Google Scholar  * Breitwieser, F. P.,


Baker, D. N. & Salzberg, S. L.KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. _Genome Biol._ 19, 198 (2018). Article  CAS  PubMed  PubMed Central 


Google Scholar  * Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. _Genome Biol._ 20, 257 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  *


Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L.Bracken: estimating species abundance in metagenomics data. _PeerJ Comput. Sci._ 3, e104 (2017). Article  Google Scholar  *


Breitwieser, P. & Salzberg, S. L.Pavian: interactive analysis of metagenomics data for microbiome studies and pathogen identification. _Bioinformatics_ 36, 1303–1304 (2020). Article  CAS


  PubMed  Google Scholar  * Langmead, B. & Salzberg, S. L.Fast gapped-read alignment with Bowtie 2. _Nat. Methods_ 9, 357–359 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar


  * Taur, Y. et al.Reconstitution of the gut microbiota of antibiotic-treated patients by autologous fecal microbiota transplant. _Sci. Transl. Med._ 10, eaap9489 (2018). Article  PubMed 


PubMed Central  Google Scholar  * Li, Z. et al.Identifying corneal infections in formalin-fixed specimens using next generation sequencing. _Invest. Ophthalmol. Vis. Sci._ 59(Jan), 280–288


(2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J.Basic local alignment search tool. _J. Mol. Biol._


215(Oct), 403–410 (1990). Article  CAS  PubMed  Google Scholar  * Pruitt, K. D., Tatusova, T. & Maglott, D. R.NCBI reference sequences (RefSeq): a curated non-redundant sequence database


of genomes, transcripts and proteins. _Nucleic Acids Res._ 35, D61–D65 (2007). Article  CAS  PubMed  Google Scholar  * O’Leary, N. A. et al.Reference sequence (RefSeq) database at NCBI:


current status, taxonomic expansion, and functional annotation. _Nucleic Acids Res._ 44, D733–D745 (2016). Article  PubMed  Google Scholar  * Ounit, R., Wanamaker, S., Close, T. J. &


Lonardi, S.CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative _k_-mers. _BMC Genomics_ 16, 236 (2015). Article  PubMed  PubMed Central  Google


Scholar  * Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L.Centrifuge: rapid and sensitive classification of metagenomic sequences. _Genome Res._ 26, 1721–1729 (2016). Article 


CAS  PubMed  PubMed Central  Google Scholar  * Menzel, P., Ng, K. L. & Krogh, A.Fast and sensitive taxonomic classification for metagenomics with Kaiju. _Nat. Commun._ 7, 11257 (2016).


Article  CAS  PubMed  PubMed Central  Google Scholar  * Ye, S. H., Siddle, K. J., Park, D. J. & Sabeti, P. C.Benchmarking metagenomics tools for taxonomic classification. _Cell_ 178,


779–794 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  * Seppey, M., Manni, M. & Zdobnov, M.LEMMI: a continuous benchmarking platform for metagenomics classifiers. _Genome


Res._ 30, 1208–1216 (2020). Article  CAS  PubMed  PubMed Central  Google Scholar  * Segata, N. et al.Metagenomic microbial community profiling using unique clade-specific marker genes.


_Nat. Methods_ 9, 811–814 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar  * Vervier, K., Mahé, P., Tournoud, M., Veyrieras, J. B. & Vert, J. P.Large-scale machine learning


for metagenomics sequence classification. _Bioinformatics_ 32, 1023–1032 (2016). Article  CAS  PubMed  Google Scholar  * Luo, Y., Yu, Y. W., Zeng, J., Berger, B. & Peng, J.Metagenomic


binning through low-density hashing. _Bioinformatics_ 35, 219–226 (2019). Article  CAS  PubMed  Google Scholar  * Breitwieser, F. P., Lu, J. & Salzberg, S. L.A review of methods and


databases for metagenomic classification and assembly. _Brief. Bioinform._ 20, 1125–1136 (2017). Article  PubMed Central  Google Scholar  * Li, H. Aligning sequence reads, clone sequences


and assembly contigs with BWA-MEM. Preprint at _arXiv_ https://doi.org/10.48550/arXiv.1303.3997 (2013). * Li, H.Minimap2: pairwise alignment for nucleotide sequences. _Bioinformatics_ 34,


3094–3100 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Stephens, Z. et al.Exogene: a performant workflow for detecting viral integrations from paired-end next-generation


sequencing data. _PLoS ONE_ 16, e0250915 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Breitwieser, F. P., Pertea, M., Zimin, A. V. & Salzberg, S. L.Human contamination


in bacterial genomes has created thousands of spurious proteins. _Genome Res._ 29, 954–960 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  * Steinegger, M. & Salzberg, S.


L.Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. _Genome Biol._ 21, 115 (2020). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Lu, J. & Salzberg, S. L.Removing contaminants from databases of draft genomes. _PLoS Comput. Biol._ 14, e1006277 (2018). Article  PubMed  PubMed Central  Google Scholar  *


Buchfink, B., Xie, C. & Huson, D. H.Fast and sensitive protein alignment using DIAMOND. _Nat. Methods_ 12, 59–60 (2015). Article  CAS  PubMed  Google Scholar  * Mirdita, M., Steinegger,


M., Breitwieser, F., Söding, J. & Levy Karin, E. Fast and sensitive taxonomic assignment to metagenomic contigs. _Bioinformatics_ 37, 3029–3031 (2021). Article  CAS  PubMed  PubMed


Central  Google Scholar  * Nasko, D. J., Koren, S., Phillippy, A. M. & Treangen, T. J.RefSeq database growth influences the accuracy of _k_-mer-based lowest common ancestor species


identification. _Genome Biol._ 19, 165 (2018). Article  PubMed  PubMed Central  Google Scholar  * Yang, C. et al.A review of computational tools for generating metagenome-assembled genomes


from metagenomic sequencing data. _Comput. Struct. Biotechnol. J._ 19, 6301–6314 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Whittaker, R. H.Evolution and measurement of


species diversity. _Taxon_ 21, 213–251 (1972). Article  Google Scholar  * Berger, W. H. & Parker, F. L. Diversity of planktonic foraminifera in deep-sea sediments. _Science_ 168,


1345–1347 (1970). Article  CAS  PubMed  Google Scholar  * Fisher, R. A., Corbet, A. S. & Williams, C. B.The relation between the number of species and the number of individuals in a


random sample of an animal population. _J. Anim. Ecol._ 12, 42–58 (1943). Article  Google Scholar  * Simpson, E. H.Measurement of diversity. _Nature_ 163, 688–688 (1949). Article  Google


Scholar  * Shannon, C. E.A mathematical theory of communication. _Bell Syst. Tech. J._ 27, 379–423 (1948). Article  Google Scholar  * Bray, J. R. & Curtis, J. T.An ordination of the


upland forest communities of southern Wisconsin. _Ecol. Monogr._ 27, 325–349 (1957). Article  Google Scholar  * Ondov, B. D., Bergman, N. H. & Phillippy, A. M.Interactive metagenomic


visualization in a web browser. _BMC Bioinform._ 12, 385 (2011). Article  Google Scholar  * Danecek, P. et al.Twelve years of SAMtools and BCFtools. _Gigascience_ 10, giab008 (2021). Article


  PubMed  PubMed Central  Google Scholar  * Grüning, B. et al.Bioconda: sustainable and comprehensive software distribution for the life sciences. _Nat. Methods_ 15, 475–476 (2018). Article


  PubMed  PubMed Central  Google Scholar  Download references ACKNOWLEDGEMENTS Indexes for tools in the Kraken suite, including the indexes used in this protocol, are made freely available


on Amazon Web Services thanks to the AWS Public Dataset Program. B.L. was supported by NIH/NIHMS grant R35GM139602. S.L.S. was supported by NIH grants R35-GM130151 and R01-HG006677. M.S.


acknowledges support from the National Research Foundation of Korea grant (2019R1A6A1A10073437, 2020M3A9G7103933, 2021R1C1C102065 and 2021M3A9I4021220); New Faculty Startup Fund; and the


Creative-Pioneering Researchers Program through Seoul National University. AUTHOR INFORMATION Author notes * These authors contributed equally: Jennifer Lu, Natalia Rincon. AUTHORS AND


AFFILIATIONS * Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA Jennifer Lu, Natalia Rincon & Steven L. Salzberg * Center for Computational Biology,


Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA Jennifer Lu, Natalia Rincon, Derrick E. Wood, Florian P. Breitwieser, Christopher Pockrandt & Steven L.


Salzberg * Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA Derrick E. Wood, Ben Langmead & Steven L. Salzberg * Department of Biostatistics, Johns Hopkins


University, Baltimore, MD, USA Steven L. Salzberg * School of Biological Sciences and Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea


Martin Steinegger Authors * Jennifer Lu View author publications You can also search for this author inPubMed Google Scholar * Natalia Rincon View author publications You can also search for


this author inPubMed Google Scholar * Derrick E. Wood View author publications You can also search for this author inPubMed Google Scholar * Florian P. Breitwieser View author publications


You can also search for this author inPubMed Google Scholar * Christopher Pockrandt View author publications You can also search for this author inPubMed Google Scholar * Ben Langmead View


author publications You can also search for this author inPubMed Google Scholar * Steven L. Salzberg View author publications You can also search for this author inPubMed Google Scholar *


Martin Steinegger View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS J.L. and M.S. led the development of the protocol. N.R. executed and


designed the microbiome analysis protocol and is the author of the KrakenTools α-diversity tools. J.L. developed the pathogen identification protocol and is the author of Bracken and


KrakenTools. M.S. authored the Jupyter notebooks for the protocol. D.E.W. is the senior author of Kraken and Kraken 2. F.B. is the author of KrakenUniq. C.P. is an author for the KrakenTools


β-diversity script. B.L. supervised the development of Kraken 2. S.L.S. supervised the development of Kraken, KrakenUniq and Bracken. B.L. and S.L.S. supervised the development of this


protocol. All authors contributed to the writing of the manuscript. CORRESPONDING AUTHORS Correspondence to Jennifer Lu or Martin Steinegger. ETHICS DECLARATIONS COMPETING INTERESTS The


authors declare no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Protocols_ thanks the anonymous reviewers for their contribution to the peer review of this work.


ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. RELATED LINKS KEY REFERENCES


USING THIS PROTOCOL Salzberg, S. et al. _Neurol. Neuroimmunol. Neuroinflamm_. 3, e251 (2016): https://doi.org/10.1212/NXI.0000000000000251 Wood, D. et al. _Genome Biol_. 15, R46 (2014):


https://doi.org/10.1186/gb-2014-15-3-r46 Lu, J. et al. _Peer J. Comput. Sci_. 3, e104 (2017): https://doi.org/10.7717/peerj-cs.104 Breitwieser, F. et al. _Genome Biol_. 19, 198 (2018):


https://doi.org/10.1186/s13059-018-1568-0 Wood, D. et al. _Genome Biol_. 20, 257 (2019): https://doi.org/10.1186/s13059-019-1891-0 Breitwieser, F. et al. _Bioinformatics_ 36, 1303–1304


(2020): https://doi.org/10.1093/bioinformatics/btz715 KEY DATA USED IN THIS PROTOCOL Taur, Y. et al. _Sci. Transl. Med_. 10, eaap9489 (2018): https://doi.org/10.1126/scitranslmed.aap9489 Li,


Z. et al. _Invest_. _Ophthalmol. Vis. Sci_. 59, 280–288 (2018): https://doi.org/10.1167/iovs.17-21617 SUPPLEMENTARY INFORMATION SUPPLEMENTARY TABLE 1 Supplementary Table 1 SUPPLEMENTARY


TABLE 2 Supplementary Table 2 SOURCE DATA SOURCE DATA FIG. 2 Breport text for plotting Sankey, and krona counts for plotting krona plots. SOURCE DATA FIG. 6 Alpha diversity table text, bray


Curtis equation text, and heatmap values for beta diversity. SOURCE DATA FIG. 7 Pathogen sample species heat map data. RIGHTS AND PERMISSIONS Springer Nature or its licensor (e.g. a society


or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of


this article is solely governed by the terms of such publishing agreement and applicable law. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Lu, J., Rincon, N., Wood, D.E.


_et al._ Metagenome analysis using the Kraken software suite. _Nat Protoc_ 17, 2815–2839 (2022). https://doi.org/10.1038/s41596-022-00738-y Download citation * Received: 29 June 2021 *


Accepted: 16 June 2022 * Published: 28 September 2022 * Issue Date: December 2022 * DOI: https://doi.org/10.1038/s41596-022-00738-y SHARE THIS ARTICLE Anyone you share the following link


with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt


content-sharing initiative