Dna mismatch repair promotes apobec3-mediated diffuse hypermutation in human cancers

Dna mismatch repair promotes apobec3-mediated diffuse hypermutation in human cancers

Play all audios:

Loading...

ABSTRACT Certain mutagens, including the APOBEC3 (A3) cytosine deaminase enzymes, can create multiple genetic changes in a single event. Activity of A3s results in striking ‘mutation


showers’ occurring near DNA breakpoints; however, less is known about the mechanisms underlying the majority of A3 mutations. We classified the diverse patterns of clustered mutagenesis in


tumor genomes, which identified a new A3 pattern: nonrecurrent, diffuse hypermutation (omikli). This mechanism occurs independently of the known focal hypermutation (kataegis), and is


associated with activity of the DNA mismatch-repair pathway, which can provide the single-stranded DNA substrate needed by A3, and contributes to a substantial proportion of A3 mutations


genome wide. Because mismatch repair is directed towards early-replicating, gene-rich chromosomal domains, A3 mutagenesis has a high propensity to generate impactful mutations, which exceeds


that of other common carcinogens such as tobacco smoke and ultraviolet exposure. Cells direct their DNA repair capacity towards more important genomic regions; thus, carcinogens that


subvert DNA repair can be remarkably potent. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access


through your institution Access Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to


this journal Receive 12 print issues and online access $209.00 per year only $17.42 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy


now Prices may be subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer


support SIMILAR CONTENT BEING VIEWED BY OTHERS MUTATIONAL SIGNATURE SBS8 PREDOMINANTLY ARISES DUE TO LATE REPLICATION ERRORS IN CANCER Article Open access 03 August 2020 MAPPING CLUSTERED


MUTATIONS IN CANCER REVEALS APOBEC3 MUTAGENESIS OF ECDNA Article Open access 09 February 2022 CELL CYCLE GENE ALTERATIONS ASSOCIATE WITH A REDISTRIBUTION OF MUTATION RISK ACROSS CHROMOSOMAL


DOMAINS IN HUMAN CANCERS Article 10 January 2024 DATA AVAILABILITY Whole-genome sequences from the TCGA project were available through the Cancer Genomics Hub repository (now superseded by


the NCI Genomic Data Commons; https://gdc.cancer.gov/). Corresponding SNP array data were downloaded from the GDC legacy portal (https://portal.gdc.cancer.gov/legacy-archive). WGS data from


the Hartwig Medical Foundation are available at https://www.hartwigmedicalfoundation.nl/en. The whole-exome sequencing data of TCGA cohort are available through the MC3 dataset at


https://gdc.cancer.gov/about-data/publications/mc3-2017. Data generated by the analyses in this study are available in the Supplementary Tables. CODE AVAILABILITY Code to generate clustered


mutation calls was implemented in Python (version 3.6) and R environments (version 3.6). Relevant packages are biopython (version 1.73) and numpy (version 1.15.4) for Python, and Biostrings


(2.52.0), VariantAnnotation (1.30.1) and GenomicRanges (1.36.0) for R. Code is available at https://github.com/davidmasp/hyperclust. Statistical analysis of the data was performed using


custom scripts in R (version 3.6). Relevant packages are mclust (version 5.4.4), mixtools (version 1.1.0), MASS (version 7.3-51.4) and flexmix (version 2.3-15). REFERENCES * Harris, K. &


Nielsen, R. Error-prone polymerase activity causes multinucleotide mutations in humans. _Genome Res._ 24, 1445–1454 (2014). CAS  PubMed  PubMed Central  Google Scholar  * Rogozin, I. B. et


al. DNA polymerase η mutational signatures are found in a variety of different types of cancer. _Cell Cycle_ 17, 348–355 (2018). CAS  PubMed  PubMed Central  Google Scholar  * Seplyarskiy,


V. B. et al. Error-prone bypass of DNA lesions during lagging-strand replication is a common source of germline and cancer mutations. _Nat. Genet._ 51, 36–41 (2019). CAS  PubMed  Google


Scholar  * Supek, F. & Lehner, B. Clustered mutation signatures reveal that error-prone DNA repair targets mutations to active genes. _Cell_ 170, 534–547.e23 (2017). CAS  PubMed  Google


Scholar  * Moris, A., Murray, S. & Cardinaud, S. AID and APOBECs span the gap between innate and adaptive immunity. _Front. Microbiol._ 5, 534 (2014). PubMed  PubMed Central  Google


Scholar  * Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. _Nature_ 500, 415–421 (2013). CAS  PubMed  PubMed Central  Google Scholar  * Burns, M. B., Temiz, N.


A. & Harris, R. S. Evidence for APOBEC3B mutagenesis in multiple human cancers. _Nat. Genet._ 45, 977–983 (2013). CAS  PubMed  PubMed Central  Google Scholar  * Nik-Zainal, S. et al.


Mutational processes molding the genomes of 21 breast cancers. _Cell_ 149, 979–993 (2012). CAS  PubMed  PubMed Central  Google Scholar  * Roberts, S. A. et al. An APOBEC cytidine deaminase


mutagenesis pattern is widespread in human cancers. _Nat. Genet._ 45, 970–976 (2013). CAS  PubMed  PubMed Central  Google Scholar  * Roberts, S. A. et al. Clustered mutations in yeast and in


human cancers can arise from damaged long single-strand DNA regions. _Mol. Cell_ 46, 424–435 (2012). CAS  PubMed  PubMed Central  Google Scholar  * Landry, S., Narvaiza, I., Linfesty, D. C.


& Weitzman, M. D. APOBEC3A can activate the DNA damage response and cause cell‐cycle arrest. _EMBO Rep._ 12, 444–450 (2011). CAS  PubMed  PubMed Central  Google Scholar  * Suspène, R.


et al. Somatic hypermutation of human mitochondrial and nuclear DNA by APOBEC3 cytidine deaminases, a pathway for DNA catabolism. _Proc. Natl Acad. Sci. USA_ 108, 4858–4863 (2011). PubMed 


Google Scholar  * Byeon, I.-J. L. et al. NMR structure of human restriction factor APOBEC3A reveals substrate binding and enzyme specificity. _Nat. Commun._ 4, 1890 (2013). PubMed  PubMed


Central  Google Scholar  * Holtz, C. M., Sadler, H. A. & Mansky, L. M. APOBEC3G cytosine deamination hotspots are defined by both sequence context and single-stranded DNA secondary


structure. _Nucleic Acids Res._ 41, 6139–6148 (2013). CAS  PubMed  PubMed Central  Google Scholar  * Nik-Zainal, S. et al. Association of a germline copy number polymorphism of _APOBEC3A_


and _APOBEC3B_ with burden of putative APOBEC-dependent mutations in breast cancer. _Nat. Genet._ 46, 487–491 (2014). CAS  PubMed  PubMed Central  Google Scholar  * Glaser, A. P. et al.


APOBEC-mediated mutagenesis in urothelial carcinoma is associated with improved survival, mutations in DNA damage response genes, and immune response. _Oncotarget_ 9, 4537–4548 (2017).


PubMed  PubMed Central  Google Scholar  * Cortez, L. M. et al. APOBEC3A is a prominent cytidine deaminase in breast cancer. _PLoS Genet._ 15, e1008545 (2019). CAS  PubMed  PubMed Central 


Google Scholar  * Sakofsky, C. J. et al. Break-induced replication is a source of mutation clusters underlying kataegis. _Cell Rep._ 7, 1640–1648 (2014). CAS  PubMed  PubMed Central  Google


Scholar  * Sakofsky, C. J. et al. Repair of multiple simultaneous double-strand breaks causes bursts of genome-wide clustered hypermutation. _PLoS Biol._ 17, e3000464 (2019). CAS  PubMed 


PubMed Central  Google Scholar  * Kazanov, M. D. et al. APOBEC-induced cancer mutations are uniquely enriched in early-replicating, gene-dense, and active chromatin regions. _Cell Rep._ 13,


1103–1109 (2015). CAS  PubMed  PubMed Central  Google Scholar  * Buisson, R. et al. Passenger hotspot mutations in cancer driven by APOBEC3A and mesoscale genomic features. _Science_ 364,


eaaw2872 (2019). CAS  PubMed  PubMed Central  Google Scholar  * Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome.


_Nature_ 521, 81–84 (2015). CAS  PubMed  PubMed Central  Google Scholar  * Zheng, C. L. et al. Transcription restores DNA repair to heterochromatin, determining regional mutation rates in


cancer genomes. _Cell Rep._ 9, 1228–1234 (2014). CAS  PubMed  PubMed Central  Google Scholar  * Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of


DNA damage and repair. _Cell_ 164, 538–549 (2016). CAS  PubMed  PubMed Central  Google Scholar  * Morganella, S. et al. The topography of mutational processes in breast cancer genomes. _Nat.


Commun._ 7, 11383 (2016). CAS  PubMed  PubMed Central  Google Scholar  * Seplyarskiy, V. B. et al. APOBEC-induced mutations in human cancers are strongly enriched on the lagging DNA strand


during replication. _Genome Res._ 26, 174–182 (2016). CAS  PubMed  PubMed Central  Google Scholar  * Green, A. M. et al. APOBEC3A damages the cellular genome during DNA replication. _Cell


Cycle_ 15, 998–1008 (2016). CAS  PubMed  PubMed Central  Google Scholar  * Kanu, N. et al. DNA replication stress mediates APOBEC3 family mutagenesis in breast cancer. _Genome Biol._ 17, 185


(2016). PubMed  PubMed Central  Google Scholar  * Nikkilä, J. et al. Elevated APOBEC3B expression drives a kataegic-like mutation signature and replication stress-related therapeutic


vulnerabilities in p53-defective cells. _Br. J. Cancer_ 117, 113–123 (2017). PubMed  PubMed Central  Google Scholar  * Bhagwat, A. S. et al. Strand-biased cytosine deamination at the


replication fork causes cytosine to thymine mutations in _Escherichia coli_. _Proc. Natl Acad. Sci. USA_ 113, 2176–2181 (2016). CAS  PubMed  Google Scholar  * Hoopes, J. I. et al. APOBEC3A


and APOBEC3B preferentially deaminate the lagging strand template during DNA replication. _Cell Rep._ 14, 1273–1282 (2016). CAS  PubMed  PubMed Central  Google Scholar  * Chen, J., Miller,


B. F. & Furano, A. V. Repair of naturally occurring mismatches can induce mutations in flanking DNA. _eLife_ 3, e02001 (2014). PubMed  PubMed Central  Google Scholar  * Cannataro, V. L.


et al. APOBEC-induced mutations and their cancer effect size in head and neck squamous cell carcinoma. _Oncogene_ 38, 3475–3487 (2019). CAS  PubMed  PubMed Central  Google Scholar  *


Henderson, S., Chakravarthy, A., Su, X., Boshoff, C. & Fenton, T. R. APOBEC-mediated cytosine deamination links PIK3CA helical domain mutations to human papillomavirus-driven tumor


development. _Cell Rep._ 7, 1833–1841 (2014). CAS  PubMed  Google Scholar  * Li, Z. et al. APOBEC signature mutation generates an oncogenic enhancer that drives _LMO1_ expression in T-ALL.


_Leukemia_ 31, 2057–2064 (2017). CAS  PubMed  PubMed Central  Google Scholar  * De Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer


evolution. _Science_ 346, 251–256 (2014). CAS  PubMed  PubMed Central  Google Scholar  * McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational


processes in cancer evolution. _Sci. Transl. Med._ 7, 283ra54 (2015). PubMed  PubMed Central  Google Scholar  * Ullah, I. et al. Evolutionary history of metastatic breast cancer reveals


minimal seeding from axillary lymph nodes. _J. Clin. Invest._ 128, 1355–1370 (2018). PubMed  PubMed Central  Google Scholar  * Reijns, M. A. M. et al. Lagging strand replication shapes the


mutational landscape of the genome. _Nature_ 518, 502–506 (2015). CAS  PubMed  PubMed Central  Google Scholar  * Taylor, B. J. et al. DNA deaminases induce break-associated mutation showers


with implication of APOBEC3B and 3A in breast cancer kataegis. _eLife_ 2, e00534 (2013). PubMed  PubMed Central  Google Scholar  * D’Antonio, M., Tamayo, P., Mesirov, J. P. & Frazer, K.


A. Kataegis expression signature in breast cancer is associated with late onset, better prognosis, and higher HER2 levels. _Cell Rep._ 16, 672–683 (2016). PubMed  PubMed Central  Google


Scholar  * Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. _Cell_ 176, 1282–1294.e20 (2019). CAS  PubMed  PubMed


Central  Google Scholar  * Zhang, Y. et al. A pan-cancer compendium of genes deregulated by somatic genomic rearrangement across more than 1,400 cases. _Cell Rep._ 24, 515–527 (2018). CAS 


PubMed  PubMed Central  Google Scholar  * Yang, Y., Sterling, J., Storici, F., Resnick, M. A. & Gordenin, D. A. Hypermutability of damaged single-strand DNA formed at double-strand


breaks and uncapped telomeres in yeast _Saccharomyces cerevisiae_. _PLoS Genet._ 4, e1000264 (2008). PubMed  PubMed Central  Google Scholar  * Chan, K. et al. An APOBEC3A hypermutation


signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. _Nat. Genet._ 47, 1067–1072 (2015). CAS  PubMed  PubMed Central  Google Scholar  * De,


S. & Michor, F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. _Nat. Biotechnol._ 29, 1103–1108 (2011). CAS  PubMed  PubMed


Central  Google Scholar  * Tomkova, M., Tomek, J., Kriaucionis, S. & Schuster-Böckler, B. Mutational signature distribution varies with DNA replication timing and strand asymmetry.


_Genome Biol._ 19, 129 (2018). PubMed  PubMed Central  Google Scholar  * Woo, Y. H. & Li, W.-H. DNA replication timing and selection shape the landscape of nucleotide variation in cancer


genomes. _Nat. Commun._ 3, 1004 (2012). PubMed  Google Scholar  * Zou, X. et al. Validating the concept of mutational signatures with isogenic cell models. _Nat. Commun._ 9, 1744 (2018).


PubMed  PubMed Central  Google Scholar  * Li, F. et al. The histone mark H3K36me3 regulates human DNA mismatch repair through its interaction with MutSα. _Cell_ 153, 590–600 (2013). CAS 


PubMed  PubMed Central  Google Scholar  * Barski, A. et al. High-resolution profiling of histone methylations in the human genome. _Cell_ 129, 823–837 (2007). CAS  PubMed  Google Scholar  *


Vavouri, T. & Lehner, B. Human genes with CpG island promoters have a distinct transcription-associated chromatin organization. _Genome Biol._ 13, R110 (2012). PubMed  PubMed Central 


Google Scholar  * Huang, Y., Gu, L. & Li, G.-M. H3K36me3-mediated mismatch repair preferentially protects actively transcribed genes from mutation. _J. Biol. Chem._ 293, 7811–7823


(2018). CAS  PubMed  PubMed Central  Google Scholar  * Mugal, C. F., von Grünberg, H.-H. & Peifer, M. Transcription-induced mutational strand bias and its effect on substitution rates in


human genes. _Mol. Biol. Evol._ 26, 131–142 (2009). CAS  PubMed  Google Scholar  * Pfister, S. X. et al. SETD2-dependent histone H3K36 trimethylation is required for homologous


recombination repair and genome stability. _Cell Rep._ 7, 2006–2018 (2014). CAS  PubMed  PubMed Central  Google Scholar  * Chen, J. & Furano, A. V. Breaking bad: the mutagenic effect of


DNA repair. _DNA Repair_ 32, 43–51 (2015). PubMed  PubMed Central  Google Scholar  * Andrianova, M. A., Bazykin, G. A., Nikolaev, S. I. & Seplyarskiy, V. B. Human mismatch repair system


balances mutation rates between strands by removing more mismatches from the lagging strand. _Genome Res._ 27, 1336–1343 (2017). CAS  PubMed  PubMed Central  Google Scholar  * Shinbrot, E.


et al. Exonuclease mutations in DNA polymerase epsilon reveal replication strand specific mutation patterns and human origins of replication. _Genome Res._ 24, 1740–1750 (2014). CAS  PubMed


  PubMed Central  Google Scholar  * Jiricny, J. The multifaceted mismatch-repair system. _Nat. Rev. Mol. Cell Biol._ 7, 335–346 (2006). CAS  PubMed  Google Scholar  * Tran, P. T., Erdeniz,


N., Symington, L. S. & Liskay, R. M. EXO1-A multi-tasking eukaryotic nuclease. _DNA Repair_ 3, 1549–1559 (2004). CAS  PubMed  Google Scholar  * Cortes-Ciriano, I., Lee, S., Park, W.-Y.,


Kim, T.-M. & Park, P. J. A molecular portrait of microsatellite instability across multiple cancers. _Nat. Commun._ 8, 15180 (2017). CAS  PubMed  PubMed Central  Google Scholar  * Hause,


R. J., Pritchard, C. C., Shendure, J. & Salipante, S. J. Classification and characterization of microsatellite instability across 18 cancer types. _Nat. Med._ 22, 1342–1350 (2016). CAS


  Google Scholar  * Maruvka, Y. E. et al. Analysis of somatic microsatellite indels identifies driver events in human tumors. _Nat. Biotechnol._ 35, 951–959 (2017). CAS  PubMed  Google


Scholar  * Hombauer, H., Srivatsan, A., Putnam, C. D. & Kolodner, R. D. Mismatch repair, but not heteroduplex rejection, is temporally coupled to DNA replication. _Science_ 334,


1713–1716 (2011). CAS  PubMed  PubMed Central  Google Scholar  * Hombauer, H., Campbell, C. S., Smith, C. E., Desai, A. & Kolodner, R. D. Visualization of eukaryotic DNA mismatch repair


reveals distinct recognition and repair intermediates. _Cell_ 147, 1040–1053 (2011). CAS  PubMed  PubMed Central  Google Scholar  * Jeon, Y. et al. Dynamic control of strand excision during


human DNA mismatch repair. _Proc. Natl Acad. Sci. USA_ 113, 3281–3286 (2016). CAS  PubMed  Google Scholar  * Smith, D. J. & Whitehouse, I. Intrinsic coupling of lagging-strand synthesis


to chromatin assembly. _Nature_ 483, 434–438 (2012). CAS  PubMed  PubMed Central  Google Scholar  * Bowen, N. et al. Reconstitution of long and short patch mismatch repair reactions using


_Saccharomyces cerevisiae_ proteins. _Proc. Natl Acad. Sci. USA_ 110, 18472–18477 (2013). CAS  PubMed  Google Scholar  * Brosey, C. A. et al. A new structural framework for integrating


replication protein A into DNA processing machinery. _Nucleic Acids Res._ 41, 2313–2327 (2013). CAS  PubMed  PubMed Central  Google Scholar  * Fan, J. & Pavletich, N. P. Structure and


conformational change of a replication protein A heterotrimer bound to ssDNA. _Genes Dev._ 26, 2337–2347 (2012). CAS  PubMed  PubMed Central  Google Scholar  * Supek, F. & Lehner, B.


Scales and mechanisms of somatic mutation rate variation across the human genome. _DNA Repair_ 81, 102647 (2019). PubMed  Google Scholar  * Bailey, M. H. et al. Comprehensive


characterization of cancer driver genes and mutations. _Cell_ 173, 371–385.e18 (2018). CAS  PubMed  PubMed Central  Google Scholar  * Pich, O. et al. The mutational footprints of cancer


therapies. _Nat. Genet._ 51, 1732–1740 (2019). CAS  PubMed  PubMed Central  Google Scholar  * Hodis, E. et al. A landscape of driver mutations in melanoma. _Cell_ 150, 251–263 (2012). CAS 


PubMed  PubMed Central  Google Scholar  * Drost, J. et al. Use of CRISPR-modified human stem cell organoids to study the origin of mutational signatures in cancer. _Science_ 358, 234–238


(2017). CAS  PubMed  PubMed Central  Google Scholar  * Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. _Science_ 359,


555–559 (2018). CAS  PubMed  Google Scholar  * Verheijen, B. M., Vermulst, M. & van Leeuwen, F. W. Somatic mutations in neurons during aging and neurodegeneration. _Acta Neuropathol._


135, 811–826 (2018). CAS  PubMed  PubMed Central  Google Scholar  * Lei, L. et al. APOBEC3 induces mutations during repair of CRISPR–Cas9-generated DNA breaks. _Nat. Struct. Mol. Biol._ 25,


45–52 (2018). CAS  PubMed  Google Scholar  * Belfield, E. J. et al. DNA mismatch repair preferentially protects genes from mutation. _Genome Res._ 28, 66–74 (2018). CAS  PubMed  PubMed


Central  Google Scholar  * Lujan, S. A. et al. Heterogeneous polymerase fidelity and mismatch repair bias genome variation and composition. _Genome Res._ 24, 1751–1764 (2014). CAS  PubMed 


PubMed Central  Google Scholar  * Peña-Diaz, J. et al. Noncanonical mismatch repair as a source of genomic instability in human cells. _Mol. Cell_ 47, 669–680 (2012). PubMed  Google Scholar


  * Zlatanou, A. et al. The hMSH2–hMSH6 complex acts in concert with monoubiquitinated PCNA and pol η in response to oxidative DNA damage in human cells. _Mol. Cell_ 43, 649–662 (2011). CAS


  PubMed  Google Scholar  * Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. _Bioinformatics_ 28, 1811–1817 (2012). CAS 


PubMed  Google Scholar  * Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. _Nature_ 575, 210–216 (2019). CAS  PubMed  PubMed Central  Google Scholar  *


Huang, M. N. et al. MSIseq: software for assessing microsatellite instability from catalogs of somatic mutations. _Sci. Rep._ 5, 13321 (2015). CAS  PubMed  Google Scholar  * Wang, J. et al.


Clonal evolution of glioblastoma under therapy. _Nat. Genet._ 48, 768–776 (2016). CAS  PubMed  PubMed Central  Google Scholar  * Hayward, N. K. et al. Whole-genome landscapes of major


melanoma subtypes. _Nature_ 545, 175–180 (2017). CAS  PubMed  Google Scholar  * Campbell, P. J. et al. Pan-cancer analysis of whole genomes. _Nature_ 578, 82–93 (2020). Google Scholar  *


Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. _Cell Syst._ 6, 271–281.e7 (2018). CAS  PubMed  PubMed Central 


Google Scholar  * Grün, B. & Leisch, F. FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. _J. Stat. Softw._ 28, 1–35 (2008). Google


Scholar  * Khodabakhshi, A. H. et al. Recurrent targets of aberrant somatic hypermutation in lymphoma. _Oncotarget_ 3, 1308–1319 (2012). PubMed  PubMed Central  Google Scholar  * Krüger, S.


et al. Rare variants in neurodegeneration associated genes revealed by targeted panel sequencing in a German ALS cohort. _Front. Mol. Neurosci._ 9, 92 (2016). PubMed  PubMed Central  Google


Scholar  * Hart, T. et al. Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens. _G3 (Bethesda)_ 7, 2719–2727 (2017). CAS  Google Scholar  * Liu, J. et al. An integrated TCGA


pan-cancer clinical data resource to drive high-quality survival outcome analytics. _Cell_ 173, 400–416.e11 (2018). CAS  PubMed  PubMed Central  Google Scholar  Download references


ACKNOWLEDGEMENTS We thank the members of the Genome Data Science group and B. Lehner for comments and discussions. This work was funded by the ERC Starting Grant HYPER-INSIGHT (757700) and


the Spanish Ministry of Economy and Competitiveness (REGIOMUT, grant number BFU2017-89833-P). The results published here are in whole or part based on data generated by the TCGA Research


Network (https://www.cancer.gov/tcga). This publication and the underlying research are partly facilitated by the Hartwig Medical Foundation and Center for Personalized Cancer Treatment


(CPCT), which have generated, analyzed and made available data for this research. D.M.P. was funded by a Severo Ochoa FPI fellowship (MCIU/Fondo Social Europeo; BES-2017-079820). F.S. was


funded by the ICREA Research Professor program and is a member of the EMBO Young Investigator Program. The authors acknowledge support from the Severo Ochoa Centre of Excellence program to


IRB Barcelona. AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain David


Mas-Ponte & Fran Supek * Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain Fran Supek Authors * David Mas-Ponte View author publications You can also search for


this author inPubMed Google Scholar * Fran Supek View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS F.S. and D.M.-P. conceptualized the study


and devised the methodology. D.M.-P. carried out the formal analysis and the investigation, operated the software and performed data visualization. D.M.-P. and F.S. wrote and edited the


draft manuscript. F.S. acquired the funding and supervised the study. CORRESPONDING AUTHOR Correspondence to Fran Supek. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no


competing interests. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. EXTENDED


DATA EXTENDED DATA FIG. 1 DETECTING CLUSTERED MUTATIONS AND SIMULATING PROCESSES THAT GENERATE CLUSTERED MUTATIONS. A, Method to determine significant mutation clustering using HyperClust. A


baseline distribution is generated by shuffling mutations within 1 Mbp windows multiple times (R1, R2, …, Rn) to loci with matching trinucleotide contexts. For every mutation, the observed


intermutational distance to its nearest neighbour (nIMD) is compared with distributions of expected IMDs (from randomized data) to determine a local FDR (lfdr). Thresholding by lfdr yields


clustered mutation calls (blue). B, Overview of study. C, Precision-recall curves for models in Fig. 1a, derived from simulated data with spiked-in mutation clusters: kataegis (top; with


five mutations per cluster at an average 600 bp pairwise distance) or omikli_M (bottom; two mutations at 101 bp). Two examples of high mutation burden tumors (TCGA-AP-A0LD, TCGA-AP-A0LE)


were used to generate the background mutation distributions. D, E, Testing accuracy of mutation cluster calling methods using simulated data. Points represent randomized tumor samples into


which spiked-in mutation clusters were introduced. Samples are ordered according to total mutation burden (panel D). Columns show different performance metrics: F1 score, precision, and


recall, all at lfdr=20%. Rows represent different types of spiked-in mutation clusters (IMD distributions plotted in panel e, where kataegis have five mutations and omikli_K/M/O two


mutations. Boxplots compare cluster calling methods, including implementations of some previous methodologies (details in Methods). The “strand-clonality-lfdr” (blue) is the HyperClust


method used throughout our work. F, G, Poisson mixture modelling (related with Fig. 1d) of the number of mutations per cluster, showing relative likelihood (panel F) of models with


increasing number of components and the density functions (panel G) of a model with two Poisson components. solid line represents mean and dashed lines the 95% C.I. H, Number of mutation


events per tumor sample (_x_ axis, n) per local hypermutation type (rows), either the A3 context TCW>K mutations, or the remaining mutations (columns). EXTENDED DATA FIG. 2


TETRANUCLEOTIDE CONTEXT SUGGESTS A ROLE FOR THE A3A ENZYME IN GENERATING OMIKLI AND A3B IN KATAEGIS MUTATIONS. A, C, Ratios of the YTCA (A3A-like) and RTCA (A3B-like) mutation frequencies


suggest differential mutagenic activity of A3A versus A3B enzymes in cancer samples. The C>T and the C>G changes in the two A3 contexts are shown in a pan-cancer analysis (panel A) and


broken down by cancer type (panel C). At least 100 TCW mutations of a certain type across all tumor samples in a tissue were required to perform analyses on that tissue (number of mutations


in brackets). Error bars are the bootstrap 95% C.I. of the ratio. KICH and THCA cancer types are not shown due to low overall number of A3-context mutations. B, Across multiple cancer


types, omikli shows a tendency towards A3A-like, lower RTCA/YTCA-ratios than does kataegis. Difference tested by Fisher’s exact test (per tumor type), two-tailed; p-values were adjusted for


multiple testing. Dashed line is FDR=20%. Lower odds ratios (<1) denote relative enrichment of YTCA (A3A-like) mutations in omikli compared to kataegis; see schematic above plot. EXTENDED


DATA FIG. 3 ASSOCIATION OF CLUSTERED MUTATION RATES WITH REPLICATION TIME (RT). A, RT association per cancer type. Number of mutations per RT bin: A3 context (top row) and the non-A3


control context at C:G nucleotide pairs (bottom row). RT bins are ordered from the latest-replicating quartile to the earliest-replicating quartile; mutation rates are shown relative to the


latest RT bin. Enrichments are not shown when the mutation count was lower than 10. B, Trinucleotide composition of the human reference genome in four RT bins, normalized to the latest RT


quartile (leftmost point). The A3 trinucleotide contexts (TCW, green) are similarly abundant in the late and in the early-replicating regions of the genome. C, D, Enrichment of A3-context


kataegis clusters, considering only RT (C), or jointly considering RT, mRNA levels and the H3K36me3 histone mark levels (D); points are coefficients from negative binomial regression, and


error bars are 95% C.I. E, Mutation rates in genomic bins with different CpG density (determined per 10 kb segment), stratified by RT quartiles. _y_ axis shows mutation densities relative to


the first bin (‘t1’, lowest tertile by CpG content). F, Spearman correlation between mRNA expression of A3A, A3B and MMR genes, and the TCW context enrichment of clustered mutations in a


tumor. Error bars are 95% C.I. from the Fisher transformation of the correlation coefficient. G, Association of A3 mutation burden (clustered and unclustered) with copy number alterations of


MMR genes. Significance by a two-tailed Mann-Whitney test, comparing tumor samples with neutral (0) versus gain/amplification (+1 and +2) states (blue stars, showing p-values according to


legend), and independently, comparing samples with neutral (0) versus loss (−1 and −2) states (purple stars). P-values were not adjusted. EXTENDED DATA FIG. 4 SIMULATIONS ESTIMATE POWER TO


DETECT MUTATION CLUSTERS AND DECONVOLUTE THEIR IMD DISTRIBUTIONS. A, B, An analysis of somatic hypermutation (SHM) events in lymphoid cancers suggests length of MMR excision tracts in human


cells. The distance from the initiating AID mutation (here, WNCYN>N context) to the flanking mutation introduced by error-prone MMR (here, any mutation at a A:T pair) is plotted, in known


SHM off-target regions (blue) and, as a control, in intergenic regions (red) (panel A). A statistically significant enrichment is seen in the bins of the distance to central AID mutation


(_x_ axis) between 400–1000 nt (panel B). Numbers above/below bars are p-values by Chi-square test on the standardized residuals. C, Gamma mixture modelling of the IMD distributions.


Log-likelihood values for different number of components when modelling IMD of the A3 kataegis and omikli mutations. D, The alpha and beta parameters of the three fitted gamma distributions


(‘comp.1’, ‘comp.2’ and ‘comp. 3’) approximately match the alpha and beta parameters expected from simulated distributions with IMD at 30 bp, 800 bp and 200 bp, respectively. E, F,


Simulations using spiked-in clustered mutations into genomes obtained by randomizing and subsampling mutations from MSI-H hypermutated tumors (panel E) and other hypermutators (panel F),


with the goal of determining the recall (or sensitivity; _y_ axis) of recovering mutation clusters at various global mutation burdens (_x_ axis). Dashed line is a loess fit and shaded area


is its 95% C.I. Vertical lines are residuals of the fit. G, Difference between MSI and MSS tumor samples in the absolute burden of clustered A3 _omikli_ mutations; significance by


Mann-Whitney test (two-tailed). EXTENDED DATA FIG. 5 VALIDATION ANALYSES USING INDEPENDENT GENOMIC DATA SETS. A–C, Fitting a Poisson distribution mixture to the number of mutations per


cluster in the Hartwig Medical Foundation (HMF) dataset. The near-maximum log likelihood (LL) is obtained with two components (panel C) and the increase to three components is not


statistically supported; p-values are from a two-sided bootstrap test. D, E, The relative density of A3 context (left) clustered mutations is higher in MSS (MMR-proficient) than in MSI


(MMR-deficient) samples of the same tumor type (left column) in the HMF data. The difference is smaller for the non-A3, control context (right). Significance by Mann-Whitney (two-tailed), n


is the number of samples, *** is p < 0.001. Numbers show fold-difference between MSS and MSI samples. The ‘other A3 tissues’ are lung, head-and-neck, skin, pancreas and bladder cancer. F,


In HMF data, the A3-context _omikli_ clustered mutations are enriched in tumors with amplified MMR genes; significance by Mann-Whitney test (two-tailed) comparing the neutral (0) versus the


gain states (+1 and +2, considered jointly); n is the number of samples. G, In HMF data, A3-context _omikli_ are enriched in early replicating, H3K36me3-marked genomic regions; error bars


are 95% C.I. H, Intermutational distance distributions for kataegis (top) and omikli (bottom) A3 context mutations in the HMF data. Dashed lines show peaks of the simulated distributions


(Fig. 2) with segment lengths of 25 bp (green), 200 bp (purple) and 800 bp (orange). I, J, Whole-exome sequences in the TCGA data show an excess of A3 context (TCW) mutation fraction in MSS


compared to MSI cancers (panel I), and an excess of TCW mutations at distances <1000 bp, normalized to longer distances, in MSS over MSI samples (panel J). ‘MSI-exp’ (_n_ = 152) denotes


the experimentally established MSI-H status while ‘MSI-pred’ (_n_ = 18) is the MSI status predicted using machine learning (ref. 61), ‘nonMSI’ (_n_ = 5,661) is neither of these cases.


EXTENDED DATA FIG. 6 CONTRIBUTION OF THE _OMIKLI_ AND THE _KATAEGIS_ MECHANISMS TO THE UNCLUSTERED A3 MUTATION BURDEN IN VARIOUS TISSUES. A, The omikli mechanism generates many unclustered


mutations (‘A3-O’) in various cancer types. B, The kataegis mechanism generates comparatively few unclustered mutations (‘A3-K’). Panels show the fit (red line) of the unclustered A3 burden


(_y_ axis) to the clustered A3 burden (_x_ axis), (see Methods). Error bars are 95% prediction intervals at x=0, and at x = mean burden of A3 clustered mutations for that cancer type.


Horizontal dashed lines are the predicted numbers of unclustered A3 mutations at those two points (for clarity also shown in blue/green bars next to each plot). Fits use robust regression


(rlm function in R). For visual clarity, only the part of the plot up to the mean of unclustered mutation burden plus a margin is shown, however the fit uses all data points (that is tumor


samples) including ones not visualized. EXTENDED DATA FIG. 7 MECHANISMS UNDERLYING A3 CLUSTERED MUTATIONS GENERATE MANY IMPACTFUL CHANGES, AFFECTING DISEASE GENES. A, Coding regions in the


human genome are enriched for CpG dinucleotides (NCG), but not with the A3-context TCW trinucleotides, compared to random expectation. B, Enrichment of mutations in exons _versus_ introns


(estimate of selection strength, _x_ axis) and the enrichment in intergenic regions versus introns (estimate of redistribution of mutations towards regions containing genic DNA, _y_ axis;


flipped). The comparison of mutagenic agents against APOBEC was performed for selected tissues, matching the relevant tissue with the particular mutagen (tumor samples listed in


Supplementary Table 7). Error bars are 95% C.I. from negative binomial regression; numbers in parenthesis are the tally of mutations. C, The differential functional impact of the tested


mutagens across replication time (RT) bins. Left: total length of coding sequences (CDS) in the late and early RT bins, shaded by the RT sextiles that were merged to create the two bins


(where 1 is the latest and 6 is the earliest RT). Middle: expected number of cancer gene CDS-affecting mutations in an average tumor sample (same sets of samples, genes and mutations as in


Fig. 5a; _y_ axis) for the late versus early RT bin (_x_ axis), for various mutagens (colors); error bars are s.e.m. Right: fold-difference between the functional impact at the late versus


early bin, for various mutagen types. D, E, The functional impact density (FID) of various mutational processes in a set of cell-essential genes (panel D) and neurodegenerative


disease-associated genes (panel E). Slope shows the fraction of impactful genetic changes i.e. those affecting the CDS of at least one gene in the set. Points show the expected number of


impactful changes resulting from a mutational process, on average, in a tumor genome affected by that mutational process. Error bars are s.e.m. ‘APOBEC-O4’ is A3 mutagenesis in omikli-rich


tumors. ‘APOBEC-K2’ is A3 mutagenesis in kataegis-rich tumors. EXTENDED DATA FIG. 8 ASSOCIATIONS BETWEEN GENIC MUTATIONS AND GLOBAL BURDEN OF CLUSTERED MUTATIONS. A, Associations between


A3-context TCW>K mutations in coding regions of each cancer gene, and the global burden of A3 kataegis (top left) or omikli (middle left) and their interaction term (bottom left). Right


panel is same as middle-left panel, but showing only the significant genes, with labels. Volcano plots show logistic regression coefficients (transformed to odds ratio) on the _x_ axis and


the log FDR on the _y_ axis. Genes that bore coding mutations in at least three tumor samples were tested. B, Number of TCW sites in a gene coding sequence (CDS; _x_ axis) predicts the


association of cancer gene mutations (_y_ axis) with A3 omikli burden (bottom) but not with A3 kataegis burden (top). Error bands are 95% C.I. of the linear fit. C, Same association analysis


as panel A but for the control, non-A3 context VCN>K mutations in the gene CDS. D, Early RT cancer genes are more affected by A3 mutagenesis. Cancer genes were stratified into RT


quartiles (_x_ axis) and logistic regression coefficient (log odds ratio, _y_ axis) linking A3 _omikli_ burden with the presence of a mutation in the CDS of any cancer gene in that RT bin


was determined. Error bars are 95% C.I. from logistic regression (on n=593 tumor samples). SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION Supplementary Note REPORTING SUMMARY


SUPPLEMENTARY TABLES Supplementary Tables 1–10 RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Mas-Ponte, D., Supek, F. DNA mismatch repair promotes


APOBEC3-mediated diffuse hypermutation in human cancers. _Nat Genet_ 52, 958–968 (2020). https://doi.org/10.1038/s41588-020-0674-6 Download citation * Received: 22 August 2019 * Accepted: 30


June 2020 * Published: 03 August 2020 * Issue Date: September 2020 * DOI: https://doi.org/10.1038/s41588-020-0674-6 SHARE THIS ARTICLE Anyone you share the following link with will be able


to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing


initiative