Play all audios:
ABSTRACT Certain mutagens, including the APOBEC3 (A3) cytosine deaminase enzymes, can create multiple genetic changes in a single event. Activity of A3s results in striking ‘mutation
showers’ occurring near DNA breakpoints; however, less is known about the mechanisms underlying the majority of A3 mutations. We classified the diverse patterns of clustered mutagenesis in
tumor genomes, which identified a new A3 pattern: nonrecurrent, diffuse hypermutation (omikli). This mechanism occurs independently of the known focal hypermutation (kataegis), and is
associated with activity of the DNA mismatch-repair pathway, which can provide the single-stranded DNA substrate needed by A3, and contributes to a substantial proportion of A3 mutations
genome wide. Because mismatch repair is directed towards early-replicating, gene-rich chromosomal domains, A3 mutagenesis has a high propensity to generate impactful mutations, which exceeds
that of other common carcinogens such as tobacco smoke and ultraviolet exposure. Cells direct their DNA repair capacity towards more important genomic regions; thus, carcinogens that
subvert DNA repair can be remarkably potent. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access
through your institution Access Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to
this journal Receive 12 print issues and online access $209.00 per year only $17.42 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy
now Prices may be subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer
support SIMILAR CONTENT BEING VIEWED BY OTHERS MUTATIONAL SIGNATURE SBS8 PREDOMINANTLY ARISES DUE TO LATE REPLICATION ERRORS IN CANCER Article Open access 03 August 2020 MAPPING CLUSTERED
MUTATIONS IN CANCER REVEALS APOBEC3 MUTAGENESIS OF ECDNA Article Open access 09 February 2022 CELL CYCLE GENE ALTERATIONS ASSOCIATE WITH A REDISTRIBUTION OF MUTATION RISK ACROSS CHROMOSOMAL
DOMAINS IN HUMAN CANCERS Article 10 January 2024 DATA AVAILABILITY Whole-genome sequences from the TCGA project were available through the Cancer Genomics Hub repository (now superseded by
the NCI Genomic Data Commons; https://gdc.cancer.gov/). Corresponding SNP array data were downloaded from the GDC legacy portal (https://portal.gdc.cancer.gov/legacy-archive). WGS data from
the Hartwig Medical Foundation are available at https://www.hartwigmedicalfoundation.nl/en. The whole-exome sequencing data of TCGA cohort are available through the MC3 dataset at
https://gdc.cancer.gov/about-data/publications/mc3-2017. Data generated by the analyses in this study are available in the Supplementary Tables. CODE AVAILABILITY Code to generate clustered
mutation calls was implemented in Python (version 3.6) and R environments (version 3.6). Relevant packages are biopython (version 1.73) and numpy (version 1.15.4) for Python, and Biostrings
(2.52.0), VariantAnnotation (1.30.1) and GenomicRanges (1.36.0) for R. Code is available at https://github.com/davidmasp/hyperclust. Statistical analysis of the data was performed using
custom scripts in R (version 3.6). Relevant packages are mclust (version 5.4.4), mixtools (version 1.1.0), MASS (version 7.3-51.4) and flexmix (version 2.3-15). REFERENCES * Harris, K. &
Nielsen, R. Error-prone polymerase activity causes multinucleotide mutations in humans. _Genome Res._ 24, 1445–1454 (2014). CAS PubMed PubMed Central Google Scholar * Rogozin, I. B. et
al. DNA polymerase η mutational signatures are found in a variety of different types of cancer. _Cell Cycle_ 17, 348–355 (2018). CAS PubMed PubMed Central Google Scholar * Seplyarskiy,
V. B. et al. Error-prone bypass of DNA lesions during lagging-strand replication is a common source of germline and cancer mutations. _Nat. Genet._ 51, 36–41 (2019). CAS PubMed Google
Scholar * Supek, F. & Lehner, B. Clustered mutation signatures reveal that error-prone DNA repair targets mutations to active genes. _Cell_ 170, 534–547.e23 (2017). CAS PubMed Google
Scholar * Moris, A., Murray, S. & Cardinaud, S. AID and APOBECs span the gap between innate and adaptive immunity. _Front. Microbiol._ 5, 534 (2014). PubMed PubMed Central Google
Scholar * Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. _Nature_ 500, 415–421 (2013). CAS PubMed PubMed Central Google Scholar * Burns, M. B., Temiz, N.
A. & Harris, R. S. Evidence for APOBEC3B mutagenesis in multiple human cancers. _Nat. Genet._ 45, 977–983 (2013). CAS PubMed PubMed Central Google Scholar * Nik-Zainal, S. et al.
Mutational processes molding the genomes of 21 breast cancers. _Cell_ 149, 979–993 (2012). CAS PubMed PubMed Central Google Scholar * Roberts, S. A. et al. An APOBEC cytidine deaminase
mutagenesis pattern is widespread in human cancers. _Nat. Genet._ 45, 970–976 (2013). CAS PubMed PubMed Central Google Scholar * Roberts, S. A. et al. Clustered mutations in yeast and in
human cancers can arise from damaged long single-strand DNA regions. _Mol. Cell_ 46, 424–435 (2012). CAS PubMed PubMed Central Google Scholar * Landry, S., Narvaiza, I., Linfesty, D. C.
& Weitzman, M. D. APOBEC3A can activate the DNA damage response and cause cell‐cycle arrest. _EMBO Rep._ 12, 444–450 (2011). CAS PubMed PubMed Central Google Scholar * Suspène, R.
et al. Somatic hypermutation of human mitochondrial and nuclear DNA by APOBEC3 cytidine deaminases, a pathway for DNA catabolism. _Proc. Natl Acad. Sci. USA_ 108, 4858–4863 (2011). PubMed
Google Scholar * Byeon, I.-J. L. et al. NMR structure of human restriction factor APOBEC3A reveals substrate binding and enzyme specificity. _Nat. Commun._ 4, 1890 (2013). PubMed PubMed
Central Google Scholar * Holtz, C. M., Sadler, H. A. & Mansky, L. M. APOBEC3G cytosine deamination hotspots are defined by both sequence context and single-stranded DNA secondary
structure. _Nucleic Acids Res._ 41, 6139–6148 (2013). CAS PubMed PubMed Central Google Scholar * Nik-Zainal, S. et al. Association of a germline copy number polymorphism of _APOBEC3A_
and _APOBEC3B_ with burden of putative APOBEC-dependent mutations in breast cancer. _Nat. Genet._ 46, 487–491 (2014). CAS PubMed PubMed Central Google Scholar * Glaser, A. P. et al.
APOBEC-mediated mutagenesis in urothelial carcinoma is associated with improved survival, mutations in DNA damage response genes, and immune response. _Oncotarget_ 9, 4537–4548 (2017).
PubMed PubMed Central Google Scholar * Cortez, L. M. et al. APOBEC3A is a prominent cytidine deaminase in breast cancer. _PLoS Genet._ 15, e1008545 (2019). CAS PubMed PubMed Central
Google Scholar * Sakofsky, C. J. et al. Break-induced replication is a source of mutation clusters underlying kataegis. _Cell Rep._ 7, 1640–1648 (2014). CAS PubMed PubMed Central Google
Scholar * Sakofsky, C. J. et al. Repair of multiple simultaneous double-strand breaks causes bursts of genome-wide clustered hypermutation. _PLoS Biol._ 17, e3000464 (2019). CAS PubMed
PubMed Central Google Scholar * Kazanov, M. D. et al. APOBEC-induced cancer mutations are uniquely enriched in early-replicating, gene-dense, and active chromatin regions. _Cell Rep._ 13,
1103–1109 (2015). CAS PubMed PubMed Central Google Scholar * Buisson, R. et al. Passenger hotspot mutations in cancer driven by APOBEC3A and mesoscale genomic features. _Science_ 364,
eaaw2872 (2019). CAS PubMed PubMed Central Google Scholar * Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome.
_Nature_ 521, 81–84 (2015). CAS PubMed PubMed Central Google Scholar * Zheng, C. L. et al. Transcription restores DNA repair to heterochromatin, determining regional mutation rates in
cancer genomes. _Cell Rep._ 9, 1228–1234 (2014). CAS PubMed PubMed Central Google Scholar * Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of
DNA damage and repair. _Cell_ 164, 538–549 (2016). CAS PubMed PubMed Central Google Scholar * Morganella, S. et al. The topography of mutational processes in breast cancer genomes. _Nat.
Commun._ 7, 11383 (2016). CAS PubMed PubMed Central Google Scholar * Seplyarskiy, V. B. et al. APOBEC-induced mutations in human cancers are strongly enriched on the lagging DNA strand
during replication. _Genome Res._ 26, 174–182 (2016). CAS PubMed PubMed Central Google Scholar * Green, A. M. et al. APOBEC3A damages the cellular genome during DNA replication. _Cell
Cycle_ 15, 998–1008 (2016). CAS PubMed PubMed Central Google Scholar * Kanu, N. et al. DNA replication stress mediates APOBEC3 family mutagenesis in breast cancer. _Genome Biol._ 17, 185
(2016). PubMed PubMed Central Google Scholar * Nikkilä, J. et al. Elevated APOBEC3B expression drives a kataegic-like mutation signature and replication stress-related therapeutic
vulnerabilities in p53-defective cells. _Br. J. Cancer_ 117, 113–123 (2017). PubMed PubMed Central Google Scholar * Bhagwat, A. S. et al. Strand-biased cytosine deamination at the
replication fork causes cytosine to thymine mutations in _Escherichia coli_. _Proc. Natl Acad. Sci. USA_ 113, 2176–2181 (2016). CAS PubMed Google Scholar * Hoopes, J. I. et al. APOBEC3A
and APOBEC3B preferentially deaminate the lagging strand template during DNA replication. _Cell Rep._ 14, 1273–1282 (2016). CAS PubMed PubMed Central Google Scholar * Chen, J., Miller,
B. F. & Furano, A. V. Repair of naturally occurring mismatches can induce mutations in flanking DNA. _eLife_ 3, e02001 (2014). PubMed PubMed Central Google Scholar * Cannataro, V. L.
et al. APOBEC-induced mutations and their cancer effect size in head and neck squamous cell carcinoma. _Oncogene_ 38, 3475–3487 (2019). CAS PubMed PubMed Central Google Scholar *
Henderson, S., Chakravarthy, A., Su, X., Boshoff, C. & Fenton, T. R. APOBEC-mediated cytosine deamination links PIK3CA helical domain mutations to human papillomavirus-driven tumor
development. _Cell Rep._ 7, 1833–1841 (2014). CAS PubMed Google Scholar * Li, Z. et al. APOBEC signature mutation generates an oncogenic enhancer that drives _LMO1_ expression in T-ALL.
_Leukemia_ 31, 2057–2064 (2017). CAS PubMed PubMed Central Google Scholar * De Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer
evolution. _Science_ 346, 251–256 (2014). CAS PubMed PubMed Central Google Scholar * McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational
processes in cancer evolution. _Sci. Transl. Med._ 7, 283ra54 (2015). PubMed PubMed Central Google Scholar * Ullah, I. et al. Evolutionary history of metastatic breast cancer reveals
minimal seeding from axillary lymph nodes. _J. Clin. Invest._ 128, 1355–1370 (2018). PubMed PubMed Central Google Scholar * Reijns, M. A. M. et al. Lagging strand replication shapes the
mutational landscape of the genome. _Nature_ 518, 502–506 (2015). CAS PubMed PubMed Central Google Scholar * Taylor, B. J. et al. DNA deaminases induce break-associated mutation showers
with implication of APOBEC3B and 3A in breast cancer kataegis. _eLife_ 2, e00534 (2013). PubMed PubMed Central Google Scholar * D’Antonio, M., Tamayo, P., Mesirov, J. P. & Frazer, K.
A. Kataegis expression signature in breast cancer is associated with late onset, better prognosis, and higher HER2 levels. _Cell Rep._ 16, 672–683 (2016). PubMed PubMed Central Google
Scholar * Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. _Cell_ 176, 1282–1294.e20 (2019). CAS PubMed PubMed
Central Google Scholar * Zhang, Y. et al. A pan-cancer compendium of genes deregulated by somatic genomic rearrangement across more than 1,400 cases. _Cell Rep._ 24, 515–527 (2018). CAS
PubMed PubMed Central Google Scholar * Yang, Y., Sterling, J., Storici, F., Resnick, M. A. & Gordenin, D. A. Hypermutability of damaged single-strand DNA formed at double-strand
breaks and uncapped telomeres in yeast _Saccharomyces cerevisiae_. _PLoS Genet._ 4, e1000264 (2008). PubMed PubMed Central Google Scholar * Chan, K. et al. An APOBEC3A hypermutation
signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. _Nat. Genet._ 47, 1067–1072 (2015). CAS PubMed PubMed Central Google Scholar * De,
S. & Michor, F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. _Nat. Biotechnol._ 29, 1103–1108 (2011). CAS PubMed PubMed
Central Google Scholar * Tomkova, M., Tomek, J., Kriaucionis, S. & Schuster-Böckler, B. Mutational signature distribution varies with DNA replication timing and strand asymmetry.
_Genome Biol._ 19, 129 (2018). PubMed PubMed Central Google Scholar * Woo, Y. H. & Li, W.-H. DNA replication timing and selection shape the landscape of nucleotide variation in cancer
genomes. _Nat. Commun._ 3, 1004 (2012). PubMed Google Scholar * Zou, X. et al. Validating the concept of mutational signatures with isogenic cell models. _Nat. Commun._ 9, 1744 (2018).
PubMed PubMed Central Google Scholar * Li, F. et al. The histone mark H3K36me3 regulates human DNA mismatch repair through its interaction with MutSα. _Cell_ 153, 590–600 (2013). CAS
PubMed PubMed Central Google Scholar * Barski, A. et al. High-resolution profiling of histone methylations in the human genome. _Cell_ 129, 823–837 (2007). CAS PubMed Google Scholar *
Vavouri, T. & Lehner, B. Human genes with CpG island promoters have a distinct transcription-associated chromatin organization. _Genome Biol._ 13, R110 (2012). PubMed PubMed Central
Google Scholar * Huang, Y., Gu, L. & Li, G.-M. H3K36me3-mediated mismatch repair preferentially protects actively transcribed genes from mutation. _J. Biol. Chem._ 293, 7811–7823
(2018). CAS PubMed PubMed Central Google Scholar * Mugal, C. F., von Grünberg, H.-H. & Peifer, M. Transcription-induced mutational strand bias and its effect on substitution rates in
human genes. _Mol. Biol. Evol._ 26, 131–142 (2009). CAS PubMed Google Scholar * Pfister, S. X. et al. SETD2-dependent histone H3K36 trimethylation is required for homologous
recombination repair and genome stability. _Cell Rep._ 7, 2006–2018 (2014). CAS PubMed PubMed Central Google Scholar * Chen, J. & Furano, A. V. Breaking bad: the mutagenic effect of
DNA repair. _DNA Repair_ 32, 43–51 (2015). PubMed PubMed Central Google Scholar * Andrianova, M. A., Bazykin, G. A., Nikolaev, S. I. & Seplyarskiy, V. B. Human mismatch repair system
balances mutation rates between strands by removing more mismatches from the lagging strand. _Genome Res._ 27, 1336–1343 (2017). CAS PubMed PubMed Central Google Scholar * Shinbrot, E.
et al. Exonuclease mutations in DNA polymerase epsilon reveal replication strand specific mutation patterns and human origins of replication. _Genome Res._ 24, 1740–1750 (2014). CAS PubMed
PubMed Central Google Scholar * Jiricny, J. The multifaceted mismatch-repair system. _Nat. Rev. Mol. Cell Biol._ 7, 335–346 (2006). CAS PubMed Google Scholar * Tran, P. T., Erdeniz,
N., Symington, L. S. & Liskay, R. M. EXO1-A multi-tasking eukaryotic nuclease. _DNA Repair_ 3, 1549–1559 (2004). CAS PubMed Google Scholar * Cortes-Ciriano, I., Lee, S., Park, W.-Y.,
Kim, T.-M. & Park, P. J. A molecular portrait of microsatellite instability across multiple cancers. _Nat. Commun._ 8, 15180 (2017). CAS PubMed PubMed Central Google Scholar * Hause,
R. J., Pritchard, C. C., Shendure, J. & Salipante, S. J. Classification and characterization of microsatellite instability across 18 cancer types. _Nat. Med._ 22, 1342–1350 (2016). CAS
Google Scholar * Maruvka, Y. E. et al. Analysis of somatic microsatellite indels identifies driver events in human tumors. _Nat. Biotechnol._ 35, 951–959 (2017). CAS PubMed Google
Scholar * Hombauer, H., Srivatsan, A., Putnam, C. D. & Kolodner, R. D. Mismatch repair, but not heteroduplex rejection, is temporally coupled to DNA replication. _Science_ 334,
1713–1716 (2011). CAS PubMed PubMed Central Google Scholar * Hombauer, H., Campbell, C. S., Smith, C. E., Desai, A. & Kolodner, R. D. Visualization of eukaryotic DNA mismatch repair
reveals distinct recognition and repair intermediates. _Cell_ 147, 1040–1053 (2011). CAS PubMed PubMed Central Google Scholar * Jeon, Y. et al. Dynamic control of strand excision during
human DNA mismatch repair. _Proc. Natl Acad. Sci. USA_ 113, 3281–3286 (2016). CAS PubMed Google Scholar * Smith, D. J. & Whitehouse, I. Intrinsic coupling of lagging-strand synthesis
to chromatin assembly. _Nature_ 483, 434–438 (2012). CAS PubMed PubMed Central Google Scholar * Bowen, N. et al. Reconstitution of long and short patch mismatch repair reactions using
_Saccharomyces cerevisiae_ proteins. _Proc. Natl Acad. Sci. USA_ 110, 18472–18477 (2013). CAS PubMed Google Scholar * Brosey, C. A. et al. A new structural framework for integrating
replication protein A into DNA processing machinery. _Nucleic Acids Res._ 41, 2313–2327 (2013). CAS PubMed PubMed Central Google Scholar * Fan, J. & Pavletich, N. P. Structure and
conformational change of a replication protein A heterotrimer bound to ssDNA. _Genes Dev._ 26, 2337–2347 (2012). CAS PubMed PubMed Central Google Scholar * Supek, F. & Lehner, B.
Scales and mechanisms of somatic mutation rate variation across the human genome. _DNA Repair_ 81, 102647 (2019). PubMed Google Scholar * Bailey, M. H. et al. Comprehensive
characterization of cancer driver genes and mutations. _Cell_ 173, 371–385.e18 (2018). CAS PubMed PubMed Central Google Scholar * Pich, O. et al. The mutational footprints of cancer
therapies. _Nat. Genet._ 51, 1732–1740 (2019). CAS PubMed PubMed Central Google Scholar * Hodis, E. et al. A landscape of driver mutations in melanoma. _Cell_ 150, 251–263 (2012). CAS
PubMed PubMed Central Google Scholar * Drost, J. et al. Use of CRISPR-modified human stem cell organoids to study the origin of mutational signatures in cancer. _Science_ 358, 234–238
(2017). CAS PubMed PubMed Central Google Scholar * Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. _Science_ 359,
555–559 (2018). CAS PubMed Google Scholar * Verheijen, B. M., Vermulst, M. & van Leeuwen, F. W. Somatic mutations in neurons during aging and neurodegeneration. _Acta Neuropathol._
135, 811–826 (2018). CAS PubMed PubMed Central Google Scholar * Lei, L. et al. APOBEC3 induces mutations during repair of CRISPR–Cas9-generated DNA breaks. _Nat. Struct. Mol. Biol._ 25,
45–52 (2018). CAS PubMed Google Scholar * Belfield, E. J. et al. DNA mismatch repair preferentially protects genes from mutation. _Genome Res._ 28, 66–74 (2018). CAS PubMed PubMed
Central Google Scholar * Lujan, S. A. et al. Heterogeneous polymerase fidelity and mismatch repair bias genome variation and composition. _Genome Res._ 24, 1751–1764 (2014). CAS PubMed
PubMed Central Google Scholar * Peña-Diaz, J. et al. Noncanonical mismatch repair as a source of genomic instability in human cells. _Mol. Cell_ 47, 669–680 (2012). PubMed Google Scholar
* Zlatanou, A. et al. The hMSH2–hMSH6 complex acts in concert with monoubiquitinated PCNA and pol η in response to oxidative DNA damage in human cells. _Mol. Cell_ 43, 649–662 (2011). CAS
PubMed Google Scholar * Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. _Bioinformatics_ 28, 1811–1817 (2012). CAS
PubMed Google Scholar * Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. _Nature_ 575, 210–216 (2019). CAS PubMed PubMed Central Google Scholar *
Huang, M. N. et al. MSIseq: software for assessing microsatellite instability from catalogs of somatic mutations. _Sci. Rep._ 5, 13321 (2015). CAS PubMed Google Scholar * Wang, J. et al.
Clonal evolution of glioblastoma under therapy. _Nat. Genet._ 48, 768–776 (2016). CAS PubMed PubMed Central Google Scholar * Hayward, N. K. et al. Whole-genome landscapes of major
melanoma subtypes. _Nature_ 545, 175–180 (2017). CAS PubMed Google Scholar * Campbell, P. J. et al. Pan-cancer analysis of whole genomes. _Nature_ 578, 82–93 (2020). Google Scholar *
Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. _Cell Syst._ 6, 271–281.e7 (2018). CAS PubMed PubMed Central
Google Scholar * Grün, B. & Leisch, F. FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. _J. Stat. Softw._ 28, 1–35 (2008). Google
Scholar * Khodabakhshi, A. H. et al. Recurrent targets of aberrant somatic hypermutation in lymphoma. _Oncotarget_ 3, 1308–1319 (2012). PubMed PubMed Central Google Scholar * Krüger, S.
et al. Rare variants in neurodegeneration associated genes revealed by targeted panel sequencing in a German ALS cohort. _Front. Mol. Neurosci._ 9, 92 (2016). PubMed PubMed Central Google
Scholar * Hart, T. et al. Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens. _G3 (Bethesda)_ 7, 2719–2727 (2017). CAS Google Scholar * Liu, J. et al. An integrated TCGA
pan-cancer clinical data resource to drive high-quality survival outcome analytics. _Cell_ 173, 400–416.e11 (2018). CAS PubMed PubMed Central Google Scholar Download references
ACKNOWLEDGEMENTS We thank the members of the Genome Data Science group and B. Lehner for comments and discussions. This work was funded by the ERC Starting Grant HYPER-INSIGHT (757700) and
the Spanish Ministry of Economy and Competitiveness (REGIOMUT, grant number BFU2017-89833-P). The results published here are in whole or part based on data generated by the TCGA Research
Network (https://www.cancer.gov/tcga). This publication and the underlying research are partly facilitated by the Hartwig Medical Foundation and Center for Personalized Cancer Treatment
(CPCT), which have generated, analyzed and made available data for this research. D.M.P. was funded by a Severo Ochoa FPI fellowship (MCIU/Fondo Social Europeo; BES-2017-079820). F.S. was
funded by the ICREA Research Professor program and is a member of the EMBO Young Investigator Program. The authors acknowledge support from the Severo Ochoa Centre of Excellence program to
IRB Barcelona. AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain David
Mas-Ponte & Fran Supek * Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain Fran Supek Authors * David Mas-Ponte View author publications You can also search for
this author inPubMed Google Scholar * Fran Supek View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS F.S. and D.M.-P. conceptualized the study
and devised the methodology. D.M.-P. carried out the formal analysis and the investigation, operated the software and performed data visualization. D.M.-P. and F.S. wrote and edited the
draft manuscript. F.S. acquired the funding and supervised the study. CORRESPONDING AUTHOR Correspondence to Fran Supek. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no
competing interests. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. EXTENDED
DATA EXTENDED DATA FIG. 1 DETECTING CLUSTERED MUTATIONS AND SIMULATING PROCESSES THAT GENERATE CLUSTERED MUTATIONS. A, Method to determine significant mutation clustering using HyperClust. A
baseline distribution is generated by shuffling mutations within 1 Mbp windows multiple times (R1, R2, …, Rn) to loci with matching trinucleotide contexts. For every mutation, the observed
intermutational distance to its nearest neighbour (nIMD) is compared with distributions of expected IMDs (from randomized data) to determine a local FDR (lfdr). Thresholding by lfdr yields
clustered mutation calls (blue). B, Overview of study. C, Precision-recall curves for models in Fig. 1a, derived from simulated data with spiked-in mutation clusters: kataegis (top; with
five mutations per cluster at an average 600 bp pairwise distance) or omikli_M (bottom; two mutations at 101 bp). Two examples of high mutation burden tumors (TCGA-AP-A0LD, TCGA-AP-A0LE)
were used to generate the background mutation distributions. D, E, Testing accuracy of mutation cluster calling methods using simulated data. Points represent randomized tumor samples into
which spiked-in mutation clusters were introduced. Samples are ordered according to total mutation burden (panel D). Columns show different performance metrics: F1 score, precision, and
recall, all at lfdr=20%. Rows represent different types of spiked-in mutation clusters (IMD distributions plotted in panel e, where kataegis have five mutations and omikli_K/M/O two
mutations. Boxplots compare cluster calling methods, including implementations of some previous methodologies (details in Methods). The “strand-clonality-lfdr” (blue) is the HyperClust
method used throughout our work. F, G, Poisson mixture modelling (related with Fig. 1d) of the number of mutations per cluster, showing relative likelihood (panel F) of models with
increasing number of components and the density functions (panel G) of a model with two Poisson components. solid line represents mean and dashed lines the 95% C.I. H, Number of mutation
events per tumor sample (_x_ axis, n) per local hypermutation type (rows), either the A3 context TCW>K mutations, or the remaining mutations (columns). EXTENDED DATA FIG. 2
TETRANUCLEOTIDE CONTEXT SUGGESTS A ROLE FOR THE A3A ENZYME IN GENERATING OMIKLI AND A3B IN KATAEGIS MUTATIONS. A, C, Ratios of the YTCA (A3A-like) and RTCA (A3B-like) mutation frequencies
suggest differential mutagenic activity of A3A versus A3B enzymes in cancer samples. The C>T and the C>G changes in the two A3 contexts are shown in a pan-cancer analysis (panel A) and
broken down by cancer type (panel C). At least 100 TCW mutations of a certain type across all tumor samples in a tissue were required to perform analyses on that tissue (number of mutations
in brackets). Error bars are the bootstrap 95% C.I. of the ratio. KICH and THCA cancer types are not shown due to low overall number of A3-context mutations. B, Across multiple cancer
types, omikli shows a tendency towards A3A-like, lower RTCA/YTCA-ratios than does kataegis. Difference tested by Fisher’s exact test (per tumor type), two-tailed; p-values were adjusted for
multiple testing. Dashed line is FDR=20%. Lower odds ratios (<1) denote relative enrichment of YTCA (A3A-like) mutations in omikli compared to kataegis; see schematic above plot. EXTENDED
DATA FIG. 3 ASSOCIATION OF CLUSTERED MUTATION RATES WITH REPLICATION TIME (RT). A, RT association per cancer type. Number of mutations per RT bin: A3 context (top row) and the non-A3
control context at C:G nucleotide pairs (bottom row). RT bins are ordered from the latest-replicating quartile to the earliest-replicating quartile; mutation rates are shown relative to the
latest RT bin. Enrichments are not shown when the mutation count was lower than 10. B, Trinucleotide composition of the human reference genome in four RT bins, normalized to the latest RT
quartile (leftmost point). The A3 trinucleotide contexts (TCW, green) are similarly abundant in the late and in the early-replicating regions of the genome. C, D, Enrichment of A3-context
kataegis clusters, considering only RT (C), or jointly considering RT, mRNA levels and the H3K36me3 histone mark levels (D); points are coefficients from negative binomial regression, and
error bars are 95% C.I. E, Mutation rates in genomic bins with different CpG density (determined per 10 kb segment), stratified by RT quartiles. _y_ axis shows mutation densities relative to
the first bin (‘t1’, lowest tertile by CpG content). F, Spearman correlation between mRNA expression of A3A, A3B and MMR genes, and the TCW context enrichment of clustered mutations in a
tumor. Error bars are 95% C.I. from the Fisher transformation of the correlation coefficient. G, Association of A3 mutation burden (clustered and unclustered) with copy number alterations of
MMR genes. Significance by a two-tailed Mann-Whitney test, comparing tumor samples with neutral (0) versus gain/amplification (+1 and +2) states (blue stars, showing p-values according to
legend), and independently, comparing samples with neutral (0) versus loss (−1 and −2) states (purple stars). P-values were not adjusted. EXTENDED DATA FIG. 4 SIMULATIONS ESTIMATE POWER TO
DETECT MUTATION CLUSTERS AND DECONVOLUTE THEIR IMD DISTRIBUTIONS. A, B, An analysis of somatic hypermutation (SHM) events in lymphoid cancers suggests length of MMR excision tracts in human
cells. The distance from the initiating AID mutation (here, WNCYN>N context) to the flanking mutation introduced by error-prone MMR (here, any mutation at a A:T pair) is plotted, in known
SHM off-target regions (blue) and, as a control, in intergenic regions (red) (panel A). A statistically significant enrichment is seen in the bins of the distance to central AID mutation
(_x_ axis) between 400–1000 nt (panel B). Numbers above/below bars are p-values by Chi-square test on the standardized residuals. C, Gamma mixture modelling of the IMD distributions.
Log-likelihood values for different number of components when modelling IMD of the A3 kataegis and omikli mutations. D, The alpha and beta parameters of the three fitted gamma distributions
(‘comp.1’, ‘comp.2’ and ‘comp. 3’) approximately match the alpha and beta parameters expected from simulated distributions with IMD at 30 bp, 800 bp and 200 bp, respectively. E, F,
Simulations using spiked-in clustered mutations into genomes obtained by randomizing and subsampling mutations from MSI-H hypermutated tumors (panel E) and other hypermutators (panel F),
with the goal of determining the recall (or sensitivity; _y_ axis) of recovering mutation clusters at various global mutation burdens (_x_ axis). Dashed line is a loess fit and shaded area
is its 95% C.I. Vertical lines are residuals of the fit. G, Difference between MSI and MSS tumor samples in the absolute burden of clustered A3 _omikli_ mutations; significance by
Mann-Whitney test (two-tailed). EXTENDED DATA FIG. 5 VALIDATION ANALYSES USING INDEPENDENT GENOMIC DATA SETS. A–C, Fitting a Poisson distribution mixture to the number of mutations per
cluster in the Hartwig Medical Foundation (HMF) dataset. The near-maximum log likelihood (LL) is obtained with two components (panel C) and the increase to three components is not
statistically supported; p-values are from a two-sided bootstrap test. D, E, The relative density of A3 context (left) clustered mutations is higher in MSS (MMR-proficient) than in MSI
(MMR-deficient) samples of the same tumor type (left column) in the HMF data. The difference is smaller for the non-A3, control context (right). Significance by Mann-Whitney (two-tailed), n
is the number of samples, *** is p < 0.001. Numbers show fold-difference between MSS and MSI samples. The ‘other A3 tissues’ are lung, head-and-neck, skin, pancreas and bladder cancer. F,
In HMF data, the A3-context _omikli_ clustered mutations are enriched in tumors with amplified MMR genes; significance by Mann-Whitney test (two-tailed) comparing the neutral (0) versus the
gain states (+1 and +2, considered jointly); n is the number of samples. G, In HMF data, A3-context _omikli_ are enriched in early replicating, H3K36me3-marked genomic regions; error bars
are 95% C.I. H, Intermutational distance distributions for kataegis (top) and omikli (bottom) A3 context mutations in the HMF data. Dashed lines show peaks of the simulated distributions
(Fig. 2) with segment lengths of 25 bp (green), 200 bp (purple) and 800 bp (orange). I, J, Whole-exome sequences in the TCGA data show an excess of A3 context (TCW) mutation fraction in MSS
compared to MSI cancers (panel I), and an excess of TCW mutations at distances <1000 bp, normalized to longer distances, in MSS over MSI samples (panel J). ‘MSI-exp’ (_n_ = 152) denotes
the experimentally established MSI-H status while ‘MSI-pred’ (_n_ = 18) is the MSI status predicted using machine learning (ref. 61), ‘nonMSI’ (_n_ = 5,661) is neither of these cases.
EXTENDED DATA FIG. 6 CONTRIBUTION OF THE _OMIKLI_ AND THE _KATAEGIS_ MECHANISMS TO THE UNCLUSTERED A3 MUTATION BURDEN IN VARIOUS TISSUES. A, The omikli mechanism generates many unclustered
mutations (‘A3-O’) in various cancer types. B, The kataegis mechanism generates comparatively few unclustered mutations (‘A3-K’). Panels show the fit (red line) of the unclustered A3 burden
(_y_ axis) to the clustered A3 burden (_x_ axis), (see Methods). Error bars are 95% prediction intervals at x=0, and at x = mean burden of A3 clustered mutations for that cancer type.
Horizontal dashed lines are the predicted numbers of unclustered A3 mutations at those two points (for clarity also shown in blue/green bars next to each plot). Fits use robust regression
(rlm function in R). For visual clarity, only the part of the plot up to the mean of unclustered mutation burden plus a margin is shown, however the fit uses all data points (that is tumor
samples) including ones not visualized. EXTENDED DATA FIG. 7 MECHANISMS UNDERLYING A3 CLUSTERED MUTATIONS GENERATE MANY IMPACTFUL CHANGES, AFFECTING DISEASE GENES. A, Coding regions in the
human genome are enriched for CpG dinucleotides (NCG), but not with the A3-context TCW trinucleotides, compared to random expectation. B, Enrichment of mutations in exons _versus_ introns
(estimate of selection strength, _x_ axis) and the enrichment in intergenic regions versus introns (estimate of redistribution of mutations towards regions containing genic DNA, _y_ axis;
flipped). The comparison of mutagenic agents against APOBEC was performed for selected tissues, matching the relevant tissue with the particular mutagen (tumor samples listed in
Supplementary Table 7). Error bars are 95% C.I. from negative binomial regression; numbers in parenthesis are the tally of mutations. C, The differential functional impact of the tested
mutagens across replication time (RT) bins. Left: total length of coding sequences (CDS) in the late and early RT bins, shaded by the RT sextiles that were merged to create the two bins
(where 1 is the latest and 6 is the earliest RT). Middle: expected number of cancer gene CDS-affecting mutations in an average tumor sample (same sets of samples, genes and mutations as in
Fig. 5a; _y_ axis) for the late versus early RT bin (_x_ axis), for various mutagens (colors); error bars are s.e.m. Right: fold-difference between the functional impact at the late versus
early bin, for various mutagen types. D, E, The functional impact density (FID) of various mutational processes in a set of cell-essential genes (panel D) and neurodegenerative
disease-associated genes (panel E). Slope shows the fraction of impactful genetic changes i.e. those affecting the CDS of at least one gene in the set. Points show the expected number of
impactful changes resulting from a mutational process, on average, in a tumor genome affected by that mutational process. Error bars are s.e.m. ‘APOBEC-O4’ is A3 mutagenesis in omikli-rich
tumors. ‘APOBEC-K2’ is A3 mutagenesis in kataegis-rich tumors. EXTENDED DATA FIG. 8 ASSOCIATIONS BETWEEN GENIC MUTATIONS AND GLOBAL BURDEN OF CLUSTERED MUTATIONS. A, Associations between
A3-context TCW>K mutations in coding regions of each cancer gene, and the global burden of A3 kataegis (top left) or omikli (middle left) and their interaction term (bottom left). Right
panel is same as middle-left panel, but showing only the significant genes, with labels. Volcano plots show logistic regression coefficients (transformed to odds ratio) on the _x_ axis and
the log FDR on the _y_ axis. Genes that bore coding mutations in at least three tumor samples were tested. B, Number of TCW sites in a gene coding sequence (CDS; _x_ axis) predicts the
association of cancer gene mutations (_y_ axis) with A3 omikli burden (bottom) but not with A3 kataegis burden (top). Error bands are 95% C.I. of the linear fit. C, Same association analysis
as panel A but for the control, non-A3 context VCN>K mutations in the gene CDS. D, Early RT cancer genes are more affected by A3 mutagenesis. Cancer genes were stratified into RT
quartiles (_x_ axis) and logistic regression coefficient (log odds ratio, _y_ axis) linking A3 _omikli_ burden with the presence of a mutation in the CDS of any cancer gene in that RT bin
was determined. Error bars are 95% C.I. from logistic regression (on n=593 tumor samples). SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION Supplementary Note REPORTING SUMMARY
SUPPLEMENTARY TABLES Supplementary Tables 1–10 RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Mas-Ponte, D., Supek, F. DNA mismatch repair promotes
APOBEC3-mediated diffuse hypermutation in human cancers. _Nat Genet_ 52, 958–968 (2020). https://doi.org/10.1038/s41588-020-0674-6 Download citation * Received: 22 August 2019 * Accepted: 30
June 2020 * Published: 03 August 2020 * Issue Date: September 2020 * DOI: https://doi.org/10.1038/s41588-020-0674-6 SHARE THIS ARTICLE Anyone you share the following link with will be able
to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing
initiative