Inferring replication timing and proliferation dynamics from single-cell dna sequencing data

Play all audios:

ABSTRACT Dysregulated DNA replication is a cause and a consequence of aneuploidy in cancer, yet the interplay between copy number alterations (CNAs), replication timing (RT) and cell cycle

dynamics remain understudied in aneuploid tumors. We developed a probabilistic method, PERT, for simultaneous inference of cell-specific replication and copy number states from single-cell

whole genome sequencing (scWGS) data. We used PERT to investigate clone-specific RT and proliferation dynamics in >50,000 cells obtained from aneuploid and clonally heterogeneous cell

lines, xenografts and primary cancers. We observed bidirectional relationships between RT and CNAs, with CNAs affecting X-inactivation producing the largest RT shifts. Additionally, we found

that clone-specific S-phase enrichment positively correlated with ground-truth proliferation rates in genomically stable but not unstable cells. Together, these results demonstrate robust

computational identification of S-phase cells from scWGS data, and highlight the importance of RT and cell cycle properties in studying the genomic evolution of aneuploid tumors. SIMILAR

CONTENT BEING VIEWED BY OTHERS UNRAVELLING SINGLE-CELL DNA REPLICATION TIMING DYNAMICS USING MACHINE LEARNING REVEALS HETEROGENEITY IN CANCER PROGRESSION Article Open access 08 February 2025

INFERENCE OF CHROMOSOME SELECTION PARAMETERS AND MISSEGREGATION RATE IN CANCER FROM DNA-SEQUENCING DATA Article Open access 31 July 2024 REPLICATION TIMING ALTERATIONS ARE ASSOCIATED WITH

MUTATION ACQUISITION DURING BREAST AND LUNG CANCER EVOLUTION Article Open access 18 July 2024 INTRODUCTION DNA replication and cell cycle regulation are frequently disrupted as part of a

cancer’s progression toward uncontrolled proliferation1,2,3. The resulting dysregulation creates a positive feedback loop of increasing replication stress and genomic instability4,5,6,

generating new somatic copy number alterations (CNAs) which drives intratumoral heterogeneity and subsequent clonal evolution7,8,9. One key component of DNA replication is the relative

timing at which different genomic regions replicate during synthesis (S)-phase of the cell cycle, known as replication timing (RT). Genome-wide RT profiles are visible at resolutions ranging

from 10kb to 10MB and have well-defined associations with chromatin organization, mutation rate, and transcription, and cellular phenotype3,10,11,12,13,14,15,16,17,18. Structural variations

and CNAs have been shown to impact transcription and epigenetic state19,20,21 but little is known about their associations with RT as technical limitations in isolating S-phase cells have

prevented measurement of both bulk and single-cell RT in primary tumors with subclonal CNAs3,13,14,17,22,23,24,25,26,27,28. This limitation has forced previous studies of RT to compare CNAs

from tumor samples against RT profiles from unmatched cell lines, often of different tissue types12,29,30,31, or RT profiles predicted from matched chromatin accessibility data18. Therefore,

there remain open questions about the bi-directional interplay between ongoing genomic instability and RT dynamics. Outside of studying RT, quantifying S-phase cells in tumors can be used

to better understand proliferation and clonal fitness dynamics. The most accurate way to measure clonal fitness is through fitting population genetics models to time-series data32,33,34;

however, such analysis may be challenging for clinical samples, where often only a single time point surgical resection is available for study. Our current toolkit for forecasting clonal

evolution is therefore limited to comparing different oncogenic mutations between clones. Indeed, there are many known driver and passenger mutations that confer fitness advantages35,36, but

it is challenging to determine how individual events interact with one another when they co-occur in the same cell and with the tumor microenvironment to determine clonal

fitness37,38,39,40,41. Forecasting clonal fitness via quantifying the proportion of replicating cells per clone offers hope to address these gaps. It is well-known that the quiescent

(G0)-phase and growth (G1 and G2)-phases of the cell cycle are more variable in duration relative to mitosis (M)-phase and S-phase1,42. This phenomenon has enabled pathologists to use S- and

M-phase fractions, along with other markers of proliferation, for tumor staging and grading43,44,45,46. It remains an open question as to whether quantifying S-phase cells from sequencing

data has prognostic utility or can be used to forecast clonal evolution in genomically unstable cancers. A fundamental limitation to studying RT and cell cycle dynamics is the inability to

isolate S-phase cells from solid tumors with subclonal CNAs. Experimental sorting of S-phase cells often relies on feeding cells bromodeoxyuridine (BrdU), a thymidine analog that gets

incorporated during S-phase, followed by fluorescence-activated cell sorting (FACS)47,48. This approach works well for dissociated cell lines and blood cancers which rapidly uptake BrdU but

does not work for solid tumors since their dissociation would disrupt the replication dynamics we care to study and administering BrdU in vivo is toxic46. Another FACS approach is to stain

for DNA content as S-phase cells have intermediate amounts of DNA relative to G1- and G2-phase cells10,46. Staining DNA does not require dissociation but sorting based on DNA content fails

for genomically heterogeneous tumors as subclonal CNAs produce variation in total DNA content per cell. These limitations have constrained all previous studies of RT in cancer to leukemia

and cell lines with no subclonal CNAs22,23,27,28. Finding better ways to isolate S-phase cells would enable the measurement of replication dynamics in genomically unstable solid tumors.

Unsorted single-cell whole genome sequencing (scWGS) is a powerful method for studying clonal heterogeneity and CNAs, and can potentially overcome the technical limitations of experimental

sorting to provide unbiased insight into DNA replication dynamics in aneuploid populations49,50,51,52,53,54. Such potential hinges upon computational identification of S-phase cells and

downstream calculation of their single-cell replication timing (scRT) profiles. These steps are challenging due to difficulty in distinguishing inherited somatic CNAs from transient DNA

replication changes. One property that helps to distinguish these two signals is that RT is often conserved within cell types. Existing scWGS S-phase classifiers50,55 attempt to leverage RT

conservation by using the correlation between ENCODE RT profiles10,23,56 and cell read count profiles as important features in random forest classifiers. However, such a classification

approach is heavily biased in favor of finding S-phase cells that follow the RT profiles used to train the classifier. Furthermore, the performance of a pre-trained classifier depends on the

extent to which sequencing and sample properties, especially GC content bias and the proportion of CNAs in early vs late replicating regions, match those of the training data. Even with

successful isolation of S-phase cells, none of the existing methods for estimating scRT from sorted scWGS data25,26,27,28 would work for samples with subclonal CNAs as they divide each

S-phase cell’s read depth profile by the average read depth profile across all G1-phase cells to determine which genomic loci should be replicated vs unreplicated in S-phase cells. We posit

that improved approaches for disentangling copy number (CN) from replication signals in scWGS data would unlock the ability to study replication dynamics of individual genetic subclones,

leading to better understanding of how DNA replication drives and is further modulated by genomic instability. We present a statistical method, P robabilistic E stimation of single-cell R

eplication T iming (PERT), to infer S-phase cells and their scRT profiles from unsorted scWGS data. PERT jointly models RT and CN at a subclonal level which critically enables for high

accuracy when analyzing samples with previously unseen RT and CN profiles. After extensive benchmarking, we used PERT to extract previously unquantifiable scRT profiles and cell cycle

distributions from a collection of genomically unstable cell lines, high-grade serous ovarian cancer (HGSOC), and triple negative breast cancer (TNBC) human tumors totaling >50,000 scWGS

cells. First, we investigated the relationship between ancestral RT and the emergence of CNAs, confirming findings from unmatched cell types12 that gains and losses preferentially emerge

from different loci. Second, we modeled the relative impact of cell type, mutational signature, and ploidy on RT and the distribution of early vs late S-phase cells to illustrate that cell

type is the most significant determinant of RT and that replication stress can lead to an accumulation of cells near the S/G2 boundary. Third, we leverage known biology involving late

replication of the inactive chromosome X allele (Xi)26,57 to identify recurrent patterns of Xi loss and reactivation in TNBC and HGSOC tumors. Finally, we investigated the relationship

between cell cycle distributions and clonal fitness. In particular, we used time-series sampling of cell lines and xenografts, along with multi-site sampling of a primary HGSOC tumor, to

define the contexts in which S-phase enrichment (SPE) scores accurately approximate fitness, observing that high SPE associates with greater cisplatin sensitivity and a depletion of S-phase

cells in highly fit whole genome doubled clones. RESULTS ACCURATE ESTIMATION OF SINGLE-CELL REPLICATION TIMING AND CELL CYCLE PHASE WITH PERT The main objective of PERT is to infer transient

replication states and inherited somatic CN states from scWGS data. To do so, PERT implements a Bayesian probabilistic graphical model and a multi-step learning procedure to infer latent

variables. Observed read depth (_Z_) is modeled as dependent on both latent CN (_X_) and replication (_Y_) states across all genomic bins (_N_ cells × _M_ loci) where the replication state

depends on each cell’s time within S-phase (_τ_), each locus’s average RT (_ρ_), and a replication stochasticity parameter (_α_) (Fig. 1a–f, S1). After learning replication states in all

bins, PERT then assigns cells to either S- or G1/2-phases based on the fraction of replicated loci per cell. The main assumptions of the model are that 1) S-phase cells within the same

sample should follow similar RT profiles with RT stochasticity controlled by a latent variable _α_, 2) cells belonging to the same clone should follow similar CN profiles, regardless of

phase, and 3) sequencing bias batch effects are library-specific but shared amongst both S- and G1/2-phase cells from the same library. Such modeling choices critically enable PERT to

maintain high accuracy when analyzing samples with previously unseen CN and RT profiles. PERT is implemented using Pyro58 and is freely available online with user tutorials. All terms in the

graphical model as well as additional mathematical, inference, and implementation details can be found in Fig. S1 and the Methods. QUANTITATIVE BENCHMARKING IN SIMULATED DATA We benchmarked

PERT’s accuracy at inferring somatic CN, replication states, and cell cycle phase through quantitative simulation experiments. We used PERT as a generative model to simulate low-coverage (1

million reads per cell, 0.0003x coverage depth) scWGS data with parameter sweeps over the number of clones, number of cells, the cell-specific CNA rate, read depth overdispersion (_λ_),

replication stochasticity (_α_), ENCODE RT profiles10,23,56, and GC bias coefficients (_β__μ_), (Table 1). We ran PERT with both CN prior settings, clone and composite (Methods), for all

simulated datasets and compared these two results to the Laks et al. classifier50 for cell cycle phase prediction and Kronos28 for scRT estimation. The performance gap between PERT and

Kronos was significant (_p__a__d__j_ < 10−4) for all parameter combinations and increased as a function of cell CNA rate, number of clones, and noise _λ_ (Fig. 1g, S2a–j). PERT’s

replication and CN accuracies decayed at high cell CNA rate and overdispersion (_λ_) values but remained robust to replication stochasticity (_α_), GC bias (_β__μ_), number of clones, and

number of cells (Fig. S2a–t). Comparing the two PERT CN priors, we found higher accuracies when the composite prior was used, particularly at high cell CNA rates, and thus only used the

composite CN prior throughout the rest of our analysis (Fig. S2a–t). The agreement between each cell’s true and inferred fraction of replicated bins enabled PERT to classify cell cycle phase

with 93% overall accuracy (97% accuracy excluding low quality, LQ, cells) when considering all cells with 5-95% replicated bins as S-phase (Fig. 1h). Accordingly, PERT significantly

outperformed (_p__a__d__j_ < 10−4) the Laks et al. classifier in phase accuracy for all simulated datasets and was critically robust to fluctuations in GC bias (_β__μ_,1), indicating that

PERT’s performance will not fluctuate between DLP+49,50, 10X Chromium scDNA51,52, and other scWGS modalities53,54 with unique GC bias relationships (Fig. S2u–ad). Additional benchmarking

information can be found in Supplementary Notes 1–2. These data indicate that PERT significantly improves inference of scRT and phase, particularly in cases where CNAs arise with subclonal

structure, allowing for exploration of replication dynamics in unsorted scWGS data of heterogeneous aneuploid populations. VALIDATION OF PERT IN SORTED DIPLOID AND ANEUPLOID CELL LINES Next

we sought to validate PERT’s accuracy at predicting cell cycle phase and RT in real, clonally heterogeneous scWGS data. To do so, we performed in silico mixing experiments of scWGS data of

two unrelated cell lines that were experimentally sorted into cell cycle phases according to cellular DNA content using FACS. We combined diploid lymphoblastoid GM18507 cells (657 G1 cells,

585 S cells, 337 G2 cells, 1 clone, ~ 0.01-0.1x coverage depth) and aneuploid breast cancer T47D cells (703 G1 cells, 623 S cells, 522 G2 cells, 5 clones, ~ 0.01-0.1x coverage depth)50

into one merged sample for analysis with PERT (Fig. S3a). Critically, these two cell lines have unique CN profiles and should have unique RT profiles. PERT found distinct CN profiles for

S-phase cells of each line and its predicted phases were highly concordant with FACS (Fig. 2a, b). Both GM18507 and T47D samples were enriched for mid-S-phase cells which was expected based

on the sorting protocol described in Laks et al.50 (Fig. S3b). Cell line ‘pseudobulk’ RT profiles showed that 15% (794/5258) of loci had an absolute RT difference >0.25 between GM18507

and T47D, consistent with each cell line having a unique RT program (Fig. 2c). To confirm that these differentially replicating loci were accurately inferred, we compared the PERT RT

profiles with publicly available DNase-seq predicted-RT (predRT) profiles of select ENCODE primary cell samples where RT is predicted from DNase-seq data using the Replicon software18,59. We

found cell type specific correlations between the inferred PERT RT profiles and predRT profiles of breast epithelium and B-cell ENCODE samples which most closely matched T47D and GM18507

cell lines, respectively (Fig. 2d), suggesting that PERT inferred accurate RT differences between the two cell lines in this mixture experiment. We next tested whether the accuracy of

clone-specific RT inference. To test this, we ran PERT independently for each cell line and compared the inferred RT profiles to their results in the previously described in silico mixing

experiment. Encouragingly, we found that both cell lines had RT Pearson correlations of ≥0.99 between their merged and split PERT runs (Fig. 2e, S3c). We performed a similar experiment with

simulated data to ensure that PERT infers accurate clone RT profiles in samples where multiple RT profiles are present. For this experiment, we simulated data in which each clone’s RT

profile matched a unique ENCODE cell line RT profile and compared the true clone RT profiles to their inferred values23. Again, we found true and inferred clone RT profiles all had Pearson

correlations ≥0.99 (Fig. S3d, e). These experiments demonstrate that PERT can accurately identify clone-specific RT profiles within the same sample, giving us confidence to interrogate

clone-specific RT profiles in unsorted tumor scWGS data. The final validation we performed with cell cycle sorted data was to test PERT’s robustness to suboptimal initialization of cell

cycle phase. Given that PERT's initialized G1/2-phase population is often constructed using a conservative subset of all G1/2-phase cells, we wanted to know whether inclusion of too

many true G1/2-phase cells in the initialized S-phase population would bias RT and phase estimates. We thus devised a permutation experiment to test this concern. We generated 23 permuted

datasets in which a subset of GM18507 and T47D FACS G1/2-phase cells were intentionally mislabeled as S-phase during initialization with permutation rates ranging from 1% to 75%. We found

that >90% of all mislabeled cells were accurately recovered (predicted G1/2) across all permutation datasets without compromising identification of non-mislabeled S-phase cells and cell

line-specific RT profiles (Fig. 2f, g, S3f). Mislabeled cells which were erroneously predicted to be in S-phase were disproportionately G2 by FACS and 80–95% replicated by PERT. Orthogonal

per-cell features for these cells were concordant with late S-phase (Fig. 2f, S3g), suggesting they were actually in late S-phase but mistakenly sorted into G2-phase by FACS. We also found

many cell-specific CNAs in the set of FACS S-phase cells predicted to be G1/2-phase (Fig. 2h). Together, our results suggest that prediction of cell cycle phase using unsorted scWGS and PERT

is highly robust to poor initialization. Our results also highlight pitfalls of experimental sorting based on total DNA content, which is known to have S-phase purities ranging from

73–93%60, due to its inability to account for cellular CN variation. REPLICATION TIMING ASSOCIATES WITH CNA BIAS IN MATCHED SAMPLES After validation of PERT’s accuracy in heterogeneous

aneuploid populations, we next investigated how replication timing influences accrual of CNAs. Tumor CNA frequencies have been shown to vary between different RT zones12,29,30,31; however,

none of these associations between RT and other genomic covariates have come from matched samples as cell cycle sorting cannot be accurately performed in solid tumors with subclonal CNAs.

Thus, we sought to confirm previously reported relationships between CNAs and RT by applying PERT to previously published scWGS data of mammary epithelial 184-hTERT cell lines (hereafter

referred to as hTERTs)61. The hTERT samples were engineered using CRISPR/Cas9 to ablate _TP53_, _TP53_/_BRCA1_, and _TP53_/_BRCA2_, and passaged ~60 times with intermediate scWGS sampling to

capture accrual of aneuploidies (Fig. 3a). The initial investigation of this dataset revealed clonal expansions of cell populations with increasing levels of CNAs but excluded analysis of

S-phase cells61. Here, we analyzed all cells with PERT and inferred RT profiles for all clones and samples. We first compared the 500kb resolution RT profiles from these unsorted scWGS

libraries to ENCODE RepliSeq data, confirming that PERT RT profiles were of high-quality before proceeding with our analysis (Fig. 3b, S4, Supplementary Note 3). With high-quality RT and CN

profiles in-hand from this unsorted time-series scWGS data, we interrogated whether RT influences CNA acquisition. To test this, we computed a reference RT profile from the ancestral hTERT

WT (_TP53_ and _BRCA1/2_ WT) cells with no CNAs (SA039 clone A) and then counted the number of gains, losses, and CNA breakpoints in CN profiles that descended from this ancestral WT

population (Fig. 3c,d, S5a–c). We found that gains preferentially arose from early RT loci, losses from late RT loci, and CNA breakpoints from late RT loci (_p__a__d__j_ < 10−4) (Fig.

3e,f, S5d,e). The association between RT and CNAs in matched samples significantly strengthens the hypotheses that these two phenomena are governed by the same underlying processes.

REPLICATION TIMING SHIFTS AT CNAS We then proceeded to ask the inverse question about whether CNAs produce shifts in RT. CNAs have been shown to undergo buffering as changes in DNA copy

number do not proportionally scale with changes in RNA and protein expression62,63,64,65. These studies have shown that while the majority of buffering happens when going from RNA to protein

due to post-transcriptional regulation, the relationship between DNA and RNA is still buffered for many genes. It is possible that shifts in chromatin organization, and thus RT, might be

related to transcriptional buffering. Previous RT studies have shown that shifts to earlier RT represent shifts to more open chromatin and higher per-copy transcriptional activity while

shifts to later RT coincide with the opposite effects3,13,14,17. Therefore, we hypothesized that there might be directional RT shifts at CNA gains and losses as part of a transcriptional

buffering response at the level of chromatin organization. To determine if CNAs produced RT shifts, we identified sets of clone-specific gains and losses which had the same CN state present

in >98% of cells in their respective clones. These changes approximate isogenic events where there is no cellular CN variation within a given clone. Two representative examples of

isogenic CNAs are the 19p gain and 19q loss which are both specific to clone G in sample SA906b (Fig. S6a). We focused on isogenic CNAs as they represent regions of the genome where PERT is

the most accurate (i.e. cell-specific CNA rate <2%). We then computed the RT shift between each clone containing an isogenic CNA and all other reference clones in the sample without the

isogenic CNA, and compared distribution of all RT shifts to a background distribution of unaltered sites where a given clone and reference have matching CN states. Across all isogenic

events, we found that losses significantly shifted to earlier RT (_p_ = 3.85_e_ − 37 per bin, _p_ = 1.32_e_ − 5 per event) while gains shifted to later RT (_p_ = 5.97_e_ − 6 per bin, _p_ =

3.32_e_ − 1 per event) (Fig. 3g, S6b-d). Given the relationships between RT, chromatin organization, and transcriptional activity, this data suggests that losses are buffered towards higher

per-copy transcriptional activity and gains are buffered towards lower per-copy transcriptional activity. These results also indicate that CNA heterogeneity produce clones with distinct RT

profiles. IMPACTS OF CELL TYPE, PLOIDY, AND MUTATIONAL PROCESS ON REPLICATION TIMING AND STRESS DYNAMICS We next sought to quantify the relative impact of cell type, mutational process

signature, and ploidy on replication timing dynamics and identify markers which might be linked to replication stress. We thus applied PERT to a wider cohort of scWGS datasets with diverse

genomic properties and investigated their associations with PERT measurements such as the average time within S-phase and RT profiles of distinct clones. The assembled metacohort comprised 6

TNBC tumors, 13 HGSOC tumors, 3 cancer cell lines, and the previously described hTERTs, totalling over 50,000 single-cell genomes32,50,61. Samples had been labeled with homologous

recombination deficiency (HRD), fold-back inversion (FBI), and tandem duplicator (TD) mutational process signatures in previous analysis by joint decomposition of structural and

single-nucleotide variants66, and contained both whole genome doubled (WGD) and non-genome doubled (NGD) clones. We focused our analysis on the 102 unique clones with >20 S-phase cells

(Fig. 4a). We first investigated the time distribution of S-phase cells across signature and ploidy. We hypothesized that double-strand breaks and fork stalling in clones undergoing

replication stress should increase the number of cells which appear stuck in late S-phase as they have activated checkpoints at the S/G2 boundary or fail to complete replication prior to

cell division31,67,68,69,70. Accordingly, we found that both WGD and mutational processes associated with replication stress, such as TD and FBI, contained higher fractions of late S-phase

cells (Fig. 4b, c). This data suggests that the time distribution of S-phase cells might provide value in detecting ongoing replication stress, DNA damage, and repair processes. We then

proceeded to investigate the RT profiles of these clones. Clustering the pairwise Pearson correlations between clone RT profiles revealed striking sample and cell type specificity (Fig. 4d).

To quantify the relative importance of these covariates and obtain covariate-specific RT profiles, we implemented a factor model which took the matrix of clone RT profiles (_K_ clones × _M_

loci) as input and jointly learned importance weights (_β_s) and latent covariate-specific RT profiles (Methods). As expected, we found the constant term representing global RT was

estimated to have the highest importance with sample and cell type covariates coming in second and third place, respectively; however, ploidy and signature were both an order of magnitude

lower in importance than sample or cell type (Fig. 4e). These results indicate that RT is largely conserved across all clones in our metacohort with most variation being attributed to cell

type or sample (individual) while ploidy and signature have negligible impacts on RT. Curious about which regions of the genome had the highest cell type RT variability, we compared the

latent cell type RT profiles to the 15 ENCODE cell lines with RepliSeq data available10,23. Investigating these RT profiles inferred by either RepliSeq or PERT at chromosome-level

resolution, we found high agreement across cell types on most autosomes with chromosome X having the highest variability in mean RT (Fig. 4f, S7). Since most of the scWGS data was from

female individuals, these results prompted further analysis on the relationship between X-inactivation and RT. CHROMOSOME X REPLICATION TIMING SHIFTS MEASURE THE RATIO OF ACTIVE TO INACTIVE

ALLELES AND REFLECT SELECTION BIAS IN HGSOC AND TNBC TUMORS Given that X-inactivation produces a late replicating inactive allele (Xi) and an early replicating active allele (Xa)26,57, we

hypothesized that the greatest RT shifts would occur from CNAs which disrupted the 1:1 balance of Xa to Xi alleles. To study the relationship between RT and X-inactivation we ran PERT and

SIGNALS, a single-cell allele-specific CN caller which has been extensively characterized and benchmarked in Funnell61, on unsorted scWGS data and compared SIGNALS B-allele frequencies

(BAFs) to PERT chrX RT. SIGNALS defines A and B allele labels such that the major (most prevalent) allele in a sample is A and the minor allele is B; however, it is not known whether the A

or B allele on chrX is Xi. Thus, integrating SIGNALS results with PERT should enable assignment of A- and B-allele labels to Xa and Xi epigenetic states. We defined chrX relative RT as the

mean RT difference between chrX and all autosomes with negative values being later chrX replication. Applying this strategy first to the hTERT cell lines, we found that chrX RT and DNA BAF

were negatively correlated at both sample and clone resolution, with delayed RT associated with balanced allele-specific CN (BAF=0.5), and loss of the B-allele (BAF < 0.5) shifting RT

earlier (Fig. 5a–c). The direction of this correlation suggested that the SIGNALS A-allele label corresponded to Xa and B to Xi for all hTERT samples. This correspondence was further

supported by chrX having a lower BAF in S-phase cells than G1/2-phase cells, indicating that the B-allele was replicating later than the A-allele in all hTERT samples (Fig. S8a–c). These

results demonstrate PERT’s ability to associate the SIGNALS A- and B-allele labels with Xa and Xi epigenetic states, enabling further interrogation of the clonality and recurrence of genomic

events which disrupt X-inactivation using scWGS data alone. We then assessed the degree of chrX RT allelic imbalance in 19 TNBC and HGSOC samples. It has been long-established that many

female reproductive tumors have Xa > Xi imbalances and chrX overexpression based on analysis of bulk DNA and RNA sequencing data71,72; however, these events have yet to be characterized

in scWGS data, meaning that their evolutionary timing, phasing between clones, and the genomic characterization have yet to be jointly analyzed. As expected, all 19 tumors had earlier chrX

RT than the allelically balanced (BAF=0.5) hTERT samples, with a negative correlation between chrX RT and DNA BAF, confirming that many of these tumors had more copies of Xa than Xi (Fig.

5d, e, S8d). Interestingly, many of these allelic imbalances were fully clonal events and arose from either loss of Xi (4/19 clonal full chrX LOH, 4/19 clonal Xq LOH, 2/19 clonal partial

LOH, 1/19 subclonal LOH) or gain of Xa (1/19 clonal full chrX, 4/19 subclonal and/or partial) (Supplementary Data 1), suggesting that Xa > Xi imbalances emerge early in tumor evolution

and that both Xa gain and Xi loss may be independently favorable events. We next investigated whether tumors with earlier chrX RT but minimal allelic imbalance (BAF≈0.5) indeed had higher

chrX transcription. To answer this question, we focused on the subset of samples with matching scRNA-seq data. We compared SIGNALS RNA BAF with DNA BAF to investigate the relationship

between allele-specific transcription and CN of each chromosome. Autosomes maintained a 1:1 relationship between RNA and DNA BAF, indicating that each allelic copy is equally transcribed,

and samples with Xi alleles present had lower RNA BAF than DNA BAF, in line with less per-copy transcription from the Xi allele (Fig. S8e). We termed the difference between chrX RNA and DNA

BAF as the transcription gap where positive values correspond to samples with more transcription of the B-allele than expected based on DNA BAF. We found that the transcription gap

positively correlated with chrX RT (Fig. 5f). This data confirmed that earlier replication of chrX in samples with minimal BAF imbalance coincided with unexpectedly high B-allele

transcription, indicating that RT shifts captured by PERT better reflect transcriptional activity than chrX allele-specific CN ratios. Finally, we sought to identify mechanisms that would

explain unexpected transcription of the inactive X allele. Given that X-inactivation proceeds through the transcription of _XIST_ on the Xq arm71, a long non-coding RNA which silences its

chrX allele in cis, we searched for genomic events in which loss of the B-allele on the Xq arm could lead to re-activation of the B-allele on the Xp arm. We identified clonal loss of

heterozygosity (LOH, BAF=0) on Xq but not Xp in 4 of 19 tumors (2 HGSOC, 2 TNBC, Fig. S8f). Using PERT, we found that the Xp arm of these samples replicated much earlier than the

corresponding locus in the hTERT WT sample (SA039) which had intact _XIST_ transcription on the B-allele (Fig. 5g), indicating that the B-alleles on the Xp arm were reactivated. We then

validated Xp reactivation in these four samples by comparing their DNA and RNA BAFs at chromosome-arm level resolution. This analysis showed that Xp arms maintained a 1:1 ratio between gene

dosage and transcription (Fig. 5h), confirming that both alleles on the Xp arm were transcriptionally active. The clonal nature of these events indicate that they all happened early in tumor

evolution (Fig. S8f). These results demonstrate that using scWGS and PERT to study X-inactivation dynamics is uniquely powerful as RT not only serves as a proxy for transcriptional

activity, but the scWGS data itself also provides orthogonal information to identify the genomic determinants of such overexpression (Fig. 5i). SPE APPROXIMATES PROLIFERATIVE FITNESS IN

STABLE GENOMES After deep exploration of replication timing with PERT, we investigated whether the proportion of S-phase cells in genetic clones could accurately approximate their

proliferative fitness. Experimental attempts to approximate clone proliferation rates through cell cycle distributions have been limited due to the fact that FACS isolation of S-phase cells

is challenging for solid tumors with subclonal CNAs46. Unbiased scRNA-seq is capable of estimating cell cycle phase distributions through the use of cell cycle marker genes; however, these

distributions can only be mapped onto genetic clones when there are many arm-level CNAs that differentiate the clones73,74,75. To determine whether S-phase cells can be used to approximate

proliferative fitness in genetically stable cancer and epithelial cell lines, we ran PERT on gastric cancer cell lines with co-registered doubling times and performed time-series analysis of

the aforementioned hTERT WT data. The gastric cancer data consisted of 3 unique cell lines with co-registered doubling times, 10X scWGS, and 10X scRNA measurements51. Encouragingly, high

PERT G1/2-phase fractions correlated with high doubling times and high scRNA G1-phase fractions (Fig. S9a–c, Supplementary Data 2). Interestingly, the scWGS-defined cell cycle fractions had

higher correlation with doubling time (_r_ = 1.00, _p_ = 3.3_e_ − 3) than the scRNA-defined cell cycle fractions (_r_ = 0.88, _p_ = 3.1_e_ − 1), suggesting that direct measurement of DNA

replication in scWGS might be a better proxy measurement for the proliferation rate than expression of cell cycle marker genes in scRNA (Fig. S9a–c). To assess the relationship between

S-phase enrichment and fitness in time-series data, we used the relative abundance of each clone within each cell cycle phase to compute continuous S-phase enrichment (SPE) scores for each

clone (_c_) × timepoint (_t_) and compared these scores to the clone’s corresponding change in abundance between timepoints _t_ and _t_ + 1 (Fig. 6a–d, Methods). It was critically important

to use clone × timepoint SPE scores instead of the raw S-phase fraction for this data as technical variation between different timepoints (i.e. cell confluence at time of sequencing, sample

handling) could lead to dramatically different S-phase fractions. In the hTERT WT data, we found that SPE scores had a positive trend with expansion (_r_ = 0.51, _p_ = 2.5_e_ − 1, Fig. S9d,

e), indicating that more S-phase cells corresponded to faster proliferation in normal epithelial cells. These results suggested that quantifying the number of S-phase cells in a particular

clone serves as an appropriate proxy for fitness when there is limited genomic instability. GENOMIC INSTABILITY AND CISPLATIN COMPLICATE RELATIONSHIP BETWEEN SPE AND FITNESS We next

investigated whether clone S-phase enrichment still approximated proliferative fitness in genomically unstable populations. Faster cell division can explain why one tumor clone has higher

fitness than others and would be captured by SPE; however, avoiding cell death (via increased tolerance of deleterious mutations, reduced DNA damage, etc.) can also lead to an increase in

proliferative fitness and such processes would not be captured by SPE7,8. To determine whether SPE corresponded to high fitness of genomically unstable cells, we analyzed time-series scWGS

data generated from _TP53_−/− hTERT cell lines and serially propagated TNBC patient derived xenografts (PDXs) (Fig. 6e, S9d). We found no consistent correlation between SPE scores and

observed fitness with intra-sample correlations ranging from _r_ = 0.20 to _r_ = − 0.43 (Fig. 6f, S9f,g, S10a–d). These results suggest that SPE alone might be insufficient to approximate

clone proliferative fitness in TNBC or p53-null genomically unstable contexts, implying that more successful efforts for approximating fitness should also account for clone death rates. We

next asked whether S-phase estimates from PERT help predict the fitness of TNBC PDX clones undergoing cisplatin treatment. Previous analysis of the TNBC PDX data had revealed an inversion of

the clonal fitness landscape upon cisplatin exposure but had not identified any genotypic or phenotypic features to explain why certain clones were more sensitive to cisplatin than

others32. Platinum chemotherapies such as cisplatin are known to selectively target fast-dividing cancer cells by inflicting DNA damage during S-phase, and also elongate S-phase in sensitive

cells by stalling many replication forks76. Therefore, we hypothesized that enrichment of S-phase cells would be a sign of lower fitness in on-cisplatin samples. To test this hypothesis, we

investigated the correlation between SPE and clonal expansion in the cisplatin-treated TNBC PDXs and compared this relationship to the untreated baseline (Fig. 6e). Indeed, we found this

correlation to be significantly more negative in the on-treatment than the untreated context (Wald _p_ = 5.12_e_ − 2, Fig. 6f, S10e–h). These results show that S-phase enriched TNBC clones

are more likely to contract than expand in the presence of cisplatin, implying that SPE scores may be an informative feature for forecasting clone trajectories from cisplatin-treated

samples. WHOLE GENOME DOUBLED CLONES ARE DEPLETED OF S-PHASE CELLS Finally, we sought to better understand the relationship between cell cycle dynamics and proliferative fitness in WGD human

tumors. WGD is a common event in many genomically unstable tumor types including TNBC and HGSOC and its association with poor patient outcomes reflects its high proliferative fitness77. To

determine whether increased fitness of WGD tumor cells could be attributed to faster progression through the cell cycle, we looked at clone SPE scores from scWGS data of an untreated HGSOC

patient with subclonal WGD in the MSK SPECTRUM cohort41,78. Patient OV-081 presented with a primary tumor in the left adnexa consisting of exclusively NGD tumor cells and a metastasis in the

omentum consisting of mostly WGD tumor cells (Fig. 6g, S11a). Running PERT on the tumor and contaminating normal scWGS cells from both sites, we found that the NGD tumor clones (B-D) had

the highest SPE scores, followed by WGD tumor clone (A) and the normal clone (E) (Fig. 6h, S11c). Importantly, we were confident that this discrepancy in SPE was not a site-specific batch

effect as there was a mix of WGD and NGD tumor clones in the omentum. The discrepancy in clone SPE was validated with site-specific scRNA data showing that 53% of tumor cells are cycling in

the NGD-dominant left adnexa (22% S-phase, 31% G2/M-phase) but only 33% of tumor cells are cycling in the WGD-dominant omentum (16% S-phase, 17% G2/M-phase, Fig. S11b). To validate this

phenomenon in the in vitro setting, we looked at the relationship between S-phase enrichment and fitness of the two WGD clones in _TP53_−/− WT sample SA906a. Here, we found that both WGD

clones were depleted for S-phase cells (rank 7/13 and 13/13 in overall clone SPE scores; 7 of 8 clone × timepoint combinations with >10 G1/2-phase cells have negative SPE) despite

having the highest overall fitness (rank 1/13 and 2/13 in fitness scores) (Fig. S12). These results demonstrate that WGD tumor cells achieve high fitness despite having slower progression

through the cell cycle than their NGD counterparts. DISCUSSION Here we demonstrate that S-phase cells from genomically unstable tumors and their replication timing profiles can be inferred

from unsorted single-cell whole genome sequence data using PERT. Our computational advance enabled us to investigate replication dynamics from tumor types such as TNBC and HGSOC, a setting

where previous experimental and computational methods could not properly isolate S-phase populations or account for subclonal copy number variation. After extensive benchmarking on simulated

and in silico mixing experiments of cell cycle sorted data, we ran PERT on a metacohort of >50,000 cells which consisted of data from engineered epithelial cells, cancer cell lines, PDX

models, primary human tumors, time-series experiments, and multiple scWGS sequencing platforms. This data enabled us to interrogate the interplay between RT and genomic instability and

investigate when cell cycle distributions can help forecast clonal evolution. PERT analysis of time-series scWGS from engineered hTERT epithelial cell lines highlighted the complex

relationship between somatic CNAs and RT. Analysis of RT in relation to subsequent CNAs revealed that losses and breakpoints preferentially emerged from late RT loci while gains emerged from

early RT loci. These results are in agreement with correlations between unmatched CNA and RT data12,29,30,31. The relationship between RT and subsequent genomic alterations could be

explained by mechanisms of over- and under-replication2,5,79,80 or reflect differential fitness of gains vs losses in gene-rich early vs gene-poor late RT loci, respectively10. Separately,

shifts to earlier RT at losses and later RT at gains aligns with the directional effects of CNA buffering62,63,64,65 and attributes a portion of said buffering to rewired chromatin

organization. Across our metacohort, we found that RT is highly conserved across various mutational signatures and ploidies, with most RT variation being attributed to cell type and

X-inactivation. It was well known that RT is imprinted during early stages of development and heritable thereafter in normal tissues and genomically stable cancers15,16,17,22. However, the

high cell type RT specificity in our data supports an extension of the same model to genomically unstable cancers with massive structural variation. Instead of mutational signature and

ploidy impacting RT, we found that clones with WGD, the tandem duplication signature, or the fold-back inversion signature all had an enrichment of late over early S-phase cells. Given that

these groups are known to associate with replication stress, this accumulation in late S-phase likely reflects the ongoing repair of double-strand breaks and stalled replication

forks67,68,69,81. CNAs affecting chromosome X had the largest impact on RT due to Xi replicating later than Xa. This enabled PERT to identify recurrent Xa > Xi allelic imbalances in our

HGSOC and TNBC tumors using scWGS alone instead of requiring DNA and RNA data71,72,82. Quantification of clone-specific SPE scores across various datasets enabled us to investigate the

relationship between cell cycle dynamics and proliferative fitness. We found a strong positive correlation between SPE and proliferative fitness in genomically stable hTERT WT epithelial

cells and gastric cancer cell lines. However, a similar correlation was not present in the genomically unstable _TP53_−/− hTERTs and untreated TNBC PDXs. Two confounders may explain the lack

of correlation. First, death rate, which SPE does not measure, may be variable across the clones from the genomically unstable samples. Without measuring the death rate, we are left with an

incomplete picture of fitness, since fitness should be viewed as a composite of birth minus death rates7,8. Second, high SPE scores are unable to distinguish between two scenarios: an

increased birth rate and a longer time to complete S-phase. The duration of S-phase can be variable, especially in the context of replication stress2. Future attempts to predict fitness

would benefit from measuring both cell death and replication stress rates of individual clones – especially in the context of genomically unstable cancers which are prone to high

clone-to-clone variation of these properties. In spite of these potential confounders, we still observed a negative correlation between SPE and fitness in cisplatin-treated TNBC PDXs. High

SPE should correspond to high chemosensitivity regardless of whether higher birth rate or longer S-phase is the underlying reason behind high SPE because S-phase cells themselves are the

target of platinum chemotherapies and a result of their toxicity76. Our data suggests that SPE may serve as an informative biomarker for identifying treatment-resistant clones in patients

receiving neoadjuvant cisplatin. Viewing the depletion of S-phase cells in highly fit WGD clones with these confounders in mind also enables us to deduce that WGD confers high fitness by

reducing death rate, likely via enhanced buffering of deleterious mutations83, implying that WGD cells perform slower but more robust cell division. METHODS PERT MODEL The input for PERT is

scWGS read depth (_Z_) and called CN states for all bins (_N_ cells × _M_ loci) (Fig. S1a). Input CN states are obtained through single-cell CN callers such as HMMcopy49,84 or 10X

CellRanger-DNA51,52. PERT first identifies a set of high-confidence G1/2-phase cells where the input CN states reflect accurate somatic CN. All remaining cells have their input CN states

dropped as they are initially considered to have unknown CN states and cell cycle phase. Most S-phase cells should be present in the unknown initial set. High-confidence G1/2-phase cells are

phylogenetically clustered into clones based on CN using methods such as sitka85 or MEDICC286. Optionally, users can provide the set of high-quality G1/2-phase cells and their mapping to CN

clones as input. The initialized sets of cells are passed into a probabilistic model which infers somatic CN (_X_ ) and replication states (_Y_ ) through three distinct learning steps (Fig.

S1c–e). In Step 1, PERT learns parameters associated with library-level GC bias (_β__μ_, _β__σ_) and sequencing overdispersion (_λ_) by training on high-confidence G1/2 cells (Fig. S1c).

Step 1 conditions on CN (_X_ ), replication (_Y_ ), and coverage/ploidy scaling terms (_μ_) because input CN states are assumed to accurately reflect somatic CN states and all bins are

unreplicated (_Y_ = 0) in high-confidence G1/2 cells. Once _β__μ_, _β__σ_ and _μ_ have been learned in Step 1, we can condition on them in Step 2 (Fig. S1d). Step 2 learns latent parameters

representing each cell’s time in S-phase (_τ__n_), each locus’s replication timing (_ρ__m_), and global replication stochasticity (_α_) to compute the probability that a given bin is

replicated (_Y__n_,_m_ = 1) or unreplicated (_Y__n_,_m_ = 0). Only unknown cells are included in Step 2. Prior belief on each unknown cell’s CN state is encapsulated using a prior

distribution (_π_) which has concentration parameters (_η_) conditioned on the input CN of the most similar high-confidence G1/2 cells. CN prior concentrations are set for each cell by using

the consensus CN profile of the most similar G1/2 clone or a composite scoring of the most similar G1/2 clone and cell CN profiles (Fig. S1f) A full list of model parameters, domains, and

distributions can be found in Fig. S1b. Step 3 is an optional final step which learns CN and replication states for high-confidence G1/2 cells (Fig. S1e). This step is necessary to determine

if any S-phase cells are present in the initial set of high-confidence G1/2 cells. Step 3 conditions on replication timing (_ρ_) and stochasticity (_α_) values learned in Step 2 to ensure

that such properties are conserved between both sets of cells. PERT is designed for scWGS data with coverage depths on the order of 0.01-0.1x and thus 500kb bin sizes are used by default in

this manuscript; however, the model can be run on count data of any bin size as long as sufficient memory and runtime are allocated. We demonstrate PERT’s ability to run on 10X scWGS data at

20kb resolution in Additional File 2. EQUATIONS FOR STEP 1 Given that we have accurate CN caller results for high-confidence G1/2 cells, we can solve for each cell’s coverage/ploidy scaling

term _μ__n_ and condition on it, $${\mu }_{n}=\frac{{\sum }_{m=0}^{M}{Z}_{n , m}}{\mathop{\sum }_{m=0}^{M}{X}_{n , m}}.$$ (1) The latent variables are arranged together in function block

_f_ through the following equations to produce the bin-specific negative binomial event counts _δ__n_,_m_. The GC bias rate of each individual bin (_ω__n_,_m_) depends on the GC content of

the locus (_γ__m_) and the GC bias coefficients (_β__n_,_k_) for the cell, $${\omega }_{n,m}={e}^{\mathop{\sum }_{k=0}^{K}{\beta }_{n,k} * {\gamma }_{m}^{k}}.$$ (2) The expected read count

per bin is computed as follows: $${\theta }_{n,m}={X}_{n,m} * {\omega }_{n,m} * {\mu }_{n}.$$ (3) The expected read count per bin is then used in conjunction with the negative binomial event

success probability term (_λ_) to produce a number of negative binomial event count for each bin, $${\delta }_{n,m}=f({X}_{n,m},{\gamma }_{m},\lambda,{\mu }_{n},{\beta

}_{n,k})=\frac{{\theta }_{n,m} * (1-\lambda )}{\lambda },$$ (4) where we place the constraint _δ__n_,_m_ ≥ 1 to avoid sampling errors in bins with _θ__n_,_m_ ≈ 0. Finally, the read count at

a bin is sampled from an overdispersed negative binomial distribution _Z__n_,_m_ ~ NB(_δ__n_,_m_, _λ_) where the expected read count for _Z__n_,_m_ is _θ__n_,_m_ and the variance is

$\frac{{\theta }_{n,m}}{(1-\lambda )}$. EQUATIONS FOR STEPS 2–3 Steps 2-3 have equations which differ from Step 1 since it must account for replicated bins and cannot solve for _μ__n_

analytically. The probability of each bin being replicated (_ϕ__n_,_m_) is a function of the cell’s time in S-phase (_τ__n_), the locus’s replication timing (_ρ__m_), and the replication

stochasticity term (_α_). Replication stochasticity (_α_) controls how closely cells follow the global RT profile by adjusting the temperature of a sigmoid function. The following equation

corresponds to function block _g_: $${\phi }_{n,m}=g(\alpha, {\tau }_{n},{\rho }_{m})=\frac{1}{1+{e}^{-\alpha ({\tau }_{n}-{\rho }_{m})}}.$$ (5) Equations corresponding to function block _f_

differ from those in Step 1. The total CN (_χ__n_,_m_) is double the somatic CN (_X__n_,_m_) when a bin is replicated (_Y__n_,_m_ = 1), $${\chi }_{n,m}={X}_{n,m} * (1+{Y}_{n,m}).$$ (6) The

GC rates (_ω__n_,_m_) and negative binomial event counts (_δ__n_,_m_) are computed the same as in Step 1 (Eq (2), Eq (4)). However, the expected read count uses total instead of somatic CN,

$${\theta }_{n,m}={\chi }_{n,m} * {\omega }_{n,m} * {\mu }_{n}.$$ (7) Since CN is learned in Steps 2-3, the coverage/ploidy scaling term (_μ__n_) must also be learned. We use a normal prior

${\mu }_{n} \sim {{\rm{N}}}({\mu }_{\mu }^{n},{\mu }_{\sigma }^{n})$ where the approximate total ploidy and total read counts are used to estimate the mean hyperparameters (\({\mu }_{\mu

}^{n}\)). Total ploidies for each cell are approximated using the CN prior concentrations (_η_) and times within S-phase (_τ_) to account for both somatic and replicated copies of DNA that

are present. We fixed the standard deviation hyperparameters (${\mu }_{\sigma }^{n}$) to always be 10x smaller than the means to ensure that _μ__n_≥0 despite use of a normal distribution

(used for computational expediency), $${\mu }_{\mu }^{n}=\frac{{\sum }_{m=0}^{M}{Z}_{n,m}}{(1+{\tau }_{n})\mathop{\sum }_{m=0}^{M}{{{\rm{argmax}}}}_{p}({\eta }_{n,m,p})},$$ (8) $${\mu

}_{\sigma }^{n}=\frac{{\mu }_{\mu }^{n}}{10}.$$ (9) CONSTRUCTING THE CN PRIOR CONCENTRATIONS There are two ways to construct the CN prior concentrations within PERT. The first is to use the

most similar high-confidence G1/2 clone to define the concentrations for each unknown cell (clone method). We assign each unknown cell its clone (_c__n_) via Pearson correlation between the

cell read depth profile (_Z__n_) and the clone pseudobulk read depth profile (_Z__c_), $${c}_{n}={{{\rm{argmax}}}}_{c}({{\rm{corr}}}({Z}_{n},{Z}_{c})).$$ (10) Clone pseudobulk CN and read

depth profiles represent the median profile across all high-confidence G1/2 cells in a given clone _c_. Once we have clone assignments for each unknown cell, the CN concentration of all

possible states _P_ at each genomic bin (_η__n_,_m_,_p_) is constructed to be _w_ times larger for the state _p_ that matches the clone pseudobulk CN state (${X}_{{c}_{n},m}$) for that

same bin compared to all other states. The default setting is _w_ = 106: $${\eta }_{n,m,p}=\left\{\begin{array}{ll}w\quad &\,{{\rm{if}}}\,\,p={X}_{{c}_{n},m}\\ 1\quad

&\,{\mbox{else}}\,.\hfill\end{array}\right.$$ (11) The second way to construct the prior is to leverage additional information from the most similar high-confidence G1/2 cells when

constructing _η__n_,_m_,_p_ (composite method). The rationale for the composite method is that there might be rare CNAs within a clone which only appear in a handful of cells but do not

appear in the clone pseudobulk CN profile _X__c_. To find the most similar high-confidence G1/2 cells, we compute the read depth correlation between the unknown cell (${Z}_{{n}_{s}}$) and

the high-confidence G1/2 cells from the best matching clone (${Z}_{{n}_{g}}$), $$\psi={{\rm{corr}}}({Z}_{{n}_{s}\!},\,\, {Z}_{{n}_{g}}\!).$$ (12) The consensus clone CN profile and top _J_

matches for each unknown cell are then used to construct the CN prior (_η__n_,_m_,_p_). Each row of _ψ_ is sorted to obtain the top _J_ high-confidence G1/2 matches _n__g_(0), …, _n__g_(

_J_−1). All entries are initialized to 1 (_η__n_,_m_,_k_ = 1) before adding varying levels of weight (_w_) to states where the CN matches a G1/2-phase cell or clone pseudobulk CN profile.

The default settings are _w_ = 105 and _J_ = 5: $${\eta }_{n,m,p}=\left\{\begin{array}{ll}+1\quad \hfill&\,\,{\mbox{everywhere}}\,\\+w * 2 * J\quad

\hfill&{{\rm{if}}}\,\,p={X}_{{c}_{n},m}\\+w * (\,J-0)\quad \hfill&\,\,{{\rm{if}}}\,\,p={X}_{{n}_{g0},m}\\+w * (\,J-1)\quad \hfill&\,\,{{\rm{if}}}\,\,p={X}_{{n}_{g1},m}\\ \ldots

\quad \hfill \\+w\quad \hfill&\,\,\,\,\,\,\,\,\,{{\rm{if}}}\,\,p={X}_{{n}_{g(J-1)},m}.\end{array}\right.$$ (13) By default, the composite method is used during Step 2 and the clone

method is used during Step 3; however, the user may select between both methods during Step 2. Using the clone method during Step 2 should be seen as a ‘vanilla’ version of PERT which should

be used when very few cell-specific CNAs are present. The clone method is used for Step 3 since the composite method would produce many self-matching cells. A comparison of the two methods

can be seen when benchmarking PERT on simulated data (Supplementary Notes 1-2). MODEL INITIALIZATION AND HYPERPARAMETERS Splitting cells into initial sets of high-confidence G1/2-phase and

unknown cells is performed by thresholding heuristic per-cell features known to correlate with cell cycle phase. PERT uses clone-normalized number of input CN breakpoints between neighboring

genomic bins (BKnorm) and clone-normalized median absolute deviation in read depth between neighboring genomic bins (MADNnorm). These features are referred to as ‘HMMcopy breakpoints’ and

‘MADN RPM’, respectively, in the main text and figures. Note that breakpoints between the start and end of adjacently numbered chromosomes are not counted. $${{\mbox{BK}}}_{n}=\mathop{\sum

}_{m=0}^{M-1}\left\{\begin{array}{ll}1\quad &\,{{\rm{if}}}\,\,{X}_{n,m}\ne {X}_{n,m+1}\\ 0\quad &\,{\mbox{else}}\,\hfill\end{array}\right.$$ (14)

$${{\mbox{BKnorm}}}_{n}={{\mbox{BK}}}_{n}-\frac{1}{C}{\sum }_{c=0}^{C}{{\mbox{BK}}}_{c}$$ (15) $${{\mbox{MADN}}}_{n}={{\rm{Med}}}\,\left({\sum

}_{m=0}^{{\rm{M}}-1}{Z}_{n,m}-{Z}_{n,m+1}\right)$$ (16) $${{\mbox{MADNnorm}}}_{n}={{\mbox{MADN}}}_{n}-\frac{1}{C}{\sum }_{c=0}^{C}{{\mbox{MADN}}}_{c}$$ (17) Under default settings, PERT

initializes cells with MADNnorm < 0 and BKnorm < 0 as high-confidence G1/2-phase with all other cells as unknown phase. Initial cell phases can also be input by users based on

experimental measurements or alternative metrics such as 10X CellRanger-DNA’s ‘dimapd’ score (used in27,28,51), the Laks et al. classifiers’ S-phase probability and quality scores50, or read

depth correlation with a reference RT profile55. To improve convergence to global optima, each cell’s time in S-phase (_τ__n_) is initialized using scRT results from a clone-aware

adaptation of Dileep et al.25 which thresholds the clone-normalized read depth profiles into replicated and unreplicated bins. Each unknown cell _n_ is assigned to clone _c_ with the highest

correlation between cell and clone pseudobulk read depth profiles (Eq (10)). The read depth of each cell is then normalized by the CN state with highest probability within the CN prior

(_η__n_,_m_,_p_), $${y}_{n,m}=\frac{{Z}_{n,m}}{{{{\rm{argmax}}}}_{p}({\eta }_{n,m,p})}.$$ (18) The clone-normalized read depth profiles ( _y__n_) are then binarized into replication state

profiles (_Y__n_) using a per-cell threshold (_t__n_ ∈ [0, 1]) that minimizes the Manhattan distance between the real data and its binarized counterpart.

$${t}_{n}={{\mbox{argmin}}}_{t}\Bigg| \, {y}_{n,m}-\left\{\begin{array}{ll}1\quad &\,{{\rm{if}}}\,\,{y}_{n,m}\ge {t}_{n}\\ 0\quad &\,{\mbox{else}}\,\hfill\end{array}\right.\Bigg|$$

(19) $${Y}_{n,m}=\left\{\begin{array}{ll}1\quad &\,{{\rm{if}}}\,\,{y}_{n,m}\ge {t}_{n}\\ 0\quad &\,{\mbox{else}}\,\hfill\end{array}\right.$$ (20) The fraction of replicated bins per

cell from the deterministic replication states _Y__n_,_m_ are then used to initialize the parameter representing each cell’s time in S-phase (_τ__n_) within PERT’s probabilistic model.

$${\tau }_{n}=\frac{1}{M}{\sum }_{m=0}^{M}{Y}_{n,m}.$$ (21) Initialization of _τ__n_ is particularly important because the model might mistake an early S-phase cell (<20% replicated) for

a late S-phase cell (>80% replicated), or vice versa, as both have relatively ‘flat’ read depth profiles compared to mid-S-phase cells. Thus _τ__n_ will rarely traverse mid-S-phase values

during inference when its initial and true values lie far apart. Additional parameter initializations include _λ_ = 0.5 for negative binomial overdispersion and _β__σ_,_k_ = 10−_k_ for the

standard deviation of each GC bias polynomial coefficient _k_. Unlike _τ__n_, the model is unlikely to get stuck at local minima with these parameters so they are initialized to the same

values globally. The latent variables _β__μ_, _ρ_, and _α_ are sampled from prior distributions with fixed hyperparameters. The mean of all GC bias polynomial coefficients (_β__μ_) are drawn

from the prior N(0, 1). Each locus’s replication timing (_ρ_) is drawn from the prior Beta(1, 1) to create a uniform distribution on the domain [0, 1]. The replication stochasticity

parameter (_α_) is drawn from the prior distribution _Γ_(shape = 2, rate = 0.2) which has a mean of $\frac{{{\rm{shape}}}}{{{\rm{rate}}}}=10$ and penalizes extreme values on a positive

real domain. PERT PHASE PREDICTIONS We used the PERT model output to predict ‘G1/2’, ‘S’, and ‘low quality’ (LQ) phases for each cell. G1/2-phase cells were defined by having <5% or

>95% replicated bins. Of the remaining cells with 5-95% replicated bins, those with high read depth autocorrelation (>0.5), replication state autocorrelation (>0.2), or fraction of

homozygous deletions (_X_ = 0, >0.05) were deemed to be low quality. All other cells were deemed to be in S-phase. Using 500kb bins, autocorrelation scores were the average of all

autocorrelations ranging from 10 to 50 bin lag size. Thresholds used for splitting S and LQ phases can be adjusted by users should the default settings produce unexpected output. MODEL

CONSTRUCTION AND INFERENCE PERT is written using Pyro which is a probabilistic programming language written in Python and supported by PyTorch backend58. PERT uses Pyro’s implementation of

Black Box Variational Inference (BBVI) which enables the use of biologically-informed priors instead of being limited to conjugate priors87. Specifically, we use the AutoDelta function which

uses a Taylor approximation around the _maximum a posteriori_ (MAP) to approximate the posterior. Optimization is performed using the Adam optimizer. By default, we set a learning rate of

0.05 and convergence is determined when the relative change in the evidence lower bound (ELBO) is <10−6 or the maximum number of iterations (2000 for step 2, 1000 for steps 1 and 3) is

reached. SIMULATED DATASETS To benchmark PERT’s ability to accurately infer single-cell replication states, somatic CN states, and cell cycle phase against Kronos and the Laks et al. cell

cycle classifier, we simulated datasets with varying clonal structures and cell-specific CNA rates. Somatic CN states are simulated by first drawing clone CN profiles and then drawing

cell-specific CNAs that deviated from said clone CN profile. All CNAs are drawn at the chromosome-arm level. 400 S- and 400 G1/2-phase cells are simulated in each dataset. Once CN states

have been simulated, we simulate the read depth using PERT as a generative model. We condition the model on the provided _β__μ_, _β__σ_, _λ_, _α_, _ρ_, _γ_, and _X_ parameters when

generating cell read depth profiles. All read depth values (_Z_) are in units of reads per million. RepliSeq data for various ENCODE cell lines are used to set _ρ_ values for each clone23.

G1/2-phase cells were conditioned to have all bins as unreplicated _Y_ = 0. S-phase cells had their cell cycle times _τ_ sampled from a Uniform(0, 1) distribution. A table of all the

parameters used in each simulated dataset can be found in Table 1. We called CN on simulated binned read count data using HMMcopy. Given that Kronos was designed as an end-to-end pipeline

that takes in raw BAM files, we forked off the Kronos repository and edited their ‘Kronos RT’ module to accept binned read count and CN states as input. Cells were split into S- and

G1/2-phase Kronos input populations according to their true phase. Code to our forked repository can be found at https://github.com/adamcweiner/Kronos_scRT. Similarly, we removed features

from the Laks et al. cell cycle classifier that used alignment information such as the percentage of overalapping reads per cell. The Laks classifier was retrained and benchmarked with said

features removed prior to deployment on simulated data (Fig. S13). EXPERIMENTAL METHODS AND PARTICIPANT DETAILS CELL CULTURE AND PDXS Cell lines and PDX samples were generated in Laks et

al.50, Funnell et al.61, and Salehi et al.32 studies. Cell line samples included (1) an immortalized normal human female breast epithelial cell line 184-hTERT L9, (2) four sets of 184-hTERT

cell lines with perturbations in TP53−/− passaged over multiple time points, (3) five 184-hTERT cell lines with a variety of genetic perturbations in the repair pathway, including TP53−/−,

BRCA1+/−, BRCA1−/−, BRCA2+/− and BRCA2−/−, (4) lymphoblastoid cell line GM18507, (5) HER2+ breast cancer cell line T47D, and (6) ovarian cancer cell line OV2295. PDX samples included 6

different models of TNBC, three of which had cisplatin-treated replicates, and 15 different models of HGSOC. Serial passaging of cell lines was done by seeding approximately 1 million cells

each time and profiling with scWGS (DLP+) at 4–11 different passage points with a mean of 6,070 cells at each time point. Xenograft-bearing mice were sacrificed when the tumor size

approached 1000 mm3 in volume, according to the limits of the experimental protocol. For the serially passaged PDXs, the harvested tumor was minced finely with scalpels and then mechanically

disaggregated for one minute using a Stomacher 80 Biomaster (Seward) in 1 ml to 2 ml cold DMEM-F12 medium with glucose, L-glutamine and HEPES. Aliquots from the resulting suspension of

cells and fragments were used for xenotransplants in the next generation of mice and cryopreserved for sequencing. Serially transplanted aliquots represented approximately 0.1– –0.3% of the

original tumor volume from the previous mouse. MSK SPECTRUM PATIENT DATA We obtained matched scRNA (10x Genomics 3’-end) and scWGS (DLP+) from HGSOC patient OV-081 from the MSK SPECTRUM

cohort. Single cell suspensions from surgically excised tissues were generated and flow sorted on CD45 to separate the immune component. CD45 negative fractions were then sequenced using the

DLP+ platform. scRNA-seq was performed on both CD45 positive and negative fractions, with the malignant cells used in this study being the CD45 negative fraction. Detailed descriptions of

data generation can be found in the two SPECTRUM studies41,78. ACQUISITION OF SAMPLES FROM PATIENTS Patient samples from University of British Columbia32,66 were acquired with informed

consent, according to procedures approved by the Ethics Committees at the University of British Columbia. Patients with breast cancer undergoing diagnostic biopsy or surgery were recruited

and samples were collected under protocols H06-00289 (BCCA-TTR-BREAST), H11-01887 (Neoadjuvant Xenograft Study), H18-01113 (Large-scale genomic analysis of human tumors) or H20-00170

(Linking clonal genomes to tumor evolution and therapeutics). HGSOC samples were obtained from women undergoing debulking surgery under protocols H18-01652 and H18-01090. Patient samples

from Memorial Sloan Kettering Cancer Center41,61,78 were obtained following Institutional Review Board (IRB) approval and patient informed consent (protocols 15–200 and 06-107 for HGSOC;

18–376 for TNBC). HGSOC and TNBC clinical assignments were performed according to American Society of Clinical Oncology guidelines for ER, PR and HER2 positivity. EXPERIMENTAL CELL CYCLE

SORTING GM18507 and T47D cell lines were experimentally sorted into G1, S, G2, and dead cells prior to scWGS (DLP+) sequencing50. 2 million cells fresh from culture suspended in 1 mL PBS

were stained with Hoechst 33342 (Invitrogen), caspase 3/7 (Essen Biosciences), and propidium iodide (PI, Sigma Aldrich) for flow sorting separation of different cell phases. Flow sorting was

carried out at the Terry Fox Laboratory, (BC Cancer Research Centre) using a BD FACSAria III cell sorter equipped with 375 nm, 405 nm, 488 nm, 561 nm and 640 nm laser. We excluded apoptotic

cells on the live cell gate by gating out Caspase 3/7 high cells. On this live non-apoptotic cell gate, we gated for the cell cycle phases using DNA content of the cells measured by Hoechst

33342 staining to sort the G1, S, and G2 phases of the cell cycle individually. Cells from different cell cycle fractions were stained and dispensed into DLP+ chips for sequencing. SCWGS

DATA PROCESSING Unless otherwise noted, all scWGS data was generated via DLP+. All DLP+ data was passed through https://github.com/shahcompbio/single_cell_pipelineusing Isabl before

downstream analysis88. This pipeline aligned reads to the hg19 reference genome using BWA-MEM. Each cell was then passed through HMMcopy using default arguments for single-cell sequencing.

HMMcopy’s output provided read count and gc-corrected integer CN states for each 500kb genomic bin across all cells and loci. Loci with low mappability (<0.95) and cells with low read

count (<500,000 reads) were removed. Cells were also filtered for contamination using the FastQ Screen which tags reads as matching human, mouse, or salmon reference genomes. If >5%

of reads in a cell are tagged as non-human the cell is flagged as contaminated and subsequently removed. Cells were only passed into phylogenetic trees if they were called as G1/2-phase and

high quality by classifiers described in Laks et al.50. In certain cases, cells might be manually excluded from the phylogenetic tree if they pass the cell cycle and quality filters but have

an abnormally high number of HMMcopy breakpoints. All cells included in the phylogenetic tree are initialized in PERT as the set of high-confidence G1/2 cells; all cells outside the tree

are initialized as unknown cells. PHYLOGENETIC CLUSTERING BASED ON CN PROFILES We used the clone IDs from Funnell et al. for high-confidence G1/2 cells61. These single-cell phylogenetic

trees were generated using sitka85. Sitka uses CN breakpoints (also referred to as changepoints) across the genome as binary input characters to construct the evolutionary relationships

between cells. Sitka was run for 3,000 chains and a consensus tree was computed for downstream analysis. The consensus tree was then cut at an optimized height to assign all cells into

clones (clusters). For datasets with no sitka trees provided or select datasets, cells were clustered into clones using K-means where the number of clones was selected through Akaike

information criterion. We performed a K-means reclustering for the Salehi et al. TNBC PDX data32 as sitka produced small clusters which inhibited robust tracking of S-phase clone fractions

across multiple timepoints. PSEUDOBULK PROFILES Many times in the text we describe pseudobulk replication timing, copy number, or read depth profiles within a subset of interest (i.e. cells

belonging to the same clone or sample). To compute pseudobulk profiles, we group all the cells of interest and then take the median values for all loci in the genome. When computing

pseudobulk CN profiles, we only include the cells of the modal (most common) ploidy state before computing median values for all loci. S-PHASE TIMES When we refer to the time of individual

S-phase cells, such a time is calculated as the fraction of replicated bins per cell. Thus, S-phase times near 1 are late S-phase cells and S-phase times near 0 are early S-phase cells.

COMPARISON OF PERT RT PROFILES TO PREDICTED RT PROFILES Predicted RT (predRT) profiles for ENCODE primary cell samples were obtained from Salvadores et al.18. The authors used the Replicon

software with default settings59 to predict RT from DNase-seq chromatin accessibility data of each ENCODE sample. Human reference hg19 coordinates were used to create RT profiles with 1 MB

bin size for this comparison. PLOIDY-NORMALIZED CN We sometimes opt to show ploidy-normalized CN profiles instead of their integer CN values. The ploidy _p__n_ of a given CN profile _n_ is

defined as the mode of all integer CN states _X__n_,_m_. We then compute a ploidy-normalized CN value _χ__n_,_m_ as follows: $${\chi }_{n,m}=\frac{{X}_{n,m}-{p}_{n}}{{p}_{n}}.$$ (22) Note

that _n_ can be a profile corresponding to a cell, clone, or sample. IDENTIFICATION OF RT SHIFTS All RT profiles were centered to have a mean of 0 and standard deviation of 1 across the

entire genome prior to comparison to one another. This was essential as certain clones might have a skewed distribution of early vs late S-phase cells and thus their RT profiles would have

later or earlier replication across all positions in the genome. BESPOKE FACTOR MODEL WHICH LEARNS FEATURE IMPORTANCE AND RT PROFILES DIRECTLY FROM CLONE RT PROFILES We built a multivariate

regression model which learned importance terms and RT profiles for each feature directly from the matrix of clone RT profiles. The model has the following terms and equations: RT_c_,_m_:

the observed replication timing of clone _c_ at locus _m_ on the domain of [0,1]. This represents the fraction of replicated bins at locus _m_ across all S-phase cells _n_ in clone _c_.

_ρ__k_,_m_: the latent replication timing of feature _k_ at locus _m_. _I__c_,_k_: indicator mask representing which features _k_ are present for clone _c_. _β__k_: importance term for

feature _k_. _σ_: standard deviation term when going from expected to observed replication timing; sampled from a uniform distribution on the domain (0,1). $${{\mbox{RT}}}_{c,m} \sim

N(\frac{1}{1+{e}^{\mathop{\sum }_{k=0}^{K}({\beta }_{k} * {I}_{c,k} * {\rho }_{k,m})}},\sigma ).$$ (23) All latent replication timing terms _ρ__k_ are normalized to have mean of 0 and

variance of 1 and there is only _β_ value per class of features $${\beta }_{k}=\left\{\begin{array}{ll}{\beta }_{t}\quad &\,\,\,\,\, {{\rm{if}}}\,{{\rm{k}}}\,{{\rm{is}}} \, {{\rm{a}}} \,

{{\rm{cell}}} \, {{\rm{type}}} \, {{\rm{feature}}}\,\\ {\beta }_{s}\quad &\,\,\,\,\,\,\,\,{{\rm{if}}}\,k\,{{\rm{is}}} \, {{\rm{a}}} \, {{\rm{signature}}} \, {{\rm{feature}}}\,\\ {\beta

}_{p}\quad &{{\rm{if}}}\,{{\rm{k}}}\,{{\rm{is}}}\, {{\rm{a}}}\, {{\rm{ploidy}}} \, {{\rm{feature}}}\,\\ {\beta }_{d}\quad &\,\,\,{{\rm{if}}}\,k\,{{\rm{is}}} \, {{\rm{a}}} \,

{{\rm{sample}}} \, {{\rm{feature}}}\,\\ {\beta }_{g}\quad &{{\rm{if}}}\,k\,{{\rm{is}}} \, {{\rm{a}}} \, {{\rm{global}}} \, {{\rm{feature}}}\,\end{array}\right.$$ (24) This model is

implemented in pyro and fit using BBVI58,87. We use the AutoNormal function which uses Normal distributions to approximate the posterior. Optimization is performed using the Adam optimizer

with a learning rate of 0.02. Convergence is determined when the relative change in ELBO is <10−3 of the total ELBO change between first and current iteration. USING SIGNALS TO QUANTIFY

ALLELIC RATIOS FROM SCDNA- AND SCRNA-SEQ In brief, SIGNALS uses haplotype blocks genotyped in single cells and implements an hidden Markov model (HMM) based on a Beta-Binomial likelihood to

infer the most probable haplotype-specific state. SHAPEIT was used to generate the haplotype blocks for SIGNALS input89. A full description of SIGNALS can be found in Funnell et al.61.

Within each haplotype block for each sample, the major (most common) allele is labeled as the A-allele with the minor (less common) allele labeled as the B-allele. The B-allele frequency

(BAF) is computed as the fraction of B-allele heterozygouos single nucleotide polymorphisms (SNPs) out of all heterozygous SNPs present in a given bin. SIGNALS is run on scDNA data by

default but when scRNA data is also available, the haplotype blocks derived from the scDNA data can be used to extract A- and B-allele counts in the scRNA data too (albeit with much fewer

counts as there are fewer SNPs sequenced in scRNA data). SIGNALS output for all scDNA samples is shown in Supplementary Data 1. GASTRIC CANCER CELL LINE DATA 10X Chromium single-cell DNA

(10X scWGS) data of gastric cancer cell lines NCI-N87, HGC-27, and SNU-668 were downloaded from SRA (PRJNA498809). CN calling was performed using the CellRanger-DNA pipeline using default

parameters. Data was aggregated from 20kb to 500kb bins for analysis with PERT. Each cell line’s doubling time and fraction of G1-phase scRNA cells were extracted from Andor et al.51. PERT

output for these sames is shown in Supplementary Data 2. CLONE S-PHASE ENRICHMENT SCORES To test whether a clone (_c_) is significantly enriched or depleted for S-phase cells at a given

timepoint (_t_), we must compare that clone’s fraction in both S- and G1/2-phases. We first define the following variables as such: _N__s_,_c_,_t_: Number of S-phase cells belonging to clone

_c_ at time _t_ _N__g_,_c_,_t_: Number of G1/2-phase cells belonging to clone _c_ at time _t_ _N__s_,_t_: Total number of S-phase cells across all clones at time _t_ _N__g_,_t_: Total

number of G1/2-phase cells across all clones at time _t_ _N__t_: Total number of cells in a population at time _t_ (all clones, all phases) We can then define the fractions of S- and

G1/2-phase cells assigned to clone _c_ at time _t_ (_f__s_,_c_,_t_, _f__g_,_c_,_t_): $${f}_{s,c,t}=\frac{{N}_{s,c,t}}{{N}_{s,t}},$$ (25) $${f}_{g,c,t}=\frac{{N}_{g,c,t}}{{N}_{g,t}}.$$ (26)

Using the fraction of G1/2-cells belonging to clone _c_, we can compute the expected total number of cells in clone _c_ and time _t_ across all cell cycle phases, $$E({N}_{c,t})={f}_{g,c,t}

* {N}_{t}.$$ (27) We produce a p-value for enrichment of S-phase cells using a hypergeometric test scipy.stats.hypergeom(M = _N__t_, n = _N__s_,_t_, N = _E_(_N__c_,_t_)).sf(_N__s_,_c_,_t_).

To produce a p-value for S-phase depletion we subtract this enrichment p-value from 1. All p-values are Bonferroni-corrected by dividing by the total number of statistical tests. p-adjusted

thresholds of 10−2 are used for saying a clone is significantly enriched or depleted for S-phase cells within a given library. The enriched (_p_+,_c_,_t_) and depleted (_p_−,_c_,_t_)

p-values from this hypergeometric test are then compared to one another to produce a single continuous SPE score (_ξ__c_,_t_). Positive values indicate enrichment for S-phase cells and

negative values indicate depletion of S-phase cells, $${\xi }_{c,t}= {\log }_{10}({p}_{-,c,t})-{\log }_{10}({p}_{+,c,t}).$$ (28) CLONE EXPANSION SCORES For time-series scWGS experiments, we

computed clone expansion scores for each clone _c_ at time _t_ (_S__c_,_t_) by examining the fraction of G1/2-phase cells belonging to clone _c_ at timepoint _t_ (_f__g_,_c_,_t_) and the

subsequent timepoint (_f__g_,_c_,_t_+1). Positive values indicate the clone expands by the next timepoint, $${S}_{c,t}={f}_{g,c,t+1}-{f}_{g,c,t}.$$ (29) We use the same equation when

computing the overall fitness of clones across multiple timepoints but instead treat the first timepoint as _t_ and the final timepoint as _t_ + 1. COMPARING SPE TO EXPANSION IN TREATED VS

UNTREATED DATA To test that treated clones had a significant difference in their relationship between SPE scores (_ξ__c_,_t_) and expansion scores (_S__c_,_t_) in treated (_T_) vs untreated

(_U_) data, we first fit a linear regression curve to the untreated data, $$S_{c,t}^{U} \sim f({\xi }_{c,t}^{U},\hat{\beta}^{U})=\hat{\beta_{0}^{U}}+\hat{\beta_{1}^{U}} * \xi_{c,t}^{U}.$$

(30) We then computed the residuals between the treated data and this line of best fit, $${S}_{c,t}^{U-T}=(\hat{{\beta }_{0}^{U}}+\hat{{\beta }_{1}^{U}} * {\xi }_{c,t}^{T})-{S}_{c,t}^{T}.$$

(31) We then computed a second linear regression curve to the residuals ${S}_{c,t}^{U-T} \sim f({\xi }_{c,t}^{T},\hat{\beta}^{U-T})$ and computed the p-value for a hypothesis test whose

null hypothesis is that the slope is zero, using Wald Test with t-distribution of the test statistic. Having a _p_ < 0.05 indicated that the slope of the treated and untreated lines are

significantly different. All clone × timepoint combinations with <10 G1/2-phase cells were excluded from such analysis. For each point, both _t_ and _t_ + 1 are either untreated or

on-treatment, thereby excluding points where _t_ is untreated and _t_ + 1 is treated. CELL CYCLE ANALYSIS OF SCRNA DATA When available, we validated PERT cell cycle distributions using the

cell cycle distributions estimated through scRNA sequencing. We determined the cell cycle phase of each scRNA cell using the Seurat CellCycleScoring() function73 which uses a set of S- and

G2M-phase markers derived from Tirosh et al.74. STATISTICAL TESTS When boxplots are presented in the figures, hinges represent the 25% and 75% quantiles and whiskers represent the ±1.5x

interquartile range. Statistical significance is tested using two-sided independent t-tests from scipy.stats unless otherwise noted. Bonferroni correction is implemented for all statistical

tests to limit false discovery. The number of stars is a shorthand for the adjusted p-value of a given statistical test (<10−4: ****, <10−3: ***, <10−2: **, <0.05: *, ≥0.05:

ns). Shaded areas surrounding linear regression lines of best fit represent 95% confidence intervals obtained via boostrapping (n=1000 boostrap resamples). Unless otherwise noted, linear

regressions are annotated with Pearson correlation coefficients (_r_) and the p-value for a hypothesis test whose null hypothesis is that the slope is zero, using the two-sided Wald Test

with t-distribution of the test statistic. REPORTING SUMMARY Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. DATA

AVAILABILITY Source and supplementary data from this study are provided at the following zenodo page 12786373. Raw and pre-processed data from Funnell et al.61 can be found on zenodo page

6998936. Raw scWGS data from Laks et al. and Salehi et al.32,50 are available from the European Genome-Phenome under study IDs EGAS00001004448 and EGAS00001003190, respectively. Publicly

accessible and controlled access data generated and analyzed from MSK SPECTRUM HGSOC patients are documented in Synapse under SynID syn25569736. Raw DNA sequencing data is often subject to

restrictions in order to protect patient privacy. Specific restrictions and instructions to obtain access can be found on the data repository websites for each study listed above. All raw

RNA sequencing data and processed data is available for unrestricted use. CODE AVAILABILITY The following code repositories are publicly available and contain tutorials for installation and

use. Package containing PERT model and other tools for scRT analysis90: https://github.com/shahcompbio/scdna_replication_tools (v1.0.0). DLP+ single-cell whole genome sequencing pipeline:

https://github.com/shahcompbio/single_cell_pipeline. Data preprocessing, analysis scripts, and figure generation: https://github.com/shahcompbio/scdna_replication_paper REFERENCES *

Williams, G. H. & Stoeber, K. The cell cycle and cancer. _J. Pathol._ 226, 352–64 (2012). Article CAS PubMed Google Scholar * Gaillard, H., García-Muse, T. & Aguilera, A.

Replication stress and cancer. _Nat. Rev. Cancer_ 15, 276–289 (2015). Article CAS PubMed Google Scholar * Rivera-Mulia, J. C. et al. Allele-specific control of replication timing and

genome organization during development. _Genome Res_ 28, 800–811 (2018). Article CAS PubMed PubMed Central Google Scholar * Garribba, L. et al. Short-term molecular consequences of

chromosome mis-segregation for genome stability. _Nat. Commun._ 14, 1353 (2023). Article ADS CAS PubMed PubMed Central Google Scholar * Shaikh, N. et al. Replication stress generates

distinctive landscapes of DNA copy number alterations and chromosome scale losses. _Genome Biol._ 23, 223 (2022). Article CAS PubMed PubMed Central Google Scholar * Burrell, R. A. et

al. Replication stress links structural and numerical cancer chromosomal instability. _Nature_ 494, 492–496 (2013). Article ADS CAS PubMed PubMed Central Google Scholar * Nowell, P. C.

The clonal evolution of tumor cell populations. _Science_ 194, 23–8 (1976). Article ADS CAS PubMed Google Scholar * Greaves, M. & Maley, C. C. Clonal evolution in cancer. _Nature_

481, 306–13 (2012). Article ADS CAS PubMed PubMed Central Google Scholar * Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers.

_Nature_ 486, 395–399 (2012). Article ADS CAS PubMed Google Scholar * Hansen, R. S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing.

_Proc. Natl Acad. Sci. USA_ 107, 139–44 (2010). Article ADS CAS PubMed Google Scholar * Guilbaud, G. et al. Evidence for sequential and increasing activation of replication origins

along replication timing gradients in the human genome. _PLoS Comput Biol._ 7, e1002322 (2011). Article CAS PubMed PubMed Central Google Scholar * De, S. & Michor, F. DNA

replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. _Nat. Biotechnol._ 29, 1103–1108 (2011). Article CAS PubMed PubMed Central Google

Scholar * Rivera-Mulia, J. C. et al. Dynamic changes in replication timing and gene expression during lineage specification of human pluripotent stem cells. _Genome Res._ 25, 1091–1103

(2015). Article CAS PubMed PubMed Central Google Scholar * Rivera-Mulia, J. C. et al. Replication timing networks reveal a link between transcription regulatory circuits and replication

timing control. _Genome Res_ 29, 1415–1428 (2019). Article CAS PubMed PubMed Central Google Scholar * Vouzas, A. E. & Gilbert, D. M. Mammalian DNA replication timing. _Cold Spring

Harbor Perspectives in Biol._ 13, http://cshperspectives.cshlp.org/content/13/7/a040162.abstract. (2021). * Donley, N. & Thayer, M. J. DNA replication timing, genome stability and

cancer: late and/or delayed DNA replication timing is associated with increased genomic instability. _Semin Cancer Biol._ 23, 80–9 (2013). Article CAS PubMed PubMed Central Google

Scholar * Klein, K. N. et al. Replication timing maintains the global epigenetic state in human cells. _Science_ 372, 371–378 (2021). Article ADS CAS PubMed PubMed Central Google

Scholar * Salvadores, M. & Supek, F. Cell cycle gene alterations associate with a redistribution of mutation risk across chromosomal domains in human cancers. _Nat. Cancer_ 5, 330–346

(2024). Article CAS PubMed Google Scholar * Spielmann, M., Lupiáñez, D. G. & Mundlos, S. Structural variation in the 3D genome. _Nat. Rev. Genet._ 19, 453–467 (2018). Article CAS

PubMed Google Scholar * Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. _Nature_ 486, 346–52 (2012). Article CAS PubMed

PubMed Central Google Scholar * Sheltzer, J. M., Torres, E. M., Dunham, M. J. & Amon, A. Transcriptional consequences of aneuploidy. _Proc. Natl Acad. Sci. USA_ 109, 12644–9 (2012).

Article ADS CAS PubMed PubMed Central Google Scholar * Ryba, T. et al. Abnormal developmental control of replication-timing domains in pediatric acute lymphoblastic leukemia. _Genome

Res_ 22, 1833–44 (2012). Article CAS PubMed PubMed Central Google Scholar * The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. _Nature_ 489,

57–74 (2012). * Hulke, M. L., Massey, D. J. & Koren, A. Genomic methods for measuring DNA replication dynamics. _Chromosome Res._ 28, 49–67 (2020). Article CAS PubMed Google Scholar

* Dileep, V. & Gilbert, D. M. Single-cell replication profiling to measure stochastic variation in mammalian replication timing. _Nat. Commun._ 9, 427 (2018). Article ADS PubMed

PubMed Central Google Scholar * Takahashi, S. et al. Genome-wide stability of the DNA replication program in single mammalian cells. _Nat. Genet._ 51, 529–540 (2019). Article CAS PubMed

Google Scholar * Massey, D. J. & Koren, A. High-throughput analysis of single human cells reveals the complex nature of DNA replication timing control. _Nat. Commun._ 13, 2402 (2022).

Article ADS CAS PubMed PubMed Central Google Scholar * Gnan, S. et al. Kronos scrt: a uniform framework for single-cell replication timing analysis. _Nat. Commun._ 13, 2329 (2022).

Article ADS CAS PubMed PubMed Central Google Scholar * Du, Q. et al. Replication timing and epigenome remodelling are associated with the nature of chromosomal rearrangements in

cancer. _Nat. Commun._ 10, 416 (2019). Article ADS CAS PubMed PubMed Central Google Scholar * Sarni, D. et al. 3D genome organization contributes to genome instability at fragile

sites. _Nat. Commun._ 11, 3613 (2020). Article ADS CAS PubMed PubMed Central Google Scholar * Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. _Nature_

578, 112–121 (2020). Article ADS CAS PubMed PubMed Central Google Scholar * Salehi, S. et al. Clonal fitness inferred from time-series modelling of single-cell cancer genomes. _Nature_

595, 585–590 (2021). Article ADS CAS PubMed PubMed Central Google Scholar * Khan, K. H. et al. Longitudinal liquid biopsy and mathematical modeling of clonal evolution forecast time

to treatment failure in the prospect-c phase ii colorectal cancer clinical trial. _Cancer Discov._ 8, 1270–1285 (2018). Article CAS PubMed PubMed Central Google Scholar * Tataru, P.,

Simonsen, M., Bataillon, T. & Hobolth, A. Statistical inference in the wright-fisher model using allele frequency data. _Syst. Biol._ 66, e30–e46 (2017). PubMed Google Scholar *

Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. _Cell_ 173, 371–385.e18 (2018). Article PubMed PubMed Central Google Scholar * Gerstung, M. et

al. The evolutionary history of 2,658 cancers. _Nature_ 578, 122–128 (2020). Article ADS CAS PubMed PubMed Central Google Scholar * Sunshine, A. B. et al. The fitness consequences of

aneuploidy are driven by condition-dependent gene effects. _PLoS Biol._ 13, e1002155 (2015). Article PubMed PubMed Central Google Scholar * Ben-David, U. & Amon, A. Context is

everything: aneuploidy in cancer. _Nat. Rev. Genet_ 21, 44–62 (2020). Article CAS PubMed Google Scholar * Vasudevan, A. et al. Single-chromosomal gains can function as metastasis

suppressors and promoters in colon cancer. _Developmental Cell_ 52, 413–428.e6 (2020). Article PubMed PubMed Central Google Scholar * Martincorena, I. et al. Universal patterns of

selection in cancer and somatic tissues. _Cell_ 171, 1029–1041.e21 (2017). Article PubMed PubMed Central Google Scholar * Vázquez-García, I. et al. Ovarian cancer mutational processes

drive site-specific immune evasion. _Nature_ 612, 778–786 (2022). Article ADS PubMed PubMed Central Google Scholar * Tachibana, K. E., Gonzalez, M. A. & Coleman, N.

Cell-cycle-dependent regulation of DNA replication and its relevance to cancer pathology. _J. Pathol._ 205, 123–9 (2005). Article CAS PubMed Google Scholar * Yerushalmi, R., Woods, R.,

Ravdin, P. M., Hayes, M. M. & Gelmon, K. A. Ki67 in breast cancer: prognostic and predictive potential. _Lancet Oncol._ 11, 174–83 (2010). Article CAS PubMed Google Scholar *

Bogunovic, D. et al. Immune profile and mitotic index of metastatic melanoma lesions enhance clinical staging in predicting patient survival. _Proc. Natl Acad. Sci. USA_ 106, 20429 (2009).

Article ADS CAS PubMed PubMed Central Google Scholar * Ibrahim, A., Lashen, A., Toss, M., Mihai, R. & Rakha, E. Assessment of mitotic activity in breast cancer: revisited in the

digital pathology era. _J. Clin. Pathol._ 75, 365 (2022). Article CAS PubMed Google Scholar * van Diest, P. J., Brugal, G. & Baak, J. P. Proliferation markers in tumours:

interpretation and clinical value. _J. Clin. Pathol._ 51, 716–24 (1998). Article PubMed PubMed Central Google Scholar * Solius, G. M., Maltsev, D. I., Belousov, V. V. & Podgorny, O.

V. Recent advances in nucleotide analogue-based techniques for tracking dividing stem cells: An overview. _J. Biol. Chem._ 297, 101345 (2021). Article CAS PubMed PubMed Central Google

Scholar * Taylor, J. H., Woods, P. S. & Hughes, W. L. The organization and duplication of chromosomes as revealed by autoradiographic studies using tritium-labeled thymidinee. _Proc.

Natl Acad. Sci. USA_ 43, 122–8 (1957). Article ADS CAS PubMed PubMed Central Google Scholar * Zahn, H. et al. Scalable whole-genome single-cell library preparation without

preamplification. _Nat. Methods_ 14, 167–173 (2017). Article CAS PubMed Google Scholar * Laks, E. et al. Clonal decomposition and DNA replication states defined by scaled single-cell

genome sequencing. _Cell_ 179, 1207–1221.e22 (2019). Article PubMed PubMed Central Google Scholar * Andor, N. et al. Joint single cell DNA-seq and RNA-seq of gastric cancer cell lines

reveals rules of in vitro evolution. _NAR Genom. Bioinform_ 2, lqaa016 (2020). Article PubMed PubMed Central Google Scholar * Velazquez-Villarreal, E. I. et al. Single-cell sequencing of

genomic DNA resolves sub-clonal heterogeneity in a melanoma cell line. _Commun. Biol._ 3, 318 (2020). Article CAS PubMed PubMed Central Google Scholar * Navin, N. et al. Tumour

evolution inferred by single-cell sequencing. _Nature_ 472, 90–4 (2011). Article ADS CAS PubMed PubMed Central Google Scholar * Minussi, D. C. et al. Breast tumours maintain a

reservoir of subclonal diversity during expansion. _Nature_ 592, 302–308 (2021). Article ADS CAS PubMed PubMed Central Google Scholar * Schneider, M. P. et al. scAbsolute: measuring

single-cell ploidy and replication status. _Genome Biol._ 25, 62 (2024). Article CAS PubMed PubMed Central Google Scholar * Thurman, R. E., Day, N., Noble, W. S. &

Stamatoyannopoulos, J. A. Identification of higher-order functional domains in the human encode regions. _Genome Res_ 17, 917–27 (2007). Article CAS PubMed PubMed Central Google Scholar

* Edwards, M. M. et al. Delayed dna replication in haploid human embryonic stem cells. _Genome Res_ 31, 2155–2169 (2021). Article PubMed PubMed Central Google Scholar * Bingham, E. et

al. Pyro: deep universal probabilistic programming. _J. Mach. Learn. Res._ 20, 973–978 (2019). Google Scholar * Gindin, Y., Meltzer, P. S. & Bilke, S. Replicon: a software to accurately

predict DNA replication timing in metazoan cells. _Front Genet_ 5, 378 (2014). Article PubMed PubMed Central Google Scholar * Grolmusz, V. K. et al. Fluorescence activated cell sorting

followed by small RNA sequencing reveals stable microRNA expression during cell cycle progression. _BMC Genomics_ 17, 412 (2016). Article PubMed PubMed Central Google Scholar * Funnell,

T. et al. Single-cell genomic variation induced by mutational processes in cancer. _Nature_ 612, 106–115 (2022). Article ADS CAS PubMed PubMed Central Google Scholar * Schukken, K. M.

& Sheltzer, J. M. Extensive protein dosage compensation in aneuploid human cancers. _Genome Res_ 32, 1254–1270 (2022). Article PubMed PubMed Central Google Scholar * Stingele, S. et

al. Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells. _Mol. Syst. Biol._ 8, 608 (2012). Article PubMed PubMed Central Google Scholar

* Torres, E. M. et al. Effects of aneuploidy on cellular physiology and cell division in haploid yeast. _Science_ 317, 916–24 (2007). Article ADS CAS PubMed Google Scholar *

Bhattacharya, A. et al. Transcriptional effects of copy number alterations in a large set of human cancers. _Nat. Commun._ 11, 715 (2020). Article ADS CAS PubMed PubMed Central Google

Scholar * Funnell, T. et al. Integrated structural variation and point mutation signatures in cancer genomes using correlated topic models. _PLOS Computational Biol._ 15, 1–24 (2019).

Article Google Scholar * Feng, W. & Jasin, M. BRCA2 suppresses replication stress-induced mitotic and G1 abnormalities through homologous recombination. _Nat. Commun._ 8, 525 (2017).

Article ADS PubMed PubMed Central Google Scholar * Lambuta, R. A. et al. Whole-genome doubling drives oncogenic loss of chromatin segregation. _Nature_ 615, 925–933 (2023). Article ADS

CAS PubMed PubMed Central Google Scholar * Gemble, S. et al. Genetic instability from a single S phase after whole-genome duplication. _Nature_ 604, 146–151 (2022). Article ADS CAS

PubMed PubMed Central Google Scholar * Willis, N. A. et al. Mechanism of tandem duplication formation in BRCA1-mutant cells. _Nature_ 551, 590–595 (2017). Article ADS CAS PubMed

PubMed Central Google Scholar * Pageau, G. J., Hall, L. L., Ganesan, S., Livingston, D. M. & Lawrence, J. B. The disappearing Barr body in breast and ovarian cancers. _Nat. Rev.

Cancer_ 7, 628–633 (2007). Article CAS PubMed Google Scholar * Chaligné, R. et al. The inactive X chromosome is epigenetically unstable and transcriptionally labile in breast cancer.

_Genome Res._ 25, 488–503 (2015). Article PubMed PubMed Central Google Scholar * Hao, Y. et al. Integrated analysis of multimodal single-cell data. _Cell_ 184, 3573–3587.e29 (2021).

Article PubMed PubMed Central Google Scholar * Kowalczyk, M. S. et al. Single-cell rna-seq reveals changes in cell cycle and differentiation programs upon aging of hematopoietic stem

cells. _Genome Res_ 25, 1860–72 (2015). Article CAS PubMed PubMed Central Google Scholar * Shi, H. et al. Allele-specific transcriptional effects of subclonal copy number alterations

enable genotype-phenotype mapping in cancer cells. _Nat. Commun._ 15, 2482 (2024). Article ADS CAS PubMed PubMed Central Google Scholar * Donaldson, K. L., Goolsby, G. L. & Wahl,

A. F. Cytotoxicity of the anticancer agents cisplatin and taxol during cell proliferation and the cell cycle. _Int. J. Cancer_ 57, 847–855 (1994). Article CAS PubMed Google Scholar *

Bielski, C. M. et al. Genome doubling shapes the evolution and prognosis of advanced cancers. _Nat. Genet._ 50, 1189–1195 (2018). Article CAS PubMed PubMed Central Google Scholar *

McPherson, A. et al. Ongoing genome doubling promotes evolvability and immune dysregulation in ovarian cancer. _bioRxiv_

http://biorxiv.org/content/early/2024/07/15/2024.07.11.602772.abstract (2024). * Bertolin, A. P., Hoffmann, J. S. & Gottifredi, V. Under-replicated DNA: The byproduct of large

genomes?_Cancers (Basel)_12 (2020). * Moreno, A. et al. Unreplicated DNA remaining from unperturbed S phases passes through mitosis for resolution in daughter cells. _Proc. Natl Acad. Sci.

USA_ 113, E5757–64 (2016). Article CAS PubMed PubMed Central Google Scholar * Matthews, H. K., Bertoli, C. & de Bruin, R. A. M. Cell cycle control in cancer. _Nat. Rev. Mol. Cell

Biol._ 23, 74–88 (2022). Article CAS PubMed Google Scholar * Achom, M. et al. A genetic basis for cancer sex differences revealed in Xp11 translocation renal cell carcinoma. _bioRxiv_

https://www.biorxiv.org/content/early/2023/08/06/2023.08.04.552029 (2023). * López, S. et al. Interplay between whole-genome doubling and the accumulation of deleterious alterations in

cancer evolution. _Nat. Genet._ 52, 283–293 (2020). Article PubMed PubMed Central Google Scholar * Ha, G. et al. Integrative analysis of genome-wide loss of heterozygosity and

monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer. _Genome Res_ 22, 1995–2007 (2012). Article CAS PubMed PubMed Central Google

Scholar * Salehi, S. et al. Cancer phylogenetic tree inference at scale from 1000s of single cell genomes. _Peer Community J._ 3,

https://peercommunityjournal.org/articles/10.24072/pcjournal.292/ (2023). * Kaufmann, T. L. et al. MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution. _Genome

Biol._ 23, 241 (2022). Article CAS PubMed PubMed Central Google Scholar * Ranganath, R., Gerrish, S. & Blei, D.Kaski, S. & Corander, J. (eds) Black Box Variational Inference.

(eds Kaski, S. & Corander, J.) _Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics_, Vol. 33 of _Proceedings of Machine Learning Research_,

814–822 (PMLR, Reykjavik, Iceland, 2014). https://proceedings.mlr.press/v33/ranganath14.html. * Medina-Martínez, J. S. et al. Isabl platform, a digital biobank for processing multimodal

patient data. _BMC Bioinforma._ 21, 549 (2020). Article Google Scholar * Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. _Nat.

Methods_ 9, 179–181 (2012). Article CAS Google Scholar * Weiner, A. & McPherson, A. Inferring replication timing and proliferation dynamics from single-cell dna sequencing data.

_Github_ shahcompbio/scdna_replication_tools https://doi.org/10.5281/zenodo.13126319 (2024). Download references ACKNOWLEDGEMENTS This project was generously supported by the Cycle for

Survival, by the Marie-Josée and Henry R. Kravis Center for Molecular Oncology and the NCI Cancer Center Core grant P30-CA008748 supporting Memorial Sloan Kettering Cancer Center. S.P.S.

holds the Nicholls Biondi Chair in Computational Oncology and is a Susan G. Komen Scholar (#GC233085). This work was also funded in part by awards to S.P.S.: the Cancer Research UK Grand

Challenge Program (GC-243330), and an NIH RM1 award (RM1-HG011014). A.C.W. is supported by NCI Ruth L. Kirschstein National Research Service Award for Predoctoral Fellows F31-CA271673.

M.J.W. is supported by NCI Pathway to Independence award K99-CA256508. S.S. is supported by NCI Pathway to Independence award K99-CA277562 I.V.-G. is supported by a Mentored Investigator

Award from the Ovarian Cancer Research Alliance (650687). AUTHOR INFORMATION Author notes * These authors contributed equally: Sohrab P. Shah, Andrew McPherson. AUTHORS AND AFFILIATIONS *

Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA Adam C. Weiner, Marc J. Williams, Hongyu Shi, Ignacio

Vázquez-García, Sohrab Salehi, Nicole Rusk, Sohrab P. Shah & Andrew McPherson * Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY,

USA Adam C. Weiner * Gerstner Sloan Kettering Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY, USA Hongyu Shi * Department of Molecular

Oncology, British Columbia Cancer, Vancouver, BC, Canada Samuel Aparicio * Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada Samuel

Aparicio Authors * Adam C. Weiner View author publications You can also search for this author inPubMed Google Scholar * Marc J. Williams View author publications You can also search for

this author inPubMed Google Scholar * Hongyu Shi View author publications You can also search for this author inPubMed Google Scholar * Ignacio Vázquez-García View author publications You

can also search for this author inPubMed Google Scholar * Sohrab Salehi View author publications You can also search for this author inPubMed Google Scholar * Nicole Rusk View author

publications You can also search for this author inPubMed Google Scholar * Samuel Aparicio View author publications You can also search for this author inPubMed Google Scholar * Sohrab P.

Shah View author publications You can also search for this author inPubMed Google Scholar * Andrew McPherson View author publications You can also search for this author inPubMed Google

Scholar CONTRIBUTIONS A.C.W., S.P.S., and A.M. conceived the study and wrote the manuscript. A.C.W. developed the machine learning model and led computational analysis. M.J.W., H.S., I.V.G.,

and S.S. assisted with data acquisition, analysis, or interpretation. N.R. assisted with editing the manuscript. S.A. preserved the original data on which the paper is based and minimized

obstacles to sharing materials. S.P.S. and A.M. jointly supervised this work. CORRESPONDING AUTHORS Correspondence to Sohrab P. Shah or Andrew McPherson. ETHICS DECLARATIONS COMPETING

INTERESTS S.P.S. and S.A. are shareholders of Imagia Canexia Health Inc. S.P.S. has an advisory role to AstraZeneca Inc. H.S. is currently employed by Benchling. The remaining authors

declare no competing interests. All relationships are outside the scope of this work. PEER REVIEW PEER REVIEW INFORMATION _Nature Communications_ thanks Kamila Naxerova and the other,

anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with

regard to jurisdictional claims in published maps and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION PEER REVIEW FILE DESCRIPTION OF ADDITIONAL SUPPLEMENTARY

FILES SUPPLEMENTARY DATA 1 SUPPLEMENTARY DATA 2 REPORTING SUMMARY RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons

Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give

appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission

under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons

licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by

statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

http://creativecommons.org/licenses/by-nc-nd/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Weiner, A.C., Williams, M.J., Shi, H. _et al._ Inferring replication timing

and proliferation dynamics from single-cell DNA sequencing data. _Nat Commun_ 15, 8512 (2024). https://doi.org/10.1038/s41467-024-52544-7 Download citation * Received: 29 January 2024 *

Accepted: 11 September 2024 * Published: 01 October 2024 * DOI: https://doi.org/10.1038/s41467-024-52544-7 SHARE THIS ARTICLE Anyone you share the following link with will be able to read

this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative