Inferring replication timing and proliferation dynamics from single-cell dna sequencing data

Inferring replication timing and proliferation dynamics from single-cell dna sequencing data

Play all audios:

Loading...

ABSTRACT Dysregulated DNA replication is a cause and a consequence of aneuploidy in cancer, yet the interplay between copy number alterations (CNAs), replication timing (RT) and cell cycle


dynamics remain understudied in aneuploid tumors. We developed a probabilistic method, PERT, for simultaneous inference of cell-specific replication and copy number states from single-cell


whole genome sequencing (scWGS) data. We used PERT to investigate clone-specific RT and proliferation dynamics in  >50,000 cells obtained from aneuploid and clonally heterogeneous cell


lines, xenografts and primary cancers. We observed bidirectional relationships between RT and CNAs, with CNAs affecting X-inactivation producing the largest RT shifts. Additionally, we found


that clone-specific S-phase enrichment positively correlated with ground-truth proliferation rates in genomically stable but not unstable cells. Together, these results demonstrate robust


computational identification of S-phase cells from scWGS data, and highlight the importance of RT and cell cycle properties in studying the genomic evolution of aneuploid tumors. SIMILAR


CONTENT BEING VIEWED BY OTHERS UNRAVELLING SINGLE-CELL DNA REPLICATION TIMING DYNAMICS USING MACHINE LEARNING REVEALS HETEROGENEITY IN CANCER PROGRESSION Article Open access 08 February 2025


INFERENCE OF CHROMOSOME SELECTION PARAMETERS AND MISSEGREGATION RATE IN CANCER FROM DNA-SEQUENCING DATA Article Open access 31 July 2024 REPLICATION TIMING ALTERATIONS ARE ASSOCIATED WITH


MUTATION ACQUISITION DURING BREAST AND LUNG CANCER EVOLUTION Article Open access 18 July 2024 INTRODUCTION DNA replication and cell cycle regulation are frequently disrupted as part of a


cancer’s progression toward uncontrolled proliferation1,2,3. The resulting dysregulation creates a positive feedback loop of increasing replication stress and genomic instability4,5,6,


generating new somatic copy number alterations (CNAs) which drives intratumoral heterogeneity and subsequent clonal evolution7,8,9. One key component of DNA replication is the relative


timing at which different genomic regions replicate during synthesis (S)-phase of the cell cycle, known as replication timing (RT). Genome-wide RT profiles are visible at resolutions ranging


from 10kb to 10MB and have well-defined associations with chromatin organization, mutation rate, and transcription, and cellular phenotype3,10,11,12,13,14,15,16,17,18. Structural variations


and CNAs have been shown to impact transcription and epigenetic state19,20,21 but little is known about their associations with RT as technical limitations in isolating S-phase cells have


prevented measurement of both bulk and single-cell RT in primary tumors with subclonal CNAs3,13,14,17,22,23,24,25,26,27,28. This limitation has forced previous studies of RT to compare CNAs


from tumor samples against RT profiles from unmatched cell lines, often of different tissue types12,29,30,31, or RT profiles predicted from matched chromatin accessibility data18. Therefore,


there remain open questions about the bi-directional interplay between ongoing genomic instability and RT dynamics. Outside of studying RT, quantifying S-phase cells in tumors can be used


to better understand proliferation and clonal fitness dynamics. The most accurate way to measure clonal fitness is through fitting population genetics models to time-series data32,33,34;


however, such analysis may be challenging for clinical samples, where often only a single time point surgical resection is available for study. Our current toolkit for forecasting clonal


evolution is therefore limited to comparing different oncogenic mutations between clones. Indeed, there are many known driver and passenger mutations that confer fitness advantages35,36, but


it is challenging to determine how individual events interact with one another when they co-occur in the same cell and with the tumor microenvironment to determine clonal


fitness37,38,39,40,41. Forecasting clonal fitness via quantifying the proportion of replicating cells per clone offers hope to address these gaps. It is well-known that the quiescent


(G0)-phase and growth (G1 and G2)-phases of the cell cycle are more variable in duration relative to mitosis (M)-phase and S-phase1,42. This phenomenon has enabled pathologists to use S- and


M-phase fractions, along with other markers of proliferation, for tumor staging and grading43,44,45,46. It remains an open question as to whether quantifying S-phase cells from sequencing


data has prognostic utility or can be used to forecast clonal evolution in genomically unstable cancers. A fundamental limitation to studying RT and cell cycle dynamics is the inability to


isolate S-phase cells from solid tumors with subclonal CNAs. Experimental sorting of S-phase cells often relies on feeding cells bromodeoxyuridine (BrdU), a thymidine analog that gets


incorporated during S-phase, followed by fluorescence-activated cell sorting (FACS)47,48. This approach works well for dissociated cell lines and blood cancers which rapidly uptake BrdU but


does not work for solid tumors since their dissociation would disrupt the replication dynamics we care to study and administering BrdU in vivo is toxic46. Another FACS approach is to stain


for DNA content as S-phase cells have intermediate amounts of DNA relative to G1- and G2-phase cells10,46. Staining DNA does not require dissociation but sorting based on DNA content fails


for genomically heterogeneous tumors as subclonal CNAs produce variation in total DNA content per cell. These limitations have constrained all previous studies of RT in cancer to leukemia


and cell lines with no subclonal CNAs22,23,27,28. Finding better ways to isolate S-phase cells would enable the measurement of replication dynamics in genomically unstable solid tumors.


Unsorted single-cell whole genome sequencing (scWGS) is a powerful method for studying clonal heterogeneity and CNAs, and can potentially overcome the technical limitations of experimental


sorting to provide unbiased insight into DNA replication dynamics in aneuploid populations49,50,51,52,53,54. Such potential hinges upon computational identification of S-phase cells and


downstream calculation of their single-cell replication timing (scRT) profiles. These steps are challenging due to difficulty in distinguishing inherited somatic CNAs from transient DNA


replication changes. One property that helps to distinguish these two signals is that RT is often conserved within cell types. Existing scWGS S-phase classifiers50,55 attempt to leverage RT


conservation by using the correlation between ENCODE RT profiles10,23,56 and cell read count profiles as important features in random forest classifiers. However, such a classification


approach is heavily biased in favor of finding S-phase cells that follow the RT profiles used to train the classifier. Furthermore, the performance of a pre-trained classifier depends on the


extent to which sequencing and sample properties, especially GC content bias and the proportion of CNAs in early vs late replicating regions, match those of the training data. Even with


successful isolation of S-phase cells, none of the existing methods for estimating scRT from sorted scWGS data25,26,27,28 would work for samples with subclonal CNAs as they divide each


S-phase cell’s read depth profile by the average read depth profile across all G1-phase cells to determine which genomic loci should be replicated vs unreplicated in S-phase cells. We posit


that improved approaches for disentangling copy number (CN) from replication signals in scWGS data would unlock the ability to study replication dynamics of individual genetic subclones,


leading to better understanding of how DNA replication drives and is further modulated by genomic instability. We present a statistical method, P robabilistic E stimation of single-cell R


eplication T iming (PERT), to infer S-phase cells and their scRT profiles from unsorted scWGS data. PERT jointly models RT and CN at a subclonal level which critically enables for high


accuracy when analyzing samples with previously unseen RT and CN profiles. After extensive benchmarking, we used PERT to extract previously unquantifiable scRT profiles and cell cycle


distributions from a collection of genomically unstable cell lines, high-grade serous ovarian cancer (HGSOC), and triple negative breast cancer (TNBC) human tumors totaling  >50,000 scWGS


cells. First, we investigated the relationship between ancestral RT and the emergence of CNAs, confirming findings from unmatched cell types12 that gains and losses preferentially emerge


from different loci. Second, we modeled the relative impact of cell type, mutational signature, and ploidy on RT and the distribution of early vs late S-phase cells to illustrate that cell


type is the most significant determinant of RT and that replication stress can lead to an accumulation of cells near the S/G2 boundary. Third, we leverage known biology involving late


replication of the inactive chromosome X allele (Xi)26,57 to identify recurrent patterns of Xi loss and reactivation in TNBC and HGSOC tumors. Finally, we investigated the relationship


between cell cycle distributions and clonal fitness. In particular, we used time-series sampling of cell lines and xenografts, along with multi-site sampling of a primary HGSOC tumor, to


define the contexts in which S-phase enrichment (SPE) scores accurately approximate fitness, observing that high SPE associates with greater cisplatin sensitivity and a depletion of S-phase


cells in highly fit whole genome doubled clones. RESULTS ACCURATE ESTIMATION OF SINGLE-CELL REPLICATION TIMING AND CELL CYCLE PHASE WITH PERT The main objective of PERT is to infer transient


replication states and inherited somatic CN states from scWGS data. To do so, PERT implements a Bayesian probabilistic graphical model and a multi-step learning procedure to infer latent


variables. Observed read depth (_Z_) is modeled as dependent on both latent CN (_X_) and replication (_Y_) states across all genomic bins (_N_ cells  ×  _M_ loci) where the replication state


depends on each cell’s time within S-phase (_τ_), each locus’s average RT (_ρ_), and a replication stochasticity parameter (_α_) (Fig. 1a–f, S1). After learning replication states in all


bins, PERT then assigns cells to either S- or G1/2-phases based on the fraction of replicated loci per cell. The main assumptions of the model are that 1) S-phase cells within the same


sample should follow similar RT profiles with RT stochasticity controlled by a latent variable _α_, 2) cells belonging to the same clone should follow similar CN profiles, regardless of


phase, and 3) sequencing bias batch effects are library-specific but shared amongst both S- and G1/2-phase cells from the same library. Such modeling choices critically enable PERT to


maintain high accuracy when analyzing samples with previously unseen CN and RT profiles. PERT is implemented using Pyro58 and is freely available online with user tutorials. All terms in the


graphical model as well as additional mathematical, inference, and implementation details can be found in Fig. S1 and the Methods. QUANTITATIVE BENCHMARKING IN SIMULATED DATA We benchmarked


PERT’s accuracy at inferring somatic CN, replication states, and cell cycle phase through quantitative simulation experiments. We used PERT as a generative model to simulate low-coverage (1


million reads per cell, 0.0003x coverage depth) scWGS data with parameter sweeps over the number of clones, number of cells, the cell-specific CNA rate, read depth overdispersion (_λ_),


replication stochasticity (_α_), ENCODE RT profiles10,23,56, and GC bias coefficients (_β__μ_), (Table 1). We ran PERT with both CN prior settings, clone and composite (Methods), for all


simulated datasets and compared these two results to the Laks et al. classifier50 for cell cycle phase prediction and Kronos28 for scRT estimation. The performance gap between PERT and


Kronos was significant (_p__a__d__j_ < 10−4) for all parameter combinations and increased as a function of cell CNA rate, number of clones, and noise _λ_ (Fig. 1g, S2a–j). PERT’s


replication and CN accuracies decayed at high cell CNA rate and overdispersion (_λ_) values but remained robust to replication stochasticity (_α_), GC bias (_β__μ_), number of clones, and


number of cells (Fig. S2a–t). Comparing the two PERT CN priors, we found higher accuracies when the composite prior was used, particularly at high cell CNA rates, and thus only used the


composite CN prior throughout the rest of our analysis (Fig. S2a–t). The agreement between each cell’s true and inferred fraction of replicated bins enabled PERT to classify cell cycle phase


with 93% overall accuracy (97% accuracy excluding low quality, LQ, cells) when considering all cells with 5-95% replicated bins as S-phase (Fig. 1h). Accordingly, PERT significantly


outperformed (_p__a__d__j_ < 10−4) the Laks et al. classifier in phase accuracy for all simulated datasets and was critically robust to fluctuations in GC bias (_β__μ_,1), indicating that


PERT’s performance will not fluctuate between DLP+49,50, 10X Chromium scDNA51,52, and other scWGS modalities53,54 with unique GC bias relationships (Fig. S2u–ad). Additional benchmarking


information can be found in Supplementary Notes 1–2. These data indicate that PERT significantly improves inference of scRT and phase, particularly in cases where CNAs arise with subclonal


structure, allowing for exploration of replication dynamics in unsorted scWGS data of heterogeneous aneuploid populations. VALIDATION OF PERT IN SORTED DIPLOID AND ANEUPLOID CELL LINES Next


we sought to validate PERT’s accuracy at predicting cell cycle phase and RT in real, clonally heterogeneous scWGS data. To do so, we performed in silico mixing experiments of scWGS data of


two unrelated cell lines that were experimentally sorted into cell cycle phases according to cellular DNA content using FACS. We combined diploid lymphoblastoid GM18507 cells (657 G1 cells,


585 S cells, 337 G2 cells, 1 clone,  ~ 0.01-0.1x coverage depth) and aneuploid breast cancer T47D cells (703 G1 cells, 623 S cells, 522 G2 cells, 5 clones,  ~ 0.01-0.1x coverage depth)50


into one merged sample for analysis with PERT (Fig. S3a). Critically, these two cell lines have unique CN profiles and should have unique RT profiles. PERT found distinct CN profiles for


S-phase cells of each line and its predicted phases were highly concordant with FACS (Fig. 2a, b). Both GM18507 and T47D samples were enriched for mid-S-phase cells which was expected based


on the sorting protocol described in Laks et al.50 (Fig. S3b). Cell line ‘pseudobulk’ RT profiles showed that 15% (794/5258) of loci had an absolute RT difference >0.25 between GM18507


and T47D, consistent with each cell line having a unique RT program (Fig. 2c). To confirm that these differentially replicating loci were accurately inferred, we compared the PERT RT


profiles with publicly available DNase-seq predicted-RT (predRT) profiles of select ENCODE primary cell samples where RT is predicted from DNase-seq data using the Replicon software18,59. We


found cell type specific correlations between the inferred PERT RT profiles and predRT profiles of breast epithelium and B-cell ENCODE samples which most closely matched T47D and GM18507


cell lines, respectively (Fig. 2d), suggesting that PERT inferred accurate RT differences between the two cell lines in this mixture experiment. We next tested whether the accuracy of


clone-specific RT inference. To test this, we ran PERT independently for each cell line and compared the inferred RT profiles to their results in the previously described in silico mixing


experiment. Encouragingly, we found that both cell lines had RT Pearson correlations of ≥0.99 between their merged and split PERT runs (Fig. 2e, S3c). We performed a similar experiment with


simulated data to ensure that PERT infers accurate clone RT profiles in samples where multiple RT profiles are present. For this experiment, we simulated data in which each clone’s RT


profile matched a unique ENCODE cell line RT profile and compared the true clone RT profiles to their inferred values23. Again, we found true and inferred clone RT profiles all had Pearson


correlations ≥0.99 (Fig. S3d, e). These experiments demonstrate that PERT can accurately identify clone-specific RT profiles within the same sample, giving us confidence to interrogate


clone-specific RT profiles in unsorted tumor scWGS data. The final validation we performed with cell cycle sorted data was to test PERT’s robustness to suboptimal initialization of cell


cycle phase. Given that PERT's initialized G1/2-phase population is often constructed using a conservative subset of all G1/2-phase cells, we wanted to know whether inclusion of too


many true G1/2-phase cells in the initialized S-phase population would bias RT and phase estimates. We thus devised a permutation experiment to test this concern. We generated 23 permuted


datasets in which a subset of GM18507 and T47D FACS G1/2-phase cells were intentionally mislabeled as S-phase during initialization with permutation rates ranging from 1% to 75%. We found


that  >90% of all mislabeled cells were accurately recovered (predicted G1/2) across all permutation datasets without compromising identification of non-mislabeled S-phase cells and cell


line-specific RT profiles (Fig. 2f, g, S3f). Mislabeled cells which were erroneously predicted to be in S-phase were disproportionately G2 by FACS and 80–95% replicated by PERT. Orthogonal


per-cell features for these cells were concordant with late S-phase (Fig. 2f, S3g), suggesting they were actually in late S-phase but mistakenly sorted into G2-phase by FACS. We also found


many cell-specific CNAs in the set of FACS S-phase cells predicted to be G1/2-phase (Fig. 2h). Together, our results suggest that prediction of cell cycle phase using unsorted scWGS and PERT


is highly robust to poor initialization. Our results also highlight pitfalls of experimental sorting based on total DNA content, which is known to have S-phase purities ranging from


73–93%60, due to its inability to account for cellular CN variation. REPLICATION TIMING ASSOCIATES WITH CNA BIAS IN MATCHED SAMPLES After validation of PERT’s accuracy in heterogeneous


aneuploid populations, we next investigated how replication timing influences accrual of CNAs. Tumor CNA frequencies have been shown to vary between different RT zones12,29,30,31; however,


none of these associations between RT and other genomic covariates have come from matched samples as cell cycle sorting cannot be accurately performed in solid tumors with subclonal CNAs.


Thus, we sought to confirm previously reported relationships between CNAs and RT by applying PERT to previously published scWGS data of mammary epithelial 184-hTERT cell lines (hereafter


referred to as hTERTs)61. The hTERT samples were engineered using CRISPR/Cas9 to ablate _TP53_, _TP53_/_BRCA1_, and _TP53_/_BRCA2_, and passaged ~60 times with intermediate scWGS sampling to


capture accrual of aneuploidies (Fig. 3a). The initial investigation of this dataset revealed clonal expansions of cell populations with increasing levels of CNAs but excluded analysis of


S-phase cells61. Here, we analyzed all cells with PERT and inferred RT profiles for all clones and samples. We first compared the 500kb resolution RT profiles from these unsorted scWGS


libraries to ENCODE RepliSeq data, confirming that PERT RT profiles were of high-quality before proceeding with our analysis (Fig. 3b, S4, Supplementary Note 3). With high-quality RT and CN


profiles in-hand from this unsorted time-series scWGS data, we interrogated whether RT influences CNA acquisition. To test this, we computed a reference RT profile from the ancestral hTERT


WT (_TP53_ and _BRCA1/2_ WT) cells with no CNAs (SA039 clone A) and then counted the number of gains, losses, and CNA breakpoints in CN profiles that descended from this ancestral WT


population (Fig. 3c,d, S5a–c). We found that gains preferentially arose from early RT loci, losses from late RT loci, and CNA breakpoints from late RT loci (_p__a__d__j_ < 10−4) (Fig. 


3e,f, S5d,e). The association between RT and CNAs in matched samples significantly strengthens the hypotheses that these two phenomena are governed by the same underlying processes.


REPLICATION TIMING SHIFTS AT CNAS We then proceeded to ask the inverse question about whether CNAs produce shifts in RT. CNAs have been shown to undergo buffering as changes in DNA copy


number do not proportionally scale with changes in RNA and protein expression62,63,64,65. These studies have shown that while the majority of buffering happens when going from RNA to protein


due to post-transcriptional regulation, the relationship between DNA and RNA is still buffered for many genes. It is possible that shifts in chromatin organization, and thus RT, might be


related to transcriptional buffering. Previous RT studies have shown that shifts to earlier RT represent shifts to more open chromatin and higher per-copy transcriptional activity while


shifts to later RT coincide with the opposite effects3,13,14,17. Therefore, we hypothesized that there might be directional RT shifts at CNA gains and losses as part of a transcriptional


buffering response at the level of chromatin organization. To determine if CNAs produced RT shifts, we identified sets of clone-specific gains and losses which had the same CN state present


in  >98% of cells in their respective clones. These changes approximate isogenic events where there is no cellular CN variation within a given clone. Two representative examples of


isogenic CNAs are the 19p gain and 19q loss which are both specific to clone G in sample SA906b (Fig. S6a). We focused on isogenic CNAs as they represent regions of the genome where PERT is


the most accurate (i.e. cell-specific CNA rate  <2%). We then computed the RT shift between each clone containing an isogenic CNA and all other reference clones in the sample without the


isogenic CNA, and compared distribution of all RT shifts to a background distribution of unaltered sites where a given clone and reference have matching CN states. Across all isogenic


events, we found that losses significantly shifted to earlier RT (_p_ = 3.85_e_ − 37 per bin, _p_ = 1.32_e_ − 5 per event) while gains shifted to later RT (_p_ = 5.97_e_ − 6 per bin, _p_ = 


3.32_e_ − 1 per event) (Fig. 3g, S6b-d). Given the relationships between RT, chromatin organization, and transcriptional activity, this data suggests that losses are buffered towards higher


per-copy transcriptional activity and gains are buffered towards lower per-copy transcriptional activity. These results also indicate that CNA heterogeneity produce clones with distinct RT


profiles. IMPACTS OF CELL TYPE, PLOIDY, AND MUTATIONAL PROCESS ON REPLICATION TIMING AND STRESS DYNAMICS We next sought to quantify the relative impact of cell type, mutational process


signature, and ploidy on replication timing dynamics and identify markers which might be linked to replication stress. We thus applied PERT to a wider cohort of scWGS datasets with diverse


genomic properties and investigated their associations with PERT measurements such as the average time within S-phase and RT profiles of distinct clones. The assembled metacohort comprised 6


TNBC tumors, 13 HGSOC tumors, 3 cancer cell lines, and the previously described hTERTs, totalling over 50,000 single-cell genomes32,50,61. Samples had been labeled with homologous


recombination deficiency (HRD), fold-back inversion (FBI), and tandem duplicator (TD) mutational process signatures in previous analysis by joint decomposition of structural and


single-nucleotide variants66, and contained both whole genome doubled (WGD) and non-genome doubled (NGD) clones. We focused our analysis on the 102 unique clones with  >20 S-phase cells


(Fig. 4a). We first investigated the time distribution of S-phase cells across signature and ploidy. We hypothesized that double-strand breaks and fork stalling in clones undergoing


replication stress should increase the number of cells which appear stuck in late S-phase as they have activated checkpoints at the S/G2 boundary or fail to complete replication prior to


cell division31,67,68,69,70. Accordingly, we found that both WGD and mutational processes associated with replication stress, such as TD and FBI, contained higher fractions of late S-phase


cells (Fig. 4b, c). This data suggests that the time distribution of S-phase cells might provide value in detecting ongoing replication stress, DNA damage, and repair processes. We then


proceeded to investigate the RT profiles of these clones. Clustering the pairwise Pearson correlations between clone RT profiles revealed striking sample and cell type specificity (Fig. 4d).


To quantify the relative importance of these covariates and obtain covariate-specific RT profiles, we implemented a factor model which took the matrix of clone RT profiles (_K_ clones × _M_


loci) as input and jointly learned importance weights (_β_s) and latent covariate-specific RT profiles (Methods). As expected, we found the constant term representing global RT was


estimated to have the highest importance with sample and cell type covariates coming in second and third place, respectively; however, ploidy and signature were both an order of magnitude


lower in importance than sample or cell type (Fig. 4e). These results indicate that RT is largely conserved across all clones in our metacohort with most variation being attributed to cell


type or sample (individual) while ploidy and signature have negligible impacts on RT. Curious about which regions of the genome had the highest cell type RT variability, we compared the


latent cell type RT profiles to the 15 ENCODE cell lines with RepliSeq data available10,23. Investigating these RT profiles inferred by either RepliSeq or PERT at chromosome-level


resolution, we found high agreement across cell types on most autosomes with chromosome X having the highest variability in mean RT (Fig. 4f, S7). Since most of the scWGS data was from


female individuals, these results prompted further analysis on the relationship between X-inactivation and RT. CHROMOSOME X REPLICATION TIMING SHIFTS MEASURE THE RATIO OF ACTIVE TO INACTIVE


ALLELES AND REFLECT SELECTION BIAS IN HGSOC AND TNBC TUMORS Given that X-inactivation produces a late replicating inactive allele (Xi) and an early replicating active allele (Xa)26,57, we


hypothesized that the greatest RT shifts would occur from CNAs which disrupted the 1:1 balance of Xa to Xi alleles. To study the relationship between RT and X-inactivation we ran PERT and


SIGNALS, a single-cell allele-specific CN caller which has been extensively characterized and benchmarked in Funnell61, on unsorted scWGS data and compared SIGNALS B-allele frequencies


(BAFs) to PERT chrX RT. SIGNALS defines A and B allele labels such that the major (most prevalent) allele in a sample is A and the minor allele is B; however, it is not known whether the A


or B allele on chrX is Xi. Thus, integrating SIGNALS results with PERT should enable assignment of A- and B-allele labels to Xa and Xi epigenetic states. We defined chrX relative RT as the


mean RT difference between chrX and all autosomes with negative values being later chrX replication. Applying this strategy first to the hTERT cell lines, we found that chrX RT and DNA BAF


were negatively correlated at both sample and clone resolution, with delayed RT associated with balanced allele-specific CN (BAF=0.5), and loss of the B-allele (BAF < 0.5) shifting RT


earlier (Fig. 5a–c). The direction of this correlation suggested that the SIGNALS A-allele label corresponded to Xa and B to Xi for all hTERT samples. This correspondence was further


supported by chrX having a lower BAF in S-phase cells than G1/2-phase cells, indicating that the B-allele was replicating later than the A-allele in all hTERT samples (Fig. S8a–c). These


results demonstrate PERT’s ability to associate the SIGNALS A- and B-allele labels with Xa and Xi epigenetic states, enabling further interrogation of the clonality and recurrence of genomic


events which disrupt X-inactivation using scWGS data alone. We then assessed the degree of chrX RT allelic imbalance in 19 TNBC and HGSOC samples. It has been long-established that many


female reproductive tumors have Xa > Xi imbalances and chrX overexpression based on analysis of bulk DNA and RNA sequencing data71,72; however, these events have yet to be characterized


in scWGS data, meaning that their evolutionary timing, phasing between clones, and the genomic characterization have yet to be jointly analyzed. As expected, all 19 tumors had earlier chrX


RT than the allelically balanced (BAF=0.5) hTERT samples, with a negative correlation between chrX RT and DNA BAF, confirming that many of these tumors had more copies of Xa than Xi (Fig. 


5d, e, S8d). Interestingly, many of these allelic imbalances were fully clonal events and arose from either loss of Xi (4/19 clonal full chrX LOH, 4/19 clonal Xq LOH, 2/19 clonal partial


LOH, 1/19 subclonal LOH) or gain of Xa (1/19 clonal full chrX, 4/19 subclonal and/or partial) (Supplementary Data 1), suggesting that Xa > Xi imbalances emerge early in tumor evolution


and that both Xa gain and Xi loss may be independently favorable events. We next investigated whether tumors with earlier chrX RT but minimal allelic imbalance (BAF≈0.5) indeed had higher


chrX transcription. To answer this question, we focused on the subset of samples with matching scRNA-seq data. We compared SIGNALS RNA BAF with DNA BAF to investigate the relationship


between allele-specific transcription and CN of each chromosome. Autosomes maintained a 1:1 relationship between RNA and DNA BAF, indicating that each allelic copy is equally transcribed,


and samples with Xi alleles present had lower RNA BAF than DNA BAF, in line with less per-copy transcription from the Xi allele (Fig. S8e). We termed the difference between chrX RNA and DNA


BAF as the transcription gap where positive values correspond to samples with more transcription of the B-allele than expected based on DNA BAF. We found that the transcription gap


positively correlated with chrX RT (Fig. 5f). This data confirmed that earlier replication of chrX in samples with minimal BAF imbalance coincided with unexpectedly high B-allele


transcription, indicating that RT shifts captured by PERT better reflect transcriptional activity than chrX allele-specific CN ratios. Finally, we sought to identify mechanisms that would


explain unexpected transcription of the inactive X allele. Given that X-inactivation proceeds through the transcription of _XIST_ on the Xq arm71, a long non-coding RNA which silences its


chrX allele in cis, we searched for genomic events in which loss of the B-allele on the Xq arm could lead to re-activation of the B-allele on the Xp arm. We identified clonal loss of


heterozygosity (LOH, BAF=0) on Xq but not Xp in 4 of 19 tumors (2 HGSOC, 2 TNBC, Fig. S8f). Using PERT, we found that the Xp arm of these samples replicated much earlier than the


corresponding locus in the hTERT WT sample (SA039) which had intact _XIST_ transcription on the B-allele (Fig. 5g), indicating that the B-alleles on the Xp arm were reactivated. We then


validated Xp reactivation in these four samples by comparing their DNA and RNA BAFs at chromosome-arm level resolution. This analysis showed that Xp arms maintained a 1:1 ratio between gene


dosage and transcription (Fig. 5h), confirming that both alleles on the Xp arm were transcriptionally active. The clonal nature of these events indicate that they all happened early in tumor


evolution (Fig. S8f). These results demonstrate that using scWGS and PERT to study X-inactivation dynamics is uniquely powerful as RT not only serves as a proxy for transcriptional


activity, but the scWGS data itself also provides orthogonal information to identify the genomic determinants of such overexpression (Fig. 5i). SPE APPROXIMATES PROLIFERATIVE FITNESS IN


STABLE GENOMES After deep exploration of replication timing with PERT, we investigated whether the proportion of S-phase cells in genetic clones could accurately approximate their


proliferative fitness. Experimental attempts to approximate clone proliferation rates through cell cycle distributions have been limited due to the fact that FACS isolation of S-phase cells


is challenging for solid tumors with subclonal CNAs46. Unbiased scRNA-seq is capable of estimating cell cycle phase distributions through the use of cell cycle marker genes; however, these


distributions can only be mapped onto genetic clones when there are many arm-level CNAs that differentiate the clones73,74,75. To determine whether S-phase cells can be used to approximate


proliferative fitness in genetically stable cancer and epithelial cell lines, we ran PERT on gastric cancer cell lines with co-registered doubling times and performed time-series analysis of


the aforementioned hTERT WT data. The gastric cancer data consisted of 3 unique cell lines with co-registered doubling times, 10X scWGS, and 10X scRNA measurements51. Encouragingly, high


PERT G1/2-phase fractions correlated with high doubling times and high scRNA G1-phase fractions (Fig. S9a–c, Supplementary Data 2). Interestingly, the scWGS-defined cell cycle fractions had


higher correlation with doubling time (_r_ = 1.00, _p_ = 3.3_e_ − 3) than the scRNA-defined cell cycle fractions (_r_ = 0.88, _p_ = 3.1_e_ − 1), suggesting that direct measurement of DNA


replication in scWGS might be a better proxy measurement for the proliferation rate than expression of cell cycle marker genes in scRNA (Fig. S9a–c). To assess the relationship between


S-phase enrichment and fitness in time-series data, we used the relative abundance of each clone within each cell cycle phase to compute continuous S-phase enrichment (SPE) scores for each


clone (_c_) × timepoint (_t_) and compared these scores to the clone’s corresponding change in abundance between timepoints _t_ and _t_ + 1 (Fig. 6a–d, Methods). It was critically important


to use clone × timepoint SPE scores instead of the raw S-phase fraction for this data as technical variation between different timepoints (i.e. cell confluence at time of sequencing, sample


handling) could lead to dramatically different S-phase fractions. In the hTERT WT data, we found that SPE scores had a positive trend with expansion (_r_ = 0.51, _p_ = 2.5_e_ − 1, Fig. S9d,


e), indicating that more S-phase cells corresponded to faster proliferation in normal epithelial cells. These results suggested that quantifying the number of S-phase cells in a particular


clone serves as an appropriate proxy for fitness when there is limited genomic instability. GENOMIC INSTABILITY AND CISPLATIN COMPLICATE RELATIONSHIP BETWEEN SPE AND FITNESS We next


investigated whether clone S-phase enrichment still approximated proliferative fitness in genomically unstable populations. Faster cell division can explain why one tumor clone has higher


fitness than others and would be captured by SPE; however, avoiding cell death (via increased tolerance of deleterious mutations, reduced DNA damage, etc.) can also lead to an increase in


proliferative fitness and such processes would not be captured by SPE7,8. To determine whether SPE corresponded to high fitness of genomically unstable cells, we analyzed time-series scWGS


data generated from _TP53_−/− hTERT cell lines and serially propagated TNBC patient derived xenografts (PDXs) (Fig. 6e, S9d). We found no consistent correlation between SPE scores and


observed fitness with intra-sample correlations ranging from _r_ = 0.20 to _r_ = − 0.43 (Fig. 6f, S9f,g, S10a–d). These results suggest that SPE alone might be insufficient to approximate


clone proliferative fitness in TNBC or p53-null genomically unstable contexts, implying that more successful efforts for approximating fitness should also account for clone death rates. We


next asked whether S-phase estimates from PERT help predict the fitness of TNBC PDX clones undergoing cisplatin treatment. Previous analysis of the TNBC PDX data had revealed an inversion of


the clonal fitness landscape upon cisplatin exposure but had not identified any genotypic or phenotypic features to explain why certain clones were more sensitive to cisplatin than


others32. Platinum chemotherapies such as cisplatin are known to selectively target fast-dividing cancer cells by inflicting DNA damage during S-phase, and also elongate S-phase in sensitive


cells by stalling many replication forks76. Therefore, we hypothesized that enrichment of S-phase cells would be a sign of lower fitness in on-cisplatin samples. To test this hypothesis, we


investigated the correlation between SPE and clonal expansion in the cisplatin-treated TNBC PDXs and compared this relationship to the untreated baseline (Fig. 6e). Indeed, we found this


correlation to be significantly more negative in the on-treatment than the untreated context (Wald _p_ = 5.12_e_ − 2, Fig. 6f, S10e–h). These results show that S-phase enriched TNBC clones


are more likely to contract than expand in the presence of cisplatin, implying that SPE scores may be an informative feature for forecasting clone trajectories from cisplatin-treated


samples. WHOLE GENOME DOUBLED CLONES ARE DEPLETED OF S-PHASE CELLS Finally, we sought to better understand the relationship between cell cycle dynamics and proliferative fitness in WGD human


tumors. WGD is a common event in many genomically unstable tumor types including TNBC and HGSOC and its association with poor patient outcomes reflects its high proliferative fitness77. To


determine whether increased fitness of WGD tumor cells could be attributed to faster progression through the cell cycle, we looked at clone SPE scores from scWGS data of an untreated HGSOC


patient with subclonal WGD in the MSK SPECTRUM cohort41,78. Patient OV-081 presented with a primary tumor in the left adnexa consisting of exclusively NGD tumor cells and a metastasis in the


omentum consisting of mostly WGD tumor cells (Fig. 6g, S11a). Running PERT on the tumor and contaminating normal scWGS cells from both sites, we found that the NGD tumor clones (B-D) had


the highest SPE scores, followed by WGD tumor clone (A) and the normal clone (E) (Fig. 6h, S11c). Importantly, we were confident that this discrepancy in SPE was not a site-specific batch


effect as there was a mix of WGD and NGD tumor clones in the omentum. The discrepancy in clone SPE was validated with site-specific scRNA data showing that 53% of tumor cells are cycling in


the NGD-dominant left adnexa (22% S-phase, 31% G2/M-phase) but only 33% of tumor cells are cycling in the WGD-dominant omentum (16% S-phase, 17% G2/M-phase, Fig. S11b). To validate this


phenomenon in the in vitro setting, we looked at the relationship between S-phase enrichment and fitness of the two WGD clones in _TP53_−/− WT sample SA906a. Here, we found that both WGD


clones were depleted for S-phase cells (rank 7/13 and 13/13 in overall clone SPE scores; 7 of 8 clone  × timepoint combinations with  >10 G1/2-phase cells have negative SPE) despite


having the highest overall fitness (rank 1/13 and 2/13 in fitness scores) (Fig. S12). These results demonstrate that WGD tumor cells achieve high fitness despite having slower progression


through the cell cycle than their NGD counterparts. DISCUSSION Here we demonstrate that S-phase cells from genomically unstable tumors and their replication timing profiles can be inferred


from unsorted single-cell whole genome sequence data using PERT. Our computational advance enabled us to investigate replication dynamics from tumor types such as TNBC and HGSOC, a setting


where previous experimental and computational methods could not properly isolate S-phase populations or account for subclonal copy number variation. After extensive benchmarking on simulated


and in silico mixing experiments of cell cycle sorted data, we ran PERT on a metacohort of  >50,000 cells which consisted of data from engineered epithelial cells, cancer cell lines, PDX


models, primary human tumors, time-series experiments, and multiple scWGS sequencing platforms. This data enabled us to interrogate the interplay between RT and genomic instability and


investigate when cell cycle distributions can help forecast clonal evolution. PERT analysis of time-series scWGS from engineered hTERT epithelial cell lines highlighted the complex


relationship between somatic CNAs and RT. Analysis of RT in relation to subsequent CNAs revealed that losses and breakpoints preferentially emerged from late RT loci while gains emerged from


early RT loci. These results are in agreement with correlations between unmatched CNA and RT data12,29,30,31. The relationship between RT and subsequent genomic alterations could be


explained by mechanisms of over- and under-replication2,5,79,80 or reflect differential fitness of gains vs losses in gene-rich early vs gene-poor late RT loci, respectively10. Separately,


shifts to earlier RT at losses and later RT at gains aligns with the directional effects of CNA buffering62,63,64,65 and attributes a portion of said buffering to rewired chromatin


organization. Across our metacohort, we found that RT is highly conserved across various mutational signatures and ploidies, with most RT variation being attributed to cell type and


X-inactivation. It was well known that RT is imprinted during early stages of development and heritable thereafter in normal tissues and genomically stable cancers15,16,17,22. However, the


high cell type RT specificity in our data supports an extension of the same model to genomically unstable cancers with massive structural variation. Instead of mutational signature and


ploidy impacting RT, we found that clones with WGD, the tandem duplication signature, or the fold-back inversion signature all had an enrichment of late over early S-phase cells. Given that


these groups are known to associate with replication stress, this accumulation in late S-phase likely reflects the ongoing repair of double-strand breaks and stalled replication


forks67,68,69,81. CNAs affecting chromosome X had the largest impact on RT due to Xi replicating later than Xa. This enabled PERT to identify recurrent Xa > Xi allelic imbalances in our


HGSOC and TNBC tumors using scWGS alone instead of requiring DNA and RNA data71,72,82. Quantification of clone-specific SPE scores across various datasets enabled us to investigate the


relationship between cell cycle dynamics and proliferative fitness. We found a strong positive correlation between SPE and proliferative fitness in genomically stable hTERT WT epithelial


cells and gastric cancer cell lines. However, a similar correlation was not present in the genomically unstable _TP53_−/− hTERTs and untreated TNBC PDXs. Two confounders may explain the lack


of correlation. First, death rate, which SPE does not measure, may be variable across the clones from the genomically unstable samples. Without measuring the death rate, we are left with an


incomplete picture of fitness, since fitness should be viewed as a composite of birth minus death rates7,8. Second, high SPE scores are unable to distinguish between two scenarios: an


increased birth rate and a longer time to complete S-phase. The duration of S-phase can be variable, especially in the context of replication stress2. Future attempts to predict fitness


would benefit from measuring both cell death and replication stress rates of individual clones – especially in the context of genomically unstable cancers which are prone to high


clone-to-clone variation of these properties. In spite of these potential confounders, we still observed a negative correlation between SPE and fitness in cisplatin-treated TNBC PDXs. High


SPE should correspond to high chemosensitivity regardless of whether higher birth rate or longer S-phase is the underlying reason behind high SPE because S-phase cells themselves are the


target of platinum chemotherapies and a result of their toxicity76. Our data suggests that SPE may serve as an informative biomarker for identifying treatment-resistant clones in patients


receiving neoadjuvant cisplatin. Viewing the depletion of S-phase cells in highly fit WGD clones with these confounders in mind also enables us to deduce that WGD confers high fitness by 


reducing death rate, likely via enhanced buffering of deleterious mutations83, implying that WGD cells perform slower but more robust cell division. METHODS PERT MODEL The input for PERT is


scWGS read depth (_Z_) and called CN states for all bins (_N_ cells  ×  _M_ loci) (Fig. S1a). Input CN states are obtained through single-cell CN callers such as HMMcopy49,84 or 10X


CellRanger-DNA51,52. PERT first identifies a set of high-confidence G1/2-phase cells where the input CN states reflect accurate somatic CN. All remaining cells have their input CN states


dropped as they are initially considered to have unknown CN states and cell cycle phase. Most S-phase cells should be present in the unknown initial set. High-confidence G1/2-phase cells are


phylogenetically clustered into clones based on CN using methods such as sitka85 or MEDICC286. Optionally, users can provide the set of high-quality G1/2-phase cells and their mapping to CN


clones as input. The initialized sets of cells are passed into a probabilistic model which infers somatic CN (_X_ ) and replication states (_Y_ ) through three distinct learning steps (Fig.


 S1c–e). In Step 1, PERT learns parameters associated with library-level GC bias (_β__μ_, _β__σ_) and sequencing overdispersion (_λ_) by training on high-confidence G1/2 cells (Fig. S1c).


Step 1 conditions on CN (_X_ ), replication (_Y_ ), and coverage/ploidy scaling terms (_μ_) because input CN states are assumed to accurately reflect somatic CN states and all bins are


unreplicated (_Y_ = 0) in high-confidence G1/2 cells. Once _β__μ_, _β__σ_ and _μ_ have been learned in Step 1, we can condition on them in Step 2 (Fig. S1d). Step 2 learns latent parameters


representing each cell’s time in S-phase (_τ__n_), each locus’s replication timing (_ρ__m_), and global replication stochasticity (_α_) to compute the probability that a given bin is


replicated (_Y__n_,_m_ = 1) or unreplicated (_Y__n_,_m_ = 0). Only unknown cells are included in Step 2. Prior belief on each unknown cell’s CN state is encapsulated using a prior


distribution (_π_) which has concentration parameters (_η_) conditioned on the input CN of the most similar high-confidence G1/2 cells. CN prior concentrations are set for each cell by using


the consensus CN profile of the most similar G1/2 clone or a composite scoring of the most similar G1/2 clone and cell CN profiles (Fig. S1f) A full list of model parameters, domains, and


distributions can be found in Fig. S1b. Step 3 is an optional final step which learns CN and replication states for high-confidence G1/2 cells (Fig. S1e). This step is necessary to determine


if any S-phase cells are present in the initial set of high-confidence G1/2 cells. Step 3 conditions on replication timing (_ρ_) and stochasticity (_α_) values learned in Step 2 to ensure


that such properties are conserved between both sets of cells. PERT is designed for scWGS data with coverage depths on the order of 0.01-0.1x and thus 500kb bin sizes are used by default in


this manuscript; however, the model can be run on count data of any bin size as long as sufficient memory and runtime are allocated. We demonstrate PERT’s ability to run on 10X scWGS data at


20kb resolution in Additional File 2. EQUATIONS FOR STEP 1 Given that we have accurate CN caller results for high-confidence G1/2 cells, we can solve for each cell’s coverage/ploidy scaling


term _μ__n_ and condition on it, $${\mu }_{n}=\frac{{\sum }_{m=0}^{M}{Z}_{n , m}}{\mathop{\sum }_{m=0}^{M}{X}_{n , m}}.$$ (1) The latent variables are arranged together in function block


_f_ through the following equations to produce the bin-specific negative binomial event counts _δ__n_,_m_. The GC bias rate of each individual bin (_ω__n_,_m_) depends on the GC content of


the locus (_γ__m_) and the GC bias coefficients (_β__n_,_k_) for the cell, $${\omega }_{n,m}={e}^{\mathop{\sum }_{k=0}^{K}{\beta }_{n,k} * {\gamma }_{m}^{k}}.$$ (2) The expected read count


per bin is computed as follows: $${\theta }_{n,m}={X}_{n,m} * {\omega }_{n,m} * {\mu }_{n}.$$ (3) The expected read count per bin is then used in conjunction with the negative binomial event


success probability term (_λ_) to produce a number of negative binomial event count for each bin, $${\delta }_{n,m}=f({X}_{n,m},{\gamma }_{m},\lambda,{\mu }_{n},{\beta


}_{n,k})=\frac{{\theta }_{n,m} * (1-\lambda )}{\lambda },$$ (4) where we place the constraint _δ__n_,_m_ ≥ 1 to avoid sampling errors in bins with _θ__n_,_m_ ≈ 0. Finally, the read count at


a bin is sampled from an overdispersed negative binomial distribution _Z__n_,_m_ ~ NB(_δ__n_,_m_, _λ_) where the expected read count for _Z__n_,_m_ is _θ__n_,_m_ and the variance is


\(\frac{{\theta }_{n,m}}{(1-\lambda )}\). EQUATIONS FOR STEPS 2–3 Steps 2-3 have equations which differ from Step 1 since it must account for replicated bins and cannot solve for _μ__n_


analytically. The probability of each bin being replicated (_ϕ__n_,_m_) is a function of the cell’s time in S-phase (_τ__n_), the locus’s replication timing (_ρ__m_), and the replication


stochasticity term (_α_). Replication stochasticity (_α_) controls how closely cells follow the global RT profile by adjusting the temperature of a sigmoid function. The following equation


corresponds to function block _g_: $${\phi }_{n,m}=g(\alpha, {\tau }_{n},{\rho }_{m})=\frac{1}{1+{e}^{-\alpha ({\tau }_{n}-{\rho }_{m})}}.$$ (5) Equations corresponding to function block _f_


differ from those in Step 1. The total CN (_χ__n_,_m_) is double the somatic CN (_X__n_,_m_) when a bin is replicated (_Y__n_,_m_ = 1), $${\chi }_{n,m}={X}_{n,m} * (1+{Y}_{n,m}).$$ (6) The


GC rates (_ω__n_,_m_) and negative binomial event counts (_δ__n_,_m_) are computed the same as in Step 1 (Eq (2), Eq (4)). However, the expected read count uses total instead of somatic CN,


$${\theta }_{n,m}={\chi }_{n,m} * {\omega }_{n,m} * {\mu }_{n}.$$ (7) Since CN is learned in Steps 2-3, the coverage/ploidy scaling term (_μ__n_) must also be learned. We use a normal prior


\({\mu }_{n} \sim {{\rm{N}}}({\mu }_{\mu }^{n},{\mu }_{\sigma }^{n})\) where the approximate total ploidy and total read counts are used to estimate the mean hyperparameters (\({\mu }_{\mu


}^{n}\)). Total ploidies for each cell are approximated using the CN prior concentrations (_η_) and times within S-phase (_τ_) to account for both somatic and replicated copies of DNA that


are present. We fixed the standard deviation hyperparameters (\({\mu }_{\sigma }^{n}\)) to always be 10x smaller than the means to ensure that _μ__n_≥0 despite use of a normal distribution


(used for computational expediency), $${\mu }_{\mu }^{n}=\frac{{\sum }_{m=0}^{M}{Z}_{n,m}}{(1+{\tau }_{n})\mathop{\sum }_{m=0}^{M}{{{\rm{argmax}}}}_{p}({\eta }_{n,m,p})},$$ (8) $${\mu


}_{\sigma }^{n}=\frac{{\mu }_{\mu }^{n}}{10}.$$ (9) CONSTRUCTING THE CN PRIOR CONCENTRATIONS There are two ways to construct the CN prior concentrations within PERT. The first is to use the


most similar high-confidence G1/2 clone to define the concentrations for each unknown cell (clone method). We assign each unknown cell its clone (_c__n_) via Pearson correlation between the


cell read depth profile (_Z__n_) and the clone pseudobulk read depth profile (_Z__c_), $${c}_{n}={{{\rm{argmax}}}}_{c}({{\rm{corr}}}({Z}_{n},{Z}_{c})).$$ (10) Clone pseudobulk CN and read


depth profiles represent the median profile across all high-confidence G1/2 cells in a given clone _c_. Once we have clone assignments for each unknown cell, the CN concentration of all


possible states _P_ at each genomic bin (_η__n_,_m_,_p_) is constructed to be _w_ times larger for the state _p_ that matches the clone pseudobulk CN state (\({X}_{{c}_{n},m}\)) for that


same bin compared to all other states. The default setting is _w_ = 106: $${\eta }_{n,m,p}=\left\{\begin{array}{ll}w\quad &\,{{\rm{if}}}\,\,p={X}_{{c}_{n},m}\\ 1\quad


&\,{\mbox{else}}\,.\hfill\end{array}\right.$$ (11) The second way to construct the prior is to leverage additional information from the most similar high-confidence G1/2 cells when


constructing _η__n_,_m_,_p_ (composite method). The rationale for the composite method is that there might be rare CNAs within a clone which only appear in a handful of cells but do not


appear in the clone pseudobulk CN profile _X__c_. To find the most similar high-confidence G1/2 cells, we compute the read depth correlation between the unknown cell (\({Z}_{{n}_{s}}\)) and


the high-confidence G1/2 cells from the best matching clone (\({Z}_{{n}_{g}}\)), $$\psi={{\rm{corr}}}({Z}_{{n}_{s}\!},\,\, {Z}_{{n}_{g}}\!).$$ (12) The consensus clone CN profile and top _J_


matches for each unknown cell are then used to construct the CN prior (_η__n_,_m_,_p_). Each row of _ψ_ is sorted to obtain the top _J_ high-confidence G1/2 matches _n__g_(0), …, _n__g_( 


_J_−1). All entries are initialized to 1 (_η__n_,_m_,_k_ = 1) before adding varying levels of weight (_w_) to states where the CN matches a G1/2-phase cell or clone pseudobulk CN profile.


The default settings are _w_ = 105 and _J_ = 5: $${\eta }_{n,m,p}=\left\{\begin{array}{ll}+1\quad \hfill&\,\,{\mbox{everywhere}}\,\\+w * 2 * J\quad


\hfill&{{\rm{if}}}\,\,p={X}_{{c}_{n},m}\\+w * (\,J-0)\quad \hfill&\,\,{{\rm{if}}}\,\,p={X}_{{n}_{g0},m}\\+w * (\,J-1)\quad \hfill&\,\,{{\rm{if}}}\,\,p={X}_{{n}_{g1},m}\\ \ldots


\quad \hfill \\+w\quad \hfill&\,\,\,\,\,\,\,\,\,{{\rm{if}}}\,\,p={X}_{{n}_{g(J-1)},m}.\end{array}\right.$$ (13) By default, the composite method is used during Step 2 and the clone


method is used during Step 3; however, the user may select between both methods during Step 2. Using the clone method during Step 2 should be seen as a ‘vanilla’ version of PERT which should


be used when very few cell-specific CNAs are present. The clone method is used for Step 3 since the composite method would produce many self-matching cells. A comparison of the two methods


can be seen when benchmarking PERT on simulated data (Supplementary Notes 1-2). MODEL INITIALIZATION AND HYPERPARAMETERS Splitting cells into initial sets of high-confidence G1/2-phase and


unknown cells is performed by thresholding heuristic per-cell features known to correlate with cell cycle phase. PERT uses clone-normalized number of input CN breakpoints between neighboring


genomic bins (BKnorm) and clone-normalized median absolute deviation in read depth between neighboring genomic bins (MADNnorm). These features are referred to as ‘HMMcopy breakpoints’ and


‘MADN RPM’, respectively, in the main text and figures. Note that breakpoints between the start and end of adjacently numbered chromosomes are not counted. $${{\mbox{BK}}}_{n}=\mathop{\sum


}_{m=0}^{M-1}\left\{\begin{array}{ll}1\quad &\,{{\rm{if}}}\,\,{X}_{n,m}\ne {X}_{n,m+1}\\ 0\quad &\,{\mbox{else}}\,\hfill\end{array}\right.$$ (14)


$${{\mbox{BKnorm}}}_{n}={{\mbox{BK}}}_{n}-\frac{1}{C}{\sum }_{c=0}^{C}{{\mbox{BK}}}_{c}$$ (15) $${{\mbox{MADN}}}_{n}={{\rm{Med}}}\,\left({\sum


}_{m=0}^{{\rm{M}}-1}{Z}_{n,m}-{Z}_{n,m+1}\right)$$ (16) $${{\mbox{MADNnorm}}}_{n}={{\mbox{MADN}}}_{n}-\frac{1}{C}{\sum }_{c=0}^{C}{{\mbox{MADN}}}_{c}$$ (17) Under default settings, PERT


initializes cells with MADNnorm < 0 and BKnorm < 0 as high-confidence G1/2-phase with all other cells as unknown phase. Initial cell phases can also be input by users based on


experimental measurements or alternative metrics such as 10X CellRanger-DNA’s ‘dimapd’ score (used in27,28,51), the Laks et al. classifiers’ S-phase probability and quality scores50, or read


depth correlation with a reference RT profile55. To improve convergence to global optima, each cell’s time in S-phase (_τ__n_) is initialized using scRT results from a clone-aware


adaptation of Dileep et al.25 which thresholds the clone-normalized read depth profiles into replicated and unreplicated bins. Each unknown cell _n_ is assigned to clone _c_ with the highest


correlation between cell and clone pseudobulk read depth profiles (Eq (10)). The read depth of each cell is then normalized by the CN state with highest probability within the CN prior


(_η__n_,_m_,_p_), $${y}_{n,m}=\frac{{Z}_{n,m}}{{{{\rm{argmax}}}}_{p}({\eta }_{n,m,p})}.$$ (18) The clone-normalized read depth profiles ( _y__n_) are then binarized into replication state


profiles (_Y__n_) using a per-cell threshold (_t__n_ ∈ [0, 1]) that minimizes the Manhattan distance between the real data and its binarized counterpart.


$${t}_{n}={{\mbox{argmin}}}_{t}\Bigg| \, {y}_{n,m}-\left\{\begin{array}{ll}1\quad &\,{{\rm{if}}}\,\,{y}_{n,m}\ge {t}_{n}\\ 0\quad &\,{\mbox{else}}\,\hfill\end{array}\right.\Bigg|$$


(19) $${Y}_{n,m}=\left\{\begin{array}{ll}1\quad &\,{{\rm{if}}}\,\,{y}_{n,m}\ge {t}_{n}\\ 0\quad &\,{\mbox{else}}\,\hfill\end{array}\right.$$ (20) The fraction of replicated bins per


cell from the deterministic replication states _Y__n_,_m_ are then used to initialize the parameter representing each cell’s time in S-phase (_τ__n_) within PERT’s probabilistic model.


$${\tau }_{n}=\frac{1}{M}{\sum }_{m=0}^{M}{Y}_{n,m}.$$ (21) Initialization of _τ__n_ is particularly important because the model might mistake an early S-phase cell (<20% replicated) for


a late S-phase cell (>80% replicated), or vice versa, as both have relatively ‘flat’ read depth profiles compared to mid-S-phase cells. Thus _τ__n_ will rarely traverse mid-S-phase values


during inference when its initial and true values lie far apart. Additional parameter initializations include _λ_ = 0.5 for negative binomial overdispersion and _β__σ_,_k_ = 10−_k_ for the


standard deviation of each GC bias polynomial coefficient _k_. Unlike _τ__n_, the model is unlikely to get stuck at local minima with these parameters so they are initialized to the same


values globally. The latent variables _β__μ_, _ρ_, and _α_ are sampled from prior distributions with fixed hyperparameters. The mean of all GC bias polynomial coefficients (_β__μ_) are drawn


from the prior N(0, 1). Each locus’s replication timing (_ρ_) is drawn from the prior Beta(1, 1) to create a uniform distribution on the domain [0, 1]. The replication stochasticity


parameter (_α_) is drawn from the prior distribution _Γ_(shape = 2, rate = 0.2) which has a mean of \(\frac{{{\rm{shape}}}}{{{\rm{rate}}}}=10\) and penalizes extreme values on a positive


real domain. PERT PHASE PREDICTIONS We used the PERT model output to predict ‘G1/2’, ‘S’, and ‘low quality’ (LQ) phases for each cell. G1/2-phase cells were defined by having  <5% or  


>95% replicated bins. Of the remaining cells with 5-95% replicated bins, those with high read depth autocorrelation (>0.5), replication state autocorrelation (>0.2), or fraction of


homozygous deletions (_X_ = 0, >0.05) were deemed to be low quality. All other cells were deemed to be in S-phase. Using 500kb bins, autocorrelation scores were the average of all


autocorrelations ranging from 10 to 50 bin lag size. Thresholds used for splitting S and LQ phases can be adjusted by users should the default settings produce unexpected output. MODEL


CONSTRUCTION AND INFERENCE PERT is written using Pyro which is a probabilistic programming language written in Python and supported by PyTorch backend58. PERT uses Pyro’s implementation of


Black Box Variational Inference (BBVI) which enables the use of biologically-informed priors instead of being limited to conjugate priors87. Specifically, we use the AutoDelta function which


uses a Taylor approximation around the _maximum a posteriori_ (MAP) to approximate the posterior. Optimization is performed using the Adam optimizer. By default, we set a learning rate of


0.05 and convergence is determined when the relative change in the evidence lower bound (ELBO) is  <10−6 or the maximum number of iterations (2000 for step 2, 1000 for steps 1 and 3) is


reached. SIMULATED DATASETS To benchmark PERT’s ability to accurately infer single-cell replication states, somatic CN states, and cell cycle phase against Kronos and the Laks et al. cell


cycle classifier, we simulated datasets with varying clonal structures and cell-specific CNA rates. Somatic CN states are simulated by first drawing clone CN profiles and then drawing


cell-specific CNAs that deviated from said clone CN profile. All CNAs are drawn at the chromosome-arm level. 400 S- and 400 G1/2-phase cells are simulated in each dataset. Once CN states


have been simulated, we simulate the read depth using PERT as a generative model. We condition the model on the provided _β__μ_, _β__σ_, _λ_, _α_, _ρ_, _γ_, and _X_ parameters when


generating cell read depth profiles. All read depth values (_Z_) are in units of reads per million. RepliSeq data for various ENCODE cell lines are used to set _ρ_ values for each clone23.


G1/2-phase cells were conditioned to have all bins as unreplicated _Y_ = 0. S-phase cells had their cell cycle times _τ_ sampled from a Uniform(0, 1) distribution. A table of all the


parameters used in each simulated dataset can be found in Table 1. We called CN on simulated binned read count data using HMMcopy. Given that Kronos was designed as an end-to-end pipeline


that takes in raw BAM files, we forked off the Kronos repository and edited their ‘Kronos RT’ module to accept binned read count and CN states as input. Cells were split into S- and


G1/2-phase Kronos input populations according to their true phase. Code to our forked repository can be found at https://github.com/adamcweiner/Kronos_scRT. Similarly, we removed features


from the Laks et al. cell cycle classifier that used alignment information such as the percentage of overalapping reads per cell. The Laks classifier was retrained and benchmarked with said


features removed prior to deployment on simulated data (Fig. S13). EXPERIMENTAL METHODS AND PARTICIPANT DETAILS CELL CULTURE AND PDXS Cell lines and PDX samples were generated in Laks et


al.50, Funnell et al.61, and Salehi et al.32 studies. Cell line samples included (1) an immortalized normal human female breast epithelial cell line 184-hTERT L9, (2) four sets of 184-hTERT


cell lines with perturbations in TP53−/− passaged over multiple time points, (3) five 184-hTERT cell lines with a variety of genetic perturbations in the repair pathway, including TP53−/−,


BRCA1+/−, BRCA1−/−, BRCA2+/− and BRCA2−/−, (4) lymphoblastoid cell line GM18507, (5) HER2+ breast cancer cell line T47D, and (6) ovarian cancer cell line OV2295. PDX samples included 6


different models of TNBC, three of which had cisplatin-treated replicates, and 15 different models of HGSOC. Serial passaging of cell lines was done by seeding approximately 1 million cells


each time and profiling with scWGS (DLP+) at 4–11 different passage points with a mean of 6,070 cells at each time point. Xenograft-bearing mice were sacrificed when the tumor size


approached 1000 mm3 in volume, according to the limits of the experimental protocol. For the serially passaged PDXs, the harvested tumor was minced finely with scalpels and then mechanically


disaggregated for one minute using a Stomacher 80 Biomaster (Seward) in 1 ml to 2 ml cold DMEM-F12 medium with glucose, L-glutamine and HEPES. Aliquots from the resulting suspension of


cells and fragments were used for xenotransplants in the next generation of mice and cryopreserved for sequencing. Serially transplanted aliquots represented approximately 0.1– –0.3% of the


original tumor volume from the previous mouse. MSK SPECTRUM PATIENT DATA We obtained matched scRNA (10x Genomics 3’-end) and scWGS (DLP+) from HGSOC patient OV-081 from the MSK SPECTRUM


cohort. Single cell suspensions from surgically excised tissues were generated and flow sorted on CD45 to separate the immune component. CD45 negative fractions were then sequenced using the


DLP+ platform. scRNA-seq was performed on both CD45 positive and negative fractions, with the malignant cells used in this study being the CD45 negative fraction. Detailed descriptions of


data generation can be found in the two SPECTRUM studies41,78. ACQUISITION OF SAMPLES FROM PATIENTS Patient samples from University of British Columbia32,66 were acquired with informed


consent, according to procedures approved by the Ethics Committees at the University of British Columbia. Patients with breast cancer undergoing diagnostic biopsy or surgery were recruited


and samples were collected under protocols H06-00289 (BCCA-TTR-BREAST), H11-01887 (Neoadjuvant Xenograft Study), H18-01113 (Large-scale genomic analysis of human tumors) or H20-00170


(Linking clonal genomes to tumor evolution and therapeutics). HGSOC samples were obtained from women undergoing debulking surgery under protocols H18-01652 and H18-01090. Patient samples


from Memorial Sloan Kettering Cancer Center41,61,78 were obtained following Institutional Review Board (IRB) approval and patient informed consent (protocols 15–200 and 06-107 for HGSOC;


18–376 for TNBC). HGSOC and TNBC clinical assignments were performed according to American Society of Clinical Oncology guidelines for ER, PR and HER2 positivity. EXPERIMENTAL CELL CYCLE


SORTING GM18507 and T47D cell lines were experimentally sorted into G1, S, G2, and dead cells prior to scWGS (DLP+) sequencing50. 2 million cells fresh from culture suspended in 1 mL PBS


were stained with Hoechst 33342 (Invitrogen), caspase 3/7 (Essen Biosciences), and propidium iodide (PI, Sigma Aldrich) for flow sorting separation of different cell phases. Flow sorting was


carried out at the Terry Fox Laboratory, (BC Cancer Research Centre) using a BD FACSAria III cell sorter equipped with 375 nm, 405 nm, 488 nm, 561 nm and 640 nm laser. We excluded apoptotic


cells on the live cell gate by gating out Caspase 3/7 high cells. On this live non-apoptotic cell gate, we gated for the cell cycle phases using DNA content of the cells measured by Hoechst


33342 staining to sort the G1, S, and G2 phases of the cell cycle individually. Cells from different cell cycle fractions were stained and dispensed into DLP+ chips for sequencing. SCWGS


DATA PROCESSING Unless otherwise noted, all scWGS data was generated via DLP+. All DLP+ data was passed through https://github.com/shahcompbio/single_cell_pipelineusing Isabl before


downstream analysis88. This pipeline aligned reads to the hg19 reference genome using BWA-MEM. Each cell was then passed through HMMcopy using default arguments for single-cell sequencing.


HMMcopy’s output provided read count and gc-corrected integer CN states for each 500kb genomic bin across all cells and loci. Loci with low mappability (<0.95) and cells with low read


count (<500,000 reads) were removed. Cells were also filtered for contamination using the FastQ Screen which tags reads as matching human, mouse, or salmon reference genomes. If  >5%


of reads in a cell are tagged as non-human the cell is flagged as contaminated and subsequently removed. Cells were only passed into phylogenetic trees if they were called as G1/2-phase and


high quality by classifiers described in Laks et al.50. In certain cases, cells might be manually excluded from the phylogenetic tree if they pass the cell cycle and quality filters but have


an abnormally high number of HMMcopy breakpoints. All cells included in the phylogenetic tree are initialized in PERT as the set of high-confidence G1/2 cells; all cells outside the tree


are initialized as unknown cells. PHYLOGENETIC CLUSTERING BASED ON CN PROFILES We used the clone IDs from Funnell et al. for high-confidence G1/2 cells61. These single-cell phylogenetic


trees were generated using sitka85. Sitka uses CN breakpoints (also referred to as changepoints) across the genome as binary input characters to construct the evolutionary relationships


between cells. Sitka was run for 3,000 chains and a consensus tree was computed for downstream analysis. The consensus tree was then cut at an optimized height to assign all cells into


clones (clusters). For datasets with no sitka trees provided or select datasets, cells were clustered into clones using K-means where the number of clones was selected through Akaike


information criterion. We performed a K-means reclustering for the Salehi et al. TNBC PDX data32 as sitka produced small clusters which inhibited robust tracking of S-phase clone fractions


across multiple timepoints. PSEUDOBULK PROFILES Many times in the text we describe pseudobulk replication timing, copy number, or read depth profiles within a subset of interest (i.e. cells


belonging to the same clone or sample). To compute pseudobulk profiles, we group all the cells of interest and then take the median values for all loci in the genome. When computing


pseudobulk CN profiles, we only include the cells of the modal (most common) ploidy state before computing median values for all loci. S-PHASE TIMES When we refer to the time of individual


S-phase cells, such a time is calculated as the fraction of replicated bins per cell. Thus, S-phase times near 1 are late S-phase cells and S-phase times near 0 are early S-phase cells.


COMPARISON OF PERT RT PROFILES TO PREDICTED RT PROFILES Predicted RT (predRT) profiles for ENCODE primary cell samples were obtained from Salvadores et al.18. The authors used the Replicon


software with default settings59 to predict RT from DNase-seq chromatin accessibility data of each ENCODE sample. Human reference hg19 coordinates were used to create RT profiles with 1 MB


bin size for this comparison. PLOIDY-NORMALIZED CN We sometimes opt to show ploidy-normalized CN profiles instead of their integer CN values. The ploidy _p__n_ of a given CN profile _n_ is


defined as the mode of all integer CN states _X__n_,_m_. We then compute a ploidy-normalized CN value _χ__n_,_m_ as follows: $${\chi }_{n,m}=\frac{{X}_{n,m}-{p}_{n}}{{p}_{n}}.$$ (22) Note


that _n_ can be a profile corresponding to a cell, clone, or sample. IDENTIFICATION OF RT SHIFTS All RT profiles were centered to have a mean of 0 and standard deviation of 1 across the


entire genome prior to comparison to one another. This was essential as certain clones might have a skewed distribution of early vs late S-phase cells and thus their RT profiles would have


later or earlier replication across all positions in the genome. BESPOKE FACTOR MODEL WHICH LEARNS FEATURE IMPORTANCE AND RT PROFILES DIRECTLY FROM CLONE RT PROFILES We built a multivariate


regression model which learned importance terms and RT profiles for each feature directly from the matrix of clone RT profiles. The model has the following terms and equations: RT_c_,_m_:


the observed replication timing of clone _c_ at locus _m_ on the domain of [0,1]. This represents the fraction of replicated bins at locus _m_ across all S-phase cells _n_ in clone _c_.


_ρ__k_,_m_: the latent replication timing of feature _k_ at locus _m_. _I__c_,_k_: indicator mask representing which features _k_ are present for clone _c_. _β__k_: importance term for


feature _k_. _σ_: standard deviation term when going from expected to observed replication timing; sampled from a uniform distribution on the domain (0,1). $${{\mbox{RT}}}_{c,m} \sim


N(\frac{1}{1+{e}^{\mathop{\sum }_{k=0}^{K}({\beta }_{k} * {I}_{c,k} * {\rho }_{k,m})}},\sigma ).$$ (23) All latent replication timing terms _ρ__k_ are normalized to have mean of 0 and


variance of 1 and there is only _β_ value per class of features $${\beta }_{k}=\left\{\begin{array}{ll}{\beta }_{t}\quad &\,\,\,\,\, {{\rm{if}}}\,{{\rm{k}}}\,{{\rm{is}}} \, {{\rm{a}}} \,


{{\rm{cell}}} \, {{\rm{type}}} \, {{\rm{feature}}}\,\\ {\beta }_{s}\quad &\,\,\,\,\,\,\,\,{{\rm{if}}}\,k\,{{\rm{is}}} \, {{\rm{a}}} \, {{\rm{signature}}} \, {{\rm{feature}}}\,\\ {\beta


}_{p}\quad &{{\rm{if}}}\,{{\rm{k}}}\,{{\rm{is}}}\, {{\rm{a}}}\, {{\rm{ploidy}}} \, {{\rm{feature}}}\,\\ {\beta }_{d}\quad &\,\,\,{{\rm{if}}}\,k\,{{\rm{is}}} \, {{\rm{a}}} \,


{{\rm{sample}}} \, {{\rm{feature}}}\,\\ {\beta }_{g}\quad &{{\rm{if}}}\,k\,{{\rm{is}}} \, {{\rm{a}}} \, {{\rm{global}}} \, {{\rm{feature}}}\,\end{array}\right.$$ (24) This model is


implemented in pyro and fit using BBVI58,87. We use the AutoNormal function which uses Normal distributions to approximate the posterior. Optimization is performed using the Adam optimizer


with a learning rate of 0.02. Convergence is determined when the relative change in ELBO is  <10−3 of the total ELBO change between first and current iteration. USING SIGNALS TO QUANTIFY


ALLELIC RATIOS FROM SCDNA- AND SCRNA-SEQ In brief, SIGNALS uses haplotype blocks genotyped in single cells and implements an hidden Markov model (HMM) based on a Beta-Binomial likelihood to


infer the most probable haplotype-specific state. SHAPEIT was used to generate the haplotype blocks for SIGNALS input89. A full description of SIGNALS can be found in Funnell et al.61.


Within each haplotype block for each sample, the major (most common) allele is labeled as the A-allele with the minor (less common) allele labeled as the B-allele. The B-allele frequency


(BAF) is computed as the fraction of B-allele heterozygouos single nucleotide polymorphisms (SNPs) out of all heterozygous SNPs present in a given bin. SIGNALS is run on scDNA data by


default but when scRNA data is also available, the haplotype blocks derived from the scDNA data can be used to extract A- and B-allele counts in the scRNA data too (albeit with much fewer


counts as there are fewer SNPs sequenced in scRNA data). SIGNALS output for all scDNA samples is shown in Supplementary Data 1. GASTRIC CANCER CELL LINE DATA 10X Chromium single-cell DNA


(10X scWGS) data of gastric cancer cell lines NCI-N87, HGC-27, and SNU-668 were downloaded from SRA (PRJNA498809). CN calling was performed using the CellRanger-DNA pipeline using default


parameters. Data was aggregated from 20kb to 500kb bins for analysis with PERT. Each cell line’s doubling time and fraction of G1-phase scRNA cells were extracted from Andor et al.51. PERT


output for these sames is shown in Supplementary Data 2. CLONE S-PHASE ENRICHMENT SCORES To test whether a clone (_c_) is significantly enriched or depleted for S-phase cells at a given


timepoint (_t_), we must compare that clone’s fraction in both S- and G1/2-phases. We first define the following variables as such: _N__s_,_c_,_t_: Number of S-phase cells belonging to clone


_c_ at time _t_ _N__g_,_c_,_t_: Number of G1/2-phase cells belonging to clone _c_ at time _t_ _N__s_,_t_: Total number of S-phase cells across all clones at time _t_ _N__g_,_t_: Total


number of G1/2-phase cells across all clones at time _t_ _N__t_: Total number of cells in a population at time _t_ (all clones, all phases) We can then define the fractions of S- and


G1/2-phase cells assigned to clone _c_ at time _t_ (_f__s_,_c_,_t_, _f__g_,_c_,_t_): $${f}_{s,c,t}=\frac{{N}_{s,c,t}}{{N}_{s,t}},$$ (25) $${f}_{g,c,t}=\frac{{N}_{g,c,t}}{{N}_{g,t}}.$$ (26)


Using the fraction of G1/2-cells belonging to clone _c_, we can compute the expected total number of cells in clone _c_ and time _t_ across all cell cycle phases, $$E({N}_{c,t})={f}_{g,c,t}


* {N}_{t}.$$ (27) We produce a p-value for enrichment of S-phase cells using a hypergeometric test scipy.stats.hypergeom(M = _N__t_, n = _N__s_,_t_, N = _E_(_N__c_,_t_)).sf(_N__s_,_c_,_t_).


To produce a p-value for S-phase depletion we subtract this enrichment p-value from 1. All p-values are Bonferroni-corrected by dividing by the total number of statistical tests. p-adjusted


thresholds of 10−2 are used for saying a clone is significantly enriched or depleted for S-phase cells within a given library. The enriched (_p_+,_c_,_t_) and depleted (_p_−,_c_,_t_)


p-values from this hypergeometric test are then compared to one another to produce a single continuous SPE score (_ξ__c_,_t_). Positive values indicate enrichment for S-phase cells and


negative values indicate depletion of S-phase cells, $${\xi }_{c,t}= {\log }_{10}({p}_{-,c,t})-{\log }_{10}({p}_{+,c,t}).$$ (28) CLONE EXPANSION SCORES For time-series scWGS experiments, we


computed clone expansion scores for each clone _c_ at time _t_ (_S__c_,_t_) by examining the fraction of G1/2-phase cells belonging to clone _c_ at timepoint _t_ (_f__g_,_c_,_t_) and the


subsequent timepoint (_f__g_,_c_,_t_+1). Positive values indicate the clone expands by the next timepoint, $${S}_{c,t}={f}_{g,c,t+1}-{f}_{g,c,t}.$$ (29) We use the same equation when


computing the overall fitness of clones across multiple timepoints but instead treat the first timepoint as _t_ and the final timepoint as _t_ + 1. COMPARING SPE TO EXPANSION IN TREATED VS


UNTREATED DATA To test that treated clones had a significant difference in their relationship between SPE scores (_ξ__c_,_t_) and expansion scores (_S__c_,_t_) in treated (_T_) vs untreated


(_U_) data, we first fit a linear regression curve to the untreated data, $$S_{c,t}^{U} \sim f({\xi }_{c,t}^{U},\hat{\beta}^{U})=\hat{\beta_{0}^{U}}+\hat{\beta_{1}^{U}} * \xi_{c,t}^{U}.$$


(30) We then computed the residuals between the treated data and this line of best fit, $${S}_{c,t}^{U-T}=(\hat{{\beta }_{0}^{U}}+\hat{{\beta }_{1}^{U}} * {\xi }_{c,t}^{T})-{S}_{c,t}^{T}.$$


(31) We then computed a second linear regression curve to the residuals \({S}_{c,t}^{U-T} \sim f({\xi }_{c,t}^{T},\hat{\beta}^{U-T})\) and computed the p-value for a hypothesis test whose


null hypothesis is that the slope is zero, using Wald Test with t-distribution of the test statistic. Having a _p_ < 0.05 indicated that the slope of the treated and untreated lines are


significantly different. All clone × timepoint combinations with <10 G1/2-phase cells were excluded from such analysis. For each point, both _t_ and _t_ + 1 are either untreated or


on-treatment, thereby excluding points where _t_ is untreated and _t_ + 1 is treated. CELL CYCLE ANALYSIS OF SCRNA DATA When available, we validated PERT cell cycle distributions using the


cell cycle distributions estimated through scRNA sequencing. We determined the cell cycle phase of each scRNA cell using the Seurat CellCycleScoring() function73 which uses a set of S- and


G2M-phase markers derived from Tirosh et al.74. STATISTICAL TESTS When boxplots are presented in the figures, hinges represent the 25% and 75% quantiles and whiskers represent the ±1.5x


interquartile range. Statistical significance is tested using two-sided independent t-tests from scipy.stats unless otherwise noted. Bonferroni correction is implemented for all statistical


tests to limit false discovery. The number of stars is a shorthand for the adjusted p-value of a given statistical test (<10−4: ****,  <10−3: ***,  <10−2: **,  <0.05: *, ≥0.05:


ns). Shaded areas surrounding linear regression lines of best fit represent 95% confidence intervals obtained via boostrapping (n=1000 boostrap resamples). Unless otherwise noted, linear


regressions are annotated with Pearson correlation coefficients (_r_) and the p-value for a hypothesis test whose null hypothesis is that the slope is zero, using the two-sided Wald Test


with t-distribution of the test statistic. REPORTING SUMMARY Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. DATA


AVAILABILITY Source and supplementary data from this study are provided at the following zenodo page 12786373. Raw and pre-processed data from Funnell et al.61 can be found on zenodo page


6998936. Raw scWGS data from Laks et al. and Salehi et al.32,50 are available from the European Genome-Phenome under study IDs EGAS00001004448 and EGAS00001003190, respectively. Publicly


accessible and controlled access data generated and analyzed from MSK SPECTRUM HGSOC patients are documented in Synapse under SynID syn25569736. Raw DNA sequencing data is often subject to


restrictions in order to protect patient privacy. Specific restrictions and instructions to obtain access can be found on the data repository websites for each study listed above. All raw


RNA sequencing data and processed data is available for unrestricted use. CODE AVAILABILITY The following code repositories are publicly available and contain tutorials for installation and


use. Package containing PERT model and other tools for scRT analysis90: https://github.com/shahcompbio/scdna_replication_tools (v1.0.0). DLP+ single-cell whole genome sequencing pipeline:


https://github.com/shahcompbio/single_cell_pipeline. Data preprocessing, analysis scripts, and figure generation: https://github.com/shahcompbio/scdna_replication_paper REFERENCES *


Williams, G. H. & Stoeber, K. The cell cycle and cancer. _J. Pathol._ 226, 352–64 (2012). Article  CAS  PubMed  Google Scholar  * Gaillard, H., García-Muse, T. & Aguilera, A.


Replication stress and cancer. _Nat. Rev. Cancer_ 15, 276–289 (2015). Article  CAS  PubMed  Google Scholar  * Rivera-Mulia, J. C. et al. Allele-specific control of replication timing and


genome organization during development. _Genome Res_ 28, 800–811 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Garribba, L. et al. Short-term molecular consequences of


chromosome mis-segregation for genome stability. _Nat. Commun._ 14, 1353 (2023). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Shaikh, N. et al. Replication stress generates


distinctive landscapes of DNA copy number alterations and chromosome scale losses. _Genome Biol._ 23, 223 (2022). Article  CAS  PubMed  PubMed Central  Google Scholar  * Burrell, R. A. et


al. Replication stress links structural and numerical cancer chromosomal instability. _Nature_ 494, 492–496 (2013). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Nowell, P. C.


The clonal evolution of tumor cell populations. _Science_ 194, 23–8 (1976). Article  ADS  CAS  PubMed  Google Scholar  * Greaves, M. & Maley, C. C. Clonal evolution in cancer. _Nature_


481, 306–13 (2012). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers.


_Nature_ 486, 395–399 (2012). Article  ADS  CAS  PubMed  Google Scholar  * Hansen, R. S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing.


_Proc. Natl Acad. Sci. USA_ 107, 139–44 (2010). Article  ADS  CAS  PubMed  Google Scholar  * Guilbaud, G. et al. Evidence for sequential and increasing activation of replication origins


along replication timing gradients in the human genome. _PLoS Comput Biol._ 7, e1002322 (2011). Article  CAS  PubMed  PubMed Central  Google Scholar  * De, S. & Michor, F. DNA


replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. _Nat. Biotechnol._ 29, 1103–1108 (2011). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Rivera-Mulia, J. C. et al. Dynamic changes in replication timing and gene expression during lineage specification of human pluripotent stem cells. _Genome Res._ 25, 1091–1103


(2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Rivera-Mulia, J. C. et al. Replication timing networks reveal a link between transcription regulatory circuits and replication


timing control. _Genome Res_ 29, 1415–1428 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  * Vouzas, A. E. & Gilbert, D. M. Mammalian DNA replication timing. _Cold Spring


Harbor Perspectives in Biol._ 13, http://cshperspectives.cshlp.org/content/13/7/a040162.abstract. (2021). * Donley, N. & Thayer, M. J. DNA replication timing, genome stability and


cancer: late and/or delayed DNA replication timing is associated with increased genomic instability. _Semin Cancer Biol._ 23, 80–9 (2013). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Klein, K. N. et al. Replication timing maintains the global epigenetic state in human cells. _Science_ 372, 371–378 (2021). Article  ADS  CAS  PubMed  PubMed Central  Google


Scholar  * Salvadores, M. & Supek, F. Cell cycle gene alterations associate with a redistribution of mutation risk across chromosomal domains in human cancers. _Nat. Cancer_ 5, 330–346


(2024). Article  CAS  PubMed  Google Scholar  * Spielmann, M., Lupiáñez, D. G. & Mundlos, S. Structural variation in the 3D genome. _Nat. Rev. Genet._ 19, 453–467 (2018). Article  CAS 


PubMed  Google Scholar  * Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. _Nature_ 486, 346–52 (2012). Article  CAS  PubMed 


PubMed Central  Google Scholar  * Sheltzer, J. M., Torres, E. M., Dunham, M. J. & Amon, A. Transcriptional consequences of aneuploidy. _Proc. Natl Acad. Sci. USA_ 109, 12644–9 (2012).


Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Ryba, T. et al. Abnormal developmental control of replication-timing domains in pediatric acute lymphoblastic leukemia. _Genome


Res_ 22, 1833–44 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar  * The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. _Nature_ 489,


57–74 (2012). * Hulke, M. L., Massey, D. J. & Koren, A. Genomic methods for measuring DNA replication dynamics. _Chromosome Res._ 28, 49–67 (2020). Article  CAS  PubMed  Google Scholar 


* Dileep, V. & Gilbert, D. M. Single-cell replication profiling to measure stochastic variation in mammalian replication timing. _Nat. Commun._ 9, 427 (2018). Article  ADS  PubMed 


PubMed Central  Google Scholar  * Takahashi, S. et al. Genome-wide stability of the DNA replication program in single mammalian cells. _Nat. Genet._ 51, 529–540 (2019). Article  CAS  PubMed


  Google Scholar  * Massey, D. J. & Koren, A. High-throughput analysis of single human cells reveals the complex nature of DNA replication timing control. _Nat. Commun._ 13, 2402 (2022).


Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Gnan, S. et al. Kronos scrt: a uniform framework for single-cell replication timing analysis. _Nat. Commun._ 13, 2329 (2022).


Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Du, Q. et al. Replication timing and epigenome remodelling are associated with the nature of chromosomal rearrangements in


cancer. _Nat. Commun._ 10, 416 (2019). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Sarni, D. et al. 3D genome organization contributes to genome instability at fragile


sites. _Nat. Commun._ 11, 3613 (2020). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. _Nature_


578, 112–121 (2020). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Salehi, S. et al. Clonal fitness inferred from time-series modelling of single-cell cancer genomes. _Nature_


595, 585–590 (2021). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Khan, K. H. et al. Longitudinal liquid biopsy and mathematical modeling of clonal evolution forecast time


to treatment failure in the prospect-c phase ii colorectal cancer clinical trial. _Cancer Discov._ 8, 1270–1285 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Tataru, P.,


Simonsen, M., Bataillon, T. & Hobolth, A. Statistical inference in the wright-fisher model using allele frequency data. _Syst. Biol._ 66, e30–e46 (2017). PubMed  Google Scholar  *


Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. _Cell_ 173, 371–385.e18 (2018). Article  PubMed  PubMed Central  Google Scholar  * Gerstung, M. et


al. The evolutionary history of 2,658 cancers. _Nature_ 578, 122–128 (2020). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Sunshine, A. B. et al. The fitness consequences of


aneuploidy are driven by condition-dependent gene effects. _PLoS Biol._ 13, e1002155 (2015). Article  PubMed  PubMed Central  Google Scholar  * Ben-David, U. & Amon, A. Context is


everything: aneuploidy in cancer. _Nat. Rev. Genet_ 21, 44–62 (2020). Article  CAS  PubMed  Google Scholar  * Vasudevan, A. et al. Single-chromosomal gains can function as metastasis


suppressors and promoters in colon cancer. _Developmental Cell_ 52, 413–428.e6 (2020). Article  PubMed  PubMed Central  Google Scholar  * Martincorena, I. et al. Universal patterns of


selection in cancer and somatic tissues. _Cell_ 171, 1029–1041.e21 (2017). Article  PubMed  PubMed Central  Google Scholar  * Vázquez-García, I. et al. Ovarian cancer mutational processes


drive site-specific immune evasion. _Nature_ 612, 778–786 (2022). Article  ADS  PubMed  PubMed Central  Google Scholar  * Tachibana, K. E., Gonzalez, M. A. & Coleman, N.


Cell-cycle-dependent regulation of DNA replication and its relevance to cancer pathology. _J. Pathol._ 205, 123–9 (2005). Article  CAS  PubMed  Google Scholar  * Yerushalmi, R., Woods, R.,


Ravdin, P. M., Hayes, M. M. & Gelmon, K. A. Ki67 in breast cancer: prognostic and predictive potential. _Lancet Oncol._ 11, 174–83 (2010). Article  CAS  PubMed  Google Scholar  *


Bogunovic, D. et al. Immune profile and mitotic index of metastatic melanoma lesions enhance clinical staging in predicting patient survival. _Proc. Natl Acad. Sci. USA_ 106, 20429 (2009).


Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Ibrahim, A., Lashen, A., Toss, M., Mihai, R. & Rakha, E. Assessment of mitotic activity in breast cancer: revisited in the


digital pathology era. _J. Clin. Pathol._ 75, 365 (2022). Article  CAS  PubMed  Google Scholar  * van Diest, P. J., Brugal, G. & Baak, J. P. Proliferation markers in tumours:


interpretation and clinical value. _J. Clin. Pathol._ 51, 716–24 (1998). Article  PubMed  PubMed Central  Google Scholar  * Solius, G. M., Maltsev, D. I., Belousov, V. V. & Podgorny, O.


V. Recent advances in nucleotide analogue-based techniques for tracking dividing stem cells: An overview. _J. Biol. Chem._ 297, 101345 (2021). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Taylor, J. H., Woods, P. S. & Hughes, W. L. The organization and duplication of chromosomes as revealed by autoradiographic studies using tritium-labeled thymidinee. _Proc.


Natl Acad. Sci. USA_ 43, 122–8 (1957). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Zahn, H. et al. Scalable whole-genome single-cell library preparation without


preamplification. _Nat. Methods_ 14, 167–173 (2017). Article  CAS  PubMed  Google Scholar  * Laks, E. et al. Clonal decomposition and DNA replication states defined by scaled single-cell


genome sequencing. _Cell_ 179, 1207–1221.e22 (2019). Article  PubMed  PubMed Central  Google Scholar  * Andor, N. et al. Joint single cell DNA-seq and RNA-seq of gastric cancer cell lines


reveals rules of in vitro evolution. _NAR Genom. Bioinform_ 2, lqaa016 (2020). Article  PubMed  PubMed Central  Google Scholar  * Velazquez-Villarreal, E. I. et al. Single-cell sequencing of


genomic DNA resolves sub-clonal heterogeneity in a melanoma cell line. _Commun. Biol._ 3, 318 (2020). Article  CAS  PubMed  PubMed Central  Google Scholar  * Navin, N. et al. Tumour


evolution inferred by single-cell sequencing. _Nature_ 472, 90–4 (2011). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Minussi, D. C. et al. Breast tumours maintain a


reservoir of subclonal diversity during expansion. _Nature_ 592, 302–308 (2021). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Schneider, M. P. et al. scAbsolute: measuring


single-cell ploidy and replication status. _Genome Biol._ 25, 62 (2024). Article  CAS  PubMed  PubMed Central  Google Scholar  * Thurman, R. E., Day, N., Noble, W. S. &


Stamatoyannopoulos, J. A. Identification of higher-order functional domains in the human encode regions. _Genome Res_ 17, 917–27 (2007). Article  CAS  PubMed  PubMed Central  Google Scholar


  * Edwards, M. M. et al. Delayed dna replication in haploid human embryonic stem cells. _Genome Res_ 31, 2155–2169 (2021). Article  PubMed  PubMed Central  Google Scholar  * Bingham, E. et


al. Pyro: deep universal probabilistic programming. _J. Mach. Learn. Res._ 20, 973–978 (2019). Google Scholar  * Gindin, Y., Meltzer, P. S. & Bilke, S. Replicon: a software to accurately


predict DNA replication timing in metazoan cells. _Front Genet_ 5, 378 (2014). Article  PubMed  PubMed Central  Google Scholar  * Grolmusz, V. K. et al. Fluorescence activated cell sorting


followed by small RNA sequencing reveals stable microRNA expression during cell cycle progression. _BMC Genomics_ 17, 412 (2016). Article  PubMed  PubMed Central  Google Scholar  * Funnell,


T. et al. Single-cell genomic variation induced by mutational processes in cancer. _Nature_ 612, 106–115 (2022). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Schukken, K. M.


& Sheltzer, J. M. Extensive protein dosage compensation in aneuploid human cancers. _Genome Res_ 32, 1254–1270 (2022). Article  PubMed  PubMed Central  Google Scholar  * Stingele, S. et


al. Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells. _Mol. Syst. Biol._ 8, 608 (2012). Article  PubMed  PubMed Central  Google Scholar


  * Torres, E. M. et al. Effects of aneuploidy on cellular physiology and cell division in haploid yeast. _Science_ 317, 916–24 (2007). Article  ADS  CAS  PubMed  Google Scholar  *


Bhattacharya, A. et al. Transcriptional effects of copy number alterations in a large set of human cancers. _Nat. Commun._ 11, 715 (2020). Article  ADS  CAS  PubMed  PubMed Central  Google


Scholar  * Funnell, T. et al. Integrated structural variation and point mutation signatures in cancer genomes using correlated topic models. _PLOS Computational Biol._ 15, 1–24 (2019).


Article  Google Scholar  * Feng, W. & Jasin, M. BRCA2 suppresses replication stress-induced mitotic and G1 abnormalities through homologous recombination. _Nat. Commun._ 8, 525 (2017).


Article  ADS  PubMed  PubMed Central  Google Scholar  * Lambuta, R. A. et al. Whole-genome doubling drives oncogenic loss of chromatin segregation. _Nature_ 615, 925–933 (2023). Article  ADS


  CAS  PubMed  PubMed Central  Google Scholar  * Gemble, S. et al. Genetic instability from a single S phase after whole-genome duplication. _Nature_ 604, 146–151 (2022). Article  ADS  CAS 


PubMed  PubMed Central  Google Scholar  * Willis, N. A. et al. Mechanism of tandem duplication formation in BRCA1-mutant cells. _Nature_ 551, 590–595 (2017). Article  ADS  CAS  PubMed 


PubMed Central  Google Scholar  * Pageau, G. J., Hall, L. L., Ganesan, S., Livingston, D. M. & Lawrence, J. B. The disappearing Barr body in breast and ovarian cancers. _Nat. Rev.


Cancer_ 7, 628–633 (2007). Article  CAS  PubMed  Google Scholar  * Chaligné, R. et al. The inactive X chromosome is epigenetically unstable and transcriptionally labile in breast cancer.


_Genome Res._ 25, 488–503 (2015). Article  PubMed  PubMed Central  Google Scholar  * Hao, Y. et al. Integrated analysis of multimodal single-cell data. _Cell_ 184, 3573–3587.e29 (2021).


Article  PubMed  PubMed Central  Google Scholar  * Kowalczyk, M. S. et al. Single-cell rna-seq reveals changes in cell cycle and differentiation programs upon aging of hematopoietic stem


cells. _Genome Res_ 25, 1860–72 (2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Shi, H. et al. Allele-specific transcriptional effects of subclonal copy number alterations


enable genotype-phenotype mapping in cancer cells. _Nat. Commun._ 15, 2482 (2024). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Donaldson, K. L., Goolsby, G. L. & Wahl,


A. F. Cytotoxicity of the anticancer agents cisplatin and taxol during cell proliferation and the cell cycle. _Int. J. Cancer_ 57, 847–855 (1994). Article  CAS  PubMed  Google Scholar  *


Bielski, C. M. et al. Genome doubling shapes the evolution and prognosis of advanced cancers. _Nat. Genet._ 50, 1189–1195 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  *


McPherson, A. et al. Ongoing genome doubling promotes evolvability and immune dysregulation in ovarian cancer. _bioRxiv_


http://biorxiv.org/content/early/2024/07/15/2024.07.11.602772.abstract (2024). * Bertolin, A. P., Hoffmann, J. S. & Gottifredi, V. Under-replicated DNA: The byproduct of large


genomes?_Cancers (Basel)_12 (2020). * Moreno, A. et al. Unreplicated DNA remaining from unperturbed S phases passes through mitosis for resolution in daughter cells. _Proc. Natl Acad. Sci.


USA_ 113, E5757–64 (2016). Article  CAS  PubMed  PubMed Central  Google Scholar  * Matthews, H. K., Bertoli, C. & de Bruin, R. A. M. Cell cycle control in cancer. _Nat. Rev. Mol. Cell


Biol._ 23, 74–88 (2022). Article  CAS  PubMed  Google Scholar  * Achom, M. et al. A genetic basis for cancer sex differences revealed in Xp11 translocation renal cell carcinoma. _bioRxiv_


https://www.biorxiv.org/content/early/2023/08/06/2023.08.04.552029 (2023). * López, S. et al. Interplay between whole-genome doubling and the accumulation of deleterious alterations in


cancer evolution. _Nat. Genet._ 52, 283–293 (2020). Article  PubMed  PubMed Central  Google Scholar  * Ha, G. et al. Integrative analysis of genome-wide loss of heterozygosity and


monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer. _Genome Res_ 22, 1995–2007 (2012). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Salehi, S. et al. Cancer phylogenetic tree inference at scale from 1000s of single cell genomes. _Peer Community J._ 3,


https://peercommunityjournal.org/articles/10.24072/pcjournal.292/ (2023). * Kaufmann, T. L. et al. MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution. _Genome


Biol._ 23, 241 (2022). Article  CAS  PubMed  PubMed Central  Google Scholar  * Ranganath, R., Gerrish, S. & Blei, D.Kaski, S. & Corander, J. (eds) Black Box Variational Inference.


(eds Kaski, S. & Corander, J.) _Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics_, Vol. 33 of _Proceedings of Machine Learning Research_,


814–822 (PMLR, Reykjavik, Iceland, 2014). https://proceedings.mlr.press/v33/ranganath14.html. * Medina-Martínez, J. S. et al. Isabl platform, a digital biobank for processing multimodal


patient data. _BMC Bioinforma._ 21, 549 (2020). Article  Google Scholar  * Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. _Nat.


Methods_ 9, 179–181 (2012). Article  CAS  Google Scholar  * Weiner, A. & McPherson, A. Inferring replication timing and proliferation dynamics from single-cell dna sequencing data.


_Github_ shahcompbio/scdna_replication_tools https://doi.org/10.5281/zenodo.13126319 (2024). Download references ACKNOWLEDGEMENTS This project was generously supported by the Cycle for


Survival, by the Marie-Josée and Henry R. Kravis Center for Molecular Oncology and the NCI Cancer Center Core grant P30-CA008748 supporting Memorial Sloan Kettering Cancer Center. S.P.S.


holds the Nicholls Biondi Chair in Computational Oncology and is a Susan G. Komen Scholar (#GC233085). This work was also funded in part by awards to S.P.S.: the Cancer Research UK Grand


Challenge Program (GC-243330), and an NIH RM1 award (RM1-HG011014). A.C.W. is supported by NCI Ruth L. Kirschstein National Research Service Award for Predoctoral Fellows F31-CA271673.


M.J.W. is supported by NCI Pathway to Independence award K99-CA256508. S.S. is supported by NCI Pathway to Independence award K99-CA277562 I.V.-G. is supported by a Mentored Investigator


Award from the Ovarian Cancer Research Alliance (650687). AUTHOR INFORMATION Author notes * These authors contributed equally: Sohrab P. Shah, Andrew McPherson. AUTHORS AND AFFILIATIONS *


Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA Adam C. Weiner, Marc J. Williams, Hongyu Shi, Ignacio


Vázquez-García, Sohrab Salehi, Nicole Rusk, Sohrab P. Shah & Andrew McPherson * Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY,


USA Adam C. Weiner * Gerstner Sloan Kettering Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY, USA Hongyu Shi * Department of Molecular


Oncology, British Columbia Cancer, Vancouver, BC, Canada Samuel Aparicio * Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada Samuel


Aparicio Authors * Adam C. Weiner View author publications You can also search for this author inPubMed Google Scholar * Marc J. Williams View author publications You can also search for


this author inPubMed Google Scholar * Hongyu Shi View author publications You can also search for this author inPubMed Google Scholar * Ignacio Vázquez-García View author publications You


can also search for this author inPubMed Google Scholar * Sohrab Salehi View author publications You can also search for this author inPubMed Google Scholar * Nicole Rusk View author


publications You can also search for this author inPubMed Google Scholar * Samuel Aparicio View author publications You can also search for this author inPubMed Google Scholar * Sohrab P.


Shah View author publications You can also search for this author inPubMed Google Scholar * Andrew McPherson View author publications You can also search for this author inPubMed Google


Scholar CONTRIBUTIONS A.C.W., S.P.S., and A.M. conceived the study and wrote the manuscript. A.C.W. developed the machine learning model and led computational analysis. M.J.W., H.S., I.V.G.,


and S.S. assisted with data acquisition, analysis, or interpretation. N.R. assisted with editing the manuscript. S.A. preserved the original data on which the paper is based and minimized


obstacles to sharing materials. S.P.S. and A.M. jointly supervised this work. CORRESPONDING AUTHORS Correspondence to Sohrab P. Shah or Andrew McPherson. ETHICS DECLARATIONS COMPETING


INTERESTS S.P.S. and S.A. are shareholders of Imagia Canexia Health Inc. S.P.S. has an advisory role to AstraZeneca Inc. H.S. is currently employed by Benchling. The remaining authors


declare no competing interests. All relationships are outside the scope of this work. PEER REVIEW PEER REVIEW INFORMATION _Nature Communications_ thanks Kamila Naxerova and the other,


anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with


regard to jurisdictional claims in published maps and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION PEER REVIEW FILE DESCRIPTION OF ADDITIONAL SUPPLEMENTARY


FILES SUPPLEMENTARY DATA 1 SUPPLEMENTARY DATA 2 REPORTING SUMMARY RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons


Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give


appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission


under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons


licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by


statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit


http://creativecommons.org/licenses/by-nc-nd/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Weiner, A.C., Williams, M.J., Shi, H. _et al._ Inferring replication timing


and proliferation dynamics from single-cell DNA sequencing data. _Nat Commun_ 15, 8512 (2024). https://doi.org/10.1038/s41467-024-52544-7 Download citation * Received: 29 January 2024 *


Accepted: 11 September 2024 * Published: 01 October 2024 * DOI: https://doi.org/10.1038/s41467-024-52544-7 SHARE THIS ARTICLE Anyone you share the following link with will be able to read


this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative