Play all audios:
Most proteins assemble into multisubunit complexes1. The persistence of these complexes across evolutionary time is usually explained as the result of natural selection for functional
properties that depend on multimerization, such as intersubunit allostery or the capacity to do mechanical work2. In many complexes, however, multimerization does not enable any known
function3. An alternative explanation is that multimers could become entrenched if substitutions accumulate that are neutral in multimers but deleterious in monomers; purifying selection
would then prevent reversion to the unassembled form, even if assembly per se does not enhance biological function3,4,5,6,7. Here we show that a hydrophobic mutational ratchet systematically
entrenches molecular complexes. By applying ancestral protein reconstruction and biochemical assays to the evolution of steroid hormone receptors, we show that an ancient hydrophobic
interface, conserved for hundreds of millions of years, is entrenched because exposure of this interface to solvent reduces protein stability and causes aggregation, even though the
interface makes no detectable contribution to function. Using structural bioinformatics, we show that a universal mutational propensity drives sites that are buried in multimeric interfaces
to accumulate hydrophobic substitutions to levels that are not tolerated in monomers. In a database of hundreds of families of multimers, most show signatures of long-term hydrophobic
entrenchment. It is therefore likely that many protein complexes persist because a simple ratchet-like mechanism entrenches them across evolutionary time, even when they are functionally
gratuitous.
Data have been deposited in the Open Science Framework (https://osf.io/) under accession GTJ86, including alignment, phylogeny, sequences and posterior probability of ancestral
reconstructions; list of PDB identifiers for coordinates of dimers and monomers in our structural database; and molecular dynamics trajectories.
Scripts and code for structural bioinformatics analysis have been deposited at github (https://github.com/JoeThorntonLab).
We thank J. Bridgham for cell culture training and advice, A. Pillai for assistance with experiments, and members of the Thornton Laboratory for comments. Molecular dynamics computations
were performed on resources provided by SNIC through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under Projects SNIC 2019/8-36 and SNIC 2019/3-189. Supported
by a Chicago Fellowship (G.K.A.H.), NIH R01GM131128 (J.W.T.) and R01GM121931 (J.W.T.).
Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
Georg K. A. Hochberg, Brian P. H. Metzger & Joseph W. Thornton
Department of Chemistry, Texas A&M University, College Station, TX, USA
Department of Chemistry – BMC, Uppsala University, Uppsala, Sweden
Department of Human Genetics, University of Chicago, Chicago, IL, USA
G.K.A.H. and J.W.T. conceived the project and oversaw the manuscript writing. G.K.A.H. performed phylogenetics, ancestral sequence reconstruction, protein purification, cell culture, and
biophysical experiments. Y.L. and A.L. performed and interpreted native MS experiments. E.G.M. performed and analysed molecular dynamics simulations. G.K.A.H. and B.P.H.M. designed
bioinformatic analyses, which G.K.A.H. performed. G.K.A.H. and J.W.T. interpreted all data. All authors contributed to manuscript writing.
Peer review information Nature thanks Douglas Theobald, Claus Wilke and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Phylogeny of steroid receptors and related nuclear receptor family members. AR, androgen receptors, PR, progestorone receptors, GR, gluccocortociod receptors, MR, mineralocortocoid
receptors. Sequence identifiers are in brackets. This topology corresponds to the ‘Chordate tree’ in Extended Data Fig. 2. Scale bar, expected substitutions per site. b, Sequence alignment
of the human ER and GR LBDs, with the MAP sequences of AncSR1 and AncSR2. Green, C-terminal extension. Most ERs contain additional sequence on the C terminus that is unalignable, even among
ERs.
a,b, Distribution of posterior probabilities (PP) of the maximum a posteriori (MAP) state at each site in reconstructed LBDs (top) and DBDs (bottom) of AncSR1 (a) and AncSR2 (b). c,
Stoichiometry of purified alternative LBD reconstructions (AltAll) of AncSR1 (pink) and AncSR2 (green), as measured by SEC-MALS. AncSR1 is a dimer, AncSR2 a monomer. AltAll reconstructions
contain the MAP state at unambiguously reconstructed sites and the state with the next highest PP at all ambiguously reconstructed wites. d, The ‘chordate’ phylogeny (top) was used for
primary ancestral reconstructions; it places the gene duplication yielding ERs and kSRs within the chordates. An alternative less parsimonious tree (‘Bilaterian’ because it places the
duplication deep in the Bilateria, bottom), has very slightly higher likelihood but requires two additional gene losses (dashed lines). The Bilaterian topology was used for alternative
reconstructions (AltPhy). Node labels, approximate likelihood ratio test statistic and transfer bootstrap value. lnl, log-likelihood. e, Distribution of per-site posterior probabilities for
reconstructed LBDs on the Bilaterian topology for AncSR1 (top) and AncSR2 (bottom). f, Stoichiometry of purified AltPhy versions of AncSR1 (pink) and AncSR2 (green) LBDs, as measured by
SEC-MALS. The average molar mass and elution time of AltPhy-AncSR1-LBD are between that of a dimer and a monomer, indicating that it is a fast-exchanging, weaker dimer than other AncSR1-LBD
versions.
a, Activation of AncSR1 from 40 ng ERE response element plasmid as a function of the AncSR1 plasmid concentration. Grey bar, concentration at which assays in Fig. 2f were performed. b, Molar
fraction in the dimeric form measured by nMS as a function of LBD concentration for AncSR1-LBD (purple) and dimerization-interface mutants SR1-LBD(+3) (black) and SR1-LBD(L184E) (grey).
Dissociation constant (Kd) estimated by nonlinear regression is indicated next to each curve. c, Dimeric fraction as a function of LBD concentration for AncSR1-LBD (purple) and
activation-helix mutant SR1-LBD(L126Q) (grey), which affects activation but not dimerization.
a, SEC of AncSR2 LBD (top) and mutants that delete the CTE (ΔCTE) or contain point mutations that impair CTE-LBD interactions (bottom), when fused to MBP. The mutants elute in the same
fraction as AncSR2, demonstrating that they are monomeric and that re-exposing the patch does not re-establish dimerization. b, TEV cleavage of AncSR2 mutants in the absence (left) and
presence (right) of 2% Triton X-100. The positions of bands corresponding to the uncleaved construct, cleaved MBP, cleaved LBD, and TEV protease are indicated. This experiment was performed
twice, with similar results. See Supplementary Fig. 1 for uncropped gels. c, Average root mean square deviation (r.m.s.d.) from replicate 2-μs molecular dynamics simulations of AncSR2-LBD
(WT) and ΔCTE mutant. The average Cα r.m.s.d. in pairwise comparisons of all simulations is shown as a heatmap. d, SEC-MALS trace of AncSR1-LBD fused to the CTE of AncSR2-LBD. The LBD is
still dimeric.
a, Difference between the fraction of residues that are hydrophobic in dimer interfaces versus that on solvent-exposed surfaces of the same proteins. The histogram shows the distribution of
this difference across every protein in our structural database. b, Fraction of hydrophobic residues in dimer interfaces as a function of the number of interface residues. The variation in
the fraction is caused mostly by very small interfaces. c, Expected equilibrium fraction of hydrophobic amino acids from mutation alone. Black: expectation based on GC content and the
genetic code. Red dots and lines: mean and standard deviation of the hydrophobic fraction of residues observed in 200 replicate simulations using mutational spectra from mutation
accumulation experiments (Fig. 4b), plotted against GC content of the organism tested. d, GC content of organisms represented by proteins in our database.
Supplemental Data: 1 Raw gel images. Uncropped gels for data presented in Extended Data Figure 4b. Boxes are drawn around lanes that were used in for the figure. Supplemental Data: 2 Scaled
Q matrices based on mutation accumulation experiments. Row indicates the initial state, column the mutated state. a, M. musculus. b, S. cerevisiae. c, E.coli. d, P aeruginosa.
Anyone you share the following link with will be able to read this content: