Disentangling the genetic basis of rhizosphere microbiome assembly in tomato

Play all audios:

ABSTRACT Microbiomes play a pivotal role in plant growth and health, but the genetic factors involved in microbiome assembly remain largely elusive. Here, we map the molecular features of

the rhizosphere microbiome as quantitative traits of a diverse hybrid population of wild and domesticated tomato. Gene content analysis of prioritized tomato quantitative trait loci suggests

a genetic basis for differential recruitment of various rhizobacterial lineages, including a _Streptomyces_-associated 6.31 Mbp region harboring tomato domestication sweeps and encoding,

among others, the iron regulator FIT and the water channel aquaporin SlTIP2.3. Within metagenome-assembled genomes of root-associated _Streptomyces_ and _Cellvibrio_, we identify bacterial

genes involved in metabolism of plant polysaccharides, iron, sulfur, trehalose, and vitamins, whose genetic variation associates with specific tomato QTLs. By integrating ‘microbiomics’ and

quantitative plant genetics, we pinpoint putative plant and reciprocal rhizobacterial traits underlying microbiome assembly, thereby providing a first step towards plant-microbiome breeding

programs. SIMILAR CONTENT BEING VIEWED BY OTHERS IDENTIFYING PLANT GENES SHAPING MICROBIOTA COMPOSITION IN THE BARLEY RHIZOSPHERE Article Open access 16 June 2022 AN INTEGRATED

TRANSCRIPTOME, METABOLOME, AND MICROBIOME DATASET OF _POPULUS_ UNDER NUTRIENT-POOR CONDITIONS Article Open access 30 April 2025 HERITABLE MICROBIOME VARIATION IS CORRELATED WITH SOURCE

ENVIRONMENT IN LOCALLY ADAPTED MAIZE VARIETIES Article 21 March 2024 INTRODUCTION Root and shoot microbiomes are fundamental to plant growth and plant tolerance to (a)biotic stress factors.

The outcome of these beneficial interactions is the emergence of specific microbiome-associated phenotypes (MAPs)1, such as drought resilience2, disease resistance3, development4, and

heterosis (i.e., hybrid vigor)5. The microbes inhabiting the surface or internal tissues of plant roots are selectively nurtured by diverse plant-derived compounds in the form of primary and

secondary metabolites6,7. Microbes reciprocate by supporting plant growth and producing metabolites that mediate processes such as nutrient acquisition and pathogen suppression8,9.

Developing a blueprint of the genetic architecture for this ‘chemical dialog’ and how these interactions lead to specific MAPs is one of the key focal points in current plant microbiome

research. The promise is that these genomic and chemical blueprints can be integrated into crop breeding programs for a new generation of ‘microbiome-assisted’ crops that can rely, at least

in part, on specific members of the microbiome for stress protection, enhanced growth, and higher yields10. Selective breeding for yield-related traits has left a considerable impact on the

taxonomic and functional composition of modern crop microbiomes11,12. Wild plant relatives represent a ‘living library’ of diverse genetic traits that may have been lost during

domestication13. For example, recombinant inbred lines (RILs) of crosses between wild tomato relatives and modern tomato cultivars have been used to identify genetic loci controlling

important agronomic traits, including tolerance to abiotic14 and biotic stress15, as well as nutritional quality and flavor profiles16. To date, microbiome traits are not yet considered for

breeding purposes, except for specific quantitative MAPs such as the number of nodules in legume-rhizobia symbioses17. However, technological advances in sequencing now make it feasible to

treat microbiomes as quantitative traits for selection. Quantitative approaches to map the microbiome as a phenotype have been adopted to investigate the phyllosphere microbiome and,

recently, for the _Arabidopsis_ and sorghum rhizosphere microbiomes18,19. However, actualizing microbiome features into breeding programs at a scale for crop improvement has not yet been

realized. In fact, for most plant species, investigations leveraging diverse plant populations to map microbiome-associated quantitative trait loci (QTL) are still in their infancy18,19,20.

In these recent studies, the microbiomes were characterized by amplicon sequencing to detect loci involved in alpha and beta diversity as well as individual OTU abundances21. These studies

provide strong evidence that microbiome recruitment has a genetic component, but the functional nature of the corresponding plant–microbe interactions cannot be reliably elucidated from

amplicon data. Hence, functional genomic features of the microbiome, as well as intraspecific diversity within microbial species, have not yet been taken into account in QTL analyses22.

Here, we use both amplicon and shotgun metagenome sequencing to generate taxonomic as well as functional microbiome features as quantitative traits. Using an extensive RIL population of a

cross between modern _Solanum lycopersicum_ var. Moneymaker and wild _Solanum pimpinellifolium_23, we identify reciprocal associations between specific plant and microbiome traits and infer

putative mechanisms for rhizosphere microbiome assembly. Using the modern allele as a reference, we find QTLs for numerous taxonomic and metagenomic features of the microbiome with both

positive and negative effects. We observe more positive effects related to increases in microbiome feature abundance for the modern reference allele compared to the wild reference allele,

suggesting that domestication has had a significant impact on rhizosphere microbiome assembly. We identify plant traits related to growth, stress, amino acid metabolism, iron and water

acquisition, hormonal responses, and terpene biosynthesis, whereas the microbial traits we identify are related to the metabolism of plant cell wall polysaccharides, vitamins, sulfur, and

iron. Furthermore, we show that amplicon-based approaches allow detection of QTLs for rarer microbial taxa, whereas shotgun metagenomics allowed mapping to smaller and thus more defined

plant genomic regions. Together, these results demonstrate the power of an integrated approach to disentangle and prioritize specific genomic regions and genes in both plants and microbes

associated with microbiome assembly. RESULTS BASELINE ANALYSES OF THE TOMATO RECOMBINANT INBRED LINE POPULATION Prior to detailed metagenome analyses of the microbiome of the tomato RIL

population, we first investigated whether QTLs previously identified in the same RIL population under sterile in vitro conditions could be replicated in our experiment conducted under

greenhouse conditions with a commercial tomato greenhouse soil (Fig. 1a, b and Supplementary Data 1)24. We identified QTLs for shoot dry weight (SDW) coinciding with a QTL identified

previously on chromosome 924. Similarly, we identified QTLs for rhizosphere mass (RM), defined here as a the total mass of the roots with tightly adhering soil, which coincides with root

trait QTLs previously identified for lateral root number, fresh and dry shoot weight, lateral root density per branched zone and total root size (Fig. 1b)24. An analysis of variance (ANOVA)

yielded significant variation in SDW based on the additivity of alleles linked to SDW (zero, one, or two alleles) (F(2, 186) = 16.02, _p_ = 3.76 e–07) (Fig. 1c, d). A post hoc Tukey test

further demonstrated significant differences between all pairwise comparisons (_p_ < 0.05). For RM, an ANOVA yielded a significant difference (F(2, 186) = 16.02, _p_ = 3.76 e–07); a post

hoc Tukey test demonstrated a statistically significant difference only between the presence of either one or two alleles (_p_ < 0.05), but did not support additivity (_p_ = 0.15) (Fig.

1e, f). Collectively, our results confirm and extend earlier work conducted on the same tomato RIL population in vitro24, providing a solid basis for QTL mapping of taxonomic and genomic

features of the rhizosphere microbiome TAXONOMIC MICROBIOME FEATURES AS QUANTITATIVE TRAITS To investigate molecular features of the microbiome as quantitative traits, we conducted 16S rRNA

gene amplicon sequencing of 225 rhizosphere samples, including unplanted bulk soil, parental tomato genotypes, and all 96 RIL accessions in duplicate (BioProject ID PRJNA787039). We observed

separation between the microbiomes of rhizosphere and bulk soil, between the microbiomes of the two parental tomato genotypes, and the RIL accession microbiomes (Fig. 2a). To limit multiple

testing and to focus on common microbiome features with sufficient coverage across all accessions, we prioritized the rhizosphere-enriched amplicon sequence variants (ASVs) to those present

in 50% or more of the RIL accessions (Fig. 2b). A QTL analysis with these prioritized ASVs was run with R/qtl225 using a high-density tomato genotype map26, harvest date, post-harvest total

bulk soil mass, RM, number of leaves at harvest, and SDW as covariates. We identified 48 QTL peaks, across 45 distinct loci, significantly associated with 33 ASVs (Supplementary Data 6).

Our logarithm of the odds (LOD) thresholds for significance had been determined by pooled permutations from all ASVs to attain a genome-wide threshold of _P_ 0.05 (LOD 3.35) and _P_ 0.2 (LOD

2.64). The modern allele was set at reference, such that negative effects were relatively more associated with the wild allele and positive effects with the modern allele. Of the

significant QTLs, 16 were microbiome features less abundant compared to the reference allele, whereas 32 were microbiome features more abundant in presence of the modern reference allele.

The QTLs on chromosomes 11, 10, 8, and 2 were associated with increases in abundance in presence of the modern reference allele. In contrast, the sole QTL on chromosome 7 was negative

relative to the reference. All other chromosomes contained a mix of QTLs with positive and negative effects on ASV abundance relative to the reference allele (Fig. 3a). While many

rhizobacterial lineages were linked to a single QTL (14 out of 25 unique taxonomies), others were linked to two or more QTLs (7 and 4 taxa, respectively) (Fig. 3b). Of the lineages with

multiple QTLs, most were positive relative to the reference allele. One salient exception was _Methylophilaceae_, with a total of 9 QTLs that were both positive and negative relative to the

reference and distributed across chromosomes 3 (positive, x2), 4 (positive), 7 (negative), 11 (positive x2), and 12 (negative x3) (Fig. 3c). Another salient feature of the QTL analysis was

the hotspot for microbiome assembly identified on chromosome 11, including a significant linkage with ASVs from _Adhaeribacter, Caulobacter, Devosia_, Rhizobiaceae_, Massilia,_ and

_Methylophilaceae_ (Fig. 3c). In addition to individual ASVs, we investigated diversity metrics as quantitative traits using Shannon index and principal coordinate analysis (PCoA) with

Bray–Curtis dissimilarity. For each approach, we calculated diversity statistics first using all ASVs with a relative abundance greater than the effective samples size27, and second using

the rhizosphere-enriched ASVs present in 50% or more of the RIL accessions. For the Shannon index, LOD thresholds for significance were determined by permutations to attain a genome-wide

threshold of _P_ 0.05 (LOD 3.27) and _P_ 0.2 (LOD 2.63). Two QTLs were identified on chromosomes 1 and 3 (Supplementary Figs. 1 and 2) using all, and prioritized, ASVs to calculate Shannon

Diversity respectively. Of note, the QTL on chromosome 1 overlaps with the confidence interval of the _Cellvibrio_ QTL highlighted later in the results section. For the PCoA, the first two

components were mapped as quantitative traits. A LOD threshold for significance was determined by permutations to attain a genome-wide threshold of _P_ 0.05 (LOD 3.41) and _P_ 0.2 (LOD

2.71). A single QTL was identified on chromosome 6 in the same position as the QTL identified previously for _Streptomyces_ ASV 5 (Supplementary Fig. 3). Of further interest is that all

diversity metric QTLs were negative relative to the reference. Thus, while genetic changes during domestication may have made some ASVs more or less abundant, these genetic changes also

impacted overall diversity. Given the non-independence of sequencing-based microbiome features, we suggest caution in interpreting the results of using diversity metrics as microbiome

features. Effect size is an important factor when mapping the genetic architecture of quantitative traits. While some QTLs have large effect sizes, many small effect QTLs may explain a large

proportion of trait variation28. To date, there is little understanding of the distribution of the effect sizes of QTLs for microbiome features. Here we show that the absolute values of the

effect sizes of the 48 QTLs on ASV relative abundance ranged from 1.3 to 17%, with an average effect size of approximately 5%, comparable to the effects seen for SDW and RM (Fig. 1c, e).

The largest QTL effects were positive for an ASV in the genus _Qipengyuania_ (17%), and an ASV in _Edaphobaculum_ (10%). However, no statistical difference was found between the absolute

value of positive and negative effect sizes (_p_ = 0.78, two-tailed _t_-test). Furthermore, for those lineages with sufficient representation at the class level (Bacteroidia,

Alphaproteobacteria, and Gammaproteobacteria), there was no statistically significant difference between effect size (F(3, 16) = 0.072, _p_ = 0.974). However, an ANOVA on the positive effect

size at genus level demonstrated significant differences between lineages (F(3, 16) = 12.94, _p_ = 1.15 e−04). A post hoc Tukey test demonstrated QTLs for _Massilia_ with a larger positive

effect size than other lineages with sufficient sample size for comparison (Fig. 3d). Collectively, our amplicon analysis provided a broad picture, suggesting that the assembly of bacteria

in the tomato rhizosphere is a complex trait governed by a combination of multiple loci, some being ASV specific, some being pleiotropic for different ASVs, and with heterogenous effect

sizes on ASV abundance (Fig. 3d). While QTLs were identified with both positive and negative effects relative to the reference modern allele, the large number of positive effects suggests

domestication impacted rhizosphere microbiome assembly. FUNCTIONAL MICROBIOME FEATURES AS QUANTITATIVE TRAITS To understand the functional traits associated with rhizosphere microbiome

assembly, we generated shotgun metagenomes for the rhizosphere microbiome of each accession in the tomato RIL population (96 total), as well as six samples of the modern tomato parent, five

samples of the wild tomato parent and seven bulk soil samples (BioProject ID PRJNA789467). After pre-processing, a co-assembly strategy using all metagenomes was implemented (see

Supplementary Methods section 4.2.2 for more detail). Subsequently, bin and contig abundances were determined by read depth using CSS normalization, a computational method to adjust for

compositional bias27. QTL mapping was conducted for the rhizosphere-enriched contig and bin abundances. A PCoA analysis of the contigs demonstrated separation between the bulk soil and RIL

rhizosphere microbiomes (Supplementary Fig. 9). Binning was done using Metabat2 (version 2:2.15)29 and genomic quality of the output was evaluated by CheckM30 (Supplementary Data 7). The

bins and assembled contigs larger than 10 kb are publicly available (https://doi.org/10.5281/zenodo.6561541). All contigs of 10 kb and larger were taxonomically assigned using Kraken31

(Supplementary Data 8). With nearly 40 million contigs being assembled, the effects of multiple testing were reduced by prioritizing rhizosphere-enriched contigs (relative to the bulk soil)

which were larger than 10 kb and with an enrichment greater than 4-fold. After these stringent prioritization steps, 1249 contigs were remaining. The functional potential of these

rhizosphere-enriched contigs represented 8.3% of protein clusters identified in all contigs greater than 10 kb by MMseqs2 using a 50% protein identity threshold32. Approximately 25% of all

proteins were contained within these clusters, suggesting that a considerable fraction of functional diversity was maintained during the prioritization. Only bins with greater than 90%

completion and less than 5% contamination were mapped (33 out of 588 bins). As with the ASVs, harvest date, bulk soil mass, RM, number of leaves at harvest, and SDW were used as covariates

in QTL mapping. We identified 7 significant bin QTLs (LOD > 3.40, _P_ < 0.05) (Supplementary Data 9) including _Streptomyces_ bin 72 with a positive effect on tomato chromosomes 6 and

11. For the contigs, a total of 717 QTLs at 26 unique positions on tomato chromosomes 1, 4, 5, 6, 9, and 11 were identified (Supplementary Data 10), corresponding to 476 metagenomic contigs

from 10 different genera (LOD > 3.47, _P_ < 0.05). The largest number of contig QTLs were linked to the _Streptomyces_, _Cellvibrio,_ and _Sphingopyxis_ lineages (Fig. 4a). The

_Streptomyces_ contigs mapped to QTLs on tomato chromosomes 4 (46 contigs, negative), 6 (190 contigs, positive), and 11 (257 contigs, positive), with a subset of contigs mapping to two or

all three of these positions (Fig. 4b). These findings corroborate and expand upon the _Streptomyces_ QTL identified on chromosome 6 using our 16S rRNA gene amplicon data, as well as that of

the bin QTLs identified on chromosomes 6 and 11. The _Cellvibrio_ contigs mapped to chromosome 1 (42 contigs, negative) and chromosome 9 (94 contigs, negative), again corroborating the

findings from our 16S rRNA gene amplicon analysis described above. In contrast, the _Sphingopyxis_ QTLs identified on chromosome 5 (24 contigs, negative) and 9 (49 contigs, positive) did not

correspond to the QTLs identified on chromosomes 8 and 3 in the 16S rRNA gene amplicon analysis. Four contigs for _Devosia_ also corroborated the results of the 16S QTL analysis. The effect

sizes ranged from 9 to 21% and were significantly different (F(14, 702) = 530.9 _p_ < 2e−16) between QTL and lineages (Fig. 4c). As with the 16S rRNA amplicon analysis, some of the

highest LOD scores were for _Devosia_. Also, the effect size of the _Sphingopyxis_ contigs was large (±20% on average), above 15% for _Cellvibrio_, and approximately 10% for _Streptomyces_.

The average QTL region was 51.59 Mbps for the 16S rRNA gene amplicon sequences and 26.64 Mbps for the metagenomic contigs (two-sided _t_-test, _p_ = 3.32E−09) (Fig. 4e). A more striking

contrast was observed in the difference between the median size of amplicon and contig QTL regions which were 58.56 Mbp and only 6.47 Mbp, respectively. In summary, while many more taxa were

identified in the amplicon-based QTL analysis, the metagenome-based QTL analysis provided QTLs with much smaller confidence intervals (Fig. 4e). AMPLICON-BASED BULK SEGREGANT ANALYSIS OF

_STREPTOMYCES_ AND _CELLVIBRIO_ ABUNDANCE The two most abundant rhizosphere taxa with replicated patterns for amplicon and metagenome-based QTLs were _Streptomyces_ and _Cellvibrio_.

Therefore, we sought to provide additional independent support for these QTLs using a bulk segregant analysis of an independent population of parental and RIL genotypes (Supplementary Data

11). In particular, we tested the previously identified amplicon-based QTLs associated with higher _Cellvibrio_ abundance at markers 464 and 3142 on chromosomes 1 and 9, respectively with

higher _Streptomyces_ abundance at marker 2274 on chromosome 6 (Fig. 5). In each case, ANOVA showed a statistical difference between genotypes and bulk soil, respectively (F(4, 396) = 21.56,

_p_ = 4.16 e−16), (F(4, 396) = 18.43, _p_ = 6.68 e−14), (F(4, 396) = 8.423, _p_ = 1.57 e−06). A post hoc Tukey HSD test supported the conclusion that wild allele at markers 464 and 3142 on

chromosomes 1 and 9, respectively, are indeed associated with increased abundance _Cellvibrio_ (_p_ = 3.913 e−04, and _p_ = 0.08, respectively), while the modern allele at markers 2274 on

chromosome 6 was significantly associated with increased abundance of _Streptomyces_ (_p_ = 1.152 e−04). HOST GENETICS AND RHIZOSPHERE MICROBIOME ASSEMBLY A subset of 5 regions consistent

across both the amplicon and metagenome-based analyses were prioritized with an average size of 2.68 Mbps (Supplementary Data 12). These included positions on chromosome 1 (positions

87.36–90.49 Mbps), chromosome 9 (pos 62.03–63.32 Mbps), chromosome 5 (pos 61.54–63.38), chromosome 6 (pos 33.99–40.3 Mbps), and chromosome 11 (pos 53.06–53.89 Mbps). In total, 1359 genes

were identified in these regions. Potential candidate genes with root-specific transcriptional patterns, defined as a 4 fold increase in the roots compared to leaf samples, were further

prioritized using a publicly available RNA-seq dataset33. Based on this analysis, a subset of 192 root specific plant genes were identified (Supplementary Data 13). A total of 98 root

specific plant genes were linked to _Streptomyces_ on chromosome 6 (84 genes) and 11 (14 genes) (Fig. 6). Intriguingly, 61 of these genes were found in regions previously identified to be

subjected to selective sweeps, regions of fixed low genetic diversity, related to tomato domestication as well as to subsequent sweeps related to improvements in fruit quality34

(Supplementary Fig. 4). While it remains unclear whether the relationship between selective sweeps and changes in microbial feature abundance is causal or coincidental; here we reveal a

genomic signature that the domestication process impacted alleles involved in microbiome assembly. Two of the most salient genes in this list included genes with high transcription in the

roots; an aquaporin and a Fer-like iron deficiency-induced transcription factor (FIT). The aquaporin (SlTIP2.3) has the highest fold change of all tonoplast intrinsic proteins in tomato

roots as compared to all other organs32,33, while the FIT gene is a bHLH transcriptional regulator controlling iron homeostasis in tomato34,35. Other genes within this region on chromosome 6

include a glycine rich protein, a receptor-like kinase known to be upregulated during drought36, alcohol dehydrogenase, numerous phosphatases, expansins, ethylene-responsive transcription

factors, gibberellin receptors, aminocyclopropane-1-carboxylate oxidase (ACO), an enzyme involved in the last step of ethylene biosynthesis, and finally, alpha-humulene and

(-)-(E)-beta-caryophyllene, a known tomato terpene and signaling molecule in tomato37,38 and also acting as a volatile in microbiome assembly39. Root specific genes involved in carbohydrate,

protein, and amino metabolism were also identified, including trypsin-alpha amylase inhibitor, prolyl 4-hydroxylase, polygalacturonase, trehalose phosphatase, glycogenin, xyloglucan

fucosyltransferase, and a metallocarboxypeptidase inhibitor, spermidine synthase, acetolactate synthases, alanine aminotransferase, and an amino acid permease. On chromosome 11, a

ferrodoxin, an aluminum-activated malate transporter40, and a cluster of various acetyltransferases and a sulfotransferase were identified. An aluminum-activated malate transporter was also

identified in the QTL region on chromosome 6, which has been linked to increased malate accumulation in both fruit and roots41. A total of 57 root specific genes were identified in the QTL

regions on chromosome 1 and 9 linked to _Cellvibrio_. These include a cytochrome p450 involved in coumarin synthesis, numerous extensins, phosphatases, respiratory burst oxidase-like

protein, iron chelator nicotianamine synthase42,43, and on chromosome 11 phenazine biosynthesis. On chromosome 5, 37 root specific genes were identified including multiple peroxidases,

glutamine synthetase, rhamnogalacturonate lyase, pectinesterase, metacaspase, and trehalose-phosphatase. Furthermore, numerous ethylene responsive transcription factors and receptor-like

kinases were observed. The QTL on chromosome 1 contains genome-wide sweeps associated with the initial tomato domestication and subsequent improvements of fruit quality traits, suggesting

that one or both of these events were connected to or act as a ‘side effect’ on the decreased abundance of _Cellvibrio_ in the tomato rhizosphere. ILLUMINATING METAGENOMIC TRAITS IN

_CELLVIBRIO_ AND _STREPTOMYCES_ To further investigate the potential functional importance of the 476 rhizosphere-enriched metagenomic contigs mapped as QTLs, we performed a deeper analysis

into their functional gene content (Supplementary Data 14, 15, and 16). An antiSMASH44 analysis identified 30 biosynthetic gene clusters (BGCs) across these contigs. These BGCs largely

originated from contigs taxonomically assigned to _Cellvibrio_ and _Streptomyces_. They included several gene clusters potentially associated with root colonization, such as two melanin BGCs

(c00216, NODE_5919; c00255, NODE_7250) from _Streptomyces_ (which have been positively associated with colonization45) and a _Cellvibrio_ aryl polyene BGC (c00185, NODE_4941), which is

thought to protect bacteria against reactive oxygen species generated during immune responses of the host plant46. The contigs also contained gene clusters potentially beneficial to the

host, such as BGCs encoding iron-scavenging siderophores, which have been associated with disease suppression in tomato47; specifically, homologs of coelichelin and desferrioxamine BGCs from

streptomycetes were found (c00269, NODE_7969, and c00122, NODE_3362), three IucA/IucC-like putative siderophore synthetase gene clusters (c00106, NODE_2973; c00041, NODE_1131; c00238,

NODE_6661), as well as a _Cellvibrio_ NRPS-PKS gene cluster (c00001, NODE_101) most likely encoding the production of a siderophore based on the presence of a TonB-dependent siderophore

receptor-encoding gene as well as a putative _tauD_-like siderophore amino acid β-hydroxylase-encoding gene48. The _Cellvibrio_ contigs also contain several genes relevant for carbohydrate

catabolism. For example, homologs of _xyl31a_ (B2R_23365) and _bgl35a_ (B2R_06825-06826) were detected (with 78%, 79 and 65% amino acid identity, respectively), genes that have been shown to

be responsible for utilization of the abundant plant cell wall polysaccharide xyloglucan in _Cellvibrio japonicus_49. In addition, a possible homolog of the β-glucosidase gene _bgl3D_50

(B2R_26663), involved in xyloglucan utilization, was also identified, having high similarity to _bgl3D_ from _Cellvibrio japonicus_ (64% amino acid identity). Also, putative

cellulose-hydrolizing enzymes were detected, such as a homolog (B2R_21082) of the cellobiohydrolase _cel6A_ from _Cellvibrio japonicus_51 encoded in a complex locus of nine

carbohydrate-acting enzymes annotated on this contig (NODE_5090) by DBCAN52 (Supplementary Data 14). Collectively, these results point to a possible role of microbial traits related to iron

acquisition and metabolism of plant polysaccharides in tomato rhizosphere microbiome assembly. Contigs of the metagenome-assembled genome (MAG) associated with _Streptomyces_ ASV5 (the key

taxon associated with tomato QTLs described above) contained a multitude of functional genes potentially relevant for host-microbe interactions. Taxonomically, the ASV5 MAG was most closely

related to a clade of streptomycetes that includes type strains of species such as _arenae_, _flavovariabilis_, _variegatus_, and _chartreusis_. To understand how tomato might differentially

recruit ASV5 streptomycetes, we analyzed the MAG for genes and gene clusters potentially involved in colonization. Intriguingly, we found contigs to be rich in genes associated with plant

cell wall degradation. In particular, we identified a family 6 glycosyl hydrolases (B2R_10154) of which the glycosyl hydrolase domain has 84% amino acid identity to that of the SACTE_0237

protein that was recently shown to be essential for the high cellulolytic activity of _Streptomyces_ sp. SirexAA-E31. Additionally, we detected a homolog (82% amino acid identity) of

_Streptomyces reticuli_ avicelase, a well-studied cellulase enzyme that degrades cellulose into cellobiose53 (B2R_29198). Larger gene clusters associated with degradation of plant cell wall

materials were also found. These included an 8 kb gene cluster coding for multiple pectate lyases and pectinesterases (B2R_31553-31558), and an 8 kb gene cluster encoding a family 43

glycosyl hydrolase, a pectate lyase L, a rhamnogalacturonan acetylesterase RhgT, a GDSL-like lipase/acylhydrolase, a family 53 glycosyl hydrolase, and an endoglucanase A (B2R_15915-15920).

Together, these findings suggest that ASV5 _Streptomyces_ has the capacity to effectively process complex organic materials shed by plant roots during growth. These results are in line with

a recent study on plant-associated streptomycetes that indicated that their colonization success appears to be associated with the ability to utilize complex organic material of plant

roots54. Root exudates also play a key role in the recruitment of microbes. Prominent sugar components of tomato root exudates are glucose, but also xylose and fructose55. The _Streptomyces_

MAG contains _xylA_ and _xylB_ genes (B2R_19014, B2R_19013) and a putative _xylFGH_ import system (B2R_29274, B2R_23438, B2R_23439) facilitating xylose metabolism. Similarly, a _frcBCA_

import system was identified in the genome (B2R_17966- B2R_17968) as well as a glucose permease (B2R_32780) with 91,5% amino acid identity to _glcP1_ SCO5578 of _Streptomyces coelicolor_

A3(2)56. Other genes putatively involved in root exudate catabolism were also found in the ASV5 MAG, such as sarcosine oxidase (_soxBAG_, B2R_20550-20551, and B2R_21105), which has been

shown to be upregulated in the presence of root exudates of various plants57,58. In summary, the _Cellvibrio_ and _Streptomyces_ contigs encoded a range of functions that likely allow them

to profit from tomato root exudates as well as complex organic material shed from growing tomato roots. How these plant traits differ between wild and domesticated tomatoes and if/how these

influence differential colonization of roots of wild and domesticated tomato lines by these two bacterial lineages will require detailed comparative metabolomic analyses of the root exudates

of both tomato lines as well as isolation of the corresponding _Cellvibrio_ and _Streptomyces_ ASVs, analysis of their substrate utilization spectrum followed by site-directed mutagenesis

of the candidate genes, root colonization assays and in situ localization studies. GENOMIC STRUCTURE IN _CELLVIBRIO_ AND _STREPTOMYCES_ PROVIDES INSIGHTS INTO ADAPTATIONS FOR DIFFERENTIAL

RECRUITMENT Bacterial populations often contain significant genomic heterogeneity. This heterogeneity may be associated with differential recruitment through altered nutrient preferences or

host colonization mechanisms. The use of metagenomics enabled us to investigate the population structure within each rhizobacterial lineage and identify intraspecific differences. To do so,

we first identified a unique set of 697,731 microbiome Single Nucleotide Variants (SNVs) in a subset of parental and bulk metagenomes using InStrain22. A set of 15,026 SNVs enriched in

either the wild or modern tomato rhizosphere were selected and the abundance of each allele at each SNV was calculated. Using these abundances, QTL mapping was performed using R/qtl2 as

described in the methods. A total of 3,357 QTL peaks were identified (LOD > 3.01, _P_ < 0.05), to 1229 independent loci. A total of 1354 QTL with positive effects and 2,001 QTL with

negative effects were identified, derived from 2,898 unique SNVs, and corresponding to 810 and 1068 unique rhizobacterial genes respectively (Supplementary Data 17). We investigated the 103

_Streptomyces_ SNV QTLs at 94 unique positions within annotated genes whose mapping coincided with the previously identified QTLs for _Streptomyces_ contigs to tomato chromosomes 4, 6, and

11 (Supplementary Data 17). Numerous _Streptomyces_ SNVs were associated positively with the reference tomato alleles on chromosomes 6 and 11. In particular, alpha-galactosidase (B2R_16136)

and arabinose import (B2R_29105) had the highest LOD and smallest overlapping confidence intervals with chromosomes 6 and 11 (Fig. 7). Indeed, many SNVs in genes involved in the degradation

of xylan59, one of the most dominant non-cellulosic polysaccharides in plant cell-walls60, as well as carbohydrate and protein metabolism were associated positively to QTL on chromosomes 6

and 11, including xyloglucanase Xgh74A (B2R_10589), alpha-xylosidase (B2R_23763), endo-1,4-beta-xylanase (B2R_20609), extracellular exo-alpha-L-arabinofuranosidase (B2R_20608), multiple

protease HtpX (B2R_19218), cutinase (B2R_19356), and putative ABC transporter substrate-binding protein YesO (B2R_09821) which has been implicated in the transport of plant cell wall

pectin-derived oligosaccharides61. A _Streptomyces_ SNV in acetolactate synthase (B2R_28001) was associated positively to QTL on tomato chromosome 6 where a plant acetolactate synthase was

located. Similarly, multiple SNVs in _Streptomyces_ genes involved in putrescine transportation (B2R_25489) were associated positively to QTL on tomato chromosomes 6 and 11, which contain

genes for spermine synthase, suggesting a possible metabolic cross-feeding from plant to microbe. A majority of these SNVs were synonymous having no effect on the produced amino acid

sequence. However, some were non-synonymous, resulting in an altered amino acid sequence, including the histidine decarboxylase SNV (B2R_16511) mapping to both tomato chromosomes 6 and 11

(Fig. 7). _Streptomyces_ SNVs that were associated negatively with the QTL on tomato chromosome 4 included an antibiotic resistance gene (daunorubicin/doxorubicin, B2R_28992) and

maltooligosyl trehalose synthase (B2R_07820) among others. Similarly, we investigated the 324 _Cellvibrio_ SNV QTLs within annotated genes whose mapping coincided with the previously

identified _Cellvibrio_ contig QTLs to chromosomes 1 and 9. Again, numerous SNV QTLs were identified in genes were related to sugar catabolism, including a gene encoding an extracellular

exo-alpha-(1->5)-L-arabinofuranosidase (B2R_16093), fructose import FruK (B2R_22268), a cellulase/esterase-encoding _celE_ homolog (B2R_11067), and genes involved in malate (B2R_18213),

mannonate (B2R_14081), xyloglucan (B2R_10668) and xylulose (B2R_22179) metabolism. Furthermore, many additional SNV QTL were identified in genes related to vitamin and cofactor metabolism as

well as sulfur and iron metabolism. In particular, these included genes for a phosphoadenosine phosphosulfate reductase (B2R_15720), vitamin B12 transporter BtuB (10 different genes, see

Supplementary Data 17), a siroheme synthase (B2R_24033), a pyridoxal phosphate homeostasis protein (B2R_17481), a heme chaperone HemW (B2R_12751), a hemin transport system permease protein

HmuU (B2R_09175), a Fe(2+) transporter FeoB (B2R_19968), a biotin synthase (B2R_30007), a catecholate siderophore receptor Fiu (B2R_17486), and a Fe(3+) dicitrate transport ATP-binding

protein Fec (B2R_09176) (Supplementary Data 17). Taken together, this analysis suggests that a shotgun metagenomic approach integrated with quantitative plant genetics can be instrumental in

a high-throughput manner to discover putative reciprocal genetic links between plant and microbial metabolisms, such as those identified here for polysaccharides, trehalose, iron, vitamin,

amino acid, and polyamine metabolism. DISCUSSION Breeding for microbiome-assisted crops is a daunting task, encompassing ecological, evolutionary, and cultural processes. What constitutes a

desirable trait for selection is context-dependent and differs between societies, crops, and locations62. As society grapples with modern challenges such as a rapidly changing environment,

water scarcity and land degradation, it is becoming increasingly clear that a new era of trait selection is needed with increased focus on sustainability and microbiome

interactions63,64,65,66. In this regard, it is also time to reckon with the consequences of historic yield-centric trait selection and accompanying genomic sweeps34, especially with regards

to plant–microbe interactions (Fig. 8a, b). Current approaches to investigating the genomic architecture determining microbiome assembly rely primarily on mutational studies in known genes

and pathways. More recently, studies leveraging the natural variation within plant populations have been used to conduct GWA and QTL of the leaf20,67 and rhizosphere18. To date, the

microbiome has been primarily characterized through amplicon sequencing, thereby providing limited functional resolution of microbiome structure. Increasing the resolution of phenotyping of

quantitative traits has been shown to improve the precision and detection of QTLs68. Thus, integrating microbial genomics into microbiome QTL analysis plays a dual purpose; increasing the

ecological resolution with which microbial traits may be mapped (e.g., at a community and population level, Fig. 8c), and second, affording the identification of the reciprocal microbial

adaptations that drive plant–microbe interactions (e.g., by using SNVs a microbiome features). In this investigation, we addressed these challenges by integrating amplicon and shotgun

metagenome sequencing to identify microbiome QTLs for the tomato rhizosphere. One major difference between the amplicon and contig QTL analysis is the number of lineages for which QTLs were

identified. Amplicon-based sequencing, which captures more rare taxa per unit sequencing, provided a broader taxonomic picture and was able to capture QTLs of both abundant and relatively

rare rhizobacterial lineages. In contrast, the majority of contig QTLs mapped to the most predominant lineages yet failed to identify QTLs for more rare lineages. Nevertheless, besides the

fact that the shotgun-based approach provided functional insights into the associated bacterial taxa, the size of the 95% confidence interval of the QTL region was significantly smaller

using contig QTLs, with a median size of just 6.47 Mbp compared to 58.56 Mbp for the amplicon-based QTL regions. Furthermore, for _Streptomyces_, the number of unique QTLs identified was

greater in the contig-based approach. Thus, we identified a trade-off between amplicon and shotgun-based technologies, whereby amplicon sequencing provides a deeper view into broad community

structure, whereas shotgun-based approaches provided a more nuanced picture. In particular, the smaller regions identified by our contig-based metagenome mapping provided considerably more

functional insights as it enabled us to analyze the genomic content contained in the regions linked to _Cellvibrio_ and _Streptomyces_. It is possible that less stringent prioritization

steps could be used to increase the number of metagenomic features identified, but this may also increase the false discovery rate. It should be noted that a limitation of the approaches

taken is that both amplicon and shotgun-based approaches produce non-independent measurements. Here we use CSS normalization, one of the top performing computational approaches to address

compositional bias69. Nevertheless, future approaches that provide community level absolute ASV abundances will further minimize compositionality of the microbiome data and likely perform

better when mapping microbiome features as QTLs. Extending these studies to the endophytic compartment and including metatranscriptome analyses may also further improve the identification of

microbiome features, provided that the endophytic microbiome can be separated well from the plant cells to obtain sufficient sequencing depth. The increased QTL mapping resolution provided

by shotgun-based phenotyping of the microbiome combined with SNV analysis provided an approach to leverage both the host diversity of the RIL and the natural microbiome population diversity

to disentangle the reciprocal genomic adaptions between plants and natural microbiomes (Fig. 8d). For example, understanding the forces driving the abundances of rhizospheric _Streptomyces_

is of increasing interest and has been linked to both iron70 and water limitations54. Here, we pinpointed the genetic basis for these interactions among the short list of highly expressed

root-specific tomato genes linked positively to _Streptomyces_ abundance including both aquaporin and FIT. More specifically, the aquaporin (SlTIP2.3) has the highest fold change of all

tonoplast intrinsic proteins in the tomato genome in the roots when compared to all other organs71,72, while the FIT gene has been shown to largely control iron homeostasis in tomato35,73.

Future experiments will focus on functional validation by, among others, transcriptome analyses and site-directed mutagenesis of the microbial and plant genes identified. In addition to

these high priority genes, many other key genes were identified in these regions. Those previously shown to contribute to microbiome assembly included 1-aminocyclopropane-1-carboxylate

oxidase, which plays a central role in plant regulation of various processes including bacterial colonization and root elongation74 and alpha-humulene/(-)-(E)-beta-caryophyllene synthase, a

terpene known to modify microbiome structure39. In addition, numerous genes related to growth, development, and cell wall loosening75 known to be involved in microbial colonization76 and

aluminum-activated malate transporter, which has been linked to microbiome-mediated abiotic stress tolerance40 and selected during tomato domestication resulting in high malate content in

both fruit and roots41. Both low-malate and high-malate haplotypes have been identified in tomato41, which may form the basis of future studies investigating the role of malate exudation in

microbiome assembly. The historic impact of domestication on genomic regions linked to microbiome assembly is also apparent (Fig. 6, Supplementary Data 14, and Supplementary Fig. 4).

However, the processes and consequences of these sweeps, and possible subsequent recombination events on microbiome assembly remain unclear. In particular, the discontinuity of sweeps in

microbiome QTL regions suggests that evolutionary pressure for recombination of key (microbiome associated) traits, such as iron homeostasis and water transport, may have acted against

selective sweeps. The approach developed here provides the means to illuminate such complex eco-evolutionary questions, forming the basis of integrating the microbiome into the classic

genotype by environment model of host phenotype10. From the microbial perspective, the increased resolution in QTL analysis afforded by our shotgun-based approach also provided a window into

the host-specific bacterial adaptations to wild and modern alleles. In particular, the SNV QTL analysis demonstrated that genes related to the degradation of various plant-associated

polysaccharides in _Streptomyces_ were associated positively with the modern reference allele. Many other functions were identified in both plant and microbe, such as trehalose metabolism,

polyamine metabolism, and acetolactate synthase, suggesting either a direct link through cross-feeding77 or signaling78, or perhaps shared ecological pressures. While the microbial

adaptations related to polysaccharides79, vitamins80 and iron metabolism47,70 are well documented in relation to plant colonization, here we demonstrate that the reciprocal adaptations that

drive plant–microbe interactions can be investigated simultaneously to uncover their genetic architecture in both host and microbiome (Fig. 8d). From a societal context, linking quantitative

genetics with community level microbiome data provides us a tool to understand the complex genotype, environment, microbiome, and management interactions that shape our agroecosystems

structure and function. Armed with these tools and molecular insights, we can begin to re-envision the agroecosystem; targeting QTLs for improved plant–microbe interactions, identifying

‘missing microbes’ or functions lost during the domestication process, or pinpointing the molecules that drive these interactions. METHODS RECOMBINANT INBRED LINE POPULATION An F8 RIL

population derived from the parental lines _Solanum lycopersicum_ cv. Moneymaker (modern) and _Solanum pimpinellifolium_ L. accession CGN14498 (wild) consisting of 100 lines were used for

this study23. A high density map produced from this population was used to map QTLs26. GROWTH CONDITIONS FOR RIL The natural soil was collected in June 2017 from a tomato greenhouse in

South-Holland, The Netherlands (51°57’47”N 4°12’16”E). The soil was sieved, air dried, and stored at room temperature until use in 2019. Before the beginning of the experiment, soil moisture

was adjusted to 20% water by volume using deionized water. All soil was homogenized by thorough mixing and allowed to sit, covered by a breathable cloth, in the greenhouse for one week

prior to potting. The soil was then homogenized once again and then potted. Each pot was weighed to ensure all pots were 175 g ± 0.5 (wet weight). Duplicate pots for each accession were

planted, as well as six replicates of each modern and wild parental accession, and 8 bulk soil pots that were left unseeded. Each replicate was prepared simultaneously. Planting was done

separately representing biological replicates. In each pot, 3 seeds were planted in a triangular pattern to ensure the germination success for all pots. The first seedling to emerge in each

pot was retained and others were removed after germination. All pots were randomly distributed in trays containing approximately 10 plants. Throughout growth, careful attention was given to

randomize the distribution of plants. First, tray location and orientation with relation to each other were randomized on a nearly daily basis. In addition, the distribution of plants within

trays was randomized three times during growth. All pots were kept covered with a transparent lid until germination, which was scored daily. After germination, plants were visually

monitored and watered at the same rates. To minimize the impact of environmental differences between pots on microbiome composition, the watering regime for all plants was standardized and

leaks from the bottom of the pot and overflows were completely prevented. To achieve this, a minimal volume (2.5–5.0 mL) of water was used at each watering. This strategy was successful as

washout was never observed. Moisture content was measured by weighing the pots at the middle and end of the experiment to ensure all pots had similar moisture contents. HARVESTING AND

PROCESSING OF PLANT MATERIALS All plants had between 5 and 7 true leaves at harvest (Supplementary Data 1). Plants were gently removed from the pot and roots and were vigorously shaken. Soil

that remained attached to the roots after this stage was considered the rhizosphere. The remaining bulk soil and rhizosphere (plus roots) fractions were weighed. The root and attached

rhizosphere fraction were treated with 4 mL of lifeguard, vortexed, and sonicated. Roots were then removed. The remaining rhizosphere sample was then stored in LifeGuard Soil Preservation

Solution (Qiagen) at −20 °C until DNA extraction. The dry weight of shoots was measured after drying at 60 °C. The dry weight of the bulk soil was measured after storing at room temperature

in open paper bags for 1 month. The DNA was extracted using the DNeasy PowerSoil extraction kit (Qiagen). The protocol was optimized for the soil in the following manner: each sample was

vortexed and then a volume of approximately 1.5 mL was transferred into 2 mL tubes. This subsample was centrifuged at 10,000 × _g_ for 30 s such that a pellet was formed. The supernatant was

removed, and a new subsample was transferred, and centrifuged until the total volume of the original sample, without sand, had been transferred to the 2 mL tubes. The resulting pellet was

recalcitrant to disruption through bead beating, and therefore was physically disrupted by a pipette tip before proceeding with DNA extraction protocol. In test samples, DNA extractions from

the sand fraction yielding no, or marginal levels of DNA. RRNA AMPLICON SEQUENCE PROCESSING All DNA was sent to BaseClear (Leiden, The Netherlands) for 16S rRNA gene 300 bp paired-end

amplicon sequencing (MiSeq platform). MiSeq primers targeted the V3-V4 region of Bacteria: 341FCCTACGGGNGGCWGCAG, 805RGACTACHVGGGTATCTAATCC. In total, 20,542,135 16S rRNA gene amplicon read

pairs over 225 samples were generated. The raw reads were processed using the DADA2 workflow (v1.14.1) to produce amplicon sequence variants (ASV) and to assign taxonomy based on the Silva

database version 13881,82 (Supplementary Data 2). ASVs tagged as non-bacterial, chloroplast, or mitochondria were removed. Next, ASV counts were normalized using the cumulative sum scaling

(CSS) (Supplementary Data 3), which has been shown to be one of the most effective computational transformation techniques69, and filtered based on the effective sample size using the

metagenomeSeq package (v1.28.2)27. Differential abundances between rhizosphere and bulk soil were determined using the eBayes function from the limma package. Enriched rhizosphere ASVs with

a greater than log(2) fold change in abundance were analyzed based on their presence and absence, standard deviation and mean values. Using these statistics, stochastic ASVs (<50% of

samples) were removed from further analysis (Supplementary Data 4). All ASV sequences may be found in Supplementary Data 5. The remaining microbiome features were then mapped as QTLs as

described subsequently. To investigate diversity metrics as quantitative traits, the Shannon diversity of each sample was calculated using all ASV after filtering based on the effective

sample size using the metagenomeSeq package (v1.28.2)27, and using all ASV in greater than 50% of samples (Supplementary Data 21). Similarly, a PCoA analysis using Bray Curtis distances was

conducted, and the values for principle components axis 1 and 2 were extracted (Supplementary Data 22). Both calculations were done in phyloseq version 1.34.083. These diversity-based

microbiome features were then mapped as QTLs as described subsequently. METAGENOMICS ANALYSIS For the one set of replicates for each accession, paired-end sequence read libraries were

generated in the length of 150 bp per read on NovaSeq paired-end platform by BaseClear B.V. Demultiplexing was performed before the following analysis. It is computationally expensive to

assemble the 114 read libraries all at once. Therefore, a strategy of (merging) partial assemblies was undertaken. Two assemblers were used to create the assembled contigs, namely SPAdes

(version 3.13.2)84 and MEGAHIT (version 1.2.9)85. Assembly quality was assessed by running MultiQC (version 1.8)86 with Quast Module87 (Supplementary Figure 5). First, 6 modern parents, 5

wild parents, and 1 bulk soil sample were co-assembled via SPAdes with the metagenomic mode and parameter of -k 21,33,55,99, generating the first assembly (A1). Subsequently, a second

assembly (A2) was done using the unmapped reads from the remaining metagenomes using MEGAHIT with the parameter of --k-list 27,33,55,77,99. The third assembly (A3) was performed similarly as

A2, however, included the unmapped reads, ambiguously mapped reads, and mapped reads with a low mapping quality score (MapQ < 20) (Supplementary Data 18). Read mapping was done with

BWA-MEM with default settings88 and SAMtools was used to convert the resulting SAM files into sorted and indexed BAM files (version 1.10). Extraction of these reads was conducted by samtools

bam2fq. Redundancy between assemblies was evaluated by alignment to A1 via nucmer package of MUMmer with --maxmatch option (version:4.0.0)89. Firstly, 111.5 Gbp of reads from the parental

samples were assembled, labeled as A1, and yielded a total assembly length of 8.6 Gbp with the largest contig of 933.0 kilobase pairs (Kbp). After aligning the reads from RIL samples to A1,

unmapped reads, ambiguously mapped reads, and mapped reads with a low mapping quality score (MapQ < 20) were retrieved and assembled, yielding the second and third assembly (A2 and A3).

Specifically, A2 stemmed from solely the unmapped reads while A3 included the ambiguously mapped reads and mapped reads with MapQ < 20 in addition to the unmapped reads. A2 and A3

produced a total assembly length of 9.6 Gbp and 14.0 Gbp, with the largest contig of 56.2 and 86.3 Kbp respectively. There were 1.2, 2.0, and 2.8 million contigs with the length over 1 Kb

for A1, A2, and A3 respectively. In particular, 912 contigs in A1 were greater or equal to 50 Kbp whereas 1 or 2 such large contigs were successfully assembled in A2 or A3. The detailed

assembly statistics is given in Supplementary Data 18 and the numbers of contigs with different ranges of length for each assembly are presented in Supplementary Fig. 5. The sequence

similarities of the contigs in each assembly (≥1 Kbp) were compared using the nucmer package in MUMer. No contigs in A2 were reported to share an overlapped region with A1, therefore contigs

in A1 and A2 could be merged directly. When A3 was aligned to A1, 1.1% of the total length (≥1 Kbp) of A3 was reported to be overlapped with A1, however, only 18 contigs from A3 were 100%

identical to regions in larger contigs in A1. The sensitivity of filtering the overlapping contigs was evaluated by a benchmarking test using a random RIL sample to calculate the mapping

rates (Supplementary Fig. 6). 83.4% reads were mapped to A1 + A3 at MapQ ≥ 20 without filtering. Excluding the contigs from A3 that were completely and identically covered by A1, the mapping

rate was nearly the same as the one without filtering. Nevertheless, the removal of all aligned contigs in A3 resulted in a slight drop of mapping rate to 82.6%. To conclude, the final

assembly was determined as A1 + A3 with the 18 redundant contigs from A3 removed. To assess the overall assembly quality and quantify the abundance of contigs among all samples, metagenomic

reads were mapped to A1, A1 + A2, and A1 + A3 (deduplicated) respectively. Afterwards, the mapping rates were calculated for the mapped reads with MapQ > 20 in each sample. As shown in

Supplementary Fig. 7, approximately 70% reads among rhizosphere samples could be mapped to A1, while the mapping rates were 55 to 65% in the bulk soil samples. With the unmapped reads

assembled and added to A1, the mapping rates for A1 + A2 increased by 10%. The read recruitment was further improved by assembling and adding ambiguously mapped reads and mapped reads with

low MapQ in the final assembly (A1 + A3). A1, as well as de-replicated A3, were merged to acquire the final assembly. All the ‘contigs’ mentioned below are referring to the contigs in this

final assembly. BINNING OF METAGENOMIC CONTIGS Metabat2 (version 2:2.15)90 was used for assigning the contigs into genomic bins. Based on tetra-nucleotide frequency and abundance scores, 588

genomic bins were generated. Afterwards, genomic quality of those genomes was evaluated by CheckM (version: 1.1.1)30 with the command “checkm linage_wf” (Supplementary Data 8). The 33

genomes displaying the completeness larger than 90% and contamination smaller than 5% were used for further study as quantitative traits. MAKING PHENOTYPE FILES BASED ON CONTIG DEPTH Read

counts for each position on the assembled contigs were acquired using bedtools genomecov (version: 2.29.2)91. A custom Python script was applied to calculate the average depth (defined as

the number of total mapped reads divided by contig length) and coverage (defined as the number of covered base pairs divided by contig length) of every contig. Furthermore, the average

abundance of contigs assigned into a bin was calculated for the high-quality genomic bins detected by CheckM30. FEATURE SELECTION Average depths of the contigs were first normalized using

the CSS and filtered based on the effective sample size using metagenomeSeq package (v1.28.2)27. Differential abundance analysis was performed by moderated _t_-tests between groups using the

makeContrasts and eBayes commands retrieved from the R package Limma (v.3.22.7)92. Obtained _P_-values were adjusted using the Benjamini–Hochberg correction method. Differences in the

abundance of contigs between groups were considered significant when adjusted _P_-values were lower than 0.01 (Supplementary Data 19). In either comparison, the contigs that were

significantly enriched in the rhizosphere were gathered and regarded as the statistically rhizosphere-enriched contigs after removing the replicated ones. To perform QTL analysis for the

abundance of these enriched rhizosphere contigs, only the contigs with biological meanings were kept, i.e., the log (2) fold-change of mean values for the normalized abundances of RIL and

bulk samples should be greater than 2, and the contig should be in enough depth with at least the mean value of a group larger than 1. This selection step resulted in 1249

rhizosphere-enriched contigs. The statistics of the filtered normalized abundance were further inspected based on the presence and absence of contigs, standard deviation, and mean values of

the counts. TAXONOMIC AND FUNCTIONAL ANNOTATION OF THE METAGENOME Taxonomic classifications were assigned to the contigs in the final assembly using Kraken2 (version: 2.0.8)31 based on exact

_k_-mer matches. A custom Kraken2 database was built to contain _RefSeq_ complete genomes/proteins of archaea, bacteria, viral, fungi, and protozoa. Univec_Core was also included in the

custom database (20200308). Using the Kraken2 standard output, a python script based on TaxonKit93 was utilized to add full taxonomic names to each contig in the format of tab-delimited

table. 76.22% of the contigs > 1 kb were classified. Among the contigs >10 kb, up to 99.44% contigs were classified. Prokaryotic microbial genes were predicted by Prodigal (version:

2.6.3)94 with metagenomics mode. 10,246,55 genes were predicted from contigs > 1 kb. Open reading frames (ORFs) on contigs >10 kb were annotated by prokka (v1.14.5) and the

_Streptomyces_ ASV5 bin (MAG.72) was further annotated by DRAM (v1.2.0) integrating UniRef, Pfam, dbCAN and KEGG databases95. To assess the impact of the prioritization on the functional

representation of the metagenome, we identified the fraction of protein clusters represented in the rhizosphere-enriched contigs compared to the rest of the contigs greater than 10 kb.

First, Prodigal was used in metagenomics mode to predict genes in the metagenomic assembly with contigs longer than 10 kbp. Next, MMSeqs2 was used to cluster the protein sequences based on

70% similarity and based on 50% similarity, and with or without partial predicted genes32. To calculate the number of clusters that contained proteins encoded in rhizosphere-enriched

contigs, the clusters were searched for the presence of protein IDs of the 1249 rhizosphere-enriched contigs. In total, approximately 8.3% of protein clusters contained genes from the

rhizosphere-enriched contigs. In addition to proteins contained on rhizosphere-enriched contigs, these clusters contained approximately 25% of all proteins encoded in contigs larger than 10

kb (Supplementary Data 20). SINGLE NUCLEOTIDE VARIANT ANALYSIS To investigate strain level QTLs, we mapped single nucleotide variants (SNVs) identified using inStrain on the 1249

rhizosphere-enriched contigs. A total of 555, 382, and 535,432 SNVs were identified in the modern and wild parental metagenomes respectively. Of these, 162,299 and 142,349 SNVs were unique

to each dataset respectively, as they either contained only reference alleles or did not exceed the inStrain SNV calling thresholds. For each unique SNV locus, coverage in the other dataset

was determined using SAMtools depth after read filtering with settings comparable to inStrain and was considered identical to the reference allele frequency. Including the unique SNVs, this

resulted in a final set of 697,731 SNVs. To select SNVs that showed differential reference allele frequencies between MM and P, first the difference in reference allele frequency (MM–P) was

calculated per SNV. From the distribution of all SNVs, the 95% confidence interval (CI) was determined to select the 5% (30,911) most different SNVs (Supplementary Fig. 8). SNVs were further

selected using a Fisher’s exact test based on the allele read count differences between MM and P. _P_-values were sorted, and a final selection of 15,026 differentially abundant SNVs

distributed over 1037 contigs was obtained using a Benjamini-Hochberg false discovery rate (FDR) correction of 0.01. SNV allele read counts were extracted from the RIL dataset using the

pysam Python package after filtering with settings comparable to inStrain. QUANTITATIVE TRAIT LOCUS ANALYSIS The QTL analysis linking selected amplicon, contig, bin, and SNV features with

plant loci was performed using the R package R/qtl225. Pseudomarkers were added to the genetic map to increase resolution, with a step distance of 1 Mbp between the markers and

pseudomarkers. Plant genome probabilities were calculated using the genetic map with pseudomarkers, plant loci cross data, and error probability of 1E-4. Plant locus kinship matrix was

calculated as proportion of shared alleles using conditional allele probabilities of all plant chromosomes, which were calculated from the plant genome probabilities. A genome scan using a

single-QTL model using a linear mixed model was performed on the SNV allele read counts as phenotypes, plant genotype probabilities as input variables and as covariates the number of leaves,

harvest day, rhizosphere soil weight (g), soil starting weight (g) and plant dry weight (g). The LOD score was determined for each plant locus SNV allele combination. A permutation test

using randomized data was performed with 1000 permutations to assess the distribution of the LOD scores. The 95% quantile was used as threshold for the selection of LOD peaks, as well as a

_P_ = 0.95 Bayes credible interval probability. INDEPENDENT VALIDATION OF QTLS THROUGH BULK SEGREGANT ANALYSIS To validate the QTLs, 33 _Solanum lycopersicum_ cv. Moneymaker (modern), 30

_Solanum pimpinellifolium_ L. accession CGN14498, and 77 RIL accessions (with replicates of 4 each) were grown and their microbiomes characterized through 16S rRNA gene amplicon sequencing.

Parental lines and RIL accessions were germinated in pots filled with 300 g agricultural soil. For each accession, were planted with six plants per replicate pot. The plants were arranged

randomly in the growth chamber (25 °C, 16 h daylight) and watered every day. Bulk soil samples without plants were used as controls (_N_ = 31). Rhizospheric soil was collected according to

standard methods96. In order to synchronize the developmental stage, the plants were harvested after 21 days, or when the 3rd trifoliate leaf was reached. The soil loosely attached to the

roots was removed and the entire root system was transferred to a 15 mL tube containing 5 mL LifeGuard Soil Preservation Solution (MoBio Laboratories). The tubes were vigorously vortexed and

sonicated. Subsequently, the roots were removed and at least 1 g (wet weight) of rhizospheric soil was recovered per sample for DNA extraction. For the bulk soil samples, approximately 1 g

of soil was collected and mixed with 5 mL of LifeGuard solution. To extract rhizospheric DNA, PowerSoil Total DNA/RNA Isolation Kit (MoBio Laboratories, Inc., USA) was used in accordance

with the manufacturer’s instruction. Rhizospheric DNA was obtained using RNA PoweSoil DNA Elution Accessory Kit (MoBio Laboratories, Inc. USA). The quantity and quality of the obtained DNA

was checked by ND1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) and Qubit 2.0 fluorometer (ThermoFisher Scientific, USA). DNA samples were stored at −20 °C until further

use. The extracted samples were used for amplification and sequencing of the 16S rRNA gene, targeting the variable V3–V4 (Forward Primer: 5′-CCTACGGGNGGCWGCAG-3′ Reverse Primer:

5′-GACTACHVGGGTATACTAATCC-3′) resulting in amplicons of approximately ~460 bp. Dual indices and Illumina sequencing adapters using the Nextera XT Index Kit were attached to the V3–V4

amplicons. Subsequently, library quantification, normalization, and pooling were performed and MiSeq v3 reagent kits were used to finally load the samples for MiSeq sequencing. For more info

please refer to the guidelines of Illumina MiSeq System. The RDP extension to PANDASeq97, named Assembler98, was used to merge paired-end reads with a minimum overlap of 10 bp and at least

a Phred score of 25. Primer sequences were removed from the per sample FASTQ files using Flexbar version 2.599. Reads were processed as before except the Silva version 132 was used for

taxonomic classification82. REPORTING SUMMARY Further information on research design is available in the Nature Research Reporting Summary linked to this article. DATA AVAILABILITY The 16S

amplicons and shotgun metagenomics sequencing data have been deposited in the NCBI database under BioProject ID PRJNA787039 and PRJNA789467, respectively. Metagenome assembled genomes are

available at Zenodo [https://doi.org/10.5281/zenodo.6561541]. The Silva database was used to assign taxonomy to 16S rRNA amplicon sequences [https://www.arb-silva.de/download/archive/]. A

custom database was used to assign taxonomy for Kraken. Due to size limitation, this database is available upon request (please contact J.M.R. at [email protected] and expect 2

weeks of processing time). Source data are provided with this paper. CODE AVAILABILITY The code used in the analysis can be found at Zenodo [https://doi.org/10.5281/zenodo.6561541].

REFERENCES * Oyserman, B. O., Medema, M. H. & Raaijmakers, J. M. Road MAPs to engineer host microbiomes. _Curr. Opin. Microbiol._ 43, 46–54 (2018). Article CAS PubMed Google Scholar

* Marasco, R. et al. A drought resistance-promoting microbiome is selected by root system under desert farming. _PLoS One_ 7, e48479 (2012). Article CAS ADS PubMed PubMed Central Google

Scholar * Carrión, V. J. et al. Pathogen-induced activation of disease-suppressive functions in the endophytic root microbiome. _Science_ 366, 606–612 (2019). Article ADS CAS PubMed

Google Scholar * Finkel, O. M. et al. A single bacterial genus maintains root growth in a complex microbiome. _Nature_ https://doi.org/10.1038/s41586-020-2778-7 (2020). Article PubMed

Google Scholar * Wagner, M. R. et al. Microbe-dependent heterosis in maize. _Proc. Natl Acad. Sci. USA_ 118, e2021965118 (2021). Article CAS PubMed PubMed Central Google Scholar *

Sasse, J., Martinoia, E. & Northen, T. Feed your friends: Do plant exudates shape the root microbiome? _Trends Plant Sci._ 23, 25–41 (2018). Article CAS PubMed Google Scholar *

Canarini, A., Kaiser, C., Merchant, A., Richter, A. & Wanek, W. Root exudation of primary metabolites: Mechanisms and their roles in plant responses to environmental stimuli. _Front.

Plant Sci._ 10, 157 (2019). Article PubMed PubMed Central Google Scholar * Tracanna, V. et al. Dissecting disease-suppressive rhizosphere microbiomes by functional amplicon sequencing

and 10× metagenomics. _mSystems_ 6, e0111620 (2021). * Crowley, D. E. _Iron Nutrition in Plants and Rhizospheric Microorganisms_ (eds Barton, L. L. & Abadia, J.) 169–198 (Springer

Netherlands, 2006). * Oyserman, B. O. et al. Extracting the GEMs: Genotype, environment, and microbiome interactions shaping host phenotypes. _Front. Microbiol._ 11, 574053 (2021). Article

PubMed PubMed Central Google Scholar * Pérez-Jaramillo, J. E., Carrión, V. J., de Hollander, M. & Raaijmakers, J. M. The wild side of plant microbiomes. _Microbiome_ 6, 143 (2018).

Article PubMed PubMed Central Google Scholar * Favela, A., O. Bohn, M. & D. Kent, A. Maize germplasm chronosequence shows crop breeding history impacts recruitment of the rhizosphere

microbiome. _ISME J._ https://doi.org/10.1038/s41396-021-00923-z (2021). * Gruber, K. Agrobiodiversity: The living library. _Nature_ 544, S8–S10 (2017). Article CAS ADS PubMed Google

Scholar * Lopez-Delacalle, M. et al. Using tomato recombinant lines to improve plant tolerance to stress combination through a more efficient nitrogen metabolism. _Front. Plant Sci._ 10,

1702 (2019). Article PubMed Google Scholar * Vosman, B. et al. QTL mapping of insect resistance components of Solanum galapagense. _Theor. Appl Genet._ 132, 531–541 (2019). Article CAS

PubMed Google Scholar * Liu, Z. et al. Identification of a Solanum pennellii chromosome 4 fruit flavor and nutritional quality-associated metabolite QTL. _Front. Plant Sci._ 7, 1671

(2016). Article PubMed PubMed Central Google Scholar * Pereira, P. A. A., Miranda, B. D., Attewell, J. R., Kmiecik, K. A. & Bliss, F. A. Selection for increased nodule number in

common bean (Phaseolus vulgaris L.). _Plant Soil_ 148, 203–209 (1993). Article Google Scholar * Deng, S. et al. Genome wide association study reveals plant loci controlling heritability of

the rhizosphere microbiome. _ISME J._ https://doi.org/10.1038/s41396-021-00993-z (2021). Article PubMed PubMed Central Google Scholar * Bergelson, J., Mittelstrass, J. & Horton, M.

W. Characterizing both bacteria and fungi improves understanding of the Arabidopsis root microbiome. _Sci. Rep._ 9, 24 (2019). Article ADS CAS PubMed PubMed Central Google Scholar *

Wallace, J. G., Kremling, K. A., Kovar, L. L. & Buckler, E. S. Quantitative genetics of the maize leaf microbiome. _Phytobiomes J._ 2, 208–224 (2018). Article Google Scholar *

Bergelson, J., Brachi, B., Roux, F. & Vailleau, F. Assessing the potential to harness the microbiome through plant genetics. _Curr. Opin. Biotechnol._ 70, 167–173 (2021). Article CAS

PubMed Google Scholar * Olm, M. R. et al. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains. _Nat. Biotechnol._ 39, 727–736

(2021). Article CAS PubMed Google Scholar * Voorrips, R. E., Verkerke, W., Finkers, R., Jongerius, R. & Kanne, J. Inheritance of taste components in tomato. _Acta Physiol. Plant_

22, 259–261 (2000). Article CAS Google Scholar * Khan, N. et al. Exploring the natural variation for seedling traits and their link with seed dimensions in tomato. _PLoS One_ 7, e43991

(2012). Article CAS ADS PubMed PubMed Central Google Scholar * Broman, K. W. et al. R/qtl2: Software for mapping quantitative trait loci with high-dimensional data and multiparent

populations. _Genetics_ 211, 495–502 (2019). Article CAS PubMed PubMed Central Google Scholar * Sterken, M. G. et al. Plasticity of maternal environment dependent expression-QTLs of

tomato seeds. Preprint at _bioRxiv_ https://doi.org/10.1101/2021.03.29.437558 (2021). * Paulson, J. N., Stine, O. C., Bravo, H. C. & Pop, M. Differential abundance analysis for microbial

marker-gene surveys. _Nat. Methods_ 10, 1200–1202 (2013). Article CAS PubMed PubMed Central Google Scholar * Lorenz, K. & Cohen, B. A. Small- and large-effect quantitative trait

locus interactions underlie variation in yeast sporulation efficiency. _Genetics_ 192, 1123–1132 (2012). Article CAS PubMed PubMed Central Google Scholar * Kang, D. D., Froula, J.,

Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. _PeerJ_ 3, e1165–e1165 (2015). Article CAS PubMed

PubMed Central Google Scholar * Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: Assessing the quality of microbial genomes recovered from isolates,

single cells, and metagenomes. _Genome Res._ 25, 1043–1055 (2015). Article CAS PubMed PubMed Central Google Scholar * Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic

analysis with Kraken 2. _Genome Biol._ 20, 257 (2019). Article CAS PubMed PubMed Central Google Scholar * Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence

searching for the analysis of massive data sets. _Nat. Biotechnol._ 35, 1026–1028 (2017). Article CAS PubMed Google Scholar * The Tomato Genome Consortium. The tomato genome sequence

provides insights into fleshy fruit evolution. _Nature_ 485, 635–641 (2012). Article ADS CAS Google Scholar * Lin, T. et al. Genomic analyses provide insights into the history of tomato

breeding. _Nat. Genet._ 46, 1220–1226 (2014). Article CAS PubMed Google Scholar * Ling, H.-Q., Bauer, P., Bereczky, Z., Keller, B. & Ganal, M. The tomato fer gene encoding a bHLH

protein controls iron-uptake responses in roots. _Proc. Natl Acad. Sci. USA_ 99, 13938–13943 (2002). Article CAS ADS PubMed PubMed Central Google Scholar * Morcillo, R. et al. Plant

transcriptome reprograming and bacterial extracellular metabolites underlying tomato drought resistance triggered by a beneficial soil bacteria. _Metabolites_ 11, 369 (2021). Article CAS

PubMed PubMed Central Google Scholar * Zhou, F. & Pichersky, E. The complete functional characterisation of the terpene synthase family in tomato. _N. Phytol._ 226, 1341–1360 (2020).

Article CAS Google Scholar * Kong, H. G., Song, G. C., Sim, H.-J. & Ryu, C.-M. Achieving similar root microbiota composition in neighbouring plants through airborne signalling. _ISME

J._ 15, 397–408 (2021). Article CAS PubMed Google Scholar * Huang, M. et al. The major volatile organic compound emitted from _Arabidopsis thaliana_ flowers, the sesquiterpene

(_E_)‐β‐caryophyllene, is a defense against a bacterial pathogen. _N. Phytologist_ 193, 997–1008 (2012). Article CAS Google Scholar * Sweeney, C., Lakshmanan, V. & Bais, H. P.

Interplant aboveground signaling prompts upregulation of auxin promoter and malate transporter as part of defensive response in the neighboring plants. _Front. Plant Sci_. 8, 595 (2017). *

Ye, J. et al. An InDel in the promoter of _Al-ACTIVATED MALATE TRANSPORTER9_ selected during tomato domestication determines fruit malate contents and aluminum tolerance. _Plant Cell_ 29,

2249–2268 (2017). Article CAS PubMed PubMed Central Google Scholar * Safdarian, M., Askari, H., Shariati, J. V. & Nematzadeh, G. Transcriptional responses of wheat roots inoculated

with Arthrobacter nitroguajacolicus to salt stress. _Sci. Rep._ 9, 1792 (2019). Article ADS CAS PubMed PubMed Central Google Scholar * Nozoye, T. The nicotianamine synthase gene is a

useful candidate for improving the nutritional qualities and Fe-deficiency tolerance of various crops. _Front. Plant Sci._ 9, 340 (2018). Article PubMed PubMed Central Google Scholar *

Blin, K. et al. antiSMASH 6.0: Improving cluster detection and comparison capabilities. _Nucleic Acids Res._ 49, W29–W35 (2021). Article CAS PubMed PubMed Central Google Scholar *

Chewning, S. S. et al. Root-associated streptomyces isolates harboring melC genes demonstrate enhanced plant colonization. _Phytobiomes J._ 3, 165–176 (2019). Article Google Scholar *

Schöner, T. A. et al. Aryl polyenes, a highly abundant class of bacterial natural products, are functionally related to antioxidative carotenoids. _ChemBioChem_ 17, 247–253 (2016). Article

CAS PubMed Google Scholar * Gu, S. et al. Competition for iron drives phytopathogen control by natural rhizosphere microbiomes. _Nat. Microbiol._ 5, 1002–1010 (2020). Article CAS PubMed

PubMed Central Google Scholar * Crits-Christoph, A., Bhattacharya, N., Olm, M. R., Song, Y. S. & Banfield, J. F. Transporter genes in biosynthetic gene clusters predict metabolite

characteristics and siderophore activity. _Genome Res._ 31, 239–250 (2021). Article PubMed Central Google Scholar * Attia, M. A. et al. In vitro and in vivo characterization of three

Cellvibrio japonicus glycoside hydrolase family 5 members reveals potent xyloglucan backbone-cleaving functions. _Biotechnol. Biofuels_ 11, 45 (2018). Article CAS PubMed PubMed Central

Google Scholar * Nelson, C. E. et al. Comprehensive functional characterization of the glycoside hydrolase family 3 enzymes from _Cellvibrio japonicus_ reveals unique metabolic roles in

biomass saccharification: Complex glucan utilization in _C. japonicus_. _Environ. Microbiol._ 19, 5025–5039 (2017). Article CAS PubMed PubMed Central Google Scholar * Gardner, J. G. et

al. Systems biology defines the biological significance of redox‐active proteins during cellulose degradation in an aerobic bacterium. _Mol. Microbiol._ 94, 1121–1133 (2014). Article CAS

Google Scholar * Yin, Y. et al. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. _Nucleic Acids Res._ 40, W445–W451 (2012). Article CAS PubMed PubMed Central

Google Scholar * Schrempf, H. & Walter, S. The cellulolytic system of Streptomyces reticuli. _Int. J. Biol. Macromolecules_ 17, 353–355 (1995). Article CAS Google Scholar * Worsley,

S. F. et al. Investigating the role of root exudates in recruiting streptomyces bacteria to the Arabidopsis thaliana microbiome. _Front. Mol. Biosci._ 8, 686110 (2021). Article CAS PubMed

PubMed Central Google Scholar * Kamilova, F. et al. Organic acids, sugars, and l -tryptophane in exudates of vegetables growing on stonewool and their effects on activities of

rhizosphere bacteria. _MPMI_ 19, 250–256 (2006). Article CAS PubMed Google Scholar * Bentley, S. D. et al. Complete genome sequence of the model actinomycete Streptomyces coelicolor

A3(2). _Nature_ 417, 141–147 (2002). Article ADS PubMed Google Scholar * Matilla, M. A., Espinosa-Urgel, M., Rodríguez-Herva, J. J., Ramos, J. L. & Ramos-González, M. I. Genomic

analysis reveals the major driving forces of bacterial life in the rhizosphere. _Genome Biol._ 8, R179 (2007). Article CAS PubMed PubMed Central Google Scholar * Chaparro, J. M. et al.

Root exudation of phytochemicals in Arabidopsis follows specific patterns that are developmentally programmed and correlate with soil microbial functions. _PLoS One_ 8, e55731 (2013).

Article CAS ADS PubMed PubMed Central Google Scholar * Polizeli, M. L. T. M. et al. Xylanases from fungi: Properties and industrial applications. _Appl Microbiol. Biotechnol._ 67,

577–591 (2005). Article CAS PubMed Google Scholar * Mellerowicz, E. J. & Gorshkova, T. A. Tensional stress generation in gelatinous fibres: A review and possible mechanism based on

cell-wall structure and composition. _J. Exp. Bot._ 63, 551–565 (2012). Article CAS PubMed Google Scholar * Sugiura, H. et al. Bacterial inducible expression of plant cell wall-binding

protein YesO through conflict between Glycine max and saprophytic Bacillus subtilis. _Sci. Rep._ 10, 18691 (2020). Article CAS ADS PubMed PubMed Central Google Scholar * Meyer, R. S.,

DuVal, A. E. & Jensen, H. R. Patterns and processes in crop domestication: An historical review and quantitative analysis of 203 global food crops: Tansley review. _N. Phytologist_ 196,

29–48 (2012). Article Google Scholar * Gopal, M. & Gupta, A. Microbiome selection could spur next-generation plant breeding strategies. _Front. Microbiol_. 7, 1971 (2016). * Busby, P.

E. et al. Research priorities for harnessing plant microbiomes in sustainable agriculture. _PLoS Biol._ 15, e2001793 (2017). Article CAS PubMed PubMed Central Google Scholar *

Beilsmith, K. et al. Genome-wide association studies on the phyllosphere microbiome: Embracing complexity in host-microbe interactions. _Plant J._ 97, 164–181 (2019). Article CAS PubMed

Google Scholar * Wille, L., Messmer, M. M., Studer, B. & Hohmann, P. Insights to plant–microbe interactions provide opportunities to improve resistance breeding against root diseases in

grain legumes. _Plant, Cell Environ._ 42, 20–40 (2019). Article CAS Google Scholar * Horton, M. W. et al. Genome-wide association study of Arabidopsis thaliana leaf microbial community.

_Nat. Commun._ 5, 5320 (2014). Article ADS PubMed Google Scholar * Sideli, G. M. et al. Quantitative phenotyping of shell suture strength in walnut (Juglans regia L.) enhances precision

for detection of QTL and genome-wide association mapping. _PLoS One_ 15, e0231144 (2020). Article CAS PubMed PubMed Central Google Scholar * Lloréns-Rico, V., Vieira-Silva, S.,

Gonçalves, P. J., Falony, G. & Raes, J. Benchmarking microbiome transformations favors experimental quantitative approaches to address compositionality and sampling depth biases. _Nat.

Commun._ 12, 3562 (2021). Article ADS CAS PubMed PubMed Central Google Scholar * Xu, L. et al. Genome-resolved metagenomics reveals role of iron metabolism in drought-induced

rhizosphere microbiome dynamics. _Nat. Commun._ 12, 3209 (2021). Article CAS ADS PubMed PubMed Central Google Scholar * Sade, N. et al. Improving plant stress tolerance and yield

production: Is the tonoplast aquaporin SlTIP2; 2 a key to isohydric to anisohydric conversion? _N. Phytologist_ 181, 651–661 (2009). Article CAS Google Scholar * Reuscher, S. et al.

Genome-wide identification and expression analysis of aquaporins in tomato. _PLoS One_ 8, e79052 (2013). Article ADS CAS PubMed PubMed Central Google Scholar * Schwarz, B. & Bauer,

P. FIT, a regulatory hub for iron deficiency and stress signaling in roots, and FIT-dependent and -independent gene signatures. _J. Exp. Bot._ 71, 1694–1705 (2020). Article CAS PubMed

PubMed Central Google Scholar * Nascimento, F. X., Rossi, M. J. & Glick, B. R. Ethylene and 1-aminocyclopropane-1-carboxylate (ACC) in plant–bacterial interactions. _Front. Plant Sci._

9, 114 (2018). Article PubMed PubMed Central Google Scholar * Cosgrove, D. J. Catalysts of plant cell wall loosening. _F1000Res_ 5, 119 (2016). Article Google Scholar * Cosgrove, D.

J. Microbial expansins. _Annu. Rev. Microbiol._ 71, 479–497 (2017). Article CAS PubMed Google Scholar * Smith, N. W., Shorten, P. R., Altermann, E., Roy, N. C. & McNabb, W. C. The

classification and evolution of bacterial cross-feeding. _Front. Ecol. Evol._ 7, 153 (2019). Article Google Scholar * Lunn, J. E., Delorge, I., Figueroa, C. M., Van Dijck, P. & Stitt,

M. Trehalose metabolism in plants. _Plant J._ 79, 544–567 (2014). Article CAS PubMed Google Scholar * Beauregard, P. B., Chai, Y., Vlamakis, H., Losick, R. & Kolter, R. Bacillus

subtilis biofilm induction by plant polysaccharides. _Proc. Natl Acad. Sci. USA_ 110, E1621–E1630 (2013). Article CAS ADS PubMed PubMed Central Google Scholar * Streit, W. R. Biotin

and other water-soluble vitamins are key growth factors for alfalfa root colonization by _Rhizobium meliioti_ 1021. _MPMI_ 9, 330 (1996). Article CAS PubMed Google Scholar * Callahan, B.

J. et al. DADA2: High-resolution sample inference from Illumina amplicon data. _Nat. Methods_ 13, 581–583 (2016). Article CAS PubMed PubMed Central Google Scholar * Quast, C. et al.

The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. _Nucleic Acids Res._ 41, D590–D596 (2012). Article CAS PubMed PubMed Central Google Scholar

* McMurdie, P. J. & Holmes, S. phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. _PLoS One_ 8, e61217 (2013). Article CAS ADS PubMed

PubMed Central Google Scholar * Bankevich, A. et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. _J. Comput. Biol.: J. Comput. Mol. Cell

Biol._ 19, 455–477 (2012). Article MathSciNet CAS Google Scholar * Li, D. et al. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community

practices. _Methods_ 102, 3–11 (2016). Article CAS PubMed Google Scholar * Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: Summarize analysis results for multiple tools

and samples in a single report. _Bioinformatics_ 32, 3047–3048 (2016). Article CAS PubMed PubMed Central Google Scholar * Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D. &

Gurevich, A. Versatile genome assembly evaluation with QUAST-LG. _Bioinformatics_ 34, i142–i150 (2018). Article CAS PubMed PubMed Central Google Scholar * Li, H. Aligning sequence

reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997 (2013). * Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. _PLoS Comput

Biol._ 14, e1005944 (2018). Article CAS PubMed PubMed Central Google Scholar * Kang, D. D. et al. MetaBAT 2: An adaptive binning algorithm for robust and efficient genome reconstruction

from metagenome assemblies. _PeerJ_ 7, e7359 (2019). Article PubMed PubMed Central Google Scholar * Quinlan, A. R. & Hall, I. M. BEDTools: A flexible suite of utilities for

comparing genomic features. _Bioinformatics_ 26, 841–842 (2010). Article CAS PubMed PubMed Central Google Scholar * Ritchie, M. E. et al. limma powers differential expression analyses

for RNA-sequencing and microarray studies. _Nucleic Acids Res._ 43, e47–e47 (2015). Article CAS PubMed PubMed Central Google Scholar * Shen, W. & Ren, H. TaxonKit: A practical and

efficient NCBI taxonomy toolkit. _J. Genet. Genomics_ 48, 844–850 (2021). Article PubMed Google Scholar * Hyatt, D. et al. Prodigal: Prokaryotic gene recognition and translation

initiation site identification. _BMC Bioinform._ 11, 119–119 (2010). Article CAS Google Scholar * Shaffer, M. et al. DRAM for distilling microbial metabolism to automate the curation of

microbiome function. _Nucleic Acids Res._ 48, 8883–8900 (2020). Article CAS PubMed PubMed Central Google Scholar * Lundberg, D. S. et al. Defining the core Arabidopsis thaliana root

microbiome. _Nature_ 488, 86–90 (2012). * Masella, A. P., Bartram, A. K., Truszkowski, J. M., Brown, D. G. & Neufeld, J. D. PANDAseq: Paired-end assembler for illumina sequences. _BMC

Bioinform._ 13, 1–7 (2012). Article CAS Google Scholar * Cole, J. R. et al. Ribosomal Database Project: Data and tools for high throughput rRNA analysis. _Nucleic Acids Res._ 42, 633–642

(2014). Article CAS Google Scholar * Dodt, M., Roehr, J., Ahmed, R. & Dieterich, C. FLEXBAR—Flexible barcode and adapter processing for next-generation sequencing platforms. _Biology_

1, 895–905 (2012). Article PubMed PubMed Central Google Scholar Download references ACKNOWLEDGEMENTS The project was financially supported, in part, by the NWO-TTW Perspective program

BackToRoots (TTW-project 14218 to J.M.R., V.J.C., V.C., and B.O.O.), by the NWO-Gravitation program MICRop (to J.M.R., M.H.M.), a National Institutes of Health (NIH) Genome to Natural

Products Network supplementary award (no. U01GM110706 to M.H.M.), a ZonMW Enabling Technologies Hotel project (no. 40-43500-98-210 to M.H.M.), a Senescyt fellowship awarded to S.S.F., and by

internal funding from the Netherlands Institute of Ecology. AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Department of Microbial Ecology, Netherlands Institute of Ecology, Wageningen, The

Netherlands Ben O. Oyserman, Stalin Sarango Flores, Thom Griffioen, Xinya Pan, Wouter Lokhorst, Azkia Nurfikari, Nejc Stopnisek, Viviane Cordovez, Víctor J. Carrión & Jos M. Raaijmakers

* Bioinformatics Group, Wageningen University, Wageningen, The Netherlands Ben O. Oyserman, Elmar van der Wijk, Lotte Pronk, Anne Kupczok & Marnix H. Medema * Institute of Biology,

Leiden University, Leiden, The Netherlands Stalin Sarango Flores, Víctor J. Carrión, Marnix H. Medema & Jos M. Raaijmakers * Department of Data Sciences, Genentech, Inc. South San

Francisco, South San Francisco, CA, USA Joseph N. Paulson * Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA Mercedeh Movassagh * Department of Data

Sciences Dana Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, USA Mercedeh Movassagh * Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen

University, Wageningen, The Netherlands Wilco Ligterink * Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, The Netherlands Basten L. Snoek Authors * Ben O. Oyserman View

author publications You can also search for this author inPubMed Google Scholar * Stalin Sarango Flores View author publications You can also search for this author inPubMed Google Scholar *

Thom Griffioen View author publications You can also search for this author inPubMed Google Scholar * Xinya Pan View author publications You can also search for this author inPubMed Google

Scholar * Elmar van der Wijk View author publications You can also search for this author inPubMed Google Scholar * Lotte Pronk View author publications You can also search for this author

inPubMed Google Scholar * Wouter Lokhorst View author publications You can also search for this author inPubMed Google Scholar * Azkia Nurfikari View author publications You can also search

for this author inPubMed Google Scholar * Joseph N. Paulson View author publications You can also search for this author inPubMed Google Scholar * Mercedeh Movassagh View author publications

You can also search for this author inPubMed Google Scholar * Nejc Stopnisek View author publications You can also search for this author inPubMed Google Scholar * Anne Kupczok View author

publications You can also search for this author inPubMed Google Scholar * Viviane Cordovez View author publications You can also search for this author inPubMed Google Scholar * Víctor J.

Carrión View author publications You can also search for this author inPubMed Google Scholar * Wilco Ligterink View author publications You can also search for this author inPubMed Google

Scholar * Basten L. Snoek View author publications You can also search for this author inPubMed Google Scholar * Marnix H. Medema View author publications You can also search for this author

inPubMed Google Scholar * Jos M. Raaijmakers View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS The study was conceived and designed by

B.O.O., V.J.C., W.Li, M.H.M., and J.M.R. The greenhouse experimentation and lab work were conducted by B.O.O., S.S.F., V.C., V.J.C., and A.N. Contributions to data analysis came from B.O.O.,

T.G., X.P., E.v.d.W., W.Lo, L.P., N.S., A.K., V.C., V.J.C., B.L.S., M.H.M., J.N.P., and M.M. The manuscript was drafted by B.O.O., B.L.S., M.H.M., and J.M.R. All authors contributed to the

revision and agreed upon the final draft. CORRESPONDING AUTHORS Correspondence to Ben O. Oyserman or Jos M. Raaijmakers. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no

competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Communications_ thanks Joëlle Schläpfer, Maggie Wagner, and the other, anonymous, reviewer(s) for their contribution to the

peer review of this work. Peer reviewer reports are available. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps

and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION PEER REVIEW FILE DESCRIPTION OF ADDITIONAL SUPPLEMENTARY FILES SUPPLEMENTARY DATA 1 SUPPLEMENTARY DATA 2

SUPPLEMENTARY DATA 3 SUPPLEMENTARY DATA 4 SUPPLEMENTARY DATA 5 SUPPLEMENTARY DATA 6 SUPPLEMENTARY DATA 7 SUPPLEMENTARY DATA 8 SUPPLEMENTARY DATA 9 SUPPLEMENTARY DATA 10 SUPPLEMENTARY DATA 11

SUPPLEMENTARY DATA 12 SUPPLEMENTARY DATA 13 SUPPLEMENTARY DATA 14 SUPPLEMENTARY DATA 15 SUPPLEMENTARY DATA 16 SUPPLEMENTARY DATA 17 SUPPLEMENTARY DATA 18 SUPPLEMENTARY DATA 19 SUPPLEMENTARY

DATA20 SUPPLEMENTARY DATA 21 SUPPLEMENTARY DATA 22 REPORTING SUMMARY SOURCE DATA SOURCE DATA RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution

4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s)

and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s

Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not

permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Oyserman, B.O., Flores, S.S., Griffioen, T. _et al._ Disentangling the genetic

basis of rhizosphere microbiome assembly in tomato. _Nat Commun_ 13, 3228 (2022). https://doi.org/10.1038/s41467-022-30849-9 Download citation * Received: 20 December 2021 * Accepted: 19 May

2022 * Published: 16 June 2022 * DOI: https://doi.org/10.1038/s41467-022-30849-9 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable

link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative