Conditional and interaction gene-set analysis reveals novel functional pathways for blood pressure

Play all audios:

ABSTRACT Gene-set analysis provides insight into which functional and biological properties of genes are aetiologically relevant for a particular phenotype. But genes have multiple

properties, and these properties are often correlated across genes. This can cause confounding in a gene-set analysis, because one property may be statistically associated even if

biologically irrelevant to the phenotype, by being correlated with gene properties that are relevant. To address this issue we present a novel conditional and interaction gene-set analysis

approach, which attains considerable functional refinement of its conclusions compared to traditional gene-set analysis. We applied our approach to blood pressure phenotypes in the UK

Biobank data (_N_ = 360,243), the results of which we report here. We confirm and further refine several associations with multiple processes involved in heart and blood vessel formation but

also identify novel interactions, among others with cardiovascular tissues involved in regulatory pathways of blood pressure homoeostasis. SIMILAR CONTENT BEING VIEWED BY OTHERS GENOME-WIDE

ANALYSIS IN OVER 1 MILLION INDIVIDUALS OF EUROPEAN ANCESTRY YIELDS IMPROVED POLYGENIC RISK SCORES FOR BLOOD PRESSURE TRAITS Article Open access 30 April 2024 LARGE-SCALE GENOMIC ANALYSES

REVEAL INSIGHTS INTO PLEIOTROPY ACROSS CIRCULATORY SYSTEM DISEASES AND NERVOUS SYSTEM DISORDERS Article Open access 14 June 2022 A COMPARISON OF THE GENES AND GENESETS IDENTIFIED BY GWAS AND

EWAS OF FIFTEEN COMPLEX TRAITS Article Open access 19 December 2022 INTRODUCTION The aim of gene-set analysis (GSA) is to uncover functional and biological properties of genes involved in

the genetic aetiology of a phenotype1,2. If a property is relevant to a phenotype, then variants associated with that phenotype will tend to accumulate in genes with that property. For

example, smooth muscle cells (SMC) play a role in blood pressure regulation3,4, and if this has a genetic basis we might expect to find genes involved in the development of SMCs to exhibit

genetic association with blood pressure phenotypes. However, genes typically have numerous different properties, which can be strongly correlated with each other if they involve many of the

same genes. Perhaps SMC development genes are also involved in the development of other types of muscle cell, or they are expressed primarily in muscle tissue. This would result in a

correlation between SMC development and muscle cell development in general, or between SMC development and muscle-specific gene expression. In such scenarios, associated variants will

accumulate in genes with a property that does not itself play a role in the phenotype, but is correlated with another gene property that does. Thus, the SMC development gene set could become

associated simply by muscle-specific gene expression playing a role in the phenotype. Traditional GSA only tests the marginal associations of gene properties5,6, and cannot account for this

kind of confounding. Such GSA is therefore liable to identify gene properties that hold no biological relevance for the phenotype, with potentially very misleading interpretations and

wasted effort in follow-up research as a result. To address this issue we have developed a novel GSA approach, based on and implemented in our existing GSA tool MAGMA5. Central to this

approach is the conditional GSA model, which can evaluate how associations of different gene properties relate to each other. As Fig. 1 illustrates, it can identify confounding where

traditional GSA cannot. The model can also deal with more complex scenarios, in which particular combinations of multiple gene properties are relevant to the phenotype rather than any

individual gene properties on their own. This manifests statistically as an interaction between gene properties, which are hard to detect when testing only marginal associations and which

can result in confounding of the gene properties involved (Fig. 1d). A more complete and accurate insight into the phenotype based on GSA therefore requires that such scenarios are taken

into account as well. Our proposed approach works by selecting gene properties with significant marginal associations using a standard GSA, then using a series of follow-up analyses to

discard those which are likely not biologically relevant for the phenotype. A wide range of gene properties is used as input to improve the probability of relevant gene properties being

included, as this allows for the detection of confounding caused by those relevant gene properties. This also improves the specificity of the conclusions that can be drawn because more gene

properties can be ruled out as having no biological relevance to the phenotype, and an absence of confounding where it might have been expected can also be shown. The analysis workflow for

our approach is shown in Fig. 2, with a more detailed overview of this analysis workflow provided in the Methods section and a guide to performing and interpreting the analysis in the

Supplementary Methods. The initial GSA in step 1 can include both binary sets and continuous gene-level variables, and is followed by four follow-up analysis steps that refine the initial

results. The results are first corrected for global effects that are likely to act as general confounders in the GSA (e.g., gene expression levels), after which overlap between significant

associations is evaluated. For gene sets (i.e. binary gene properties) this is followed by additional checks for outliers and signs of further confounding. Finally, post hoc interaction

analyses are performed for all significant gene properties, to refine the interpretation of their effects. In an optional sixth step, exploratory interaction analysis is applied to detect

additional associations that were not picked up in the initial GSA. We performed a simulation study to validate the conditional and interaction GSA models used in our workflow, and then to

demonstrate the analysis workflow we applied it to the analysis of blood pressure phenotypes. For this we used the UK Biobank7 data, analysing three blood pressure phenotypes: systolic blood

pressure (SBP), diastolic blood pressure (DBP) and pulse pressure (PP). The gene annotation used in these analyses consisted of gene sets from the three Gene Ontology domains3,8, miRNA

target gene sets9, and continuously valued tissue-specific gene expression levels from the GTEx data10. A replication study was also performed to further validate results from the UK Biobank

analysis. High blood pressure is an important risk factor for cardiovascular disease11 and has an estimated heritability of 30–50%12. Recently, large-scale GWAS studies have identified over

400 loci that regulate blood pressure10,13,14,15,16,17, with many of the identified loci showing associations with different blood pressure phenotypes16. Some GSA was performed as part of

these studies, but only to a limited extent (see Supplementary Methods for a brief overview) and only using traditional GSA approaches. Applying our extend GSA analysis workflow to these

phenotypes may therefore expand our current understanding of the genetic aetiology and biological mechanisms of blood pressure regulation. Our analyses show that confounding and overlap

between associations is widespread, with the majority of initially significant associations found to be due to the effects of general confounders and the associations of other gene

properties. Interactions are also prevalent and often involve gene properties with no detectable marginal association, suggesting that the interaction analysis model can provide additional

insights into the phenotype to complement those of standard GSA. For the blood pressure phenotypes a range of processes involved in heart and blood vessel formation have been identified, as

well as tissue-specific expression in artery, heart and female reproductive organs. Several novel interactions have also been found, among others identifying joint involvement of

cardiovascular development and homoeostatic processes, and involvement of heart-expressed miRNA-145 target genes. RESULTS SIMULATIONS DEMONSTRATE RISK OF CONFOUNDING IN GSA A simulation

study was performed to evaluate the conditional and interaction GSA models, both individually and in relation to the standard marginal GSA (details on the simulation settings are provided in

the Methods and Supplementary Methods). As shown in Supplementary Figure 1, marginal GSA is highly vulnerable to confounding. When a gene set with no biological effect assigned to it is

analysed, it will be statistically significant at a rate far exceeding the significance threshold if it overlaps with another gene set that does have an effect. The conditional analysis

model can effectively account for this however, correcting for the confounding effect of the overlapping set and yielding an error rate at the nominal significance level. This phenomenon is

also clearly illustrated in the blood pressure analyses, for example for the heart development gene set. For PP it is initially significant, with a marginal _p_-value of 1.6 × 10–6. This

association is entirely explained by the much stronger association of the cardiovascular system development set that contains it, with a conditional _p_-value for heart development of only

0.40. A similar situation is shown in Supplementary Figure 2. Here, two overlapping gene sets were simulated, with one or both of them assigned an effect. This was then analysed in two ways:

analysing the two gene sets and their interaction in an interaction GSA, and analysing the interaction set (containing all genes shared by the two gene sets) by itself with a marginal GSA.

In these simulations there are no actual interaction associations, and for the interaction analysis the error rates are indeed at the nominal significance level. When testing the marginal

association of the interaction set however, the error rates are strongly inflated. Although normally an interaction would not be analysed in this way, it can happen that gene sets are

defined in terms of a combination of multiple gene properties. For example, a gene set may be defined as all the genes in a particular pathway that are also differentially expressed in the

heart. Such a gene set is therefore actually an interaction between that pathway and differential heart expression, and will be confounded by any main effects that the pathway or

differential heart expression may have. For these kinds of compound gene set, the interaction GSA is therefore required as well. GENE–PROPERTY ASSOCIATIONS ARE STRONGLY OVERLAPPING Results

of the blood pressure analyses at different steps of the workflow are summarised in Table 1, with the individual associations retained at the end of the workflow shown in Table 2. Initially

significant associations that were later discarded can be found in Supplementary Tables 1 and 2. As shown there is a considerable reduction in the number of associations in the final

results, compared to the standard GSA in step 1. A large portion of this is due to the general confounders that are corrected for in step 2, which reduced the number of associations by 75%.

Conditioning the remaining associations on each other in step 3 led to a further reduction of 30%. Moreover, there were multiple instances of gene properties being selected jointly, with

their associations clearly reflecting a single signal but their overlap too strong to be disentangled. The number of distinct signals captured by these significant and retained gene

properties is therefore even lower. This suggests the presence of a great deal of overlap between the associations in the standard GSA, with many of the tested gene properties tapping into a

much smaller subset of shared signals. Moreover, in practice the overlap in associations among different gene properties in particular is even stronger than the reduction in the number of

hits suggests, as shown in Fig. 3a. Looking at the associations of all the gene sets, the effect of conditioning on general confounders is relatively moderate and primarily affects the

strongest associations. However, conditioning on all the significant gene sets retained at the end of the analysis workflow has a much more profound impact. It is most pronounced for PP, for

which almost no marginal association remains, but it strongly affects the other two phenotypes as well (Supplementary Figure 3). EVIDENCE OF WIDESPREAD GENE-SET INTERACTION The analyses

show that although not as extensive as for the marginal associations, there is considerable evidence for interactions both between pairs of gene sets (Fig. 3b, Supplementary Figure 4) and

between gene sets and tissue-specific gene expression (Fig. 3c, Supplementary Figure 5). This is also reflected in the individual results for the post hoc interaction analyses, with

significant interactions of both kinds (Tables 3 and 4). It seems unlikely that this is unique to blood pressure phenotypes, which suggests that gene properties probably commonly affect the

phenotypes specifically in combination with other gene properties. It follows that finding these interactions is necessary for gaining a proper insight into the genetics of a phenotype. In

the post hoc analyses, by definition one of the gene properties had a marginal association strong enough to be detected. In some cases this may reflect a genuine main effect, but this can

also happen when there is only a strong interaction. An example of this is the cell proliferation gene set, for which the marginal association can be entirely explained by two interactions

(see below). For the majority of interactions found in the post hoc analyses, the second gene property also shows little or no evidence of any marginal association. The involvement of those

gene properties would therefore be very difficult to detect in a normal GSA. The exploratory interaction results point to the same conclusion, with for many of the gene properties involved

in the top interactions again little evidence of marginal associations (Supplementary Table 4). It is also clear that such weak marginal associations can hide very strong effects. For the

interactions between tissue expression and gene sets, the _p_-values of the subset of top 25% expressed genes are often very low. Similarly, for the top interactions found in the exploratory

analysis, three of the four negative interactions hide significant main effects of gene properties that are not marginally significant. Although for these the observed marginal associations

were stronger, they were still not strong enough for the GSA in step 1 to detect them. Since negative interactions are relatively prevalent (Fig. 3c), this again suggests that there may be

a considerable amount of association that a normal GSA cannot easily uncover. VARIABILITY ACROSS GENE-SET DOMAINS In our results there are considerable differences between the Gene Ontology

and miRNA target gene-set domains, in both the number of significant results (Table 1) and the overall levels of association (Supplementary Figure 6). The majority of the significant results

are found in the Gene Ontology biological process domain, with only a handful of additional associations in the cellular component and molecular function domains. For the miRNA target sets,

no associations are found at all. As Supplementary Figure 6 shows, the miRNA results are not entirely devoid of signal, and the general class of miRNA target genes shows a strong

association for both SBP and PP (Supplementary Table 5). No individual miRNA families emerge from the analysis however, with the three initially significant miRNA target set associations

explained away by the gene expression and general miRNA target gene effects. It may be that the miRNA target sets are too broad and are not involved in the phenotypes as a whole, a

possibility supported by the strong interaction found for miRNA-145 with heart expression (see below). Although the cellular component and molecular function domains do yield some

associations they are dominated by the biological process domain, a feature that is found in the interaction analyses as well (see Tables 3 and 4, Supplementary Table 4). This is in part due

to there being significantly more biological process gene sets to analyse, though for the marginal associations this is compensated by the correspondingly more stringent multiple testing

correction. Moreover, the associations that are found for cellular component and molecular function are not entirely convincing, with almost all of them showing irregularities in their

set-specific QQ-plots (Table 2, Supplementary Figures 7 and 8; see also step 4 of the detailed analysis overview in the Supplementary Methods). TISSUE EXPRESSION PREDICTS BLOOD PRESSURE

ASSOCIATION Initial analysis of the GTEx gene expression levels shows that overall gene expression is significant for all three phenotypes (Supplementary Table 5), meaning that genes with a

higher average gene expression tend to have stronger genetic associations with the phenotypes as well. This general effect drives the associations found for many of the individual tissues,

with the majority of the associations for these tissues disappearing once the general effect is corrected for (Table 1). Expression in the remaining tissues is still strongly correlated

however, making it difficult to attribute associations to any individual tissue. Conditioning the tissues on each other suggests that there are likely at most three distinct clusters of

association (see Table 2). The first and strongest is in the arterial expression levels, present in both DBP and PP. This arterial association explains a large proportion of the other tissue

associations, but a second cluster of female reproductive organs remains. It is the only association common to all three phenotypes, and manifests most prominently in the uterus expression.

Unique to PP there is also a third association, however, for heart (atrial appendage) expression. TISSUE-EXPRESSION DEPENDENCY OF GENE-SET ASSOCIATIONS Post-hoc interaction analyses for the

tissue-specific expression shows that there is also considerable positive interaction between tissue expression levels and gene sets, with significant interactions for all of the four

analysed tissues and all three phenotypes (Table 3, Fig. 4). The level of interaction association is found to vary across phenotypes and tissues and also seems to be tissue-specific, as

there is little sign of interaction for the overall gene expression level (Fig. 3c). Positive interaction between tissue-specific expression and a gene set represents a scenario where there

is an association specific to the more strongly expressed genes in the gene set. Many of the gene sets involved are quite different in function from those found in the main GSA, and have

generally weak marginal associations. One finding is a set of interactions between uterus-specific expression and three biological processes relating to sexual development for both SBP and

PP, most strongly found for sex differentiation (interaction _p_-values of 5.2 × 10–6 and 6.3 × 10–7, respectively). Marginal associations for sex differentiation have _p_-values of only

0.0034 and 0.0052, respectively, but when the subset of more strongly uterus-expressed genes is tested, strong associations emerge (conditional _p_-values of 2.2 × 10–6 and 4.3 × 10–7). This

effect is specific to uterus expression, with no sign of interaction for the other analysed tissues. Another novel finding is the interaction between tibial artery expression and miRNA-145

target genes for SBP and PP (interaction _p_-values of 1.6 × 10–5 and 1.0 × 10–5). Marginal association is now absent altogether (_p_-values of 0.442 and 0.638), but again the subset of top

expressed genes is highly significant (conditional _p_-values of 1.7 × 10–7 and 6.2 × 10–7). There are also several interactions between nucleotide, nucleoside and purine processes, and both

arterial and heart expression, found for all three phenotypes. One other surprising result is the interaction between heart expression regulation of blood pressure, highly significant for

both SBP and DBP (interaction _p_-values of 2.6 × 10–8 and 2.8 × 10–9). It is also initially significant for PP, but the association is not as strong and does not survive the outlier

correction (Supplementary Table 3). What makes this result surprising is that, in an analysis of blood pressure phenotypes, it only shows up here. It has no marginal associations, nor do any

of its subsets, and also does not interact with artery expression. Yet in conjunction with heart expression its associations are very strong, with conditional _p_-values for the top

expressed subset (1.6 × 10–9 and 2.0 × 10–11 respectively) lower than for any of the marginal gene-set associations. CARDIOVASCULAR AND MUSCLE CELL INVOLVEMENT For PP, a number of different

biological processes related to the heart were found to be associated (Table 2). The strongest of these was cardiovascular system development (_p_ = 1.8 × 10–9) (or circulatory system

development, which is identical), which by itself accounts for much of the association of the other heart-related processes. The association for cardiovascular system development is in turn

partly explained by its two significant interactions (see Table 4), with the nested sets chemical homoeostasis and homoeostatic process (interaction _p_-values of 1.4 × 10–5 and 2.0 × 10–5).

These interactions explain part of the marginal cardiovascular system development association (main effect _p_-values of 0.00067 and 0.00098 in the interaction model), but enough of it

remains to suggest that its joint effect with the homoeostasis gene sets is important but is not the whole story of its role in blood pressure genetics. There is also evidence for a related

involvement of muscle cell processes, with cardiocyte differentiation significant for both SBP and PP (_p_-values of 6.3 × 10–9 and 9.5 × 10–9) and another association for the nested pair of

sets negative regulation of smooth muscle cell proliferation and regulation of smooth muscle cell proliferation for PP (_p_-values of 6.2 × 10–7 and 1.0 × 10–7). ROLE OF CELL PROLIFERATION

AND INTRACELLULAR REGULATION Another strong association is found for the cell proliferation set for PP (_p_ = 1.6 × 10–8). Although this set overlaps with the two SMC proliferation sets, it

is much larger and represents an independent additional signal. This signal can be traced to a pair of two largely independent interactions of cell proliferation (see Table 4), with the

biological processes regulation of intracellular transport and regulation of intracellular signal transduction (interaction _p_-values of 7.6 × 10–6 and 0.00020). Although similar in their

function, these two interactions do not strongly overlap. Jointly they do account for almost all of the marginal association of cell proliferation, with its main effect _p_-value reduced to

0.015 when conditioning on both interactions simultaneously. DISCUSSION The development of the analysis workflow presented in this paper was motivated by the problem of correlated gene

properties, and the confounding and the multiplicity of redundant overlapping associations that could result from this. The results from the blood pressure analyses show that this can indeed

present a serious problem in practice. General confounding factors, here primarily the involvement of overall and tissue-specific expression, are shown capable of inducing significant

associations in a large number of gene properties. Those gene properties subsequently also overlap with and confound each other, with a subset of the significant gene properties accounting

for the associations of the rest as well as for large amounts of sub-significant associations in the other gene properties. Correcting for these issues drastically reduces the number of

gene-property associations, which implies that traditional GSA lacking such corrections is liable to yield large numbers of associations which are likely not biologically relevant to the

phenotype. Conclusions drawn from such analyses are therefore at considerable risk of being incorrect, and potentially very misleading. These same issues most likely affect other, similar

types of analysis as well, such as network analysis or SNP-set analysis. Our extension to interaction GSA opens up new avenues of analysis. Results for the blood pressure phenotypes suggest

that there may be numerous signals in the annotation that a standard GSA cannot reliably detect, if it can detect them at all. This is perhaps best exemplified by the regulation of blood

pressure gene set. Based on its marginal associations there is little evidence that it is involved in blood pressure genetics, and would not have been found in a traditional GSA. Yet it has

a very strong interaction with heart-specific expression for both SBP and DBP, and the subset of top expressed genes in the set is highly associated. This same pattern is found for many of

the tissue expression by gene-set interactions, with many of those gene sets having entirely unremarkable marginal p-values. The same is suggested by the exploratory interaction analysis,

with negative interactions in particular seen to mask strong associations. Taken together, our results thus show that a traditional GSA is doubly vulnerable. Firstly, due to confounding many

marginal associations are likely to be found that are biologically irrelevant, or the byproduct of more specific interactions. This can lead to potentially very misleading conclusions, and

wasted effort trying to follow them up. Secondly, many gene properties may only affect the phenotype in combination with other gene properties, rather than on their own. Marginal

associations for such gene properties will often be weak or absent altogether, and therefore unlikely to be found in traditional GSA. Our extended GSA approach can address these issues,

pruning away many likely irrelevant associations through conditional analysis and detecting novel additional or more refined signals with the interaction model. Aside from demonstrating the

utility of our proposed analysis workflow, our analyses also provide a variety of insights into the genetics of blood pressure, and many of the individual associations fit well with the

existing blood pressure literature. The tissue expression analyses detected associations for several cardiovascular tissues, which are highly adapted to blood pressure fluctuations. In the

cellular component domain constituents of the (sacromeric) cytoskeleton, including actin and T-tubules, were identified18, and the majority of detected biological processes are involved in

blood vessel and heart formation. These include cardiovascular and circulatory system development, cardiocyte differentiation, SMC regulation and (cardiac) mesenchyme development. The

interaction analyses provided further detail for these associations. Expression in the heart (atrial appendage) interacted strongly with the regulation of blood pressure gene set for both

SBP and DBP, which possibly reflects the role of atrial natriuretic peptide in the homoeostasis of sodium and water retention19. This is supported by the interactions of cardiovascular

system development with homoeostatic processes for SBP and PP. Heart expression also interacted with cellular response to nitrogen compound for PP, which fits the known natriuretic

peptide–nitric oxide pathway and guanylate cyclase signalling systems that are targeted by nitroglyceride20. Artery tissues were found to exhibit interactions with nucleoside phosphate and

purine-containing compound biosynthetic process for SBP, DBP and PP. Nucleoside and purine are not only constituents of RNA and DNA but are also involved in metabolic processes such as

signal transduction and regulation of enzyme activity21. This therefore aligns with the interactions found between cell proliferation and regulation of intracellular transport and signal

transduction for PP, supporting the role of purinergic signalling in the proliferation of vascular smooth muscle and endothelial cells22. Further evidence for a role of signal transduction

was found in the associations of nitric oxide and cGMP for DBP. Nitric oxide is an important signalling molecule that regulates vascular tone by acting as a vasodilator via the cGMP

signalling cascade and intracellular Ca2+ levels23,24. Also found for DBP was reactive oxygen species biosynthesis, which has been implicated with cardiovascular disease including

hypertension25. The miRNA target genes, which regulate various physiological and pathophysiological processes at a post-transcriptional level26, were associated for all three blood pressure

phenotypes. Although none of the individual miRNA target sets was significant, an interaction was found between tibial artery expression and the miRNA-145 target set. This interaction can be

explained by the influence of miRNA-145 on differentiation27 and phenotype switching of vascular SMCs28,29, and the upregulation of miRNA-145 in endothelial cells in response to shear

stress and hypertension30. No associations were found for the kidney cortex or the adrenal gland in the tissue expression analysis, which is surprising considering the regulatory role of the

renin–angiotensin–aldosterone system on blood volume and systemic vascular resistance31 and known associations of renal sodium regulatory genes variants with blood pressure32. One possible

explanation is that the available kidney cortex expression is too general. It has been shown that unique and highly distinctive patterns of gene expression exist for glomeruli, cortex,

medulla, papillary tips and pelvic tissue33, and associations with blood pressure genetics may only exist in such more specific tissues. Regulation of urine volume was also found to be

associated with both SBP and DBP, which supports the hypothesis that kidney involvement may be quite specific. Also notable were the associations of several female reproductive organ

tissues, most prominently the uterus, for all three phenotypes. This may point to the involvement of an underlying hormonal pathway, correlated to ovarian expression. Such a pathway could

reflect the known protective effects of oestrogens on cardiovascular disease and hypertension34. Alternatively, expression in these tissues may serve as a proxy for placental expression,

which is not available in the GTEx data. The placenta has been shown to play a role in blood pressure regulation during pregnancy35, and placental functioning is directly related to fetal

growth which has been linked to the development of hypertension during adult-life of the child36,37. The application of traditional GSA has previously led to novel biological hypotheses on

human physiology and the pathophysiology of disease, and the GSA presented in this paper improves on that promise for blood pressure phenotypes. Our results, filtered and refined using the

extended analysis workflow, suggest a variety of possible avenues by which the role of genetics in blood pressure may be explained. Exploring these avenues could advance our understanding of

blood pressure and the identification of therapeutic targets for cardiovascular disease, and our extended analysis can be used generally to provide the same for other phenotypes as well.

METHODS CORE GSA FRAMEWORK We use GSA implemented in MAGMA (v1.07), a detailed description of which can be found in De Leeuw et al.5. Briefly, the model is based on a linear regression

framework with genes as data points, with the regression equation _Z_=_β_0+_Bβ__B_+_Sβ__S_+_ε_, with \(\varepsilon \sim {\mathrm{MVN}}\left( {0,\sigma _e^2{\hat{\mathrm \Sigma }}} \right)\).

Gene _p_-values _P__g_ are first computed from the SNP data for each gene _g_ . These are transformed to _Z_-scores, \(Z_g = {\mathrm{\Phi }}( {1 - P_g})\) with Φ the probit function, such

that higher _Z__g_ correspond to stronger genetic associations with the phenotype. The gene set is encoded in the variable _S_, with _S__g_=1 if gene _g_ is in the gene set and _S__g_=0

otherwise. Linkage disequilibrium (LD) between genes is quantified in the gene–gene correlation matrix \({\hat{\mathrm \Sigma }}\), which is scaled by the variance \(\sigma _e^2\) to model

the residuals. Several common technical confounders are included as covariates, represented by the matrix _B_ in the regression. These are: the number of variants in each gene, an estimate

of the LD within each gene, the inverse of the mean minor allele count of variants in each gene, and the sample size on which each gene _p_-value is based. For each of these variables, the

log transformation of the variable is also included as a covariate. A one-sided test is performed on the coefficient _β__s_ of the null hypothesis _β__s_=0 against the alternative

_β__s_>0, testing whether the genes in the gene set are more strongly associated with the phenotype than other genes. This constitutes a competitive test (see De Leeuw et al.1 for a

discussion on key differences with self-contained GSA). The model can also analyse non-binary gene properties, such as gene expression. In this case _S_ is a continuous variable, and the

coefficient _β__s_ reflects the degree to which the genetic association of a gene changes as the value for the tested variable increases. By default, a two-sided test is performed on _β__s_

when analysing continuous gene properties since, in contrast to gene sets, negative associations may be informative as well. Throughout the text, we use ‘gene property’ to refer to any type

of gene-level variable, and ‘gene set’ to refer specifically to a binary gene property. CONDITIONAL AND INTERACTION GSA MODEL Conditional and interaction GSA is implemented by generalising

the core regression framework. For conditional analysis a matrix of additional covariates _C_ is included in the model, to obtain _Z_=_β_0+_Bβ__B_+_Cβ__C_+_Sβ__S_+_ε_. The _β__S_ now

reflects the conditional effect of _S_ on the genetic association _Z_, corrected for the effects that the covariates in _C_ have on _Z_. For the interaction GSA an interaction term _S_12 is

defined as the product of two gene properties _S_1 and _S_2, with \(S_{12_g} = S_{1_g} \times S_{2_g}\). Then _S_12 is tested conditional on _S_1 and _S_2 to determine if there is any

interaction between them, in the model _Z_=_β_0+_Bβ__B_+_S_1_β_1+_S_2_β_2+_S_12_β_12+_ε_. The test can be either two-sided or one-sided in either direction. An interaction of this type means

that genes that have high values for both gene properties are more strongly (or weakly, if _β_12 is negative) associated with the phenotype than genes that have high values for only one of

the two. This suggests a specific role for that combination of properties. This role may be limited to that combination, but can also be in addition to significant main effects (_β_1 and

_β_2) of the gene properties. For pairs of gene sets, _S_12 simply corresponds to the set of genes included in both gene sets. The conditional and interaction GSA models are implemented in

MAGMA as part of the GSA framework, and can be used with any of the gene analysis models available in MAGMA. It can therefore be applied to both raw genotype data as well as SNP summary

statistics from any type of single variant analysis. ANALYSIS WORKFLOW The extend GSA workflow consists of six analysis steps (Fig. 2). An initial GSA is first performed to select

significant gene properties, and the subsequent steps are then used to provide further information on their associations. This is then used to aid interpretation of the results, and to

discard likely irrelevant gene properties from consideration. It can also flag some gene properties as requiring further analysis and data before interpreting them, if the evidence for their

biological relevance to the phenotype is ambivalent. The initial GSA results are thus progressively refined, improving the reliability of the conclusions that can be drawn. An overview of

the six steps is provided here. An extensive guideline on performing the analyses and interpreting the results can be found in the Supplementary Methods. The first step of the analysis

workflow is a standard MAGMA GSA (with only the automatic correction for technical confounders). Only gene properties significant in this GSA are directly evaluated in the subsequent steps

(except step 6). In the second step, the significant gene properties are conditioned on likely confounders, and the impact those confounders have on their associations is assessed. Gene

properties that are no longer significant at the significance threshold used in step 1 are then discarded. In step 3, remaining significant gene properties are conditioned on each other.

This helps determine the extent to which their associations overlap, and to identify which of those associations are most likely relevant for the phenotype. Gene properties are selected in a

stepwise fashion on the strength of their associations and the way those associations overlap with each other. In each selection step, gene properties are conditioned on both the gene

properties already selected and the general confounders from the second analysis step. Gene properties for which the association is largely or wholly explained by other gene properties are

discarded; gene properties which are found to share a single underlying association that cannot be disentangled are selected and interpreted jointly. The fourth step applies only to gene

sets, and checks for outliers and signs of confounding effects not detected in the previous steps. For each gene set QQ-plots of the residual _Z_-scores of genes in the set are created,

adding a confidence band to visualise the degree of deviation expected by chance. These are inspected for signs that the association of the gene set may be driven by a smaller subset of

genes in the set, indicating possible confounding. If not uncovered in the post hoc interaction analyses, the source of confounding could then be investigated further using targeted analyses

with additional data or annotation. If the likely associated subset is very small the problem is likely one of outliers instead, and the gene set can be discarded altogether. In the fifth

step, interaction analyses are performed for all the remaining significant gene properties. This can narrow down the significant associations to more specific effects that occur only in

combination with other gene properties. Positive interactions are tested with all other available gene properties; for interactions between gene sets, this is restricted to pairs of gene

sets for which the overlap between the sets is not too large or small, as otherwise the interaction term is not meaningfully defined. In the optional sixth step, an exploratory interaction

analysis is performed in order to detect additional interactions. An initial list of gene properties is generated based on their marginal associations, and interactions with all other gene

properties are tested for this list as in step 5. A liberal selection criterion such as FDR-controlled significance is recommended for creating the initial list. In contrast to step 5,

two-sided tests are performed for the interactions. This allows for the detection of negative interactions, which would point to involvement in the phenotype of a particular gene property

only the absence of another gene property. This step is independent of the previous steps, and therefore requires separate multiple testing correction. GENOTYPE AND PHENOTYPE DATA Primary

quality control and imputation of the UK Biobank (July 2017 release) data was performed by UK Biobank itself7. We applied additional QC and filtering of variants and individuals to obtain a

sample of independent individuals of European ancestry, containing hard-called genotypes with MAF greater than 0.000001 and missingness of at most 5%. Since poorly imputed SNPs can bias the

results, only variants of high imputation quality (info score of at least 0.9, variants imputed on HRC panel only) were included in the analysis. Full details on the data and QC can be found

in the Supplementary Methods. The processed data set used for the blood pressure analyses contained 360,243 individuals and 13,923,638 autosomal variants. In our analyses, three phenotypes

were analysed: SBP, DBP and PP. SBP and DBP were corrected for use of blood pressure-lowering medication, adding 10 and 15 mm Hg respectively to the measured values for individuals known to

use such medication38. PP was computed as PP = SBP−DBP. Thirty principal components were included as covariates to correct for population structure in the data, computed using FlashPCA39.

Other covariates included in the analysis were sex, age, age squared, BMI, Townsend Deprivation index, and genotyping array indicator. To further validate the results from the UK Biobank

analysis, a replication analysis was performed using the 2011 ICPB GWAS data40. Details for this replication analysis can be found in the Supplementary Methods. ANNOTATION Variants were

annotated to genes based on NCBI (37.3) gene definitions41, mapping variants to a gene if they were located in the transcription region of that gene, or within two kilobase upstream or one

kilobase downstream of the transcription region. A total of 18,285 autosomal protein-coding genes had at least one variant mapped to them, and 43.7% of the variants in the data mapped to at

least one gene. Variants not mapped to any gene were not used in the analysis. Gene annotation from five different domains was used in the analysis: tissue-specific gene expression data,

three Gene Ontology domains, and miRNA target sets. Gene Ontology and miRNA target gene sets were obtained from MsigDB (v6.0)8. For the miRNA target sets, an additional gene set of all genes

contained in at least one of the target sets was created, reflecting general miRNA target status. GTEx (v7)9 was used for the gene expression data. Mean RPKM values were computed across

gene and tissue. These were truncated down to 50, incremented by one, then log-transformed to obtain a per-tissue expression score. Average scores across tissues were computed as a measure

of the overall expression level of each gene. Ensembl gene IDs were mapped to Entrez IDs for the genes in the data, resulting in expression scores for 17,064 genes in the data. SIMULATION

STUDY A random subsample of 10,000 individuals was taken from the UK Biobank data, filtering variants with MAF smaller than 1% and variants not mapped to any gene. Continuous phenotypes were

simulated for this data by constructing a genetic component and adding normally distributed noise such that the genetic component explained 10% of the phenotypic variance. The genetic

components were created by designating 1000 genes as causal, then selecting a subset of SNPs from each of these genes as effect SNPs and combining them (see Supplementary Methods for full

details). Simulated phenotypes were analysed in PLINK 1.9 (ref. 42) to obtain SNP _p_-values. Ten genetic components were constructed (designating new causal genes and SNPs), with 100

replicates for each. Multiple phenotypes with new random noise were generated for each replicate, using meta-analysis on the SNP _p_-values to obtain GWAS results representing sample sizes

of 10,000, 50,000, and 100,000. Pairs of overlapping gene sets were then constructed, containing different patterns and proportions of causal genes. In each condition an initial gene set was

created containing a specified proportion of causal genes. Another gene set was then created overlapping with it, as either a subset, a superset, or partially overlapping set. Genes in the

overlap were randomly selected from the initial gene set, with the rest randomly sampled from the remaining genes. For evaluation of the interaction model, only partial overlap conditions

were used. Additional parameters that were varied across conditions were the gene set sizes, the degree of overlap, and the level of association assigned to the initial gene set. For the

interaction model, the level of main effect association assigned to the second gene set was also varied. A full description of the simulation settings and results is given in the

Supplementary Methods. In each condition, ten gene sets overlapping with the initial set were created. For the conditional model simulations the marginal association and association

conditional on the initial set were tested. For the interaction model, the interaction term was tested either as a gene set by itself or using the interaction model. Results were aggregated

per condition over the ten sets and the 1000 GWAS replicates, computing type 1 error rates at different significance thresholds. PRIMARY GSA Analyses were performed using MAGMA (v1.07)5.

Phenotypes were first regressed on the covariates, using the resulting residuals as input for the MAGMA gene analysis. The SNP-wise (multi) model was used for the gene analysis. This model

combines the SNP-wise (mean) model (more sensitive to many smaller SNP associations in a gene) and the SNP-wise (top) model (more sensitive to a single large SNP association in the gene) to

obtain a good distribution of power over different genetic architectures. This model is recommended when the number of SNPs in a data set is very large, as the SNP-wise (mean) and PC

regression models are less sensitive to detecting gene associations when a single strong SNP effect is present in a gene containing many other SNPs. To deal with rare variants, per gene SNPs

with a minor allele count smaller than 100 were aggregated into a weighted burden score. This was then included in the model in the same way as normal SNPs, replacing the rare variants. At

most 25 SNPs were used per burden score. For genes with more than 25 rare variants, multiple burden scores were created. All GSA was performed using this gene analysis output. Bonferroni

correction was used to correct for multiple testing, separately for each phenotype. It was also applied separately for each domain, corrected for the number of domains, for a significance

threshold of \(\alpha _D = \frac{{0.05}}{{5 \times K_D}} = \frac{{0.01}}{{K_D}}\) per domain _D_ with _K__D_ the number of tests for that domain. In all the analyses one-sided tests were

used, testing for positive associations. CONDITIONAL GSA After the initial GSA, analyses were repeated conditioning on potential general confounders. Overall gene expression was included for

all domains. For the four gene set domains, tissue-specific expression for coronary artery, tibial artery, heart (atrial appendage), and uterus were also conditioned on; the miRNA target

set analyses were additionally conditioned on general miRNA target status. For conditional analyses of the gene sets, missing tissue expression values were set to the median expression value

for that tissue. Only gene sets and tissues still significant at the original threshold were retained. Conditional analyses were then performed to evaluate overlap between associations of

significant associations. The stepwise procedure was used per domain for the significant and retained gene properties until there were no remaining associations with conditional _p_-values

below 0.05 (see Analysis workflow above and Detailed overview of blood pressure analysis in the Supplementary Methods). For gene sets, associations retained after this selection were then

also conditioned on those from the other domains. All these analyses also included the general confounders as covariates. After this set-specific QQ-plots were created for all retained gene

sets to inspect them for signs of outliers and hidden confounding. EXPRESSION BY GENE SET INTERACTION ANALYSIS After the conditional analyses, post hoc interaction analyses were performed

for the top tissue expression levels. Genes with no expression values were removed, and interactions were then tested with all gene sets of at least 100 genes. To make the results more

comparable across phenotypes, the same tissues were used for all three phenotypes, testing interactions for coronary and tibial artery, heart (atrial appendage) and uterus, as well as for

overall gene expression. For each tissue, overall expression and its interaction with the tested gene set were included as covariates. For miRNA target sets, general miRNA target status and

its interactions with overall expression and the tissue expression were additionally included. One-sided tests were performed for the interaction terms, testing for positive interactions.

Bonferroni correction was performed per tissue, correcting for the 1495 interactions tested per tissue. To check for outliers, scatterplots of residual tissue expression (corrected for the

overall expression) by gene _Z_-scores were created for all significant interactions. Each plot only used genes in the set, and both variables were normalised within those genes. Genes were

marked as an outlier if they were more than two standard deviations from the origin and all genes within two standard deviations were either further from the origin or themselves marked as

outlier. The analysis was then repeated with the marked outliers removed from the gene set. A gene set was also constructed of the top 25% residually expressed genes in the gene set

(excluding outliers), in which was then tested conditional on the whole gene set. Interactions for which neither follow-up test was significant were discarded. GENE SET BY GENE SET

INTERACTION ANALYSIS Post hoc interaction analyses were also performed for all significant and retained gene sets, testing interactions with other gene sets. Interactions were only tested

for gene-set pairs if there was meaningful overlap between the gene sets: for each set in the pair, the overlap with the other gene set as well as the part not overlapping with the other

gene set was required to be at least 20 genes, and at least 10% of the genes in the gene set. One-sided tests for positive interactions were performed, conditioning on the general

confounders. Bonferroni correction was applied separately for each of the significant and retained gene sets, correcting for the number of interactions tested for that gene set. An

exploratory interaction analysis was also performed. Gene sets were selected using FDR correction (Benjamini–Hochberg, at _α=_0.05), separately for each of the four gene-set domains. For

each of these gene sets, interactions were tested with all other gene sets for which there was meaningful overlap, using the same criteria as in the post hoc interaction analysis. Two-sided

tests were performed on the interactions, conditioning on the general confounders. Bonferroni correction was applied for the total number of interactions tested. CODE AVAILABILITY The MAGMA

analysis software can be obtained for Linux, Windows and Mac platforms from http://ctg.cncr.nl/software/magma. DATA AVAILABILITY The raw genotype and phenotype data analysed in this study

were used under license from UK Biobank (http://www.ukbiobank.ac.uk), and restrictions apply to its availability. However the data are available from the authors upon reasonable request, if

permission is given by UK Biobank. REFERENCES * De Leeuw, C. A., Neale, B. M., Heskes, T. & Posthuma, D. The statistical properties of gene-set analysis. _Nat. Rev. Genet._ 17, 353–364

(2016). Article PubMed CAS Google Scholar * Wang, K., Li, M. & Hakonarson, H. Analysing biological pathways in genome-wide association studies. _Nat. Rev. Genet._ 11, 843–854 (2010).

Article PubMed CAS Google Scholar * The Gene Ontology Consortium. Gene Ontology Consortium: going forward. _Nucl. Acids Res._ 43, D1049–D1056 (2015). Article CAS Google Scholar *

McCurley, A. et al. Direct regulation of blood pressure by smooth muscle cell mineralocorticoid receptors. _Nat. Med._ 18, 1429–1433 (2012). Article PubMed PubMed Central CAS Google

Scholar * de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. _PLoS Comput. Biol._ 11, e1004219 (2015). Article PubMed PubMed

Central CAS Google Scholar * Lee, P. H., O’Dushlaine, C., Thomas, B. & Purcell, S. M. INRICH: Interval-based enrichment analysis for genome-wide association studies. _Bioinformatics_

28, 1797–1799 (2012). Article PubMed PubMed Central CAS Google Scholar * Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex

diseases of middle and old age. _PLoS Med._ 12, e1001779 (2015). Article PubMed PubMed Central Google Scholar * Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based

approach for interpreting genome-wide expression profiles. _Proc. Natl Acad. Sci. USA_ 102, 15545–15550 (2005). Article ADS PubMed CAS Google Scholar * GTEx Consortium. The

Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. _Science_ 348, 648–660 (2015). Article PubMed Central CAS Google Scholar * Surendran, P. et al.

Trans-ancestry meta-analyses identify rare and common variants associated with blood pressure and hypertension. _Nat. Genet._ 48, 1151–1161 (2016). Article PubMed PubMed Central CAS

Google Scholar * Rapsomaniki, E. et al. Blood pressure and incidence of twelve cardiovascular diseases: lifetime risks, healthy life-years lost, and age-specific associations in 1.25

million people. _Lancet_ 383, 1899–1911 (2014). Article PubMed PubMed Central Google Scholar * Levy, D. et al. Framingham Heart Study 100K Project: genome-wide associations for blood

pressure and arterial stiffness. _BMC Med. Genet._ 8(Suppl. 1), S3 (2007). Article PubMed PubMed Central CAS Google Scholar * Ehret, G. B. et al. The genetics of blood pressure

regulation and its target organs from association studies in 342,415 individuals. _Nat. Genet._ 48, 1171–1184 (2016). Article PubMed PubMed Central CAS Google Scholar * Liu, C. et al.

Meta-analysis identifies common and rare variants influencing blood pressure and overlapping with metabolic trait loci. _Nat. Genet._ 48, 1162–1170 (2016). Article PubMed PubMed Central

CAS Google Scholar * Hoffmann, T. J. et al. Genome-wide association analyses using electronic health records identify new loci influencing blood pressure variation. _Nat. Genet._ 49, 54–64

(2017). Article PubMed CAS Google Scholar * Warren, H. R. et al. Genome-wide association analysis identifies novel blood pressure loci and offers biological insights into cardiovascular

risk. _Nat. Genet._ 49, 403–415 (2017). Article PubMed PubMed Central CAS Google Scholar * Kraja, A. T. et al. New blood pressure-associated loci identified in meta-analyses of 475 000

individuals. _Circ. Cardiovasc. Genet._ 10, e001778 (2017). Article PubMed CAS PubMed Central Google Scholar * Gautel, M. & Djinović-Carugo, K. The sarcomeric cytoskeleton: from

molecules to motion. _J. Exp. Biol._ 219, 135–145 (2016). Article PubMed Google Scholar * Atlas, S. A. & Laragh, J. H. Atrial natriuretic peptide: a new factor in hormonal control of

blood pressure and electrolyte homeostasis. _Annu. Rev. Med._ 37, 397–414 (1986). Article PubMed CAS Google Scholar * Murad, F. Shattuck Lecture: nitric oxide and cyclic GMP in cell

signaling and drug development. _N. Engl. J. Med._ 355, 2003–2011 (2006). Article PubMed CAS Google Scholar * Yegutkin, G. G. Nucleotide- and nucleoside-converting ectoenzymes: important

modulators of purinergic signalling cascade. _Biochim. Biophys. Acta_ 1783, 673–694 (2008). Article PubMed CAS Google Scholar * Burnstock, G. Purinergic signaling and vascular cell

proliferation and death. _Arterioscler. Thromb. Vasc. Biol._ 22, 364–373 (2002). Article PubMed CAS Google Scholar * Moncada, S., Palmer, R. M. & Higgs, E. A. Nitric oxide:

physiology, pathophysiology, and pharmacology. _Pharmacol. Rev._ 43, 109–142 (1991). PubMed CAS Google Scholar * Hughes, A. D. Calcium channels in vascular smooth muscle cells. _J. Vasc.

Res._ 32, 353–370 (1995). Article PubMed CAS Google Scholar * Touyz, R. M. & Briones, A. M. Reactive oxygen species and vascular biology: implications in human hypertension.

_Hypertens. Res._ 34, 5–14 (2011). Article PubMed CAS Google Scholar * Kozomara, A. & Griffiths-Jones, S. miRBase: annotating high confidence microRNAs using deep sequencing data.

_Nucl. Acids Res._ 42, D68–D73 (2014). Article PubMed CAS Google Scholar * Wang, Y. S. et al. Role of miR-145 in cardiac myofibroblast differentiation. _J. Mol. Cell Cardiol._ 66, 94–105

(2014). Article PubMed CAS Google Scholar * Rangrez, A. Y., Massy, Z. A., Metzinger-Le Meuth, V. & Metzinger, L. miR-143 and miR-145: molecular keys to switch the phenotype of

vascular smooth muscle cells. _Circ. Cardiovasc. Genet._ 4, 197–205 (2011). Article PubMed CAS Google Scholar * Zhang, Y. N. et al. Phenotypic switching of vascular smooth muscle cells

in the ʻnormal regionʼ of aorta from atherosclerosis patients is regulated by miR-145. _J. Cell. Mol. Med._ 20, 1049–1061 (2016). Article PubMed PubMed Central CAS Google Scholar *

Hergenreider, E. et al. Atheroprotective communication between endothelial cells and smooth muscle cells through miRNAs. _Nat. Cell Biol._ 14, 249–256 (2012). Article PubMed CAS Google

Scholar * Paul, M., Poyan Mehr, A. & Kreutz, R. Physiology of local renin-angiotensin systems. _Physiol. Rev._ 86, 747–803 (2006). Article PubMed CAS Google Scholar * Tobin, M. D.

et al. Common variants in genes underlying monogenic hypertension and hypotension and blood pressure in the general population. _Hypertension_ 51, 1658–1664 (2008). Article PubMed CAS

Google Scholar * Higgins, J. P. T. et al. Gene expression in the normal adult human kidney assessed by complementary DNA microarray. _Mol. Biol. Cell_ 15, 649–656 (2004). Article PubMed

PubMed Central CAS Google Scholar * Ashraf, M. S. & Vongpatanasin, W. Estrogen and hypertension. _Curr. Hypertens. Rep._ 8, 368–376 (2006). Article PubMed CAS Google Scholar *

Granger, J. P., Alexander, B. T., Llinas, M. T., Bennett, W. A. & Khalil, R. A. Pathophysiology of hypertension during preeclampsia linking placental ischemia with endothelial

dysfunction. _Hypertension_ 38, 718–722 (2001). Article PubMed CAS Google Scholar * Alexander, B. T. Placental insufficiency leads to development of hypertension in growth-restricted

offspring. _Hypertension_ 41, 457–462 (2003). Article PubMed CAS Google Scholar * Alexander, B. T. Fetal programming of hypertension. _Am. J. Physiol. Regul. Integr. Comp. Physiol._ 290,

R1–R10 (2006). Article PubMed CAS Google Scholar * Tobin, M. D., Sheehan, N. A., Scurrah, K. J. & Burton, P. R. Adjusting for treatment effects in studies of quantitative traits:

antihypertensive therapy and systolic blood pressure. _Stat. Med._ 24, 2911–2935 (2005). Article MathSciNet PubMed Google Scholar * Abraham, G. & Inouye, M. Fast principal component

analysis of large-scale genome-wide data. _PLoS ONE_ 9, e93766 (2014). Article ADS PubMed PubMed Central CAS Google Scholar * Wain et al. Genome-wide association study identifies six

new loci influence pulse pressure and mean arterial pressure. _Nat. Genet._ 43, 1005–1011 (2011). Article PubMed PubMed Central CAS Google Scholar * NCBI Resource Coordinators. Database

resources of the national center for biotechnology information. _Nucleic Acids Res._ 45, D12–D17 (2017). Article CAS Google Scholar * Chang, C. C. et al. Second-generation PLINK: rising

to the challenge of larger and richer datasets. _Gigascience_ 4, 7 (2015). Article PubMed PubMed Central CAS Google Scholar Download references ACKNOWLEDGEMENTS This work was funded by

The Netherlands Organization for Scientific Research (NWO VICI 453-14-005, 645-000-003). The analyses were carried out on the Genetic Cluster Computer, which is financed by the Netherlands

Scientific Organization (NWO: 480-05-003), by the VU University, Amsterdam, The Netherlands, and by the Dutch Brain Foundation, and is hosted by the Dutch National Computing and Networking

Services SurfSARA. This research has been conducted using the UK Biobank Resource under project 16406. We thank the participants and researchers who collected and contributed to the data.

AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University Amsterdam,

Amsterdam, 1081 HV, The Netherlands Christiaan A. de Leeuw, Sven Stringer & Danielle Posthuma * Department of Radiology, Leiden University Medical Center, Leiden, 2333 ZA, The

Netherlands Ilona A. Dekkers * Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, 6525 EC, The Netherlands Tom Heskes * Department of Clinical Genetics,

Amsterdam Neuroscience, VU University Medical Center, Amsterdam, 1007 MB, The Netherlands Danielle Posthuma Authors * Christiaan A. de Leeuw View author publications You can also search for

this author inPubMed Google Scholar * Sven Stringer View author publications You can also search for this author inPubMed Google Scholar * Ilona A. Dekkers View author publications You can

also search for this author inPubMed Google Scholar * Tom Heskes View author publications You can also search for this author inPubMed Google Scholar * Danielle Posthuma View author

publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS C.dL., T.H. and D.P. conceived of the study. C.dL. developed the statistical method and performed the

analyses. S.S. prepared the UK Biobank data for analysis. C.dL., I.A.D. and D.P. wrote the paper. All authors discussed the results and commented on the paper. CORRESPONDING AUTHORS

Correspondence to Christiaan A. de Leeuw or Danielle Posthuma. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PUBLISHER'S

NOTE: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. ELECTRONIC SUPPLEMENTARY MATERIAL SUPPLEMENTARY INFORMATION PEER

REVIEW FILE DESCRIPTION OF ADDITIONAL SUPPLEMENTARY FILES SUPPLEMENTARY DATA 1 RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0

International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the

source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative

Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by

statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE de Leeuw, C.A., Stringer, S., Dekkers, I.A. _et al._ Conditional and interaction

gene-set analysis reveals novel functional pathways for blood pressure. _Nat Commun_ 9, 3768 (2018). https://doi.org/10.1038/s41467-018-06022-6 Download citation * Received: 06 October 2017

* Accepted: 31 July 2018 * Published: 14 September 2018 * DOI: https://doi.org/10.1038/s41467-018-06022-6 SHARE THIS ARTICLE Anyone you share the following link with will be able to read

this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative