Conditional and interaction gene-set analysis reveals novel functional pathways for blood pressure

Conditional and interaction gene-set analysis reveals novel functional pathways for blood pressure

Play all audios:

Loading...

ABSTRACT Gene-set analysis provides insight into which functional and biological properties of genes are aetiologically relevant for a particular phenotype. But genes have multiple


properties, and these properties are often correlated across genes. This can cause confounding in a gene-set analysis, because one property may be statistically associated even if


biologically irrelevant to the phenotype, by being correlated with gene properties that are relevant. To address this issue we present a novel conditional and interaction gene-set analysis


approach, which attains considerable functional refinement of its conclusions compared to traditional gene-set analysis. We applied our approach to blood pressure phenotypes in the UK


Biobank data (_N_ = 360,243), the results of which we report here. We confirm and further refine several associations with multiple processes involved in heart and blood vessel formation but


also identify novel interactions, among others with cardiovascular tissues involved in regulatory pathways of blood pressure homoeostasis. SIMILAR CONTENT BEING VIEWED BY OTHERS GENOME-WIDE


ANALYSIS IN OVER 1 MILLION INDIVIDUALS OF EUROPEAN ANCESTRY YIELDS IMPROVED POLYGENIC RISK SCORES FOR BLOOD PRESSURE TRAITS Article Open access 30 April 2024 LARGE-SCALE GENOMIC ANALYSES


REVEAL INSIGHTS INTO PLEIOTROPY ACROSS CIRCULATORY SYSTEM DISEASES AND NERVOUS SYSTEM DISORDERS Article Open access 14 June 2022 A COMPARISON OF THE GENES AND GENESETS IDENTIFIED BY GWAS AND


EWAS OF FIFTEEN COMPLEX TRAITS Article Open access 19 December 2022 INTRODUCTION The aim of gene-set analysis (GSA) is to uncover functional and biological properties of genes involved in


the genetic aetiology of a phenotype1,2. If a property is relevant to a phenotype, then variants associated with that phenotype will tend to accumulate in genes with that property. For


example, smooth muscle cells (SMC) play a role in blood pressure regulation3,4, and if this has a genetic basis we might expect to find genes involved in the development of SMCs to exhibit


genetic association with blood pressure phenotypes. However, genes typically have numerous different properties, which can be strongly correlated with each other if they involve many of the


same genes. Perhaps SMC development genes are also involved in the development of other types of muscle cell, or they are expressed primarily in muscle tissue. This would result in a


correlation between SMC development and muscle cell development in general, or between SMC development and muscle-specific gene expression. In such scenarios, associated variants will


accumulate in genes with a property that does not itself play a role in the phenotype, but is correlated with another gene property that does. Thus, the SMC development gene set could become


associated simply by muscle-specific gene expression playing a role in the phenotype. Traditional GSA only tests the marginal associations of gene properties5,6, and cannot account for this


kind of confounding. Such GSA is therefore liable to identify gene properties that hold no biological relevance for the phenotype, with potentially very misleading interpretations and


wasted effort in follow-up research as a result. To address this issue we have developed a novel GSA approach, based on and implemented in our existing GSA tool MAGMA5. Central to this


approach is the conditional GSA model, which can evaluate how associations of different gene properties relate to each other. As Fig. 1 illustrates, it can identify confounding where


traditional GSA cannot. The model can also deal with more complex scenarios, in which particular combinations of multiple gene properties are relevant to the phenotype rather than any


individual gene properties on their own. This manifests statistically as an interaction between gene properties, which are hard to detect when testing only marginal associations and which


can result in confounding of the gene properties involved (Fig. 1d). A more complete and accurate insight into the phenotype based on GSA therefore requires that such scenarios are taken


into account as well. Our proposed approach works by selecting gene properties with significant marginal associations using a standard GSA, then using a series of follow-up analyses to


discard those which are likely not biologically relevant for the phenotype. A wide range of gene properties is used as input to improve the probability of relevant gene properties being


included, as this allows for the detection of confounding caused by those relevant gene properties. This also improves the specificity of the conclusions that can be drawn because more gene


properties can be ruled out as having no biological relevance to the phenotype, and an absence of confounding where it might have been expected can also be shown. The analysis workflow for


our approach is shown in Fig. 2, with a more detailed overview of this analysis workflow provided in the Methods section and a guide to performing and interpreting the analysis in the 


Supplementary Methods. The initial GSA in step 1 can include both binary sets and continuous gene-level variables, and is followed by four follow-up analysis steps that refine the initial


results. The results are first corrected for global effects that are likely to act as general confounders in the GSA (e.g., gene expression levels), after which overlap between significant


associations is evaluated. For gene sets (i.e. binary gene properties) this is followed by additional checks for outliers and signs of further confounding. Finally, post hoc interaction


analyses are performed for all significant gene properties, to refine the interpretation of their effects. In an optional sixth step, exploratory interaction analysis is applied to detect


additional associations that were not picked up in the initial GSA. We performed a simulation study to validate the conditional and interaction GSA models used in our workflow, and then to


demonstrate the analysis workflow we applied it to the analysis of blood pressure phenotypes. For this we used the UK Biobank7 data, analysing three blood pressure phenotypes: systolic blood


pressure (SBP), diastolic blood pressure (DBP) and pulse pressure (PP). The gene annotation used in these analyses consisted of gene sets from the three Gene Ontology domains3,8, miRNA


target gene sets9, and continuously valued tissue-specific gene expression levels from the GTEx data10. A replication study was also performed to further validate results from the UK Biobank


analysis. High blood pressure is an important risk factor for cardiovascular disease11 and has an estimated heritability of 30–50%12. Recently, large-scale GWAS studies have identified over


400 loci that regulate blood pressure10,13,14,15,16,17, with many of the identified loci showing associations with different blood pressure phenotypes16. Some GSA was performed as part of


these studies, but only to a limited extent (see Supplementary Methods for a brief overview) and only using traditional GSA approaches. Applying our extend GSA analysis workflow to these


phenotypes may therefore expand our current understanding of the genetic aetiology and biological mechanisms of blood pressure regulation. Our analyses show that confounding and overlap


between associations is widespread, with the majority of initially significant associations found to be due to the effects of general confounders and the associations of other gene


properties. Interactions are also prevalent and often involve gene properties with no detectable marginal association, suggesting that the interaction analysis model can provide additional


insights into the phenotype to complement those of standard GSA. For the blood pressure phenotypes a range of processes involved in heart and blood vessel formation have been identified, as


well as tissue-specific expression in artery, heart and female reproductive organs. Several novel interactions have also been found, among others identifying joint involvement of


cardiovascular development and homoeostatic processes, and involvement of heart-expressed miRNA-145 target genes. RESULTS SIMULATIONS DEMONSTRATE RISK OF CONFOUNDING IN GSA A simulation


study was performed to evaluate the conditional and interaction GSA models, both individually and in relation to the standard marginal GSA (details on the simulation settings are provided in


the Methods and Supplementary Methods). As shown in Supplementary Figure 1, marginal GSA is highly vulnerable to confounding. When a gene set with no biological effect assigned to it is


analysed, it will be statistically significant at a rate far exceeding the significance threshold if it overlaps with another gene set that does have an effect. The conditional analysis


model can effectively account for this however, correcting for the confounding effect of the overlapping set and yielding an error rate at the nominal significance level. This phenomenon is


also clearly illustrated in the blood pressure analyses, for example for the heart development gene set. For PP it is initially significant, with a marginal _p_-value of 1.6 × 10–6. This


association is entirely explained by the much stronger association of the cardiovascular system development set that contains it, with a conditional _p_-value for heart development of only


0.40. A similar situation is shown in Supplementary Figure 2. Here, two overlapping gene sets were simulated, with one or both of them assigned an effect. This was then analysed in two ways:


analysing the two gene sets and their interaction in an interaction GSA, and analysing the interaction set (containing all genes shared by the two gene sets) by itself with a marginal GSA.


In these simulations there are no actual interaction associations, and for the interaction analysis the error rates are indeed at the nominal significance level. When testing the marginal


association of the interaction set however, the error rates are strongly inflated. Although normally an interaction would not be analysed in this way, it can happen that gene sets are


defined in terms of a combination of multiple gene properties. For example, a gene set may be defined as all the genes in a particular pathway that are also differentially expressed in the


heart. Such a gene set is therefore actually an interaction between that pathway and differential heart expression, and will be confounded by any main effects that the pathway or


differential heart expression may have. For these kinds of compound gene set, the interaction GSA is therefore required as well. GENE–PROPERTY ASSOCIATIONS ARE STRONGLY OVERLAPPING Results


of the blood pressure analyses at different steps of the workflow are summarised in Table 1, with the individual associations retained at the end of the workflow shown in Table 2. Initially


significant associations that were later discarded can be found in Supplementary Tables 1 and 2. As shown there is a considerable reduction in the number of associations in the final


results, compared to the standard GSA in step 1. A large portion of this is due to the general confounders that are corrected for in step 2, which reduced the number of associations by 75%.


Conditioning the remaining associations on each other in step 3 led to a further reduction of 30%. Moreover, there were multiple instances of gene properties being selected jointly, with


their associations clearly reflecting a single signal but their overlap too strong to be disentangled. The number of distinct signals captured by these significant and retained gene


properties is therefore even lower. This suggests the presence of a great deal of overlap between the associations in the standard GSA, with many of the tested gene properties tapping into a


much smaller subset of shared signals. Moreover, in practice the overlap in associations among different gene properties in particular is even stronger than the reduction in the number of


hits suggests, as shown in Fig. 3a. Looking at the associations of all the gene sets, the effect of conditioning on general confounders is relatively moderate and primarily affects the


strongest associations. However, conditioning on all the significant gene sets retained at the end of the analysis workflow has a much more profound impact. It is most pronounced for PP, for


which almost no marginal association remains, but it strongly affects the other two phenotypes as well (Supplementary Figure 3). EVIDENCE OF WIDESPREAD GENE-SET INTERACTION The analyses


show that although not as extensive as for the marginal associations, there is considerable evidence for interactions both between pairs of gene sets (Fig. 3b, Supplementary Figure 4) and


between gene sets and tissue-specific gene expression (Fig. 3c, Supplementary Figure 5). This is also reflected in the individual results for the post hoc interaction analyses, with


significant interactions of both kinds (Tables 3 and 4). It seems unlikely that this is unique to blood pressure phenotypes, which suggests that gene properties probably commonly affect the


phenotypes specifically in combination with other gene properties. It follows that finding these interactions is necessary for gaining a proper insight into the genetics of a phenotype. In


the post hoc analyses, by definition one of the gene properties had a marginal association strong enough to be detected. In some cases this may reflect a genuine main effect, but this can


also happen when there is only a strong interaction. An example of this is the cell proliferation gene set, for which the marginal association can be entirely explained by two interactions


(see below). For the majority of interactions found in the post hoc analyses, the second gene property also shows little or no evidence of any marginal association. The involvement of those


gene properties would therefore be very difficult to detect in a normal GSA. The exploratory interaction results point to the same conclusion, with for many of the gene properties involved


in the top interactions again little evidence of marginal associations (Supplementary Table 4). It is also clear that such weak marginal associations can hide very strong effects. For the


interactions between tissue expression and gene sets, the _p_-values of the subset of top 25% expressed genes are often very low. Similarly, for the top interactions found in the exploratory


analysis, three of the four negative interactions hide significant main effects of gene properties that are not marginally significant. Although for these the observed marginal associations


were stronger, they were still not strong enough for the GSA in step 1 to detect them. Since negative interactions are relatively prevalent (Fig. 3c), this again suggests that there may be


a considerable amount of association that a normal GSA cannot easily uncover. VARIABILITY ACROSS GENE-SET DOMAINS In our results there are considerable differences between the Gene Ontology


and miRNA target gene-set domains, in both the number of significant results (Table 1) and the overall levels of association (Supplementary Figure 6). The majority of the significant results


are found in the Gene Ontology biological process domain, with only a handful of additional associations in the cellular component and molecular function domains. For the miRNA target sets,


no associations are found at all. As Supplementary Figure 6 shows, the miRNA results are not entirely devoid of signal, and the general class of miRNA target genes shows a strong


association for both SBP and PP (Supplementary Table 5). No individual miRNA families emerge from the analysis however, with the three initially significant miRNA target set associations


explained away by the gene expression and general miRNA target gene effects. It may be that the miRNA target sets are too broad and are not involved in the phenotypes as a whole, a


possibility supported by the strong interaction found for miRNA-145 with heart expression (see below). Although the cellular component and molecular function domains do yield some


associations they are dominated by the biological process domain, a feature that is found in the interaction analyses as well (see Tables 3 and 4, Supplementary Table 4). This is in part due


to there being significantly more biological process gene sets to analyse, though for the marginal associations this is compensated by the correspondingly more stringent multiple testing


correction. Moreover, the associations that are found for cellular component and molecular function are not entirely convincing, with almost all of them showing irregularities in their


set-specific QQ-plots (Table 2, Supplementary Figures 7 and 8; see also step 4 of the detailed analysis overview in the Supplementary Methods). TISSUE EXPRESSION PREDICTS BLOOD PRESSURE


ASSOCIATION Initial analysis of the GTEx gene expression levels shows that overall gene expression is significant for all three phenotypes (Supplementary Table 5), meaning that genes with a


higher average gene expression tend to have stronger genetic associations with the phenotypes as well. This general effect drives the associations found for many of the individual tissues,


with the majority of the associations for these tissues disappearing once the general effect is corrected for (Table 1). Expression in the remaining tissues is still strongly correlated


however, making it difficult to attribute associations to any individual tissue. Conditioning the tissues on each other suggests that there are likely at most three distinct clusters of


association (see Table 2). The first and strongest is in the arterial expression levels, present in both DBP and PP. This arterial association explains a large proportion of the other tissue


associations, but a second cluster of female reproductive organs remains. It is the only association common to all three phenotypes, and manifests most prominently in the uterus expression.


Unique to PP there is also a third association, however, for heart (atrial appendage) expression. TISSUE-EXPRESSION DEPENDENCY OF GENE-SET ASSOCIATIONS Post-hoc interaction analyses for the


tissue-specific expression shows that there is also considerable positive interaction between tissue expression levels and gene sets, with significant interactions for all of the four


analysed tissues and all three phenotypes (Table 3, Fig. 4). The level of interaction association is found to vary across phenotypes and tissues and also seems to be tissue-specific, as


there is little sign of interaction for the overall gene expression level (Fig. 3c). Positive interaction between tissue-specific expression and a gene set represents a scenario where there


is an association specific to the more strongly expressed genes in the gene set. Many of the gene sets involved are quite different in function from those found in the main GSA, and have


generally weak marginal associations. One finding is a set of interactions between uterus-specific expression and three biological processes relating to sexual development for both SBP and


PP, most strongly found for sex differentiation (interaction _p_-values of 5.2 × 10–6 and 6.3 × 10–7, respectively). Marginal associations for sex differentiation have _p_-values of only


0.0034 and 0.0052, respectively, but when the subset of more strongly uterus-expressed genes is tested, strong associations emerge (conditional _p_-values of 2.2 × 10–6 and 4.3 × 10–7). This


effect is specific to uterus expression, with no sign of interaction for the other analysed tissues. Another novel finding is the interaction between tibial artery expression and miRNA-145


target genes for SBP and PP (interaction _p_-values of 1.6 × 10–5 and 1.0 × 10–5). Marginal association is now absent altogether (_p_-values of 0.442 and 0.638), but again the subset of top


expressed genes is highly significant (conditional _p_-values of 1.7 × 10–7 and 6.2 × 10–7). There are also several interactions between nucleotide, nucleoside and purine processes, and both


arterial and heart expression, found for all three phenotypes. One other surprising result is the interaction between heart expression regulation of blood pressure, highly significant for


both SBP and DBP (interaction _p_-values of 2.6 × 10–8 and 2.8 × 10–9). It is also initially significant for PP, but the association is not as strong and does not survive the outlier


correction (Supplementary Table 3). What makes this result surprising is that, in an analysis of blood pressure phenotypes, it only shows up here. It has no marginal associations, nor do any


of its subsets, and also does not interact with artery expression. Yet in conjunction with heart expression its associations are very strong, with conditional _p_-values for the top


expressed subset (1.6 × 10–9 and 2.0 × 10–11 respectively) lower than for any of the marginal gene-set associations. CARDIOVASCULAR AND MUSCLE CELL INVOLVEMENT For PP, a number of different


biological processes related to the heart were found to be associated (Table 2). The strongest of these was cardiovascular system development (_p_ = 1.8 × 10–9) (or circulatory system


development, which is identical), which by itself accounts for much of the association of the other heart-related processes. The association for cardiovascular system development is in turn


partly explained by its two significant interactions (see Table 4), with the nested sets chemical homoeostasis and homoeostatic process (interaction _p_-values of 1.4 × 10–5 and 2.0 × 10–5).


These interactions explain part of the marginal cardiovascular system development association (main effect _p_-values of 0.00067 and 0.00098 in the interaction model), but enough of it


remains to suggest that its joint effect with the homoeostasis gene sets is important but is not the whole story of its role in blood pressure genetics. There is also evidence for a related


involvement of muscle cell processes, with cardiocyte differentiation significant for both SBP and PP (_p_-values of 6.3 × 10–9 and 9.5 × 10–9) and another association for the nested pair of


sets negative regulation of smooth muscle cell proliferation and regulation of smooth muscle cell proliferation for PP (_p_-values of 6.2 × 10–7 and 1.0 × 10–7). ROLE OF CELL PROLIFERATION


AND INTRACELLULAR REGULATION Another strong association is found for the cell proliferation set for PP (_p_ = 1.6 × 10–8). Although this set overlaps with the two SMC proliferation sets, it


is much larger and represents an independent additional signal. This signal can be traced to a pair of two largely independent interactions of cell proliferation (see Table 4), with the


biological processes regulation of intracellular transport and regulation of intracellular signal transduction (interaction _p_-values of 7.6 × 10–6 and 0.00020). Although similar in their


function, these two interactions do not strongly overlap. Jointly they do account for almost all of the marginal association of cell proliferation, with its main effect _p_-value reduced to


0.015 when conditioning on both interactions simultaneously. DISCUSSION The development of the analysis workflow presented in this paper was motivated by the problem of correlated gene


properties, and the confounding and the multiplicity of redundant overlapping associations that could result from this. The results from the blood pressure analyses show that this can indeed


present a serious problem in practice. General confounding factors, here primarily the involvement of overall and tissue-specific expression, are shown capable of inducing significant


associations in a large number of gene properties. Those gene properties subsequently also overlap with and confound each other, with a subset of the significant gene properties accounting


for the associations of the rest as well as for large amounts of sub-significant associations in the other gene properties. Correcting for these issues drastically reduces the number of


gene-property associations, which implies that traditional GSA lacking such corrections is liable to yield large numbers of associations which are likely not biologically relevant to the


phenotype. Conclusions drawn from such analyses are therefore at considerable risk of being incorrect, and potentially very misleading. These same issues most likely affect other, similar


types of analysis as well, such as network analysis or SNP-set analysis. Our extension to interaction GSA opens up new avenues of analysis. Results for the blood pressure phenotypes suggest


that there may be numerous signals in the annotation that a standard GSA cannot reliably detect, if it can detect them at all. This is perhaps best exemplified by the regulation of blood


pressure gene set. Based on its marginal associations there is little evidence that it is involved in blood pressure genetics, and would not have been found in a traditional GSA. Yet it has


a very strong interaction with heart-specific expression for both SBP and DBP, and the subset of top expressed genes in the set is highly associated. This same pattern is found for many of


the tissue expression by gene-set interactions, with many of those gene sets having entirely unremarkable marginal p-values. The same is suggested by the exploratory interaction analysis,


with negative interactions in particular seen to mask strong associations. Taken together, our results thus show that a traditional GSA is doubly vulnerable. Firstly, due to confounding many


marginal associations are likely to be found that are biologically irrelevant, or the byproduct of more specific interactions. This can lead to potentially very misleading conclusions, and


wasted effort trying to follow them up. Secondly, many gene properties may only affect the phenotype in combination with other gene properties, rather than on their own. Marginal


associations for such gene properties will often be weak or absent altogether, and therefore unlikely to be found in traditional GSA. Our extended GSA approach can address these issues,


pruning away many likely irrelevant associations through conditional analysis and detecting novel additional or more refined signals with the interaction model. Aside from demonstrating the


utility of our proposed analysis workflow, our analyses also provide a variety of insights into the genetics of blood pressure, and many of the individual associations fit well with the


existing blood pressure literature. The tissue expression analyses detected associations for several cardiovascular tissues, which are highly adapted to blood pressure fluctuations. In the


cellular component domain constituents of the (sacromeric) cytoskeleton, including actin and T-tubules, were identified18, and the majority of detected biological processes are involved in


blood vessel and heart formation. These include cardiovascular and circulatory system development, cardiocyte differentiation, SMC regulation and (cardiac) mesenchyme development. The


interaction analyses provided further detail for these associations. Expression in the heart (atrial appendage) interacted strongly with the regulation of blood pressure gene set for both


SBP and DBP, which possibly reflects the role of atrial natriuretic peptide in the homoeostasis of sodium and water retention19. This is supported by the interactions of cardiovascular


system development with homoeostatic processes for SBP and PP. Heart expression also interacted with cellular response to nitrogen compound for PP, which fits the known natriuretic


peptide–nitric oxide pathway and guanylate cyclase signalling systems that are targeted by nitroglyceride20. Artery tissues were found to exhibit interactions with nucleoside phosphate and


purine-containing compound biosynthetic process for SBP, DBP and PP. Nucleoside and purine are not only constituents of RNA and DNA but are also involved in metabolic processes such as


signal transduction and regulation of enzyme activity21. This therefore aligns with the interactions found between cell proliferation and regulation of intracellular transport and signal


transduction for PP, supporting the role of purinergic signalling in the proliferation of vascular smooth muscle and endothelial cells22. Further evidence for a role of signal transduction


was found in the associations of nitric oxide and cGMP for DBP. Nitric oxide is an important signalling molecule that regulates vascular tone by acting as a vasodilator via the cGMP


signalling cascade and intracellular Ca2+ levels23,24. Also found for DBP was reactive oxygen species biosynthesis, which has been implicated with cardiovascular disease including


hypertension25. The miRNA target genes, which regulate various physiological and pathophysiological processes at a post-transcriptional level26, were associated for all three blood pressure


phenotypes. Although none of the individual miRNA target sets was significant, an interaction was found between tibial artery expression and the miRNA-145 target set. This interaction can be


explained by the influence of miRNA-145 on differentiation27 and phenotype switching of vascular SMCs28,29, and the upregulation of miRNA-145 in endothelial cells in response to shear


stress and hypertension30. No associations were found for the kidney cortex or the adrenal gland in the tissue expression analysis, which is surprising considering the regulatory role of the


renin–angiotensin–aldosterone system on blood volume and systemic vascular resistance31 and known associations of renal sodium regulatory genes variants with blood pressure32. One possible


explanation is that the available kidney cortex expression is too general. It has been shown that unique and highly distinctive patterns of gene expression exist for glomeruli, cortex,


medulla, papillary tips and pelvic tissue33, and associations with blood pressure genetics may only exist in such more specific tissues. Regulation of urine volume was also found to be


associated with both SBP and DBP, which supports the hypothesis that kidney involvement may be quite specific. Also notable were the associations of several female reproductive organ


tissues, most prominently the uterus, for all three phenotypes. This may point to the involvement of an underlying hormonal pathway, correlated to ovarian expression. Such a pathway could


reflect the known protective effects of oestrogens on cardiovascular disease and hypertension34. Alternatively, expression in these tissues may serve as a proxy for placental expression,


which is not available in the GTEx data. The placenta has been shown to play a role in blood pressure regulation during pregnancy35, and placental functioning is directly related to fetal


growth which has been linked to the development of hypertension during adult-life of the child36,37. The application of traditional GSA has previously led to novel biological hypotheses on


human physiology and the pathophysiology of disease, and the GSA presented in this paper improves on that promise for blood pressure phenotypes. Our results, filtered and refined using the


extended analysis workflow, suggest a variety of possible avenues by which the role of genetics in blood pressure may be explained. Exploring these avenues could advance our understanding of


blood pressure and the identification of therapeutic targets for cardiovascular disease, and our extended analysis can be used generally to provide the same for other phenotypes as well.


METHODS CORE GSA FRAMEWORK We use GSA implemented in MAGMA (v1.07), a detailed description of which can be found in De Leeuw et al.5. Briefly, the model is based on a linear regression


framework with genes as data points, with the regression equation _Z_=_β_0+_Bβ__B_+_Sβ__S_+_ε_, with \(\varepsilon \sim {\mathrm{MVN}}\left( {0,\sigma _e^2{\hat{\mathrm \Sigma }}} \right)\).


Gene _p_-values _P__g_ are first computed from the SNP data for each gene _g_ . These are transformed to _Z_-scores, \(Z_g = {\mathrm{\Phi }}( {1 - P_g})\) with Φ the probit function, such


that higher _Z__g_ correspond to stronger genetic associations with the phenotype. The gene set is encoded in the variable _S_, with _S__g_=1 if gene _g_ is in the gene set and _S__g_=0


otherwise. Linkage disequilibrium (LD) between genes is quantified in the gene–gene correlation matrix \({\hat{\mathrm \Sigma }}\), which is scaled by the variance \(\sigma _e^2\) to model


the residuals. Several common technical confounders are included as covariates, represented by the matrix _B_ in the regression. These are: the number of variants in each gene, an estimate


of the LD within each gene, the inverse of the mean minor allele count of variants in each gene, and the sample size on which each gene _p_-value is based. For each of these variables, the


log transformation of the variable is also included as a covariate. A one-sided test is performed on the coefficient _β__s_ of the null hypothesis _β__s_=0 against the alternative


_β__s_>0, testing whether the genes in the gene set are more strongly associated with the phenotype than other genes. This constitutes a competitive test (see De Leeuw et al.1 for a


discussion on key differences with self-contained GSA). The model can also analyse non-binary gene properties, such as gene expression. In this case _S_ is a continuous variable, and the


coefficient _β__s_ reflects the degree to which the genetic association of a gene changes as the value for the tested variable increases. By default, a two-sided test is performed on _β__s_


when analysing continuous gene properties since, in contrast to gene sets, negative associations may be informative as well. Throughout the text, we use ‘gene property’ to refer to any type


of gene-level variable, and ‘gene set’ to refer specifically to a binary gene property. CONDITIONAL AND INTERACTION GSA MODEL Conditional and interaction GSA is implemented by generalising


the core regression framework. For conditional analysis a matrix of additional covariates _C_ is included in the model, to obtain _Z_=_β_0+_Bβ__B_+_Cβ__C_+_Sβ__S_+_ε_. The _β__S_ now


reflects the conditional effect of _S_ on the genetic association _Z_, corrected for the effects that the covariates in _C_ have on _Z_. For the interaction GSA an interaction term _S_12 is


defined as the product of two gene properties _S_1 and _S_2, with \(S_{12_g} = S_{1_g} \times S_{2_g}\). Then _S_12 is tested conditional on _S_1 and _S_2 to determine if there is any


interaction between them, in the model _Z_=_β_0+_Bβ__B_+_S_1_β_1+_S_2_β_2+_S_12_β_12+_ε_. The test can be either two-sided or one-sided in either direction. An interaction of this type means


that genes that have high values for both gene properties are more strongly (or weakly, if _β_12 is negative) associated with the phenotype than genes that have high values for only one of


the two. This suggests a specific role for that combination of properties. This role may be limited to that combination, but can also be in addition to significant main effects (_β_1 and


_β_2) of the gene properties. For pairs of gene sets, _S_12 simply corresponds to the set of genes included in both gene sets. The conditional and interaction GSA models are implemented in


MAGMA as part of the GSA framework, and can be used with any of the gene analysis models available in MAGMA. It can therefore be applied to both raw genotype data as well as SNP summary


statistics from any type of single variant analysis. ANALYSIS WORKFLOW The extend GSA workflow consists of six analysis steps (Fig. 2). An initial GSA is first performed to select


significant gene properties, and the subsequent steps are then used to provide further information on their associations. This is then used to aid interpretation of the results, and to


discard likely irrelevant gene properties from consideration. It can also flag some gene properties as requiring further analysis and data before interpreting them, if the evidence for their


biological relevance to the phenotype is ambivalent. The initial GSA results are thus progressively refined, improving the reliability of the conclusions that can be drawn. An overview of


the six steps is provided here. An extensive guideline on performing the analyses and interpreting the results can be found in the Supplementary Methods. The first step of the analysis


workflow is a standard MAGMA GSA (with only the automatic correction for technical confounders). Only gene properties significant in this GSA are directly evaluated in the subsequent steps


(except step 6). In the second step, the significant gene properties are conditioned on likely confounders, and the impact those confounders have on their associations is assessed. Gene


properties that are no longer significant at the significance threshold used in step 1 are then discarded. In step 3, remaining significant gene properties are conditioned on each other.


This helps determine the extent to which their associations overlap, and to identify which of those associations are most likely relevant for the phenotype. Gene properties are selected in a


stepwise fashion on the strength of their associations and the way those associations overlap with each other. In each selection step, gene properties are conditioned on both the gene


properties already selected and the general confounders from the second analysis step. Gene properties for which the association is largely or wholly explained by other gene properties are


discarded; gene properties which are found to share a single underlying association that cannot be disentangled are selected and interpreted jointly. The fourth step applies only to gene


sets, and checks for outliers and signs of confounding effects not detected in the previous steps. For each gene set QQ-plots of the residual _Z_-scores of genes in the set are created,


adding a confidence band to visualise the degree of deviation expected by chance. These are inspected for signs that the association of the gene set may be driven by a smaller subset of


genes in the set, indicating possible confounding. If not uncovered in the post hoc interaction analyses, the source of confounding could then be investigated further using targeted analyses


with additional data or annotation. If the likely associated subset is very small the problem is likely one of outliers instead, and the gene set can be discarded altogether. In the fifth


step, interaction analyses are performed for all the remaining significant gene properties. This can narrow down the significant associations to more specific effects that occur only in


combination with other gene properties. Positive interactions are tested with all other available gene properties; for interactions between gene sets, this is restricted to pairs of gene


sets for which the overlap between the sets is not too large or small, as otherwise the interaction term is not meaningfully defined. In the optional sixth step, an exploratory interaction


analysis is performed in order to detect additional interactions. An initial list of gene properties is generated based on their marginal associations, and interactions with all other gene


properties are tested for this list as in step 5. A liberal selection criterion such as FDR-controlled significance is recommended for creating the initial list. In contrast to step 5,


two-sided tests are performed for the interactions. This allows for the detection of negative interactions, which would point to involvement in the phenotype of a particular gene property


only the absence of another gene property. This step is independent of the previous steps, and therefore requires separate multiple testing correction. GENOTYPE AND PHENOTYPE DATA Primary


quality control and imputation of the UK Biobank (July 2017 release) data was performed by UK Biobank itself7. We applied additional QC and filtering of variants and individuals to obtain a


sample of independent individuals of European ancestry, containing hard-called genotypes with MAF greater than 0.000001 and missingness of at most 5%. Since poorly imputed SNPs can bias the


results, only variants of high imputation quality (info score of at least 0.9, variants imputed on HRC panel only) were included in the analysis. Full details on the data and QC can be found


in the Supplementary Methods. The processed data set used for the blood pressure analyses contained 360,243 individuals and 13,923,638 autosomal variants. In our analyses, three phenotypes


were analysed: SBP, DBP and PP. SBP and DBP were corrected for use of blood pressure-lowering medication, adding 10 and 15 mm Hg respectively to the measured values for individuals known to


use such medication38. PP was computed as PP = SBP−DBP. Thirty principal components were included as covariates to correct for population structure in the data, computed using FlashPCA39.


Other covariates included in the analysis were sex, age, age squared, BMI, Townsend Deprivation index, and genotyping array indicator. To further validate the results from the UK Biobank


analysis, a replication analysis was performed using the 2011 ICPB GWAS data40. Details for this replication analysis can be found in the Supplementary Methods. ANNOTATION Variants were


annotated to genes based on NCBI (37.3) gene definitions41, mapping variants to a gene if they were located in the transcription region of that gene, or within two kilobase upstream or one


kilobase downstream of the transcription region. A total of 18,285 autosomal protein-coding genes had at least one variant mapped to them, and 43.7% of the variants in the data mapped to at


least one gene. Variants not mapped to any gene were not used in the analysis. Gene annotation from five different domains was used in the analysis: tissue-specific gene expression data,


three Gene Ontology domains, and miRNA target sets. Gene Ontology and miRNA target gene sets were obtained from MsigDB (v6.0)8. For the miRNA target sets, an additional gene set of all genes


contained in at least one of the target sets was created, reflecting general miRNA target status. GTEx (v7)9 was used for the gene expression data. Mean RPKM values were computed across


gene and tissue. These were truncated down to 50, incremented by one, then log-transformed to obtain a per-tissue expression score. Average scores across tissues were computed as a measure


of the overall expression level of each gene. Ensembl gene IDs were mapped to Entrez IDs for the genes in the data, resulting in expression scores for 17,064 genes in the data. SIMULATION


STUDY A random subsample of 10,000 individuals was taken from the UK Biobank data, filtering variants with MAF smaller than 1% and variants not mapped to any gene. Continuous phenotypes were


simulated for this data by constructing a genetic component and adding normally distributed noise such that the genetic component explained 10% of the phenotypic variance. The genetic


components were created by designating 1000 genes as causal, then selecting a subset of SNPs from each of these genes as effect SNPs and combining them (see Supplementary Methods for full


details). Simulated phenotypes were analysed in PLINK 1.9 (ref. 42) to obtain SNP _p_-values. Ten genetic components were constructed (designating new causal genes and SNPs), with 100


replicates for each. Multiple phenotypes with new random noise were generated for each replicate, using meta-analysis on the SNP _p_-values to obtain GWAS results representing sample sizes


of 10,000, 50,000, and 100,000. Pairs of overlapping gene sets were then constructed, containing different patterns and proportions of causal genes. In each condition an initial gene set was


created containing a specified proportion of causal genes. Another gene set was then created overlapping with it, as either a subset, a superset, or partially overlapping set. Genes in the


overlap were randomly selected from the initial gene set, with the rest randomly sampled from the remaining genes. For evaluation of the interaction model, only partial overlap conditions


were used. Additional parameters that were varied across conditions were the gene set sizes, the degree of overlap, and the level of association assigned to the initial gene set. For the


interaction model, the level of main effect association assigned to the second gene set was also varied. A full description of the simulation settings and results is given in the 


Supplementary Methods. In each condition, ten gene sets overlapping with the initial set were created. For the conditional model simulations the marginal association and association


conditional on the initial set were tested. For the interaction model, the interaction term was tested either as a gene set by itself or using the interaction model. Results were aggregated


per condition over the ten sets and the 1000 GWAS replicates, computing type 1 error rates at different significance thresholds. PRIMARY GSA Analyses were performed using MAGMA (v1.07)5.


Phenotypes were first regressed on the covariates, using the resulting residuals as input for the MAGMA gene analysis. The SNP-wise (multi) model was used for the gene analysis. This model


combines the SNP-wise (mean) model (more sensitive to many smaller SNP associations in a gene) and the SNP-wise (top) model (more sensitive to a single large SNP association in the gene) to


obtain a good distribution of power over different genetic architectures. This model is recommended when the number of SNPs in a data set is very large, as the SNP-wise (mean) and PC


regression models are less sensitive to detecting gene associations when a single strong SNP effect is present in a gene containing many other SNPs. To deal with rare variants, per gene SNPs


with a minor allele count smaller than 100 were aggregated into a weighted burden score. This was then included in the model in the same way as normal SNPs, replacing the rare variants. At


most 25 SNPs were used per burden score. For genes with more than 25 rare variants, multiple burden scores were created. All GSA was performed using this gene analysis output. Bonferroni


correction was used to correct for multiple testing, separately for each phenotype. It was also applied separately for each domain, corrected for the number of domains, for a significance


threshold of \(\alpha _D = \frac{{0.05}}{{5 \times K_D}} = \frac{{0.01}}{{K_D}}\) per domain _D_ with _K__D_ the number of tests for that domain. In all the analyses one-sided tests were


used, testing for positive associations. CONDITIONAL GSA After the initial GSA, analyses were repeated conditioning on potential general confounders. Overall gene expression was included for


all domains. For the four gene set domains, tissue-specific expression for coronary artery, tibial artery, heart (atrial appendage), and uterus were also conditioned on; the miRNA target


set analyses were additionally conditioned on general miRNA target status. For conditional analyses of the gene sets, missing tissue expression values were set to the median expression value


for that tissue. Only gene sets and tissues still significant at the original threshold were retained. Conditional analyses were then performed to evaluate overlap between associations of


significant associations. The stepwise procedure was used per domain for the significant and retained gene properties until there were no remaining associations with conditional _p_-values


below 0.05 (see Analysis workflow above and Detailed overview of blood pressure analysis in the Supplementary Methods). For gene sets, associations retained after this selection were then


also conditioned on those from the other domains. All these analyses also included the general confounders as covariates. After this set-specific QQ-plots were created for all retained gene


sets to inspect them for signs of outliers and hidden confounding. EXPRESSION BY GENE SET INTERACTION ANALYSIS After the conditional analyses, post hoc interaction analyses were performed


for the top tissue expression levels. Genes with no expression values were removed, and interactions were then tested with all gene sets of at least 100 genes. To make the results more


comparable across phenotypes, the same tissues were used for all three phenotypes, testing interactions for coronary and tibial artery, heart (atrial appendage) and uterus, as well as for


overall gene expression. For each tissue, overall expression and its interaction with the tested gene set were included as covariates. For miRNA target sets, general miRNA target status and


its interactions with overall expression and the tissue expression were additionally included. One-sided tests were performed for the interaction terms, testing for positive interactions.


Bonferroni correction was performed per tissue, correcting for the 1495 interactions tested per tissue. To check for outliers, scatterplots of residual tissue expression (corrected for the


overall expression) by gene _Z_-scores were created for all significant interactions. Each plot only used genes in the set, and both variables were normalised within those genes. Genes were


marked as an outlier if they were more than two standard deviations from the origin and all genes within two standard deviations were either further from the origin or themselves marked as


outlier. The analysis was then repeated with the marked outliers removed from the gene set. A gene set was also constructed of the top 25% residually expressed genes in the gene set


(excluding outliers), in which was then tested conditional on the whole gene set. Interactions for which neither follow-up test was significant were discarded. GENE SET BY GENE SET


INTERACTION ANALYSIS Post hoc interaction analyses were also performed for all significant and retained gene sets, testing interactions with other gene sets. Interactions were only tested


for gene-set pairs if there was meaningful overlap between the gene sets: for each set in the pair, the overlap with the other gene set as well as the part not overlapping with the other


gene set was required to be at least 20 genes, and at least 10% of the genes in the gene set. One-sided tests for positive interactions were performed, conditioning on the general


confounders. Bonferroni correction was applied separately for each of the significant and retained gene sets, correcting for the number of interactions tested for that gene set. An


exploratory interaction analysis was also performed. Gene sets were selected using FDR correction (Benjamini–Hochberg, at _α=_0.05), separately for each of the four gene-set domains. For


each of these gene sets, interactions were tested with all other gene sets for which there was meaningful overlap, using the same criteria as in the post hoc interaction analysis. Two-sided


tests were performed on the interactions, conditioning on the general confounders. Bonferroni correction was applied for the total number of interactions tested. CODE AVAILABILITY The MAGMA


analysis software can be obtained for Linux, Windows and Mac platforms from http://ctg.cncr.nl/software/magma. DATA AVAILABILITY The raw genotype and phenotype data analysed in this study


were used under license from UK Biobank (http://www.ukbiobank.ac.uk), and restrictions apply to its availability. However the data are available from the authors upon reasonable request, if


permission is given by UK Biobank. REFERENCES * De Leeuw, C. A., Neale, B. M., Heskes, T. & Posthuma, D. The statistical properties of gene-set analysis. _Nat. Rev. Genet._ 17, 353–364


(2016). Article  PubMed  CAS  Google Scholar  * Wang, K., Li, M. & Hakonarson, H. Analysing biological pathways in genome-wide association studies. _Nat. Rev. Genet._ 11, 843–854 (2010).


Article  PubMed  CAS  Google Scholar  * The Gene Ontology Consortium. Gene Ontology Consortium: going forward. _Nucl. Acids Res._ 43, D1049–D1056 (2015). Article  CAS  Google Scholar  *


McCurley, A. et al. Direct regulation of blood pressure by smooth muscle cell mineralocorticoid receptors. _Nat. Med._ 18, 1429–1433 (2012). Article  PubMed  PubMed Central  CAS  Google


Scholar  * de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. _PLoS Comput. Biol._ 11, e1004219 (2015). Article  PubMed  PubMed


Central  CAS  Google Scholar  * Lee, P. H., O’Dushlaine, C., Thomas, B. & Purcell, S. M. INRICH: Interval-based enrichment analysis for genome-wide association studies. _Bioinformatics_


28, 1797–1799 (2012). Article  PubMed  PubMed Central  CAS  Google Scholar  * Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex


diseases of middle and old age. _PLoS Med._ 12, e1001779 (2015). Article  PubMed  PubMed Central  Google Scholar  * Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based


approach for interpreting genome-wide expression profiles. _Proc. Natl Acad. Sci. USA_ 102, 15545–15550 (2005). Article  ADS  PubMed  CAS  Google Scholar  * GTEx Consortium. The


Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. _Science_ 348, 648–660 (2015). Article  PubMed Central  CAS  Google Scholar  * Surendran, P. et al.


Trans-ancestry meta-analyses identify rare and common variants associated with blood pressure and hypertension. _Nat. Genet._ 48, 1151–1161 (2016). Article  PubMed  PubMed Central  CAS 


Google Scholar  * Rapsomaniki, E. et al. Blood pressure and incidence of twelve cardiovascular diseases: lifetime risks, healthy life-years lost, and age-specific associations in 1.25


million people. _Lancet_ 383, 1899–1911 (2014). Article  PubMed  PubMed Central  Google Scholar  * Levy, D. et al. Framingham Heart Study 100K Project: genome-wide associations for blood


pressure and arterial stiffness. _BMC Med. Genet._ 8(Suppl. 1), S3 (2007). Article  PubMed  PubMed Central  CAS  Google Scholar  * Ehret, G. B. et al. The genetics of blood pressure


regulation and its target organs from association studies in 342,415 individuals. _Nat. Genet._ 48, 1171–1184 (2016). Article  PubMed  PubMed Central  CAS  Google Scholar  * Liu, C. et al.


Meta-analysis identifies common and rare variants influencing blood pressure and overlapping with metabolic trait loci. _Nat. Genet._ 48, 1162–1170 (2016). Article  PubMed  PubMed Central 


CAS  Google Scholar  * Hoffmann, T. J. et al. Genome-wide association analyses using electronic health records identify new loci influencing blood pressure variation. _Nat. Genet._ 49, 54–64


(2017). Article  PubMed  CAS  Google Scholar  * Warren, H. R. et al. Genome-wide association analysis identifies novel blood pressure loci and offers biological insights into cardiovascular


risk. _Nat. Genet._ 49, 403–415 (2017). Article  PubMed  PubMed Central  CAS  Google Scholar  * Kraja, A. T. et al. New blood pressure-associated loci identified in meta-analyses of 475 000


individuals. _Circ. Cardiovasc. Genet._ 10, e001778 (2017). Article  PubMed  CAS  PubMed Central  Google Scholar  * Gautel, M. & Djinović-Carugo, K. The sarcomeric cytoskeleton: from


molecules to motion. _J. Exp. Biol._ 219, 135–145 (2016). Article  PubMed  Google Scholar  * Atlas, S. A. & Laragh, J. H. Atrial natriuretic peptide: a new factor in hormonal control of


blood pressure and electrolyte homeostasis. _Annu. Rev. Med._ 37, 397–414 (1986). Article  PubMed  CAS  Google Scholar  * Murad, F. Shattuck Lecture: nitric oxide and cyclic GMP in cell


signaling and drug development. _N. Engl. J. Med._ 355, 2003–2011 (2006). Article  PubMed  CAS  Google Scholar  * Yegutkin, G. G. Nucleotide- and nucleoside-converting ectoenzymes: important


modulators of purinergic signalling cascade. _Biochim. Biophys. Acta_ 1783, 673–694 (2008). Article  PubMed  CAS  Google Scholar  * Burnstock, G. Purinergic signaling and vascular cell


proliferation and death. _Arterioscler. Thromb. Vasc. Biol._ 22, 364–373 (2002). Article  PubMed  CAS  Google Scholar  * Moncada, S., Palmer, R. M. & Higgs, E. A. Nitric oxide:


physiology, pathophysiology, and pharmacology. _Pharmacol. Rev._ 43, 109–142 (1991). PubMed  CAS  Google Scholar  * Hughes, A. D. Calcium channels in vascular smooth muscle cells. _J. Vasc.


Res._ 32, 353–370 (1995). Article  PubMed  CAS  Google Scholar  * Touyz, R. M. & Briones, A. M. Reactive oxygen species and vascular biology: implications in human hypertension.


_Hypertens. Res._ 34, 5–14 (2011). Article  PubMed  CAS  Google Scholar  * Kozomara, A. & Griffiths-Jones, S. miRBase: annotating high confidence microRNAs using deep sequencing data.


_Nucl. Acids Res._ 42, D68–D73 (2014). Article  PubMed  CAS  Google Scholar  * Wang, Y. S. et al. Role of miR-145 in cardiac myofibroblast differentiation. _J. Mol. Cell Cardiol._ 66, 94–105


(2014). Article  PubMed  CAS  Google Scholar  * Rangrez, A. Y., Massy, Z. A., Metzinger-Le Meuth, V. & Metzinger, L. miR-143 and miR-145: molecular keys to switch the phenotype of


vascular smooth muscle cells. _Circ. Cardiovasc. Genet._ 4, 197–205 (2011). Article  PubMed  CAS  Google Scholar  * Zhang, Y. N. et al. Phenotypic switching of vascular smooth muscle cells


in the ʻnormal regionʼ of aorta from atherosclerosis patients is regulated by miR-145. _J. Cell. Mol. Med._ 20, 1049–1061 (2016). Article  PubMed  PubMed Central  CAS  Google Scholar  *


Hergenreider, E. et al. Atheroprotective communication between endothelial cells and smooth muscle cells through miRNAs. _Nat. Cell Biol._ 14, 249–256 (2012). Article  PubMed  CAS  Google


Scholar  * Paul, M., Poyan Mehr, A. & Kreutz, R. Physiology of local renin-angiotensin systems. _Physiol. Rev._ 86, 747–803 (2006). Article  PubMed  CAS  Google Scholar  * Tobin, M. D.


et al. Common variants in genes underlying monogenic hypertension and hypotension and blood pressure in the general population. _Hypertension_ 51, 1658–1664 (2008). Article  PubMed  CAS 


Google Scholar  * Higgins, J. P. T. et al. Gene expression in the normal adult human kidney assessed by complementary DNA microarray. _Mol. Biol. Cell_ 15, 649–656 (2004). Article  PubMed 


PubMed Central  CAS  Google Scholar  * Ashraf, M. S. & Vongpatanasin, W. Estrogen and hypertension. _Curr. Hypertens. Rep._ 8, 368–376 (2006). Article  PubMed  CAS  Google Scholar  *


Granger, J. P., Alexander, B. T., Llinas, M. T., Bennett, W. A. & Khalil, R. A. Pathophysiology of hypertension during preeclampsia linking placental ischemia with endothelial


dysfunction. _Hypertension_ 38, 718–722 (2001). Article  PubMed  CAS  Google Scholar  * Alexander, B. T. Placental insufficiency leads to development of hypertension in growth-restricted


offspring. _Hypertension_ 41, 457–462 (2003). Article  PubMed  CAS  Google Scholar  * Alexander, B. T. Fetal programming of hypertension. _Am. J. Physiol. Regul. Integr. Comp. Physiol._ 290,


R1–R10 (2006). Article  PubMed  CAS  Google Scholar  * Tobin, M. D., Sheehan, N. A., Scurrah, K. J. & Burton, P. R. Adjusting for treatment effects in studies of quantitative traits:


antihypertensive therapy and systolic blood pressure. _Stat. Med._ 24, 2911–2935 (2005). Article  MathSciNet  PubMed  Google Scholar  * Abraham, G. & Inouye, M. Fast principal component


analysis of large-scale genome-wide data. _PLoS ONE_ 9, e93766 (2014). Article  ADS  PubMed  PubMed Central  CAS  Google Scholar  * Wain et al. Genome-wide association study identifies six


new loci influence pulse pressure and mean arterial pressure. _Nat. Genet._ 43, 1005–1011 (2011). Article  PubMed  PubMed Central  CAS  Google Scholar  * NCBI Resource Coordinators. Database


resources of the national center for biotechnology information. _Nucleic Acids Res._ 45, D12–D17 (2017). Article  CAS  Google Scholar  * Chang, C. C. et al. Second-generation PLINK: rising


to the challenge of larger and richer datasets. _Gigascience_ 4, 7 (2015). Article  PubMed  PubMed Central  CAS  Google Scholar  Download references ACKNOWLEDGEMENTS This work was funded by


The Netherlands Organization for Scientific Research (NWO VICI 453-14-005, 645-000-003). The analyses were carried out on the Genetic Cluster Computer, which is financed by the Netherlands


Scientific Organization (NWO: 480-05-003), by the VU University, Amsterdam, The Netherlands, and by the Dutch Brain Foundation, and is hosted by the Dutch National Computing and Networking


Services SurfSARA. This research has been conducted using the UK Biobank Resource under project 16406. We thank the participants and researchers who collected and contributed to the data.


AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University Amsterdam,


Amsterdam, 1081 HV, The Netherlands Christiaan A. de Leeuw, Sven Stringer & Danielle Posthuma * Department of Radiology, Leiden University Medical Center, Leiden, 2333 ZA, The


Netherlands Ilona A. Dekkers * Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, 6525 EC, The Netherlands Tom Heskes * Department of Clinical Genetics,


Amsterdam Neuroscience, VU University Medical Center, Amsterdam, 1007 MB, The Netherlands Danielle Posthuma Authors * Christiaan A. de Leeuw View author publications You can also search for


this author inPubMed Google Scholar * Sven Stringer View author publications You can also search for this author inPubMed Google Scholar * Ilona A. Dekkers View author publications You can


also search for this author inPubMed Google Scholar * Tom Heskes View author publications You can also search for this author inPubMed Google Scholar * Danielle Posthuma View author


publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS C.dL., T.H. and D.P. conceived of the study. C.dL. developed the statistical method and performed the


analyses. S.S. prepared the UK Biobank data for analysis. C.dL., I.A.D. and D.P. wrote the paper. All authors discussed the results and commented on the paper. CORRESPONDING AUTHORS


Correspondence to Christiaan A. de Leeuw or Danielle Posthuma. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PUBLISHER'S


NOTE: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. ELECTRONIC SUPPLEMENTARY MATERIAL SUPPLEMENTARY INFORMATION PEER


REVIEW FILE DESCRIPTION OF ADDITIONAL SUPPLEMENTARY FILES SUPPLEMENTARY DATA 1 RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0


International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the


source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative


Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by


statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit


http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE de Leeuw, C.A., Stringer, S., Dekkers, I.A. _et al._ Conditional and interaction


gene-set analysis reveals novel functional pathways for blood pressure. _Nat Commun_ 9, 3768 (2018). https://doi.org/10.1038/s41467-018-06022-6 Download citation * Received: 06 October 2017


* Accepted: 31 July 2018 * Published: 14 September 2018 * DOI: https://doi.org/10.1038/s41467-018-06022-6 SHARE THIS ARTICLE Anyone you share the following link with will be able to read


this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative