Introduction

Heart failure (HF) affects >30 million individuals worldwide and its prevalence is rising1. HF-associated morbidity and mortality remain high despite therapeutic advances, with 5-year survival averaging ~50%2. HF is a clinical syndrome defined by fluid congestion and exercise intolerance due to cardiac dysfunction3. HF results typically from myocardial disease with impairment of left ventricular (LV) function manifesting with either reduced or preserved ejection fraction. Several cardiovascular and systemic disorders are implicated as aetiological factors, most notably coronary artery disease (CAD), obesity and hypertension; multiple risk factors frequently co-occur and the contribution to aetiology has been challenging based on observational data alone1,4. Monogenic hypertrophic and dilated cardiomyopathy (DCM) syndromes are known causes of HF, although they account for a small proportion of disease burden5. HF is a complex disorder with an estimated heritability of ~26%6. Previous modest-sized genome-wide association studies (GWAS) of HF reported two loci, while studies of DCM have identified a few replicated loci7,8,9,10,11. We hypothesised that a GWAS of HF with greater power would provide an opportunity for: (i) discovery of genetic variants modifying disease susceptibility in a range of comorbid contexts, both through subtype-specific and shared pathophysiological mechanisms, such as fluid congestion; and (ii) provide insights into aetiology by estimating the unconfounded causal contribution of observationally associated risk factors by Mendelian randomisation (MR) analysis12.

Herein, we perform a large meta-analysis of GWAS of HF to identify disease associated genomic loci. We seek to relate HF-associated loci to putative effector genes through integrated analysis of expression data from disease-relevant tissues, including statistical colocalisation analysis. We evaluate the genetic evidence supporting a causal role for HF risk factors identified through observational studies using Mendelian randomisation and explore mediation of risk through conditional analysis. In summary, our study identifies additional HF risk variants, prioritises putative effector genes and provides a genetic appraisal of the putative causal role of observationally associated risk factors, contributing to our understanding of the pathophysiological basis of HF.

Results

Meta-analysis identifies 11 genomic loci associated with HF

We conducted a GWAS comprising 47,309 cases and 930,014 controls of European ancestry across 26 studies from the Heart Failure Molecular Epidemiology for Therapeutic Targets (HERMES) Consortium. The study sample comprised both population cohorts (17 studies, 38,780 HF cases, 893,657 controls) and case-control samples (9 studies, 8,529 cases, 36,357 controls; see Supplementary Notes 2 and 3 for a detailed description of the included studies). Genotype data were imputed to either the 1000 Genomes Project (60%), Haplotype Reference Consortium (35%) or study-specific reference panels (5%). We performed a fixed-effect inverse variance-weighted (IVW) meta-analysis relating 8,281,262 common and low-frequency variants (minor allele frequency (MAF) > 1%) to HF risk (Fig. 1). We identified 12 independent genetic variants, at 11 loci associated with HF at genome-wide significance (P < 5 × 10−8), including 10 loci not previously reported for HF (Fig. 2, Table 1). The quantile–quantile, regional association plots and study-specific effects for each independent variant are shown in Supplementary Figs. 13. We replicated two previously reported associations for HF and three of four loci for DCM (Bonferroni-corrected P < 0.05; Supplementary Data 1). Using linkage disequilibrium score regression (LDSC)13, we estimated the heritability of HF in UK Biobank \((h_g^2)\) on the liability scale, as 0.088 (s.e. = 0.013), based on an estimated disease prevalence of 2.5%14.

Fig. 1: Study design and analysis workflow.
figure 1

Overview of study design to identify and characterise heart failure-associated risk loci and for secondary cross-trait genome-wide analyses. GWAS, genome-wide association study; QTL, quantitative trait locus; MAGMA, Multi-marker Analysis of GenoMic Annotation; SNP, single-nucleotide polymorphism; mtCOJO, multi-trait-based conditional and joint analysis.

Fig. 2: Manhattan plot of genome-wide heart failure associations.
figure 2

The x-axis represents the genome in physical order; the y-axis shows −log10 P values for individual variant association with heart failure risk from the meta-analysis (n = 977,323). Suggestive associations at a significance level of P < 1 × 10−5 are indicated by the blue line, while genome-wide significance at P < 5 × 10−8 is indicated by the red line. Meta-analysis was performed using a fixed-effect inverse variance-weighted model. Independent genome-wide significant variants are annotated with the nearest gene(s).

Table 1 Variants associated with heart failure at genome-wide significance.

Phenotypic effects of HF-associated variants

Next, we investigated associations between the identified loci and other traits that may provide insights into aetiology. First, we queried the NHGRI-EBI GWAS Catalog15 and a large database of genetic associations in UK Biobank (http://www.nealelab.is/uk-biobank), and identified several biomarker and disease associations at each locus (Supplementary Data 2 and 3). Second, we tested for associations of identified loci with ten known HF risk factors, including cardiac structure and function measures, using GWAS summary data (Supplementary Data 4)16,17,18,19,20,21,22,23. Six sentinel variants were associated with CAD, including established loci, such as 9p21/CDKN2B-AS1 and LPA18. Four variants were associated with atrial fibrillation (AF), a common antecedent and sequela of HF24. To estimate whether the HF risk effects were mediated wholly or in part by risk factors upstream of HF (e.g., CAD), we conditioned HF GWAS summary statistics on nine HF risk factors using Multi-trait Conditional and Joint Analysis (mtCOJO)25 (Supplementary Data 5). Conditioning on AF attenuated the HF risk effect by >50% for the PITX2/FAM241A locus but not other AF-associated loci (KLHL3, SYNPOL2/AGAP5), conditioning on CAD fully attenuated effects for two of the six CAD loci (LPA, 9p21/CDKN2B-AS1) and conditioning on body mass index (BMI) ablated the effect of the FTO locus (Supplementary Fig. 4, Supplementary Data 5). Next, we performed hierarchical agglomerative clustering of loci based on cross-trait associations to identify groups related to HF subtypes (Fig. 3). Among HF loci not associated with CAD, a group of four clustered together, of which two (KLHL3 and SYNPO2L/AGAP5) were associated with AF and two (BAG3 and CDKN1A) with reduced LV systolic function (fractional shortening (FS); Bonferroni-corrected P < 0.05); we highlight the results for these loci in our reporting of subsequent analyses to identify candidate genes. Notably, genetic associations with DCM at the BAG3 locus have been reported previously10,11.

Fig. 3: Associations of HF risk variants with traits relating to disease subtypes and risk factors.
figure 3

This bubble plot shows associations between the identified HF loci and risk factors and quantitative imaging traits, using summary estimates from UK Biobank (DCM, dilated cardiomyopathy) and published GWAS summary statistics. Number in bracket represents sample size (for quantitative traits) or number of cases (for binary traits) used to derive the GWAS summary statistics. The size of the bubble represents the absolute Z-score for each trait, with the direction oriented towards the HF risk allele. Red/blue indicates a positive/negative cross-trait association (i.e., increase/decrease in disease risk or increase/decrease in continuous trait). We accounted for family-wise error rate at 0.05 by Bonferroni correction for the ten traits tested per HF locus (P < 4.5e-4); traits meeting this threshold of significance for association are indicated by dark colour shading. Agglomerative hierarchical clustering of variants was performed using the complete linkage method, based on Euclidian distance. Where a sentinel variant was not available for all traits, a common proxy was selected (bold text). For the LPA locus, associations for the more common of the two variants at this locus are shown. Bold text represents variants whose estimates are plotted, upon which we performed hierarchical agglomerative clustering using the complete linkage method based on Euclidian distance. FS, fractional shortening; LVD, left ventricular dimension; DCM, dilated cardiomyopathy; AF, atrial fibrillation; CAD, coronary artery disease; LDL-C, low-density lipoprotein cholesterol; T2D, type 2 diabetes; BMI, body mass index; SBP, systolic blood pressure; DBP, diastolic blood pressure.

Tissue-enrichment analysis

We performed gene-based association analyses using MAGMA26 to identify tissues and aetiological pathways relevant to HF. Thirteen genes were associated with HF at genome-wide significance, of which four were located within 1 Mb of a sentinel HF variant and expressed in heart tissue (Supplementary Data 6). Tissue specificity analysis across 53 tissue types from the Genotype-Tissue Expression (GTEx) project identified the atrial appendage as the highest ranked tissue for gene expression enrichment, excluding reproductive organs (Supplementary Fig. 5). We sought to map candidate genes to the HF loci by assessing the functional consequences of sentinel variants (or their proxies) on gene expression, and protein structure/abundance using quantitative trait locus (QTL) analyses.

Variant effects on protein coding sequence

Since the identified HF variants were located in non-coding regions, we investigated if sentinel variants were in linkage disequilibrium (LD, r2 > 0.8) with non-synonymous variants with predicted deleterious effects. We identified a missense variant in BAG3 (rs2234962; r2 = 0.99 with sentinel variant rs17617337) associated previously with DCM and progression to HF, and three missense variants in SYNPO2L (rs34163229, rs3812629 and rs60632610; all r2 > 0.9 with sentinel variant rs4746140)10,11,27. All four missense variants had Combined Annotation Dependent Depletion scores > 20, suggesting deleterious effects (Supplementary Data 7).

Prioritisation of putative effector genes by expression analysis

We then sought to identify candidate genes for HF risk loci by assessing their effects on gene expression. Given that cardiac dysfunction defines HF and that HF-associated genes by MAGMA analysis were enriched in heart tissues, we first looked for expression quantitative trait loci (eQTL) in heart tissues (LV, left atrium, and RAA, right atrium auricular region) from the Myocardial Applied Genomics Network (MAGNet) and GTEx projects. Three of 12 variants were significantly associated with the expression of one or more genes located in cis in at least one heart tissue (Bonferroni-corrected P < 0.05; Supplementary Data 8). For several of the identified HF loci, extra-cardiac tissues are likely to be relevant; for example, liver is reported to mediate effects of the LPA locus28. To further explore these effects, we then analysed results from a large whole-blood eQTL dataset (n = 31,684) and found associations with cis-gene expression (P < 5 × 10−8) for 8 of 12 sentinel variants (Supplementary Table 1)29. For most HF variants, heart eQTL associations were consistent with those for blood traits; however, for intronic HF sentinel variants in BAG3, CDKN1A and KLHL3 we detected expression of the corresponding gene transcripts in blood only.

Next, to prioritise among candidate genes identified through eQTL associations, we estimated the posterior probability for a common causal variant underlying associations with gene expression and HF at each locus, by conducting pairwise Bayesian colocalisation analysis30. We found evidence for colocalisation (posterior probability > 0.7) for MYOZ1 and SYNPO2L in heart, PSRC1 and ABO in heart and blood; and CDKN1A in blood (Supplementary Data 8, Supplementary Table 1). PSRC1 and MYOZ1 were also implicated in a transcriptome-wide association analysis performed using predicted gene expression based on GTEx human atrial and ventricular expression reference data (Supplementary Table 2). Using serum pQTL data from the INTERVAL study (N = 3,301), we also identified significant concordant cis associations for BAG3 and ABO (Supplementary Data 9)31.

The evidence linking candidate genes with HF risk loci is summarised in Supplementary Table 3, and candidate genes are described in Supplementary Note 1. At HF risk loci associated with reduced systolic function or AF, but not with CAD, the annotated functions of candidate genes related to myocardial disease processes, and traits that may influence clinical expressivity, such as renal sodium handling. For example, the sentinel variant at the SYNPO2L/AGAP5 locus was associated with expression of MYOZ1 and SYNPO2L, encoding two α-actinin binding Z-disc cardiac proteins. MYOZ1 is a negative regulator of calcineurin signalling, a pathway linked to pathological hypertrophy32,33 and SYNPO2L is implicated in cardiac development and sarcomere maintenance34. The HF sentinel variant at the BAG3 locus was in high LD with a non-synonymous variant associated previously with DCM11, and was associated with decreased cis-gene expression in blood. BAG3 encodes a Z-disc-associated protein that mediates selective macroautophagy and promotes cell survival through interaction with apoptosis regulator BCL235. CDKN1A encodes p21, a potent cell cycle inhibitor that mediates post-natal cardiomyocyte cell cycle arrest36 and is implicated in LMNA-mediated cellular stress responses37. KLHL3 is a negative regulator of the thiazide-sensitive Na+Cl cotransporter (SLC12A3) in the distal nephron; loss of function variants cause familial hyperkalaemic hypertension (FHHt) by increasing constitutive sodium and chloride resorption38. The sentinel variant at this locus was associated with decreased gene expression and could predispose to sodium and fluid retention. Notably, thiazide diuretics inhibit SLC12A3 to restore sodium and potassium homoeostasis in FHHt and are effective treatments for preventing hypertensive HF39.

Genetic appraisal of HF risk factors

Although many risk factors are associated with HF, only myocardial infarction and hypertension have an established causal role based on evidence from randomised controlled trials (RCTs)40. Important questions remain about causality for other risk factors. For instance, type 2 diabetes (T2D) is a risk factor for HF, yet it is unclear if the association is mediated via CAD risk or by direct myocardial effects, which may have important preventative implications41. Accordingly, we investigated potential causal roles for modifiable HF risk factors, using GWAS summary data. First, we estimated the genetic correlation (rg) between HF and 11 related traits, using bivariate LDSC. For eight of the eleven traits tested, we found evidence of shared additive genetic effects with estimates of rg ranging from −0.25 to 0.67 (Supplementary Table 4). The estimated CAD-HF rg was 0.67, suggesting 45% \((r_g^2)\) of variation in genetic risk of HF is accounted for by common genetic variation shared with CAD, and that the remaining genetic variation is independent of CAD.

Next, we estimated the causal effects of the 11 HF risk factors using Generalised Summary-data-based Mendelian Randomisation, which accounts for pleiotropy by excluding heterogenous variants based on the heterogeneity in dependent instrument (HEIDI) test (Methods, Supplementary Fig. 6, Supplementary Data 10). Consistent with evidence from RCTs and genetic studies42, we found evidence for causal effects of higher diastolic blood pressure (DBP; OR = 1.30 per 10 mmHg, P = 9.13 × 10−21) and systolic blood pressure (SBP; OR = 1.18 per 10 mmHg, P= 4.8 × 10−23), and higher risk of CAD (OR = 1.36, P = 1.67 × 10−70) on HF. We note that the effect estimates for variant associations with blood pressure, included as instrumental variables, were adjusted for BMI, which may attenuate the estimated causal effect on HF. We found a s.d. increment of BMI (equivalent to 4.4 kg m−2 (men) − 5.4 kg m−2 (women)43) accounted for a 74% higher HF risk (P= 2.67 × 10−50), consistent with previous reports44,45. We identified evidence supporting causal effects of genetic liability to AF (OR of HF per 1 log odds higher AF = 1.19, P = 1.40 × 10−75) and T2D (OR of HF per 1 log odds higher T2D = 1.05, P = 6.35 × 10−05) and risk of HF. We did not find supportive evidence for a causal role for higher heart rate (HR) or lower glomerular filtration rate (GFR) despite reported observational associations46,47. We then performed a sensitivity analysis to explore potential bias arising from the inclusion of case-control samples by repeating the Mendelian randomisation analysis, using HF GWAS estimates generated from population-based cohort studies only. The results of this analysis were consistent with those generated from the overall sample (Supplementary Table 5).

To investigate whether risk factor effects on HF were mediated by CAD and AF, we performed analyses conditioning for CAD and AF using mtCOJO. We observed attenuation of the effect of T2D after conditioning for CAD (OR = 1.02, P = 0.19), suggesting at least partial mediation by CAD risk rather than through direct myocardial effects of hyperglycaemia. Similarly, the effects of low-density lipoprotein cholesterol (LDL-C) were fully explained by effects of CAD on HF risk (OR = 1.00, P = 0.80). Conversely, the effects of blood pressure, BMI and triglycerides (TGs) were only partially attenuated, suggesting causal mechanisms independent of those associated with AF and CAD (Fig. 4, Supplementary Data 10).

Fig. 4: Conditional Mendelian randomisation analyses of HF risk factors.
figure 4

Forest plot of HF risk factors with significant causal effect HF risk estimated using Mendelian randomisation, implemented with GSMR. Diamonds represent the odds ratio and the error bars indicate the 95% confidence interval. The unadjusted estimates represent the risk of HF as estimated from the HF GWAS data, while the adjusted estimates represent risk of HF conditioned, using GWAS summary statistics for atrial fibrillation (adjusted for AF) or coronary artery disease (adjusted for CAD) estimated using the mtCOJO method. For binary traits (coronary artery disease, atrial fibrillation and type 2 diabetes), the MR estimates represent average causal effect per natural-log odds increase in the trait risk. For continuous traits, the MR estimates represent average causal effect per standard deviation increase in the reported unit of the trait. LDL, low-density lipoprotein; HDL, high-density lipoprotein; CAD, coronary artery disease; AF, atrial fibrillation.

Discussion

We identify 12 independent variant associations for HF risk at 11 genomic loci by leveraging genome-wide data on 47,309 cases and 930,014 controls, including 10 loci not previously associated with HF. The identified loci were associated with modifiable risk factors and traits related to LV structure and function, and include the strongest associations signals from GWAS of CAD (9p21, LPA)18, AF (PITX2)17 and BMI (FTO)20. Conditioning for CAD, AF and blood pressure traits demonstrated that the effects of some loci (e.g., 9p21/CDKN2B-AS1) were mediated wholly via risk factor trait associations (e.g., CAD); however, for 8 of 12 variants the attenuation of effects was <50%, suggesting alternative mechanisms may be important. Those loci associated with reduced LV systolic function or AF mapped to candidate genes implicated in processes of cardiac development, protein homoeostasis and cellular senescence. We use genetic causal inference and conditional analysis to explore the syndromic heterogeneity and causal biology of HF, and to provide insights into aetiology. Mendelian randomisation analysis confirms previously reported casual effects for BMI and provides evidence supporting the causal role of several observationally linked risk factors, including AF, elevated blood pressure (DBP and SBP), LDL-C, CAD, TGs and T2D. Using conditional analysis, we demonstrate CAD-independent effects for AF, BMI, blood pressure and estimate that the effects of T2D are mostly mediated by an increased risk of CAD.

The heterogeneity of aetiology and clinical manifestation of HF are likely to have reduced statistical power. We identify a modest number of genetic associations for HF compared to other cardiovascular disease GWAS of comparable sample size, such as for AF, suggesting that an important component of HF heritability may be more attributable to specific disease subtypes than components of a final common pathway17. Subsequent studies will explore emerging opportunities to define HF subtypes and longitudinal phenotypes in large biobanks and patient registries at scale using standardised definitions based on diagnostic codes, imaging and electronic health records. We speculate that future analysis of HF subtypes may yield additional insights into the genetic architecture of HF to inform new approaches to prevention and treatment.

Methods

Samples

Participants of European ancestry from 26 cohorts (with a total of 29 distinct datasets) with either a case-control or population-based study design were included in the meta-analysis, as part of the HERMES Consortium. Cases included participants with a clinical diagnosis of HF of any aetiology with no inclusion criteria based on LV ejection fraction; controls were participants without HF. Definitions used to adjudicate HF status within each study are detailed in the Supplementary Data 11 and baseline characteristics for each study are provided in Supplementary Data 12. We meta-analysed data from a total of 47,309 cases and 930,014 controls. All included studies were ethically approved by local institutional review boards and all participants provided written informed consent. The meta-analysis of summary-level GWAS estimates from participating studies was performed in accordance with guidelines for study procedures provided by the UCL Research Ethics Committee.

Genotyping and imputation

All studies used high-density genotyping arrays and performed genotype calling and pre-imputation quality control (QC), as reported in Supplementary Data 13. Studies performed imputation using one or more of the following reference panels: 1000 Genomes (Phase 1 or Phase 3)48, Hapmap 2 NCBI build 3649, Haplotype Reference Consortium (HRC)50, the Estonian Whole-Genome Sequence reference51 or a reference sample based on 15,220 whole-genome sequences of Icelandic individuals. The following software tools were used by studies for phasing: Eagle52, MaCH53 and SHAPEIT54; and imputation: mimimac255 and IMPUTE256. For imputation to the HRC reference panel, the Sanger Imputation Server (https://www.sanger.ac.uk/science/tools/sanger-imputation-service) was used. The deCODE study was imputed using study specific procedures57. Methods for phasing, imputation and post-imputation QC for each study are detailed in Supplementary Data 13.

Study-level GWA analysis

GWA analysis for each study was performed locally according to a common analysis plan, and summary-level estimates were provided for meta-analysis. Autosomal single-nucleotide polymorphisms (SNPs) were tested for association with HF using logistic regression, assuming additive genetic effects. For the Cardiovascular Health Study, HF association estimates were generated by analysis of incident cases using a Cox proportional hazards model. All studies included age and sex (except for single-sex studies) as covariates in the regression models. Principal components (PCs) were included as covariates for individual studies as appropriate. The following tools were used for study-level GWA analysis: ProbABEL58, mach2dat (http://www.unc.edu/~yunmli/software.html), QuickTest59, PLINK260, SNPTEST61 or R62 as detailed in Supplementary Data 13.

QC on study summary-level data

QC of summary-level results for each study was performed according to the protocol described in Winkler et al.63. In brief, we used the EasyQC tool to harmonise variant IDs and alleles across studies and to compare reported allele frequencies with allele frequencies in individuals of European ancestry from the 1000 Genomes imputation reference panel64. We inspected PZ plots (reported P value against P value derived from the Z-score), beta and s.e. distributions, and Manhattan plots to check for consistency and to identify spurious associations. For each study, variants were removed if they satisfied any one of the following criteria: imputation quality < 0.5, MAF < 0.01, absolute betas and s.e. > 10. As recommended in Sinnott et al.65 and Johnson et al.66, more stringent QC measures were applied to studies where genotyping of cases and controls was performed on different platforms. This included more stringent thresholds for removing SNPs with low-quality imputation, and where available, individuals genotyped on both platforms were used to remove SNPs with low concordance rates between the two platforms. To check for study-level genomic inflation, we examined quantile–quantile plots and calculated the genomic inflation factor (λGC). For three studies, where some degree of genomic inflation was observed (λGC > 1.1), genomic control correction was applied (Supplementary Data 13)67.

Meta-analysis

Meta-analysis of summary data was conducted using the fixed-effect IVW approach implemented in METAL (released March 25 2011)68. Variants were included if they were present in at least half of all studies. We tested for inflation of the meta-analysis test statistic due to cryptic population structure by estimating the LDSC intercept, implemented using LDSC v1.0.013. As the LDSC intercept indicated no inflation (LD score intercept of 1.0069), no further correction was applied to the meta-analysis summary estimates. To identify variants independently associated with HF, we analysed the genome-wide results using FUMA v1.3.269, selecting a random sample of 10,000 UK Biobank participants of European ancestry as an LD reference dataset70. Variants were filtered using a P < 5 × 10−8 and independent genomic loci were LD-pruned based on an r2 < 0.1. We calculated Cochrane’s Q and I2 statistics to assess whether the effect estimates for HF sentinel variants were consistent across studies71.

Heritability estimation

To estimate the proportion of HF risk explained by common variants we estimated heritability \(h_g^2\) on the liability scale, using LDSC on the UK Biobank summary data (6,504 HF cases, 387,652 controls), assuming a population prevalence of 2.5%14. This approach assumes that a binary trait has an underlying continuous liability, and above a certain liability threshold an individual becomes affected. We can then estimate the genetic contribution to the continuous liability. Sample ascertainment can change the distribution of liability in the sampled individuals and needs to be adjusted for, which requires making assumptions about the population prevalence of the trait.

LD reference dataset

A LD reference was created, including 10,000 UK Biobank participants of European ancestry, based on HRC-imputed genotypes (referred to henceforth as UKB10K). European individuals were identified by projecting the UK Biobank samples onto the 1000 G Phase 3 samples. A genomic relationship matrix was constructed using HapMap3 variants, filtered for MAF > 0.01, PHWE < 10−6 and missingness < 0.05 in the European subset, and one member of each pair of samples with observed genomic relatedness >0.05 was excluded to obtain a set of unrelated European individuals. Random sampling without replacement was used to extract a subset of 10,000 unrelated individuals of European ancestry. Variants with a minor allele count > 5, a genotype probability > 0.9 and imputation quality > 0.3 were converted to hard calls. This LD reference dataset was used for downstream summary-based analysis and for identifying SNP proxies.

Gene set enrichment analysis

A gene-based and gene set enrichment analysis of variant associations was performed using MAGMA26, implemented by FUMA v1.3.269. This analysis was performed using summary-level meta-analysis results. First, a gene-based association analysis to identify candidate genes associated with HF was conducted. Second, a tissue enrichment analysis of HF-associated genes was performed using gene expression data for 30 tissues from GTEx. Finally, a gene set enrichment analysis was performed based on pathway annotations from the Gene Ontology database72. For all MAGMA analyses, multiple testing was accounted for by Bonferroni correction.

Missense consequences of sentinel variants and proxies

We queried the protein coding consequence of the sentinel variants and proxies (r2 > 0.8) using the Combined Annotation Dependent Depletion (CADD) score73, implemented using FUMA v1.3.269. The CADD score integrates information from 63 distinct functional annotations into a single quantitative score, ranging from 1 to 99, based on variant rank relative to all 8.6 billion possible single nucleotide variants of the human reference genome (GRCh37). Sentinel SNPs or proxies with CADD score > 20 were identified. A CADD score of 20 indicates that the variant is ranked in the top 1% of highest scoring variants, while a CADD score of 30 indicates the variant is ranked in the top 0.1%.

Expression quantitative trait analysis

To determine if HF sentinel variants had cis effects on gene expression, we queried two eQTL datasets based on RNA sequencing of human heart tissue—the GTEx v7 resource74 and the MAGNet repository (http://www.med.upenn.edu/magnet/). The GTExv7 sample included 272 LV and 264 RAA non-diseased tissue samples from European (83.7%) and African Americans (15.1%) individuals. The MAGNet repository included 89 LV and 101 LA tissue samples obtained from rejected donor tissue from hearts with no evidence of structural disease; and 89 LV samples from individuals with DCM, obtained at the time of transplantation. eQTL analysis of the LV data from MAGNet analysis was performed using the QTLtools package75 in DCM with adjustment for age, sex, disease status and the first three genetic PCs. To account for observed batch effects, a surrogate variant analysis was performed using the R package SVAseq76 and 22 additional covariates were identified and included in the model. Existing eQTL summary data in LA tissue from MAGNet and heart tissue from GTEx were queried17,77. We queried HF sentinel variants for eQTL associations with genes located either fully or partly within a 1 megabase (Mb) region upstream or downstream of the sentinel variant (referred to as cis-genes). We accounted for multiple testing by adjusting a significance threshold of P < 0.05 for the total number of SNP-cis-gene tests performed across the four heart tissue eQTL datasets (P < 4.73E-05 for a total of 1,056 SNP–gene associations). Baseline characteristics for the MAGNet study are provided in Supplementary Table 6. We also queried sentinel HF variants for associations with cis gene expression in blood from the eQTLGen consortium (N = 31,684)29. Given the large sample size, we used a stringent genome-wide significance threshold of P < 5 × 10−8 to identify significant blood eQTLs.

Colocalisation analysis

Bayesian colocalisation analysis was performed using R package coloc to test whether shared associations with gene expression and HF risk were consistent with a single common causal variant hypothesis30. We tested all genes with significant cis–eQTL association by analysing all variants within a 200 kilobase window around the gene using eQTL summary data for heart tissues and whole blood, and HF summary data from present study. We set the prior probability of a SNP being associated only with gene expression, only with HF, or with both traits as 10−4, 10−4 and 10−5. For each gene, we report the posterior probability that the association with gene expression and HF risk is driven by a single causal variant. We consider a posterior probability of ≥0.7 as providing evidence, supporting a causal role for the gene as a mediator of HF risk.

Transcriptome-wide association analysis

We employed the S-PrediXcan method78 implemented in the MetaXcan software (https://github.com/hakyimlab/MetaXcan) to identify genes whose predicted expression levels in heart tissue are associated with HF risk. Prediction models trained on GTExv7 heart tissue datasets were applied to the HERMES meta-analysis results. Only models that significantly predicted gene expression in the GTEx eQTL dataset (false discovery rate < 0.05) were considered. A total of 4859 genes were tested in left ventricle tissue and 4467 genes for right atrial appendage. Genes with an association P < 5.36 × 10−6 [0.05/(4859 + 4467)] were considered to have gene expression profiles significantly associated with HF.

Protein quantitative trait analysis in blood

We queried both cis- and trans- protein QTL (pQTL) associations based on measures for serum proteins mapping to 3000 genes in 3301 healthy individuals from the INTERVAL study31. We accounted for multiple testing by adjusting a significance threshold of P < 0.05 for the total number of tests for all variants and proteins tested (36,000 tests).

Association of HR risk loci with other phenotypes

We queried associations (with P < 1 × 10−5) of sentinel variants and proxies (r2 > 0.6) with any trait in the NHGRI-EBI Catalog of published GWAS (accessed 21 January 2019)15,79. We report associations (where P < 1 × 10−5) for the sentinel variants with traits in the UK Biobank cohort using the MRBase PheWAS database (http://phewas.mrbase.org/, accessed 17 January 2019). The database contains GWA summary data for 4203 phenotypes measured in 361,194 unrelated individuals of European ancestry from the UK Biobank data. We queried GWAS data for ten traits related to HF risk factors, endophenotypes and related disease traits using summary-level data from the largest available GWAS study (either publicly available or through agreement with study investigators). The following phenotypes were considered: fractional shortening (FS), LV dimension16, DCM; AF17, CAD18, LDL-C22, T2D23; BMI20, SBP and DBP19. For DCM, a GWAS was performed in the UKB among individuals of European ancestry with cases defined by the presence of ICD10 code I42.0 as a main/secondary diagnosis or primary/secondary cause of death with non-cases as referents, using PLINK2. Logistic regression was performed with adjustment for age, sex, genotyping array and the first ten PCs.

Hierarchical agglomerative clustering

We performed hierarchical agglomerative clustering on a locus level using the complete linkage method based on the associations with related traits as described above. Where a sentinel variant is not available in any of the other traits summary results, a common proxy is used in place of the sentinel variant. For the LPA locus, we used associations for a proxy of the more common variant (rs55730499). Dissimilarity structure was calculated using Euclidean distance based on the Z-score (beta of continuous traits or log odds of disease risk divided by s.e.) of the cross-trait associations. We accounted for multiple testing at family-wise error rate of 0.05 by Bonferroni correction for the ten traits tested per HF locus (110 tests), and considered P < 4.5e−4 (0.05/110) as our significance threshold for association.

Genetic correlation analysis

We estimated genetic correlation between HF and 11 risk factors using LDSC13 on the GWAS summary statistics for each trait: AF17, CAD18, LDL-C, high-density lipoprotein cholesterol (HDL-C), TGs22, T2D23; BMI20, SBP, DBP19, HR21 and estimated GFR80.

Mendelian randomisation analysis

We performed two sample Mendelian randomisation analysis using the Generalised summary data-based Mendelian randomisation (GSMR)25 implemented in GCTA v1.91.7beta81. To identify independent SNP instruments for each exposure, GWAS-significant SNPs (P < 5 × 10−08) for each risk factor were pruned (r2 < 0.05; LD window of 10,000 kb; using the UKB10K LD reference). We then estimated the causal effect of the risk factor on the disease trait according to the MR paradigm. The HEIDI test implemented in GSMR was used to detect and remove (if HEIDI P < 0.01) variants showing horizontal pleiotropy i.e., having independent effects on both exposure and outcome, as such variants do not satisfy the underlying assumptions for valid instruments. As sensitivity analyses, we estimated the causal effects of known risk factors on HF risk other statistical methodology and software—the R package TwoSampleMR82 was used to select independent variant instruments for the exposure using the same parameters as per the GSMR analysis (P < 5 × 10−8; r2 < 0.05; LD window of 10,000 kb), except the TwoSampleMR package uses the 1000 Genomes as the LD reference. Causal estimates based on the IVW83, MR-Egger and median-weighted methods84 were then calculated using the Mendelian Randomisation85 R package. To enable comparison of MR estimates between traits, we present effect estimates corresponding to the risk of HF for a 1-s.d. higher risk factor of interest. Where the original GWAS conducted rank-based inverse normal transformation (RINT) of a trait prior to GWAS, we used the per-allele beta coefficients following RINT to approximate the equivalent values on the standardised scale, as has been conducted previously.

To determine if the causal effects of the continuous risk factors on HF were mediated via their effects on CAD or AF risk, we repeated the GSMR analysis after conditioning the HF summary statistics on CAD and AF GWAS summary statistics, as described below.

Conditional analysis

To estimate the effects of HF risk variants after adjusting for risk factors which showed a significant causal effect on HF in the MR analyses, we performed the mtCOJO on summary data, as implemented in GCTA v1.91.7beta81. HF summary statistics were adjusted for AF17, CAD18, LDL-C, HDL-C, TGs22, DBP, SBP19 and BMI20 using GWAS summary data. The UKB10K LD reference was used.

Reporting summary

Further information is provided in the Nature Research Reporting Summary.