Arrhythmias And Sudden Death

Development and validation of prediction models for incident atrial fibrillation in heart failure

Abstract

Objectives Accurate prediction of heart failure (HF) patients at high risk of atrial fibrillation (AF) represents a potentially valuable tool to inform shared decision making. No validated prediction model for AF in HF is currently available. The objective was to develop clinical prediction models for 1-year risk of AF.

Methods Using the Danish Heart Failure Registry, we conducted a nationwide registry-based cohort study of all incident HF patients diagnosed from 2008 to 2018 and without history of AF. Administrative data sources provided the predictors. We used a cause-specific Cox regression model framework to predict 1-year risk of AF. Internal validity was examined using temporal validation.

Results The population included 27 947 HF patients (mean age 69 years; 34% female). Clinical experts preselected sex, age at HF, NewYork Heart Association (NYHA) class, hypertension, diabetes mellitus, chronic kidney disease, obstructive sleep apnoea, chronic obstructive pulmonary disease and myocardial infarction. Among patients aged 70 years at HF, the predicted 1-year risk was 9.3% (95% CI 7.1% to 11.8%) for males and 6.4% (95% CI 4.9% to 8.3%) for females given all risk factors and NYHA III/IV, and 7.5% (95% CI 6.7% to 8.4%) and 5.1% (95% CI 4.5% to 5.8%), respectively, given absence of risk factors and NYHA class I. The area under the curve was 65.7% (95% CI 63.9% to 67.5%) and Brier score 7.0% (95% CI 5.2% to 8.9%).

Conclusion We developed a prediction model for the 1-year risk of AF. Application of the model in routine clinical settings is necessary to determine the possibility of predicting AF risk among patients with HF more accurately and if so, to quantify the clinical effects of implementing the model in practice.

What is already known on this topic

  • Patients with incident heart failure (HF) have a twofold increased rate of incident atrial fibrillation (AF), which is a serious event associated with a poor prognosis.

  • Identification of HF patients at high risk for incident AF may target preventive efforts to reduce modifiable risk factors for AF and guide early monitoring.

  • No clinical scoring system that predicts incident AF in HF is routinely used in clinical practice.

What this study adds

  • In a nationwide cohort study, we developed a prediction model for the 1-year risk of AF in HF.

  • Predictive performance: time-dependent area under the curve of 65.7% and Brier score of 7.0%.

How this study might affect research, practice or policy

  • Further studies are needed to determine whether improvement in prediction is possible, and if so, to quantify the clinical effects of implementing such a model in routine practice.

Introduction

Heart failure (HF) is a common clinical syndrome with a global prevalence of above 64 million patients in 2017.1 HF patients have high mortality, but recent data on temporal trends suggest improvements in survival after incident HF.2 Patients with incident HF have a twofold increased rate of incident atrial fibrillation (AF),3 which is a serious event associated with increased hospital utilisation,4 increased risk of stroke5 and substantial excess mortality.3 6

Prediction models enable stratification of patients based on their absolute risk and constitute an essential element in personalised medicine. Several clinical scoring systems have been developed to predict incident AF in the general population such as the Framingham Heart Study7 or developed to predict other outcomes than AF but shown promising results in predicting incident AF such as the CHA2DS2-VASc score.8 However, the development of prediction models and the validation of existing models in patients with HF are limited.9 10

Identification of HF patients at high risk for incident AF could have several benefits. First, the prediction model may facilitate the identification of other risk markers such as biomarkers (eg, natriuretic peptides) or imaging markers (eg, left atrial measures) by providing a base clinical model for further risk reclassification. Second, the identification may target preventive efforts to reduce modifiable risk factors for AF and guide early monitoring to detect undiagnosed AF. Third, a prediction model may provide valuable risk estimates for incident AF that inform clinicians and patients in decision making. Fourth, identification of high-risk individuals may lead to initiation of early rhythm-controlling medication and catheter ablation, which seems to have benefits according to recent data.11 12 However, no clinical scoring system that predicts incident AF in HF is routinely used in clinical practice. In this study, we aimed to develop and validate clinical prediction models for the 1-year development of incident AF applicable at the time of incident HF.

Materials and methods

Setting and data sources

The Danish Heart Failure Registry (DHFR) is a nationwide clinical quality database established in 2003. The DHFR monitors and improves the quality of care for inpatients and outpatients with incident HF in Denmark. Danish hospitals must report all eligible patients to DHFR and a certified cardiologist validates all patients before enrolment.13 The registry provided the source population, which included patients diagnosed with incident HF from 2008 to 2018. The inclusion criterion of the registry is a first-time diagnosis of HF that follows diagnostic criteria from the Danish Society for Cardiology and European Society of Cardiology: HF symptoms and objective signs of HF, and if possible, clinical improvement on HF treatment. Exclusion criteria of the DHFR are HF caused by uncorrectable structural heart disease, HF caused by valvular heart disease, HF caused by rapid heart rhythm (including AF), isolated right-sided HF, HF diagnosed concurrently with a primary diagnosis of acute myocardial infarction, or HF diagnosed and treated by a private practitioner of cardiology. The cardiologist identifies the conditions in the patient’s medical record.

The Danish National Patient Registry was established in 1977 and collects prospectively registered data on all inpatients, and after 1995 also all outpatients. Data include individual-level information on dates of admission and discharge, surgical procedures performed, and one primary and several secondary diagnoses per discharge. Coding of diagnoses followed the International Classification of Diseases 8th Revision before 1994 and the 10th revision (ICD-10) from 1994 and onwards. The physician who discharged a patient coded all diagnoses for that patient.

The Danish National Prescription Registry contains individual-level data on all dispensed prescriptions since 1994 and provided information on pharmacological treatments. Coding of medications followed the Anatomical Therapeutic Chemical Classification System.

The Danish Civil Registration System contains individual-level information on sex, date of birth, vital status and migration. All Danish citizens are assigned a unique 10-digit Civil Registration number, which enabled unambiguous linkages of data between registries.

Design and population

We conducted a nationwide registry-based cohort study among Danish patients with incident HF. Baseline was the day of the diagnosis of HF, and the time horizon was 1 year after baseline. From the source population, we used the Danish National Patient Registry to exclude patients diagnosed with AF (or atrial flutter) on or before baseline if the DHFR did not exclude those patients at enrolment (online supplemental table 1). Additionally, we excluded patients who have lived in Denmark for less than 5 years to ensure sufficient time for the registry-based identification of history of diseases.

Candidate predictors

Selection of predictors originated from a combination of expert knowledge and available data (table 1). Information on candidate predictors originated from administrative clinical data and included demography, health behaviours, clinical data, comorbidities and health factors, and medication (online supplemental table 2).

Table 1
|
Characteristics of patients at diagnosis of heart failure

Demographic factors included age at HF onset, sex and level of education. We categorised level of education into three groups based on the International Standard Classification of Education (ISCED). Group 1 included early childhood, primary education and lower secondary educations (ISCED 0–2). Group 2 included general upper secondary education and vocational upper secondary education (ISCED 3). Group 3 included short-cycle tertiary, medium-length tertiary, bachelor’s-level educations or equivalent, second-cycle, master’s-level or equivalent and PhD level (ISCED 5–8). ISCED 4 does not exist in Denmark.

Lifestyle factors included high alcohol intake and smoking. We applied the definition of high alcohol intake that pertained to the DHFR. The definition was more than 14 drinks per week for women and 21 drinks per week for men until 1 July 2015 according to the official recommendations from the Danish National Board of Health. After that date, the registry applied a lower threshold of high intake, that is, more than 7 drinks per week for women and 14 drinks per week for men. Smoking status was categorised as current smoking, former smoking or never smoking.

Clinical data included left ventricular ejection fraction (LVEF) categorised as ≥50%, >40%–49%, 25%–40% or <25% and NewYork Heart Association (NYHA) classification as I, II and III/ IV. The patients underwent echocardiography no later than 7 days after baseline or up to 6 months before the diagnosis if the cardiologist considered the examination relevant. Information on NYHA classification was ascertained at the diagnosis of HF or up to 12 weeks after.

Comorbidities and health factors included history of the conditions listed in table 1. We included major acute medical events and surgery to capture triggers of AF, and we ascertained both within 1 month before baseline. Major acute medical events included sepsis, pneumonia, pulmonary embolism and/or acute respiratory distress syndrome. Surgery included procedures associated with a high risk of postoperative AF, including cardiac surgery (coronary artery bypass grafting and valvular surgery) and non-cardiac thoracic surgery (large lung resections and oesophagogastrostomy). We omitted sepsis and acute respiratory distress syndrome from the models because the prevalences were considered too low (table 1).

Medications at baseline are listed in table 1. To reflect ongoing treatment, we defined current use as redeeming at least one prescription within 6 months before baseline (online supplemental table 2).

Selection of predictors

The selection of variables originated from consensus among three subject matter clinical experts in the author group (GYHL, EJJB and LF). We applied a modified Delphi method by which the experts independently selected and prioritised between 4 and 10 final predictors. The targeted maximum number of variables was prespecified with the aim to achieve a tool that is applicable in clinical practice. First, the facilitator (NV) sent out a list with prespecified variables to the experts. Second, each expert selected relevant predictors. Third, the facilitator collected and shared the anonymised results. Depending on the results, more rounds could be relevant to achieve consensus. Online supplemental figure 1 provides details on the modified Delphi process.

Outcome

The outcome was a first-time hospital diagnosis of incident AF (or atrial flutter) after the diagnosis of HF. AF (and atrial flutter) could be coded both as a primary or secondary diagnosis (online supplemental table 2). We identified all AF (or atrial flutter) diagnoses irrespective of the type of AF (or atrial flutter). The positive predictive value of AF and atrial flutter in the National Patient Registry is high,14 and atrial flutter accounts for approximately 5% of the ICD-10 I48 diagnoses.14 Patients were followed from baseline until AF, death, heart transplant, emigration, 1 year after HF or 31 December 2018, whichever came first.

Statistics

Patient characteristics were summarised at the date of HF diagnosis using count and percentages for categorical variables and median and IQRs for continuous variables. We predicted the 1-year risks of AF in presence of the competing risks (death or heart transplant) using cause-specific Cox regression and random survival forest.15 In the main analysis, we included the variables selected by the modified Delphi method and used only the complete cases (no missing values). To fit the random survival forest, the number of trees was set to 1000, and we tuned the minimal node size using 10 repetitions of 10-fold cross-validation with all other hyperparameters fixed.

We performed two additional analyses. First, we included all variables and compared the cause-specific Cox regression approach with the random survival forest approach. Second, we repeated the analyses using multiple imputation of missing values in the variables selected by the Delphi method, in the development dataset but not in the validation dataset. We used 1000 imputations using the Substantive Model Compatible Fully Conditional Specification (SMCFCS) method.16

We fitted the model with data on patients diagnosed with HF from 2008 to 2013 (development dataset) and with the data on patients diagnosed with HF from 2014 to 2018 (validation dataset). We calculated the prediction performance of the 1-year predicted risks of AF as time-dependent area under the curve (AUC), Brier score and Index of Prediction Accuracy (IPA).15 The Brier score reflects both discrimination and calibration, the AUC only discrimination. The models were compared with each other and against a benchmark null model which ignores all risk predictor variables.15 The calendar split simulates the natural situation in which data from patients diagnosed with HF in the oldest dataset is used to build the models. Next, we used the built models to predict AF among patients diagnosed with HF in the most recent dataset. We also used a random single split in data to examine performance to account for potential temporal trends.

To fit our final model, we used data on incident HF patients from all years 2008–2017. The predictions of the final model were exemplified using low-risk and high-risk individuals. Low risk was defined as absence of all risk factors and an NYHA class of I, while high risk was presence of all risk factors and an NYHA class of III/IV. Furthermore, we illustrated the predicted risks for the low-risk and high-risk individuals by age and sex. The results of the final prediction model were presented as an Excel file (see online supplemental file 2).

The cause-specific Cox regression and random forest models provided 95% CIs. All analyses were performed with R V.4.0.5.

Results

Baseline characteristics

The study population consisted of 27 947 patients with incident HF with a mean age of 68.5 years and 33.6% were female (table 1). Online supplemental figure 2 shows the flow chart. When applying the calendar split, the development dataset included 15 020 patients and the validation dataset included 12 927 patients (table 1). At 1 year after HF, the cumulative risk of AF was 7.5% for the cohort when accounting for the competing risk of death or heart transplantation, and the 1-year mortality was 8.8% (online supplemental table 3).

Delphi process

At least two of the three clinical experts selected the same nine variables in the first Delphi round, and there was no need for an additional round (online supplemental figure 3) as the number of variables was within the prespecified range. The selected predictor variables were sex, age at baseline, NYHA class, hypertension, diabetes mellitus, chronic kidney disease, obstructive sleep apnoea, chronic obstructive pulmonary disease and myocardial infarction. Of 27 947 patients, 25 289 (90.5%) had complete information on the selected variables (online supplemental table 4).

Prediction models with predictors selected by experts

Based on the cause-specific Cox regression model, we calculated the sex-specific 1-year risk in low-risk and high-risk individuals, respectively (figure 1). In the high-risk individuals, the predicted risks varied between 0.7% and 8.5% among women and 1.0% and 11.7% among men. In the low-risk individuals, the predicted risk varied between 0.5% and 10.0% among women and 0.8% and 14.2% among men. Among patients aged 70 years at HF, the 1-year risk of AF was 9.3% (95% CI 7.1% to 11.8%) and 6.4% (95% CI 4.9% to 8.3%) among high-risk men and females, respectively. In low-risk individuals, the 1-year risk was 7.5% (95% CI 6.7% to 8.4%) among males and 5.1% (95% CI 4.5% to 5.8%) among females. We noted an increasing predicted risk by increasing age in both scenarios, but the risk declined at the higher ages in the high-risk individuals (figure 1). Online supplemental figure 4 shows the predicted risk of death without AF and we noted that the risk increased strongly by age from the age of 80 years for the high-risk individuals. Online supplemental file 1 shows all predictions for the model based on logistic regression and population with complete follow-up (online supplemental table 5).

Figure 1
Figure 1

Age at HF and 1-year predicted risk of AF for low-risk and high-risk subjects. High-risk individuals: Patient has NYHA class III/IV, hypertension, diabetes mellitus, chronic kidney disease, obstructive sleep apnoea, chronic obstructive pulmonary disease and myocardial infarction. Low-risk individuals: Patient has NYHA class I, no hypertension, no diabetes mellitus, no chronic kidney disease, no obstructive sleep apnoea, no chronic obstructive pulmonary disease and no history of myocardial infarction. AF, atrial fibrillation; HF, heart failure; NYHA, NewYork Heart Association.

The time-dependent AUC was similar for the random forest model (64.2%, 95% CI 62.3% to 66.1%) and for the cause-specific Cox model (65.7%, 95% CI 63.9% to 67.5%, table 2). The discriminative ability was for both models better than the benchmark null model, but the Brier score was not substantially different across the models (table 2). After multiple imputation, the AUC was 65.1% (95% CI 63.2% to 67.0%), the Brier score 7.2% (95% CI 5.3% to 9.0%) and the IPA 1.8%.

Table 2
|
Summary of average prediction performance with 95% CIs for subject matter selected models

Prediction models with all predictor variables

We repeated the analysis after inclusion of all predictors listed in table 1 and the expert selected predictors in a complete case population. The predictive performance was similar across all four models (table 3).

Table 3
|
Summary of average prediction performance with 95% CIs for models with expert-selected and all predictors

Random split

We examined the models elected by experts using a single random data split instead of the calendar split. The Brier score of the null model was 7.1% (95% CI 6.2% to 7.9%). The AUC was 63.0% (95% CI 60.9% to 65.0%) for the random forest model and 63.7% (95% CI 61.7% to 65.8%) for the cause-specific Cox model. The corresponding Brier scores were 7.0% (95% CI 6.1% to 7.8%) for the random forest model and 6.9% (95% CI 6.1% to 7.8%) for the cause-specific Cox model, while the IPA was 1.4% and 1.5%, respectively.

Discussion

In this nationwide cohort study, we developed prediction models for incident AF among patients with HF. Clinical experts independently preselected the predictor variables for inclusion in the models based on candidate predictors available in administrative data. The predicted risk was higher with increasing age and was substantially higher among patients with all predictors.

The reason for a declining risk in high-risk individuals with highest age is most likely a substantial increase in risk of death from approximately the age of 80 years. The respective predictive performance of the random forest and cause-specific Cox regression was modest and similar, with Brier scores of 7.0% and time-dependent AUCs of 64% and 66%, respectively. Furthermore, use of a single random split did not change the results substantially compared with the calendar split. Our models that included all predictor variables did not demonstrate substantially different predictive performance, and use of all clinical information did not seem to outperform the reduced models selected by clinical experts.

The clinical experts did not prioritise certain variables associated with incident AF, for example, elevated alcohol consumption (not selected) and valvular disease (selected by one expert). A reason for not selecting such variables may be that analyses of the Framingham Heart Study have shown that alcohol consumption has not contributed to the risk of AF in any of the epochs during the last 50 years.17 Furthermore, the population-attributable risk associated with significant murmur has decreased over time.17

We applied a calendar split approach and a single random split approach to examine the predictive performance of the models. The first approach is advantageous because it simulates a natural situation in which the model is built on data from past patients with incident HF, and the model is then applied to future patients with incident HF. However, this approach does not account for potential temporal trends, for instance by HF guideline updates or new approvals of therapy. We found that the crude 1-year risk of AF increased over years, but we did not examine calendar trends. A single random split approach may account for temporal changes but the approach also comes with disadvantages because splitting randomly makes the results depend on the random seed. As a random seed determines the split, the predictions depend on a potential lucky number of the analyst.15 However, we noted no substantial difference in the predictive performance between the approaches.

The number of prediction models for AF developed and/or validated among patients with HF is limited. To our knowledge, only one study has developed a prediction model for AF, namely among 623 AF-free HF patients with reduced LVEF<45%.9 The outcome was persistent AF and the patients were followed for at least 1 year, but no specific time horizon was chosen.9 In contrast to our study, the authors applied backward selection to identify significant predictors that were included to establish a risk score. Furthermore, 76 patients died with no AF during follow-up but the analysis did not include death as a competing risk, and the study reported no predictive performance of the model.9 The C2HEST score was originally developed in a general Chinese population and was based on a medical insurance database.18 Recently, Liang et al tested the C2HEST score in a HFpEF population (LVEF ≥45%) with 2202 AF-free patients from the TOPCAT trial.10 The fact that one of the inclusion criteria of the TOPCAT trial was history of HF hospitalisation within the previous 12 months or elevated brain natriuretic peptide within 60 days before randomisation questions the time origin and applicability of the C2HEST model.19 In comparison, the time origin of our models was on the day of the HF diagnosis, which may be simpler to implement for clinicians and patients. Liang et al reported a time-dependent AUC at 5 years of 0.69 (95% CI 0.64 to 0.74) but a measure of calibration was not reported.10 A direct comparison of predictive performance between our models and the C2HEST score would require the models to be applied in the same dataset, but we were unable to use the C2HEST in our data because the estimates of the competing risk of death were not reported.18

The COMMANDER HF trial did not demonstrate benefits of using anticoagulation for patients with HF and no AF.20 As far as we know, no study has identified patients with HF at high risk for AF and used a randomised control trial to examine efficacy and safety. Our model may have the potential to identify a high-risk group of patients with HF who may randomised to determine if they would benefit from anticoagulation or initiation of more aggressive control of AF risk factors to prevent or postpone the onset of AF.

Identifying HF patients at high risk for incident AF appears to be challenging with administrative data. Prospective studies are needed to quantify the clinical effect of implementing the prediction model in routine practice and to assess possibilities of improvement given the collection of more candidate predictor variables. Identification of a more robust model may form the basis of a clinical trial that aims to quantify the clinical effects of the prediction model. In the interim, patients with HF are expected to have frequent healthcare contact, and opportunistic screening of all HF patients should be considered at routine follow-ups or other contacts in primary and secondary care. The 2020 European Society of Cardiology (ESC) guideline for AF lists several screening methods, such as pulse palpation and Holter monitoring, but no specific recommendation is given for patients with HF.21 Confirmation of the AF diagnosis and appropriate characterisation of the arrhythmia is part the holistic or integrated care pathway approach to AF management.21 In addition to early detection of AF, clinical staff should therefore prioritise optimisation of HF-related care, such as patient education and medical therapy, and the management of comorbidities. Importantly, use of ACE-inhibitors, angiotensin receptor blockers, mineralocorticoid receptor antagonist, beta-blockers and SGLT-2 inhibitors may reduce the incidence of AF.22–24

Limitations

We may have missed patients with prior AF whose diagnosis was not recorded or not recognised in the registries. We were unable to clinically evaluate the patients for undiagnosed AF, and to subclassify the type of AF. Studies that validated the AF diagnosis coded in the registry have shown positive predictive values of 92% and 95%.14 However, non-differential misclassification of AF registration is possible. The time of registered AF may be wrong, as we only have information on the time at the diagnosis and not the time at the development. Data on death from the Civil Registration System are considered highly accurate.

We had no data on body mass index, natriuretic peptides, left atrial volume, left atrial fibrosis or atrial ectopic activity. However, information on predictors such as left atrial fibrosis and atrial ectopic may be costly and time-consuming to clinically obtain and therefore may not be resource effective to include in a prediction model applicable in routine practice.

The generalisability of our prediction models may be reduced by the inclusion and exclusion criteria of the DHFR and the fact that the population consisted mainly of European ancestry individuals. Data from the Framingham Heart Study has shown a higher burden of prevalent AF among HF patients with preserved LVEF,3 which may account for the lower proportion of patients with preserved LVEF observed in our study compared with most population-based studies. Hence, the generalisability of our findings to HF with preserved LVEF, which excessively affects women, is uncertain. Furthermore, the clinical performance of applying the model in routine clinical settings or external validation has not been determined.

Conclusions

We developed prediction models for the 1-year risk of AF in HF based on predictors obtained from administrative clinical data. Further studies are needed to determine whether it is possible to predict AF risk among patients with HF more accurately and if so, to quantify the clinical effects of implementing such a model in routine practice. In the interim, the challenge of identifying HF patients at high risk of AF supports opportunistic screening of HF patients for AF onset and holistic optimisation of HF care to prevent AF.