Methods
We reported this study in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis statement.14
Netherlands Institute for Health Services Research Primary Care Database (Nivel-PCD)
The Nivel-PCD consists of routine primary care EHR data from over 1.8 million patients from over 500 general practices across the Netherlands in 2019. The database includes information on diagnoses, consultations, prescribed medication and (laboratory) measurements.
In the Netherlands, all non-institutionalised inhabitants are obligatorily registered with one general practitioner (GP) as their primary care provider. In general practices, all encounters are linked to International Classification of Primary Care version 1 (ICPC-1) diagnostic codes in the EHR.15 Since GPs have a central role in Dutch primary care as the gatekeepers of referrals to specialised care, all specialists report their findings back to the GP. The GP then links this correspondence to either an existing or a new ICPC-1 code. Therefore, GPs have a complete overview of morbidity of their patients. Nivel-PCD constructs episodes of illness with associated start and end date using multiple markers of diagnostic information in the EHRs (see online supplemental methods for details). This process has been described previously and has been shown to provide an accurate assessment of morbidity rates.16
Prescriptions are recorded according to the Anatomical Therapeutic Chemical classification system. Since GPs in the Netherlands are often tasked with providing repeat prescriptions for medication initiated by specialists, Nivel-PCD widely covers prescriptions for chronic morbidities initiated by both GPs and specialists. Other data including but not limited to sex, age, smoking status and body measurements are stored as separate parameters. Due to prohibitions by Dutch law, information on ethnic background is not systematically recorded in EHRs.17
Data extraction
We used data from 1 January 2013 to 31 December 2018. Baseline was 1 January 2014, with the EHR data recorded during the calendar year 2013 serving as baseline data in order to include only recent measurement and medication data. When multiple entries for one variable were available in 2013, we used the recorded entry closest to baseline, 1 January 2014. Detailed operational definitions for the CHARGE-AF variables are shown in the online supplemental methods.
We assumed absence of baseline morbidity or smoking when no episode of illness or status as active smoker was recorded for a disease prior to baseline.18 Age and sex were available for all patients. When a patient had no recorded height, weight, SBP or DBP during calendar year 2013, we considered these measurements as missing. We applied no imputation techniques for missing CHARGE-AF measurement variables since we expected these data not to be missing at random.
Study population
We included patients aged 40 years or older and free of AF at baseline who were registered at one of the Nivel-PCD associated practices during the full calendar year 2013. We excluded patients from practices without follow-up data beyond 2013 since inclusions of such data would automatically render patients without follow-up data. Among included patients, we distinguished those with missing data for one or more of the four body measurements included in the CHARGE-AF model (height, weight, SBP and DBP)—‘incomplete cases’—and those with baseline data available for all these measurements—‘complete cases’.
Outcomes
The primary outcome was newly diagnosed AF. We defined AF as the recording of the ICPC-1 code K78 ‘AF or atrial flutter’ or any recording of a treating physician for AF or participation in AF care programme. We defined the date of AF diagnosis as the first date associated with either of these AF entries. We were unable to ascertain death as the reason for loss of follow-up, since date and cause of death are not validly recorded in primary care EHRs.
Follow-up
Patient registration at a Nivel-PCD associated practice is assessed quarterly. Reasons for loss of follow-up in Nivel-PCD are death, exclusion of practice due to low quality data, technical failure of data extraction or a patient moving away from their Nivel-PCD associated practice. We defined loss to follow-up as the first day of a period of four or more consecutive quarters of absent data, or the first day of a period of consecutive quarters of absent data that included the last quarter of calendar year 2018. We censored follow-up in our analyses at time of AF diagnosis, loss to follow-up or end of the 5-year observation window (31 December 2018), whichever occurred first.
The CHARGE-AF model
We calculated each individual’s CHARGE-AF predicted 5-year AF risk using the formula from the original derivation article5: 1–0.9718412736 ∧ exp (ΣbX − 12.5815600). Here, ΣbX is calculated as: (age in years/5) * 0.5083+ethnicity (Caucasian/white) * 0.46491 + (height in centimetres/10) * 0.2478 + (weight in kg/15) * 0.1155 + (SBP in mm Hg/20) * 0.1972 – (DBP in mm Hg/10) * 0.1013+current smoking * 0.35931+antihypertensive medication use * 0.34889+DM * 0.23666+heart failure * 0.70127+MI * 0.49659.
The Dutch population is ~95% Caucasian/white,19 and Nivel-PCD contains a representative sample of Dutch inhabitants.20 In absence of ethnicity data in Nivel-PCD, we therefore assumed ethnicity as Caucasian/white for all Nivel-PCD subjects. We chose this approach in accordance with previous work and because the CHARGE-AF formula results in a prediction of an individual’s absolute 5-year AF risk. Leaving ethnicity out of the formula would lead to a systematic underestimation of absolute risk by the model.21
We assessed the relative contribution of each CHARGE-AF variable to an increase in baseline CHARGE-AF score by multiplying the mean value of each risk factor by its CHARGE-AF coefficient within successive strata of baseline CHARGE-AF risk.
Statistical analysis
We reported continuous variables as means±SD, ordinal variables as median and IQR, and dichotomous variables as number and percentages. We assessed differences in baseline parameters using the unpaired t-test with Welch’s approximation, the Wilcoxon rank-sum test and the χ2 test where appropriate. We assessed significance in all analyses at the 0.05 level.
We estimated the cumulative 5-year AF incidence using survival analysis and presented it as number and percentages as well as incidence per 1000 person years using survival-time analysis. We plotted the cumulative AF incidence using a Kaplan-Meier failure plot.
In validation of the CHARGE-AF model for 5-year AF risk, we assessed discrimination by the C-statistic and 95% CI. We assessed calibration by the calibration plot according to deciles of baseline CHARGE-AF risk,22 by the calibration slope of the linear predictor and its 95% CI22 and by the Hosmer-Lemeshow goodness-of-fit test modified for survival analyses by D’Agostino and Nam.23 A Nam-D’Agostino χ2 with p value <0.05 indicated insufficient calibration.24 A calibration slope significantly smaller than 1 indicated overfitting of the CHARGE-AF model when applied to our cohort.22 Finally, we assessed calibration by the Kaplan-Meier failure function stratified according to baseline CHARGE-AF risk. For this, we used categories <2.5%, 2.5%–5% and >5% predicted risk in accordance with the original CHARGE-AF publication.5
We compared CHARGE-AF’s discriminatory abilities for risk of newly diagnosed AF with that of two other easily obtainable predictors that have previously been shown to predictive of new AF: age alone as continuous linear variable and the CHA2DS2-VASc score25 as a categorical variable.4 6 26–29 We assessed net reclassification improvement (NRI) by the NRI index and 95% CI for 5-year AF of CHARGE-AF versus age alone as well as CHARGE-AF versus CHA2DS2-VASc using 200 bootstrap samples in low, intermediate and high AF risk categories with cut-offs at 2.5% and 5% predicted AF risk.22 Data for age and CHA2DS2-VASc score were complete in all participants.
We performed stratified analyses according to age, sex and CHA2DS2-VASc score in all validation analyses in order to assess whether CHARGE-AF, CHA2DS2-VASc score and age would perform better among clinically relevant subgroups, and whether different predictors for newly diagnosed AF outperformed others in any of these subgroups.
Finally, we assessed the clinical implications of applying different cut-offs for dichotomisation of baseline CHARGE-AF risk into high-risk and low-risk groups. We applied cut-offs 2.5%, 5% and 10% baseline CHARGE-AF risk and assessed for each cut-off: the proportion of patients that would be counted as high risk; the proportion of total 5-year AF cases that would be among high-risk patients; 5-year AF incidence among those counted as high-risk patients; the proportion of high-risk patients with a CHA2DS2-VASc score ≥2 (corresponding with the need for oral anticoagulation therapy2); and the proportion of high-risk 5-year AF cases with a CHA2DS2-VASc score ≥2. In order to formally test whether the applied cut-offs were able to discriminate between high and low risk of 5-year AF incidence, we provided the unadjusted HR for 5-year AF incidence of high-risk patients with low-risk patients as reference using a Cox proportional hazards model.
We used Stata V.15.030 and R V.1.1.46331 using the haven, nricens, polspline, rms, survival and survminer packages for our analyses.
Ethics and study approval
Dutch law allows the use of EHRs for research purposes under certain conditions. According to this legislation, neither obtaining informed consent from patients nor approval by a medical ethics committee is obligatory for this type of observational studies containing no directly identifiable data (Dutch Civil Law, Article 7:458).17