Discussion
We used a novel approach to assess the potential influence of trial inclusion criteria on reported safety and efficacy by emulating two landmark NOAC versus VKA trials in real-world AF patients. Patient baseline characteristics and 2-year follow-up matched conditions of the ARISTOTLE and ROCKET AF landmark studies.5 6 The main finding is that the relative effectiveness of all three OACs was similar when applying the ROCKET AF criteria, and thus selecting patients at a higher risk of stroke. By contrast, when the more inclusive ARISTOTLE criteria were applied, apixaban, and to a lesser extent rivaroxaban, demonstrated significant clinical benefits in terms of stroke/SE risk compared with VKA. These findings are not intended to guide the prescription of specific OACs in stroke prophylaxis, rather they highlight the importance of selection criteria when designing and interpreting clinical trials.
The adjusted HRs in observational data showed considerable overlap with the results of the original trials in all selected outcomes. Differences in the risk of selected outcomes between ROCKET AF and our emulation of the trial could be due to the fact that the trial design limited the number of recruited low-risk patients. We did not apply the same restrictions to the GARFIELD-AF dataset because doing so would have further reduced the number of eligible rivaroxaban users, and thereby the statistical power in our analysis.
Interestingly, applying the ARISTOTLE selection criteria to the GARFIELD-AF registry did not replicate the original trial’s finding that apixaban reduces major bleeding compared with VKA.5 This discrepancy might be due to differences in the proportion of patients with moderate-to-severe renal impairment. In our cohort, 8.8% of apixaban-treated and 6.3% of VKA-treated patients had moderate-to-severe renal impairment, compared with approximately 16.5% in both arms of ARISTOTLE. Subgroup analysis of ARISTOTLE showed apixaban’s protection against major bleeding varied according to renal impairment level, benefiting those with moderate-to-severe impairment but not those with no impairment. Therefore, the lower proportion of patients with significant renal impairment in our study could partially explain the lack of association between apixaban and major bleeding.
Randomised clinical trials remain the gold standard of medical research. However, results of NOAC versus VKA trials are difficult to compare due to markedly different baseline characteristics of the participants, as outlined by numerous reports.4 17–20 When direct comparisons through randomised trials are unavailable, high-quality observational data are required for assessing relative drug performance.21 22 The GARFIELD-AF registry contains extensive data for baseline characteristics and outcomes over a 2-year follow-up period, similar to the original ARISTOTLE and ROCKET AF trials. This enabled a valid emulation of both trials’ inclusion and exclusion criteria.
Several studies suggest that NOACs might have different risk/benefit profiles in AF patients, especially when not at a high risk of stroke. The largest and most comprehensive observational study comparing NOACs in patients with AF found that apixaban was associated with the lowest risk for gastrointestinal bleeding among all NOACs, but similar rates of ischaemic stroke or SE and intracranial haemorrhage. Estimated risks among users of apixaban versus rivaroxaban for gastrointestinal bleeding (HR 0.72, 95% CI 0.66 to 0.79) and ischaemic stroke/SE (HR 0.89, 95% CI 0.78 to 1.02) were in line with earlier large observational studies and meta-analysis.23 Ray et al, with a larger number of patients on apixaban and rivaroxaban, provided more precise estimates for intracranial haemorrhage (HR 0.68, 95% CI 0.59 to 0.77) and all-cause mortality (HR 0.94, 95% CI 0.92 to 0.98).21 A method for indirectly comparing NOACs is network meta-analysis of randomised controlled trials. A systematic review of 22 such studies concluded that no significant difference existed between apixaban and rivaroxaban regarding the risk for stroke/SE, but that apixaban had a lower risk for major bleeding.24
As mentioned above, these results need to be interpreted cautiously due to differences between the designs of the original trials.4 Interestingly, one meta-analysis observed that among very high-risk patients in ARISTOTLE and ROCKET AF (CHADS2 score ≥3) the risks of stroke or death were similar, irrespective of the NOAC used. However, unlike in our ROCKET AF selected patients, significantly fewer major haemorrhage events occurred in the high-risk apixaban compared with rivaroxaban users.25
The GARFIELD-AF registry’s active enrolment coincided with the emergence of NOACs for use in non-valvular AF. Previous work in the registry confirmed safety and efficacy of NOACs versus VKA overall, and in subgroup populations of newly diagnosed AF patients.26 27
Strengths and limitations
The main strengths of this analysis arise from the GARFIELD-AF registry as the largest worldwide prospective observational cohort of newly diagnosed AF patients. A detailed and highly complete follow-up allowed for the assessment of 2-year outcomes and emulation of two randomised trials in the same dataset. Studies with smaller sample sizes or shorter follow-up can be insufficiently powered to detect differences in the rates of rare events between treatment groups.
The extensive baseline investigation enabled assessment of all inclusion, as well as the main exclusion criteria of the ARISTOTLE and ROCKET AF trials. It also allowed for a detailed presentation of baseline characteristics, which showed the effect of different eligibility criteria on comorbidity and risk profiles of eligible participants. Further strengths are the use of propensity score overlap weights to ensure comparisons of NOAC versus VKA in patients with similar baseline properties, and use of similar outcome definitions to compare results, unlike the different definitions used in the original trials.4–6
This work had several limitations. Treatment was defined as first OAC received, analogous to an intention-to-treat analysis, not accounting for non-recommended dosing, treatment switches or cessation. Not all exclusion criteria defined in the original ARISTOTLE and ROCKET AF protocols could be operationalised in the GARFIELD-AF dataset. However, the main reasons for non-operationalised trial exclusion were clinically established contraindications to any OAC, for example, planned major surgery or recent stroke or major bleeding. Therefore, it was unlikely that such patients were enrolled in GARFIELD-AF and thus wrongfully included in the current analysis.
The geographical catchment of the trials differed from that of GARFIELD-AF which covered over 30 countries. We previously observed geographical variation in the outcomes of OAC treatment in AF patients that was not explained by baseline risk factors and likely due to regional differences in clinical practice, including the management of comorbidities.28 Patients of black African ancestry were likely underrepresented in all of the studies.
Due to the more stringent selection criteria for ROCKET AF, this trial’s emulation contained fewer patients, resulting in wider CIs and less certainty in the observed trends. Unlike ROCKET AF, we did not restrict the proportion of patients with CHADS2 score ≤2 to a maximum of 10% of the sample. Consequently, the overall cardiovascular risk in the population of the emulated trial was lower than in the original trial.
Fourth, we did not include GARFIELD-AF participants on dabigatran or edoxaban at baseline. The registry contained few edoxaban users because this NOAC had not yet been widely introduced during GARFIELD-AF enrolment. As for dabigatran, its landmark trial reported separate results for the doses 150 and 110 mg.29 However, the GARFIELD-AF registry contained too few dabigatran users to allow for stratification into separate dabigatran dosage arms in our analyses.
Finally, although we adjusted for an extensive list of confounding factors, in an observational study, the possibility of unmeasured confounding cannot be ruled out. Therefore, definite conclusions regarding superiority of any NOAC will require direct comparison in carefully designed randomised trials.