Discussion
In this study, we have systematically identified all RCTs investigating the efficacy of pharmacological interventions in HFpEF and have, for the first time, comprehensively assessed the reporting quality of these publications. We show a trend in improving reporting quality of HFpEF RCTs over time and following updates to CONSORT guidelines, though there remains a considerable variation in reporting quality, with many important aspects relating to trial methodology and results consistently under-reported. The mean CONSORT 2010 score for HFpEF RCTs is 55.4%, comparable to the findings from similar contemporary studies in other fields of medicine and surgery.13–17 We also identified a strong positive correlation between the CONSORT score and metrics of journal impact and author number.
Critical appraisal of the validity and generalisability of study findings requires comprehensive reporting of clinical trials, with discrepancies associated with variations in effect estimate that may affect management decisions by doctors and policymakers. Although several criteria were well reported, we identified significant deficiencies in trial methodology and reporting of results. These have impact on readers' assessments of study quality and risk of bias, and will reduce the quality and accuracy of meta-analyses. Similar reviews have echoed these findings, highlighting particularly poor reporting of details surrounding randomisation.2 ,14–18 Chen and Liu2 showed that the reporting of methodology in RCTs in a high impact cardiology journal was inadequate, with 70% of studies reporting less than half of methodological items sufficiently, with randomisation and blinding frequently affected.
The CONSORT 2010 score demonstrates positive correlation with the year of publication, with articles published after 2010 scoring more favourably than those from earlier periods. The CONSORT update in 2010 is likely to have generated increased awareness of the importance of high reporting standards. A Cochrane review in 2012 demonstrating superior reporting quality of articles published by journals that endorse CONSORT guidelines compared with those that did not, and improvements after a journal's endorsement of CONSORT.19 Accordingly, almost 600 biomedical journals, including Open Heart, endorse CONSORT and advise adherence from submitting authors.20
We demonstrate positive correlation between measures of journal impact with CONSORT score. Although authors will aim to submit the most thoroughly reported studies to the most influential publishers, higher impact factor journals have higher rejection rates and will therefore impose more rigorous presubmission checks and review processes. Furthermore, better-reported studies may be more extensively cited, with a corresponding positive influence on journal impact factor.
Significance of reporting quality in HFpEF clinical trials
As in all conditions with uncertain therapies, the meta-analysis of pooled trial and registry data are important in increasing our understanding of possible treatments for HFpEF. High-quality meta-analysis depends on comprehensive and accurate trial information, with particular emphasis on the population, intervention and outcome measures, and trial design. Although there have been many recent meta-analyses of specific drug classes in HFpEF,21–23 the last comprehensive review for all drug therapies in HFpEF was published in 2011 and identified no reduction in all-cause mortality for drug classes, individually and combined.24 Since then, there have been a large number of new trials—including at least 14 RCTs identified in this study—some of which have evaluated novel treatments and there is value in including these in an updated review.25
Detailed reporting of study inclusion criteria and participant demographics are particularly important in HFpEF clinical trials. Trial inclusion criteria are heterogeneous and have changed as the understanding of HFpEF as a disease syndrome has evolved.26 Combinations of LVEF cut-offs, prior heart failure hospitalisation, clinical features, the presence or absence of comorbidities, echocardiographic and haemodynamic parameters, and natriuretic peptide levels are being used as inclusion criteria. In a recent analysis comparing three major HFpEF trials (the Digitalis Investigation Group-Preserved Ejection Fraction (DIG-PEF), Candesartan in Heart Failure Assessment of Reduction in Mortality and morbidity (CHARM-Preserved) and Irbesartan in Heart Failure with Preserved systolic function (I-PRESERVE)), the authors found that the I-PRESERVE study population was most representative of HFpEF patients in the community, possibly attributable to its comparatively stringent inclusion criteria.27 Although strict inclusion criteria is likely to reduce the recruitment of patients with LV systolic dysfunction, exclusion of significant comorbidities may result in a non-representative study population and reduce the applicability of results to real-life settings. Analysing the effects that patient selection criteria have on published trial outcomes will be important in optimising future trial design.
The pathophysiological role of non-cardiac comorbidities in patients with HFpEF is becoming well characterised and better understood.28 ,29 HFpEF populations demonstrate a high prevalence of pulmonary disease, diabetes mellitus and cardiometabolic disorders, anaemia, chronic kidney disease (CKD) and obesity30–32 and are independently associated with poor outcomes.33–36 The distribution of such comorbidities in clinical trials is likely to influence results, and indeed it has been argued that the absence of positive outcomes may be related to inclusion of non-HFpEF patients.37 Although patient demographics were well reported (91% of all trials), detailed description of important comorbidities was much poorer: diabetes mellitus was reported in 70% of trials, atrial fibrillation in 52%, COPD in 18%, anaemia in 9%, CKD in 9% and obesity in 6%. This shows that while adherence to reporting standards is to be encouraged, it is important that salient information of particular interest in HFpEF should be provided.
It is increasingly accepted that HFpEF is a heterogeneous condition with a range of disease phenotypes. Using the novel approach of latent class analysis (LCA), Kao et al38 used patients enrolled in the I-PRESERVE study and identified a significant positive response to irbesartan compared with placebo in a group characterised by high prevalence of obesity, diabetes mellitus and hyperlipidaemia. Although LCA is one approach that can identify subgroups with differing prognoses and responses to treatments, this requires patient-level data that can be challenging to access.39 Consequently, there is strength in combining individual trial subgroup analyses using meta-analysis. This approach of investigating treatment effects on different patient groups stratified by variables will likely yield insight into which groups are likely to respond to therapy. The ability and success of this approach depends on the clear reporting of all prespecified analyses and primary and secondary outcomes, including subgroup analyses and exploratory outcomes. A review of heart failure disease-management programmes found that significant and clinically important differences within subgroups are not meta-analysed due to a dearth of available reported data,40 a finding that is likely to be true for HFpEF. Similarly, another group undertaking meta-analysis of the effectiveness of pharmacological treatments in patients with NYHA class I or II symptoms were unable to do so due to poor reporting and non-disclosure of data.39
Important aspects of HFpEF trial design can influence outcomes and the investigators' ability to detect differences.41 As demonstrated in the Perindopril in Elderly People with Chronic Heart Failure (PEP-CHF) study, the primary end point of all-cause mortality and unplanned heart failure hospitalisation trended towards statistical significance at 12 months (HR 0.69, p=0.055). However, these beneficial effects were lost by the end of the trial (HR 0.92, p=0.545). This was due to a significant proportion of patients in the placebo arm going onto open-label ACEi, resulting in an eventual study power of just 35%. The CHARM-preserved trial generated neutral results, though study-drug discontinuation for adverse events or laboratory abnormalities was significantly higher in the treatment arm than in the placebo arm (18% vs 14%, p=0.001). The use of a run-in period to establish drug tolerance may reduce the differential effects of study drug discontinuation. Although trial results have been largely neutral, it cannot be conclusively argued that perindopril, candesartan or other HFpEF trial treatments are of no clinical benefit, as trial design and limitations can clearly affect the ability of studies to detect meaningful differences and must therefore be clearly reported.
Geographic variation in the rates of mortality and hospitalisation in HFpEF clinical trials has been well described.42 In the Treatment of Preserved Cardiac Function with Aldosterone Antagonist Trial (TOPCAT), the primary end point rate was far lower in Russia and Georgia (unadjusted rate of 2.3 per 100 patient-years in placebo group) than in the Americas (12.6 per 100 patient-years).43 Similar results were found for the CHARM-Preserved and I-PRESERVE trials, with unadjusted rates of mortality, and adjusted rates for hospitalisation for heart failure greater in North America compared with Eastern Europe and Russia.42 This variation is more specific for trials of HFpEF rather than trials of HFrEF and may reflect the logistical challenges and disparate criteria for diagnosing HFpEF, or differing provision of healthcare services available.42 Regional variations in outcome rates are an important consideration in international, multicentre RCTs, with clear reporting of trial centre locations and number of patients enrolled, event rates by regional area and subgroup analysis by region all important in understanding the influence that geographical variation has on treatment outcomes.
Study limitations
One limitation of our study methods is that the outcome measure, CONSORT 2010 score, requires subjective assessment. Publications from CONSORT provide good guidance on this,1 and scoring was carried out blindly by two assessors in this study. We showed a high degree of interobserver agreement, comparable with other similar published studies, with all discrepancies resolved by discussion. The use of ‘N/A’ as an additional qualifier protects studies against falsely low scores, by only scoring articles out of a relevant total. It must be emphasised that a study's CONSORT score does not reflect the quality of the study or its risk of bias. Rather, high-quality reporting is necessary for accurate meta-analysis, trial evaluation to aid interpretation and implementation, and allows contributions to the wider body of work. One argument against the rigid use of CONSORT or its surrogate as a marker of article reporting quality is that certain aspects may be deemed to be unnecessary (eg, absolute risk difference may be easily calculated from the individual event rates), and indeed reviewers may ask for superfluous information to be removed. In our study, we are unable to discern whether failure to report a CONSORT item has occurred due to authors, editors or reviewers.