Original research article

Reporting trends of randomised controlled trials in heart failure with preserved ejection fraction: a systematic review

Abstract

Background Heart failure with preserved ejection fraction (HFpEF) causes significant cardiovascular morbidity and mortality. Current consensus guidelines reflect the neutral results from randomised controlled trials (RCTs). Adequate trial reporting is a fundamental requirement before concluding on RCT intervention efficacy and is necessary for accurate meta-analysis and to provide insight into future trial design. The Consolidated Standards of Reporting Trials (CONSORT) 2010 statement provides a framework for complete trial reporting. Reporting quality of HFpEF RCTs has not been previously assessed, and this represents an important validation of reporting qualities to date.

Objectives The aim was to systematically identify RCTs investigating the efficacy of pharmacological therapies in HFpEF and to assess the quality of reporting using the CONSORT 2010 statement.

Methods MEDLINE, EMBASE and CENTRAL databases were searched from January 1996 to November 2015, with RCTs assessing pharmacological therapies on clinical outcomes in HFpEF patients included. The quality of reporting was assessed against the CONSORT 2010 checklist.

Results A total of 33 RCTs were included. The mean CONSORT score was 55.4% (SD 17.2%). The CONSORT score was strongly correlated with journal impact factor (r=0.53, p=0.003) and publication year (r=0.50, p=0.003). Articles published after the introduction of CONSORT 2010 statement had a significantly higher mean score compared with those published before (64% vs 50%, p=0.02).

Conclusions Although the CONSORT score has increased with time, a significant proportion of HFpEF RCTs showed inadequate reporting standards. The level of adherence to CONSORT criteria could have an impact on the validity of trials and hence the interpretation of intervention efficacy. We recommend improving compliance with the CONSORT statement for future RCTs.

Key questions

What is already known about this subject?

  • Several studies have shown that a significant proportion of randomised controlled trials (RCTs) demonstrate poor reporting standards despite the availability of the Consolidated Standards of Reporting Trials (CONSORT) statement. Heart failure with preserved ejection fraction (HFpEF) is a considerable source of morbidity and mortality, with no known disease-modifying treatments. The role of reporting of HFpEF trial findings has not been assessed, and the size of the problem is not known.

What does this study add?

  • We present the first systematic assessment of reporting standards for RCTs investigating therapies for HFpEF using CONSORT, and identify trends and areas which authors, reviewers and journal editorial boards can target for improvement.

How might this impact on clinical practice?

  • Improvements in trial reporting and provision of relevant information for HFpEF will allow important post hoc analysis of trial findings and guide future trial design. This will provide a greater understanding of HFpEF heterogeneity and help to identify phenotypes with tailored therapies.

Introduction

Randomised controlled trials (RCTs) along with meta-analysis provide the highest level of evidence on the efficacy of healthcare interventions. Accurate interpretation of results and critical appraisal of RCTs depends on adequate reporting and a study design that is free from bias. Studies have shown poor reporting standards in RCTs,1 particularly so in areas concerning trial methodology.2 ,3 The Consolidated Standards of Reporting Trials (CONSORT) statement,4 updated in 2010, aims to improve the quality of reporting clinical trials, allowing results to be better interpreted and critically appraised.

Heart failure with preserved left ventricular ejection fraction (HFpEF) is a major cause of morbidity and mortality, comparable to heart failure with reduced left ventricular ejection fraction (HFrEF). HFpEF is the cause of symptomatic heart failure in over half of cases, with increasing prevalence in an increasingly ageing population.5 The recently published European Society of Cardiology heart failure guidelines reflect the absence of disease-modifying effects demonstrated in HFpEF RCTs and meta-analyses.6–10

The absence of evidence for HFpEF treatment efficacy may be due to differing pathophysiological processes compared with that for HFrEF, difficulty in clinical diagnosis and heterogeneity of included study populations with subgroup phenotypes. In addition to these well-recognised issues, clear reporting of HFpEF trials is a fundamental requirement to assess the appraisal of methodological approaches and validity of results, as well as for the accuracy of meta-analysis and subgroup analysis. Adequate reporting of information specifically relevant to issues in the HFpEF trial design will also help direct future clinical trial design to optimise effectiveness. Trends in the quality of HFpEF trial reporting and areas for improvement that will be of clinical and research benefit have not previously been reported.

The aim of this study was to systematically identify RCTs investigating the efficacy of pharmacological therapies in HFpEF published between 1996 and 2015, to assess quality of reporting using the CONSORT 2010 statement and also to identify temporal trends.

Methods

This article has been reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses.11

Search strategy

We systematically searched MEDLINE, EMBASE and CENTRAL databases for all clinical trials using the keywords: heart failure and normal ejection fraction, heart failure and preserved cardiac function, heart failure and preserved ejection fraction, diastolic heart failure, diastolic dysfunction, HFpEF and HFnEF (see online supplementary material table S1 for full search protocol). Results were filtered for RCTs using predesigned and validated filters. The search was run on 20 November 2015, with results included from 1 January 1996 to 20 November 2015. The reference lists of included studies and relevant systematic reviews were searched for additional studies. No published study protocol exists for this systematic review.

Inclusion and exclusion criteria

Inclusion criteria were: (1) RCT, (2) trial inclusion criteria specifying heart failure signs and symptoms with left ventricular ejection fraction (LVEF) ≥40%, (3) pharmacological intervention with placebo or pharmacological comparison and (4) outcomes including all-cause and cardiovascular mortality, hospitalisation and changes in New York Heart Association functional class (NYHA), exercise capacity (6-min walking distance, VO2 max) or quality of life (measured using the Minnesota Living with Heart Failure Questionnaire).

Exclusion criteria were: (1) treatment of acute heart failure, (2) treatment duration <7 days, (3) studies using healthy controls, (4) non-English language publications, (5) abstracts and conference publications and (6) unpublished studies.

Study selection

After the removal of duplicates, the title and abstracts of initial search results were screened for relevance. The full texts of remaining results were independently assessed by three authors (SLZ, FTC, EM) for inclusion based on predetermined inclusion and exclusion criteria. The final list of included studies was decided by discussion between authors and required full agreement. Remaining disagreements were resolved by AAN.

Assessment of quality

Reporting quality was assessed using the CONSORT 2010 score (table 1). Each item on the checklist was answered ‘Yes’, ‘No’ or ‘Not Applicable (NA)’, with each ‘Yes’ scoring 1 point. Each item was weighted equally. An overall reporting quality score percentage was calculated for each study by dividing the number of points by the total available (excluding NA). Two independent physician reviewers (SLZ and FTC) assessed the quality of reporting. Discrepancies in scoring were resolved with discussion between the two reviewers. Any unresolved discrepancies were decided by a third reviewer (EM).

Table 1
|
CONSORT 2010 Checklist and percentage of articles that adequately report each CONSORT 2010 checklist item

Statistical analysis

Interobserver analysis scores were assessed for correlation using Cohen's κ score. Pearson's product–moment correlation was used to assess the correlation between CONSORT score and prespecified variables (journal impact factor, journal 5-year impact factor, article influence, eigenfactor, year of publication, author number, participant number and trial length). Mann-Whitney U test was used to test for binary non-parametric data. Journal metrics were obtained from InCites Journal Citation Reports.12 Risk of bias for each study was not analysed. Statistical analysis was carried out using IBM SPSS Statistics V.22 and Microsoft Excel 2011.

Results

Study search results and interobserver variability

Initial search identified 3561 potential studies (MEDLINE n=1128, EMBASE n=1442, CENTRAL n=991). After the removal of duplicates and non-relevant articles, the full texts of 172 articles were reviewed (figure 1). The reference list of included articles and systematic review articles was searched for additional trials, identifying one other study. In total, 33 studies were included for analysis (see online supplementary material table S2 for study summary). Cohen's κ score for interobserver variability of CONSORT checklist scoring was 0.86 (91.3% observed agreement, 1115 of 1221 items).

Figure 1
Figure 1

Flow chart of study search, selection, inclusion and exclusion of articles.

Reporting quality

The mean article CONSORT score was 55.4% (range 23.3–93.8%, SD 17.2%; figure 2 and table 1). No article scored 100%. The best reported criteria were: protocol (item 24, 100%, 13/13), changes to methodology (item 3b, 100%, 6/6), interim analysis (item 7b, 100%, 4/4), interpretation (item 22, 97%, 32/33), eligibility criteria (item 4a, 97%, 32/33), statistical methods (item 12a, 97.0%, 32/33), background (item 21, 93.9%, 31/33) and baseline data (item 15, 90.9%, 30/33; figure 2 and table 1). The worst reported criteria were presentation of binary outcomes (item 17b, 0%, 0/7), abstract (item 1b, 9.1%, 3/33), trial termination (item 14b, 12.1% 4/33) and allocation concealment (item 9, 12.1%, 4/33).

Figure 2
Figure 2

Percentage of studies adequately reporting each CONSORT 2010 checklist item where applicable.

Time trend and CONSORT statement updates

The CONSORT 2010 score increased with time and showed strong correlation with the year of publication (r=0.50, p=0.003; figure 3). Articles published after the CONSORT 1996 statement but before the CONSORT 2001 update had a mean score of 41.0% (SD 3.3%; figure 4). Articles published between CONSORT 2001 and CONSORT 2010 had a mean score of 50.2% (SD 14.5%). Articles published after CONSORT 2010 had a mean score of 63.8% (SD 18.1%). There was a significant increase in the mean score of articles published after CONSORT 2010 compared with those published between CONSORT 2001 and 2010 (difference between means 13.6%, p=0.02), whereas the difference between CONSORT 1996 and 2010 scores approached significance (difference between means 22.8%, p=0.07). There was no significant difference between CONSORT 1996 and 2001 scores (difference between means 9.2%, p=0.53).

Figure 3
Figure 3

The year of publication and CONSORT 2010 score.

Figure 4
Figure 4

Individual study CONSORT 2010 score grouped by latest CONSORT statement at the time of publication. Individual study CONSORT 2010 score (open circles) grouped by available CONSORT statement (1996, 2001 and 2010) at the time of study publication, with mean scores during each period (filled circles).

Correlation with journal impact factor

The CONSORT score was strongly correlated with journal impact factor (r=0.53, p=0.003) and journal 5-year impact factor (r=0.49, p=0.006). The score was also strongly correlated with article influence score (r=0.50, p=0.005) and with eigenfactor score (r=0.36, p=0.05).

Correlation with other variables

The CONSORT score was strongly correlated with author number (r=0.52, p=0.002) but not with participant number (r=0.30, p=0.09) or treatment duration (r=0.17, p=0.34). Trials that included ≥100 patients had a higher CONSORT score (63.7% vs. 46.5%, p=0.011). There was no significant difference for treatment duration (≥12 m or <12 m), trial assignment (parallel or crossover) or funding sponsor (national/charity/academic or pharmaceutical company/industry; table 2).

Table 2
|
Effects of trial characteristics on CONSORT score

Reporting quality of Methods and Results

The mean reporting score for the Methods section (checklist items 3a–12b) was 51.9% (range 15.4–92.9%, SD 19.7%) and that for the Results section (checklist items 13a–19) was 51.9% (range 12.5–100%, SD 24.1%). There remained correlation with the year of publication and journal's impact factor (Methods: r=0.32 and r=0.36; Results: r=0.39 and r=0.51; combined methods and results: r=0.38 and r=0.47), which was not significantly different from correlation coefficient of total score with the year of publication (p>0.20 for all). Mean scores for Methods and Results increased after CONSORT 2010 publication from 42.5% to 61.9% and 44.1% to 59.3%, respectively.

Discussion

In this study, we have systematically identified all RCTs investigating the efficacy of pharmacological interventions in HFpEF and have, for the first time, comprehensively assessed the reporting quality of these publications. We show a trend in improving reporting quality of HFpEF RCTs over time and following updates to CONSORT guidelines, though there remains a considerable variation in reporting quality, with many important aspects relating to trial methodology and results consistently under-reported. The mean CONSORT 2010 score for HFpEF RCTs is 55.4%, comparable to the findings from similar contemporary studies in other fields of medicine and surgery.13–17 We also identified a strong positive correlation between the CONSORT score and metrics of journal impact and author number.

Critical appraisal of the validity and generalisability of study findings requires comprehensive reporting of clinical trials, with discrepancies associated with variations in effect estimate that may affect management decisions by doctors and policymakers. Although several criteria were well reported, we identified significant deficiencies in trial methodology and reporting of results. These have impact on readers' assessments of study quality and risk of bias, and will reduce the quality and accuracy of meta-analyses. Similar reviews have echoed these findings, highlighting particularly poor reporting of details surrounding randomisation.2 ,14–18 Chen and Liu2 showed that the reporting of methodology in RCTs in a high impact cardiology journal was inadequate, with 70% of studies reporting less than half of methodological items sufficiently, with randomisation and blinding frequently affected.

The CONSORT 2010 score demonstrates positive correlation with the year of publication, with articles published after 2010 scoring more favourably than those from earlier periods. The CONSORT update in 2010 is likely to have generated increased awareness of the importance of high reporting standards. A Cochrane review in 2012 demonstrating superior reporting quality of articles published by journals that endorse CONSORT guidelines compared with those that did not, and improvements after a journal's endorsement of CONSORT.19 Accordingly, almost 600 biomedical journals, including Open Heart, endorse CONSORT and advise adherence from submitting authors.20

We demonstrate positive correlation between measures of journal impact with CONSORT score. Although authors will aim to submit the most thoroughly reported studies to the most influential publishers, higher impact factor journals have higher rejection rates and will therefore impose more rigorous presubmission checks and review processes. Furthermore, better-reported studies may be more extensively cited, with a corresponding positive influence on journal impact factor.

Significance of reporting quality in HFpEF clinical trials

As in all conditions with uncertain therapies, the meta-analysis of pooled trial and registry data are important in increasing our understanding of possible treatments for HFpEF. High-quality meta-analysis depends on comprehensive and accurate trial information, with particular emphasis on the population, intervention and outcome measures, and trial design. Although there have been many recent meta-analyses of specific drug classes in HFpEF,21–23 the last comprehensive review for all drug therapies in HFpEF was published in 2011 and identified no reduction in all-cause mortality for drug classes, individually and combined.24 Since then, there have been a large number of new trials—including at least 14 RCTs identified in this study—some of which have evaluated novel treatments and there is value in including these in an updated review.25

Detailed reporting of study inclusion criteria and participant demographics are particularly important in HFpEF clinical trials. Trial inclusion criteria are heterogeneous and have changed as the understanding of HFpEF as a disease syndrome has evolved.26 Combinations of LVEF cut-offs, prior heart failure hospitalisation, clinical features, the presence or absence of comorbidities, echocardiographic and haemodynamic parameters, and natriuretic peptide levels are being used as inclusion criteria. In a recent analysis comparing three major HFpEF trials (the Digitalis Investigation Group-Preserved Ejection Fraction (DIG-PEF), Candesartan in Heart Failure Assessment of Reduction in Mortality and morbidity (CHARM-Preserved) and Irbesartan in Heart Failure with Preserved systolic function (I-PRESERVE)), the authors found that the I-PRESERVE study population was most representative of HFpEF patients in the community, possibly attributable to its comparatively stringent inclusion criteria.27 Although strict inclusion criteria is likely to reduce the recruitment of patients with LV systolic dysfunction, exclusion of significant comorbidities may result in a non-representative study population and reduce the applicability of results to real-life settings. Analysing the effects that patient selection criteria have on published trial outcomes will be important in optimising future trial design.

The pathophysiological role of non-cardiac comorbidities in patients with HFpEF is becoming well characterised and better understood.28 ,29 HFpEF populations demonstrate a high prevalence of pulmonary disease, diabetes mellitus and cardiometabolic disorders, anaemia, chronic kidney disease (CKD) and obesity30–32 and are independently associated with poor outcomes.33–36 The distribution of such comorbidities in clinical trials is likely to influence results, and indeed it has been argued that the absence of positive outcomes may be related to inclusion of non-HFpEF patients.37 Although patient demographics were well reported (91% of all trials), detailed description of important comorbidities was much poorer: diabetes mellitus was reported in 70% of trials, atrial fibrillation in 52%, COPD in 18%, anaemia in 9%, CKD in 9% and obesity in 6%. This shows that while adherence to reporting standards is to be encouraged, it is important that salient information of particular interest in HFpEF should be provided.

It is increasingly accepted that HFpEF is a heterogeneous condition with a range of disease phenotypes. Using the novel approach of latent class analysis (LCA), Kao et al38 used patients enrolled in the I-PRESERVE study and identified a significant positive response to irbesartan compared with placebo in a group characterised by high prevalence of obesity, diabetes mellitus and hyperlipidaemia. Although LCA is one approach that can identify subgroups with differing prognoses and responses to treatments, this requires patient-level data that can be challenging to access.39 Consequently, there is strength in combining individual trial subgroup analyses using meta-analysis. This approach of investigating treatment effects on different patient groups stratified by variables will likely yield insight into which groups are likely to respond to therapy. The ability and success of this approach depends on the clear reporting of all prespecified analyses and primary and secondary outcomes, including subgroup analyses and exploratory outcomes. A review of heart failure disease-management programmes found that significant and clinically important differences within subgroups are not meta-analysed due to a dearth of available reported data,40 a finding that is likely to be true for HFpEF. Similarly, another group undertaking meta-analysis of the effectiveness of pharmacological treatments in patients with NYHA class I or II symptoms were unable to do so due to poor reporting and non-disclosure of data.39

Important aspects of HFpEF trial design can influence outcomes and the investigators' ability to detect differences.41 As demonstrated in the Perindopril in Elderly People with Chronic Heart Failure (PEP-CHF) study, the primary end point of all-cause mortality and unplanned heart failure hospitalisation trended towards statistical significance at 12 months (HR 0.69, p=0.055). However, these beneficial effects were lost by the end of the trial (HR 0.92, p=0.545). This was due to a significant proportion of patients in the placebo arm going onto open-label ACEi, resulting in an eventual study power of just 35%. The CHARM-preserved trial generated neutral results, though study-drug discontinuation for adverse events or laboratory abnormalities was significantly higher in the treatment arm than in the placebo arm (18% vs 14%, p=0.001). The use of a run-in period to establish drug tolerance may reduce the differential effects of study drug discontinuation. Although trial results have been largely neutral, it cannot be conclusively argued that perindopril, candesartan or other HFpEF trial treatments are of no clinical benefit, as trial design and limitations can clearly affect the ability of studies to detect meaningful differences and must therefore be clearly reported.

Geographic variation in the rates of mortality and hospitalisation in HFpEF clinical trials has been well described.42 In the Treatment of Preserved Cardiac Function with Aldosterone Antagonist Trial (TOPCAT), the primary end point rate was far lower in Russia and Georgia (unadjusted rate of 2.3 per 100 patient-years in placebo group) than in the Americas (12.6 per 100 patient-years).43 Similar results were found for the CHARM-Preserved and I-PRESERVE trials, with unadjusted rates of mortality, and adjusted rates for hospitalisation for heart failure greater in North America compared with Eastern Europe and Russia.42 This variation is more specific for trials of HFpEF rather than trials of HFrEF and may reflect the logistical challenges and disparate criteria for diagnosing HFpEF, or differing provision of healthcare services available.42 Regional variations in outcome rates are an important consideration in international, multicentre RCTs, with clear reporting of trial centre locations and number of patients enrolled, event rates by regional area and subgroup analysis by region all important in understanding the influence that geographical variation has on treatment outcomes.

Study limitations

One limitation of our study methods is that the outcome measure, CONSORT 2010 score, requires subjective assessment. Publications from CONSORT provide good guidance on this,1 and scoring was carried out blindly by two assessors in this study. We showed a high degree of interobserver agreement, comparable with other similar published studies, with all discrepancies resolved by discussion. The use of ‘N/A’ as an additional qualifier protects studies against falsely low scores, by only scoring articles out of a relevant total. It must be emphasised that a study's CONSORT score does not reflect the quality of the study or its risk of bias. Rather, high-quality reporting is necessary for accurate meta-analysis, trial evaluation to aid interpretation and implementation, and allows contributions to the wider body of work. One argument against the rigid use of CONSORT or its surrogate as a marker of article reporting quality is that certain aspects may be deemed to be unnecessary (eg, absolute risk difference may be easily calculated from the individual event rates), and indeed reviewers may ask for superfluous information to be removed. In our study, we are unable to discern whether failure to report a CONSORT item has occurred due to authors, editors or reviewers.

Conclusions

The reporting quality of RCTs investigating the impact of pharmacological interventions in HFpEF has improved over time but remains suboptimal. We identified a positive trend in the quality of reporting following each revision of the CONSORT statement and demonstrated correlation between the quality of study reporting and journal impact factor. Encouragingly, reporting of methodology and results increased significantly after the update of CONSORT in 2010. Although improvements in adherence to CONSORT 2010 reporting criteria are necessary, specific details related to inclusion criteria, patient demographics, trial design and subgroup analysis by important variables (participant demographics, comorbidities and geographical variation) will provide greater insight into treatment effects for HFpEF and lend a basis on which future clinical trials be designed. The treatment of HFpEF remains a substantial clinical challenge and, given the relatively small number of dedicated RCTs, transparent and complete reporting will likely improve our understanding of the disease and its treatments.