Article Text

Download PDFPDF

Strengths and weaknesses of ‘real-world’ studies involving non-vitamin K antagonist oral anticoagulants
  1. A John Camm1,2 and
  2. Keith A A Fox3
  1. 1 Molecular and Clinical Sciences Institute, St George’s University of London, London, UK
  2. 2 Molecular and Clinical Sciences Institute, Imperial College, London, UK
  3. 3 Centre for Cardiovascular Science, University of Edinburgh and Royal Infirmary of Edinburgh, Edinburgh, UK
  1. Correspondence to Professor A John Camm; submissions2017{at}


Randomised controlled trials (RCTs) provide the reference standard for comparing the efficacy of one therapy or intervention with another. However, RCTs have restrictive inclusion and exclusion criteria; thus, they are not fully representative of an unselected real-world population. Real-world evidence (RWE) studies encompass a wide range of research methodologies and data sources and can be broadly categorised as non-interventional studies, patient registries, claims database studies, patient surveys and electronic health record studies. If appropriately designed, RWE studies include a patient population that is far more representative of unselected patient populations than those of RCTs, but they do not provide a robust basis for comparing treatment strategies. RWE studies can have very large sample sizes, can provide information on treatments in patient groups that are usually excluded from RCTs, are generally less expensive and quicker than RCTs, and can assess a broad range of outcomes. Limitations of RWE studies can include low internal validity, lack of quality control surrounding data collection and susceptibility to multiple sources of bias for comparing outcomes. RWE studies can complement the findings from RCTs by providing valuable information on treatment practices and patient characteristics among unselected patients. This information is necessary to guide treatment decisions and for reimbursement and payment decisions. RWE studies have been extensively applied in the postmarketing approval assessment of non-vitamin K antagonist oral anticoagulants since 2010. However, the benefits, costs, limitations and methodological challenges associated with the different types of RWE must be considered carefully when interpreting the findings.

  • anticoagulant
  • limitations
  • real-world evidence

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Randomised controlled trials (RCTs) are the gold standard for demonstrating the efficacy of a particular therapy or intervention.1–3 However, RCTs are, by design, limited to a subset of patients who are not fully representative of the unselected real-world population.1 2 Patients enrolled in RCTs tend to show higher treatment adherence than those in clinical practice, they are positively disposed to some aspect of the treatment (as shown by their agreement to participate in the study) and they have fewer comorbidities in order to limit the impact of competing risks.4 RCTs tend to exclude patients who are very young, very old or who have major comorbidities, thereby limiting the relevance of their findings to wider clinical practice.5 In addition, large-scale RCTs are expensive to conduct (~$50 million for a pivotal study) and have a long lead time from inception to completion.4

Real-world evidence (RWE) is an umbrella term that has been broadly defined as information derived from the analysis of data on health interventions used for healthcare decision-making, which has not been collected as part of an RCT but is typically from clinical practice.2 3 6 RWE is considered to better represent routine practice compared with the idealised conditions of an RCT.2 3 6 Data from RWE studies can complement findings from RCTs and, if appropriately designed, can provide valuable information about practice patterns and patient characteristics in a real-world setting.2 RWE studies are subject to some patient selection because such studies are unable to capture all types of patients, for example, patients who have not presented to a physician or are yet to be diagnosed. RWE studies provide a valuable reflection of the range and distribution of patients observed in the clinical practice setting.2

This review article discusses the utility, strengths and limitations of RWE studies, as well as potential developments to further bridge the gap between RCTs and daily clinical practice.

Strengths and limitations of RWE studies

RWE studies include a wide range of research methodologies and data sources, although they can be broadly categorised into non-interventional studies, patient registries, claims database studies, patient surveys and electronic health record studies (table 1).2 5

Table 1

Summary of the main types of RWE studies and their key characteristics2 5 10 11

They can also be categorised into prospective studies, which generally require primary data collection, or retrospective studies, which use secondary data to look back in time (ie, data initially collected for other purposes; figure 1).

Figure 1

Illustration to highlight the design and analysis time frame (relative to the study start or index date) of different types of real-world evidence studies. Arrows depict prospective or retrospective studies of various durations.

Potential strengths of RWE studies

RWE studies aim to include patient populations that are far more representative of the unselected population than those of RCTs; they can have very large sample sizes and can provide information on treatment practices in specific populations that are usually excluded from RCTs (eg, elderly patients or those with renal impairment).5 This enhances the external validity (ie, the generalisability) of their findings compared with RCTs, which have restricted inclusion characteristics.5 RWE studies are less expensive and faster to complete than RCTs, and they can assess treatment patterns across a much broader range of outcomes. As well as broadly assessing safety and clinical treatment patterns, and effectiveness outcomes (in studies comparing new treatments with standard of care), RWE studies can include assessments of treatment adherence, treatment persistence, epidemiology, burden of disease, prescribing patterns, health resource utilisation and cost-effectiveness.5 They also have the capacity to assess overall outcomes in various patient and risk groups and, because they may be conducted in an unselected population over a long time frame, can provide insights into toxicity, long-term safety and rare adverse events.2 5

Potential limitations of RWE studies

Traditionally, methodological and study design issues, such as the risk of confounding and bias, have prevented non-randomised observational studies from being considered useful for the assessment of new medical treatments or interventions. Even a well-designed, robust, non-randomised study would be considered to be lower grade evidence compared with a poorly designed RCT.7 Currently, the value of RWE is much better understood, with RWE studies now forming a key part of the postauthorisation evaluation of an approved drug, at the request of regulatory authorities.7 8 However, possible limitations should still be considered.7 The key limitation of RWE studies is the lack of randomisation. Although this contributes to the high external validity of the data, it reduces the internal validity of the data (ie, the extent to which any differences between the intervention and control groups can be attributed to the intervention itself, as opposed to other factors).5 9 In addition, without a controlled clinical study environment, there may be little or no control over the quality of the data collection in RWE studies, although measures can be taken to optimise this.5 For example, prospective patient registries may have quality-control measures and auditing procedures to improve data integrity, including the use of real-time electronic data capture, personnel training, independent adjudication of outcomes and the use of data or coding dictionaries to ensure standardised outcome definitions.2 10 Retrospective comparisons are prone to multiple biases, including sampling bias, recall bias, confounding by indication and changes in practice and/or disease biology.5 11 Statistical approaches have been developed that aim to adjust for multiple types of bias (eg, propensity score adjustment, covariate adjustment and others).2 However, serious confounding may remain due to the impact of unmeasured confounders and unmeasured risks on treatment decisions. Therefore, retrospective study data do not meet the reliability and accuracy afforded by the methodological rigour of RCTs.2 The methods employed in developing treatment guidelines do not rank observational data as a reliable basis for comparing treatments or strategies.12 13

Applications of RWE studies

Given their ability to evaluate clinical practice patterns and monitor long-term and infrequent safety events, RWE studies have been evaluated in postmarketing approval assessments of the non-vitamin K antagonist oral anticoagulants (NOACs). Four NOACs (apixaban, dabigatran, edoxaban and rivaroxaban) are currently available for the prevention of stroke in patients with atrial fibrillation (AF) and the treatment of venous thromboembolism (VTE), but it is recommended that they are used with caution in patients with conditions where there is an increased risk of bleeding.14–17 Therefore, it is essential that factors such as dose selection and treatment adherence are monitored outside of a controlled clinical environment to see if they differ from pivotal RCTs18–21 and to determine their impact on clinical outcomes such as major bleeding, stroke or recurrent VTE. For this reason, many RWE studies have been, and are being, conducted to evaluate NOAC treatments; these will be used as examples to illustrate the value and limitations of the information provided by the different RWE study types.

Non-interventional studies

Non-interventional studies are defined as those where treatment is prescribed in accordance with the terms of the marketing authorisation, and where the assignment of the patient to a particular therapy is entirely at the physician’s discretion, rather than decided in advance by the study protocol.22 23 As such, treatment selection and management is separate from the decision to include the patient in the study, with no diagnostic or monitoring procedures performed above those deemed necessary by the attending physician for the conduct of the study.22 23 In some instances, prospective, multicentre, non-interventional studies conducted in routine clinical practice may be referred to as pragmatic clinical trials.24 25

With the exception of registries (detailed in the next section), non-interventional studies generally fall into three categories: cohort, cross-sectional and case–control studies.11

Cohort studies

Cohort studies can be either prospective or retrospective, and typically assess the incidence, cause and prognosis of certain conditions/outcomes.11 In prospective cohort studies, patients are monitored until a relevant outcome occurs (eg, a major bleeding event, stroke or recurrent VTE for studies of NOACs),26 27 with data routinely collected on potential risk factors for the outcome.11 In retrospective cohort studies, the methodology is the same but the data have already been collected for a separate purpose and a post-hoc analysis is carried out.11

Advantages of cohort studies include the ability to assess a broad range of risk factors,28 including those where an RCT may be unethical (such as the impact of smoking or exposure to toxins), and the fact that several outcomes can be monitored simultaneously.11 The chronology of the study also enables a clear distinction between cause and effect (unlike cross-sectional studies), although this also means that loss to follow-up can significantly affect the outcomes, and studying rare outcomes can be inefficient.11 29 Retrospective cohort studies can be conducted quickly and inexpensively compared with RCTs because the data have already been collected. However, retrospective cohort studies can suffer from limited or missing data, less rigour in data collection (no prestudy analysis plan or protocol) and recall bias, all of which are distinct disadvantages.11

The first international, prospective, multicentre, non-interventional study of a NOAC for stroke prevention in patients with AF was the single-arm XANTUS (Xarelto for Prevention of Stroke in Patients with Atrial Fibrillation) study of rivaroxaban. This was then followed by the XALIA (XArelto for Long-term and Initial Anticoagulation in venous thromboembolism) study of rivaroxaban versus standard anticoagulation in patients with VTE.26 27 These studies, both examples of pragmatic trials, supported the results of RCTs, showing low incidences of major bleeding (in both studies) and low rates of stroke (XANTUS) and recurrent VTE (XALIA) in patients receiving rivaroxaban in a real-world setting.26 27 The problem with selection bias (eg, the idea that agreement to participate in the survey might select a group that responds in a specific way) was noted in both studies, and efforts to minimise this using a single cohort design (XANTUS) or propensity score matching (XALIA) were employed.28 In both XANTUS and XALIA, outcomes were centrally adjudicated by investigators blinded to treatment in an attempt to reduce reporting bias.26 27 However, missing data and the effects of unmeasured confounding factors could not be addressed.26 27

Cross-sectional studies

Cross-sectional studies involve the assessment of a single group of patients at a single point in time, at which treatment and outcomes are determined simultaneously.11 29 They are typically used to assess prevalence and infer the cause of conditions/outcomes.11 29 Cross-sectional studies can be conducted relatively quickly and inexpensively compared with RCTs, and can assess multiple outcomes simultaneously. They are, therefore, the most efficient way to determine the prevalence of a condition.11 29 However, because the data are collected at a single time point, it is difficult to clearly distinguish cause and effect—patients who develop an outcome but die before the end of the study are not captured, and they are susceptible to selection bias. This method is also inefficient for studying rare conditions because, even in large sample sizes, there may be few or no patients with the disease.11 29–31 Additionally, cross-sectional studies are often completed using questionnaires, which have inherent problems, including low response rates and susceptibility to various sources of bias (see later section on patient surveys).2 11 32

In the case of the NOACs, cross-sectional studies have provided insights into the prevalence of underdosing, how to achieve more appropriate dose selection,30 how closely the pivotal RCTs resemble a real-world population33 and how successfully NOACs have been adopted into prescribing practices over time.31

Case–control studies

Unlike cohort and cross-sectional studies, case–control studies are usually conducted retrospectively.11 Patients who have experienced the outcome of interest are matched with a control group who have not experienced this outcome, and exposure to treatment or other factors are assessed from medical history to determine causality.11 29 Because the patient population is selected based on the outcome, case–control studies are especially useful in studying rare conditions or those with a long latency between exposure and disease, although the study is limited to investigating a single outcome.11 They can also consider many variables simultaneously, providing a case-efficient way of identifying potential predictors of specific outcomes.11 However, case–control studies are often conducted by an interview, which makes them susceptible to sampling bias (eg, recruitment of cases or controls from single sources), observational bias (eg, interviewers assessing cases differently from controls) and recall bias (eg, cases being more likely to recall past exposure than controls because they have considered the cause of the condition), as well as unmeasured confounders.11 29 34 Because of these limitations, the results of case–control studies are most appropriately used for hypothesis generation,11 34 35 and have primarily been used to assess whether NOAC treatment is a risk factor for rare outcomes or safety outcomes in specific patient populations.34–36

Patient registries

Patient registries are non-interventional, observational cohort studies.2 They are typically prospective studies that involve standardised, ongoing data collection in a real-world setting to fulfil a specific predefined purpose, where management of treatment and care is determined by the patient and caregiver rather than the registry protocol.10 Ideally, the collection of registry data is uniform for all patients, including the type, method and frequency of collection. Information is then collated in a central registry database for analysis.

Advantages of registries over RCTs include the capacity to enrol a much larger and more diverse patient population with the potential for a longer follow-up period. This provides data that are more reflective of a real-world population and enables the study of longer term outcomes, including the identification of more infrequent safety outcomes.2 Registry studies also involve few or no required visits, evaluations or procedures at specialist centres because the data are collected by the attending physician as part of daily practice. Not only does this make registries potentially less expensive than RCTs, but it also means that treatment patterns reflect the daily clinical decision-making that is most relevant to healthcare providers and payers, and can help identify the most cost-effective treatment approaches.2 In addition, registries are sometimes linked with other databases, which can enable the assessment of separate outcomes such as healthcare utilisation and mortality.10

These factors make registries particularly useful for assessing (1) the natural history of a disease; (2) real-world safety and effectiveness; (3) prognosis and quality of life; (4) quality of care; and (5) cost-effectiveness of treatment strategies.10

As with all observational studies, a disadvantage inherent to patient registries is that the lack of randomisation means there is no guarantee that the patient groups are similar (because of factors such as dose adjustments), which can reduce the internal validity of the findings. Importantly, retrospective registries are also subject to survivor bias because they exclude deaths prior to the registry start date, and time from diagnosis to enrolment is longer for the patients included compared with prospective studies, meaning these registries will not encompass high-risk periods after disease onset.2 Because the data in registries are collected from a wide variety of different centres, sometimes across different countries, there can also be considerable variation in the auditing and control measures employed and the quality of the data collected.2 In addition, variations between countries and centres regarding the requirement and processes for obtaining informed patient consent can also lead to authorisation or selection bias.37 38 Furthermore, there are limits to the amount of data that can be obtained in a routine clinical practice visit, and the visits may not be scheduled at regularly timed intervals, hindering comparisons between patient groups. Despite this, the advancement of modern technologies such as mobile phone apps and cloud computing may make it far easier to collect data in a more uniform fashion and in real time, thus enabling more rigorous monitoring of data collection.2

Data on NOAC use in patients with AF or VTE are provided by several large registries, including the regional Dresden NOAC Registry, the national ORBIT AF I/II (Outcomes Registry for Better Informed Treatment of Atrial Fibrillation) registries and the multinational GARFIELD-AF (Global Anticoagulant Registry in the FIELD-Atrial Fibrillation), GARFIELD-VTE (Global Anticoagulant Registry in the FIELD-Venous Thromboembolism) and GLORIA-AF (Global Registry on Long-Term Oral Antithrombotic Treatment in Patients with Atrial Fibrillation) registries (table 2).

Table 2

Overview of key prospective registries for stroke prevention in patients with AF and treatment of VTE

Results from these registries have provided insights into the management of NOAC treatment in clinical practice, including switching from warfarin to NOACs, the identification of high-risk patient populations and the high levels of inappropriate NOAC dosing that occur in clinical practice.39–42 However, reported limitations to the interpretation of these findings include missing or incomplete data, selection bias and residual/unmeasured bias due to a lack of randomisation.39–43

Long-term clinical outcomes in the majority of the above registries are currently being acquired because the studies are still ongoing, but initial results support the use of NOACs in patients with AF and patients treated for VTE. For example, in SWIVTER (SWIss Venous ThromboEmbolism Registry), a study of patients treated for VTE, the risk of recurrent VTE was similar in the rivaroxaban treatment arm compared with the conventional anticoagulation arm (1.2% vs 2.1%, P=0.29) for the propensity score-adjusted population; the risk of major bleeding was also similar (0.5% vs 0.5%, respectively, P=1.00).44 Patients with AF receiving rivaroxaban for stroke prevention in the Dresden NOAC Registry were also analysed (n=1204). The combined endpoint including stroke, transient ischaemic attack and systemic embolism occurred at a rate of 2.03 per 100 patient-years in the intention-to-treat analysis (95% CI 1.5 to 2.7). Major bleeding in the on-treatment group occurred at a rate of 3.0 per 100 patient-years. Event outcomes in the on-treatment group were higher in patients receiving rivaroxaban 15 mg once daily compared with rivaroxaban 20 mg once daily (stroke/transient ischaemic attack/systemic embolism 2.7 vs 1.25 per 100 patient-years and major bleeding 4.5 vs 2.4 per 100 patient-years, respectively).45 Similar results were observed in analyses performed in patients with AF receiving apixaban and dabigatran for stroke prevention in the Dresden NOAC Registry.46 47

Administrative and healthcare claims database studies

These studies involve retrospective (or sometimes real-time) analysis of data from administrative and healthcare claims databases containing treatment information and clinical information, such as diagnosis codes and hospital admissions/discharge dates.2 48 As such, these databases are particularly suited to longitudinal and cross-sectional analyses of healthcare utilisation and costs at the patient, group or population level,2 although insights can also be gained into associations between interventions and outcomes using the (although sometimes limited) clinical information.

The key advantages of administrative and healthcare claims database studies are that they can be performed relatively quickly and inexpensively compared with RCTs, involve a very large established patient cohort and can have a long follow-up period. This enables the identification of rare events, the determination of longer term outcomes and insight into the economic impact of interventions.2 In some instances, database information can also be linked with clinical data, such as patient-reported outcomes, laboratory assessments, medical records and physician surveys. This practice is most common in countries such as Denmark and Sweden, where every member of the populace has a unique personal identification number, enabling lifelong follow-up and linkage between databases at the individual level. The information contained within these Scandinavian administrative databases has been used to assess the comparative effectiveness and safety of the NOACs for stroke prevention in patients with AF; regression or propensity-based adjustment was used to try to reduce bias from confounding variables inherent in a non-interventional observational analysis.49–51 Similar approaches have been used in numerous studies using data from healthcare claims databases, particularly to assess the risk of bleeding events in NOAC-treated patients and adherence/persistence to treatment (table 3).52–55

Table 3

Examples of healthcare administrative and claims databases and associated NOAC studies

Administrative and claims databases, however, lack key information about the choice of therapy. This choice can be influenced by clinical risk factors and physician and patient preferences which are not recorded in such databases. In addition, databases lack clarity in terms of insights into dosing. All these factors make inferences from administrative and claims databases, particularly direct drug comparisons, unreliable in relation to the clinical effectiveness of specific treatments. Data quality is also one of the major disadvantages of these database studies. Clinical data may not only be limited or missing (eg, health outcomes, health status, symptoms and patient characteristics, such as creatinine clearance levels, international normalised ratio values or time in therapeutic range),2 52 55 56 but the accuracy of reporting is variable (both between different countries and between centres within the same country) and coding errors are common.49 52 55 With respect to NOAC studies, there are many different codes and categories for bleeding events, further increasing the chances of coding errors (eg, gastrointestinal bleeding is often listed as ‘bleeding of unknown source’), and diagnosis codes (eg, International Classification of Diseases, Ninth or Tenth Revision) may not always differentiate between indications.52 Therefore, caution should always be exercised when interpreting treatment effectiveness and safety based on claims database analyses.

Even if the integrity of the data can be confirmed, there are additional issues inherent to claims database analyses, including limited validation, lack of a population denominator and lack of distinction between costs and charges2; follow-up duration may be limited by patients switching between insurance plans. However, selection bias is perhaps the most common and challenging methodological issue affecting both administrative and claims database analyses because treatment selection, clinical outcomes and economic outcomes could be influenced by factors that are not recorded in the database (eg, baseline health status, symptomatology and comorbidities).2 49 54 55 57

Electronic health record studies

Electronic health record studies are retrospective (or sometimes real-time), observational analyses of data sourced directly from medical records or charts.2 58 They are typically used to assess clinical treatments, procedures and outcomes.

Like claims database studies, electronic health record studies can be performed relatively quickly and inexpensively compared with RCTs, can involve a relatively large patient cohort and can have a longer follow-up. The study of rare conditions or those with a long latency between exposure and disease is also possible,58 as is the study of the real-world use of specific techniques, treatments and procedures.2 Although health record studies are usually limited to a small number of study centres, central databases such as the UK Clinical Practice Research Datalink are becoming more widely available with the proliferation of mobile technologies.2 59 These databases enable longitudinal, patient-level data collection from multiple sources and may contribute to more consistent recording and coding of information.

Although real-time clinical treatment and outcomes data are captured in electronic health record studies, there are similar issues to those of claims database studies regarding data integrity. Findings are heavily reliant on the accuracy of recorded information or the recall of individuals. Therefore, there is the potential for recording/coding and interpretive errors or bias.2 58 60 Transforming the data for research purposes requires sophisticated statistical tools, and making robust inferences on clinical effectiveness and safety based on chart review data remains a challenge.2

Nevertheless, recent electronic health record studies of NOACs have supported the findings of the pivotal RCTs and provided insights into the safety of NOAC use and treatment persistence.60–63

Patient surveys

Patient and health surveys are designed to collect health status, well-being, healthcare resource utilisation, treatment patterns, education and treatment preference information from patients, healthcare providers or the general population.2 32 They can be conducted in a variety of ways, but typically involve either online, interview or paper-based questionnaires.32 64–66 Patient surveys usually include a rigorous methodology for the collection of data and can provide unique information on the generalisability of treatments, their impacts at a patient level and adherence in the real world.2 With regard to NOACs, patient surveys have proved particularly useful in identifying factors that influence patient–physician decision-making, the need for improved patient education and adherence (including physician adherence to appropriate monitoring), and both patient and physician treatment preferences.32 64–66

Patient surveys, however, lack relevant data on specific treatments or products.2 Therefore, they cannot be used to evaluate the effectiveness or safety of treatments, other than by inference from patient or physician perceptions, and are susceptible to various forms of bias. These include recall bias, selection bias and subjectivity bias (eg, positive response bias, where a patient may respond in a way they believe to be appropriate rather than reporting their actual behaviour).2 32 Although efforts can be made to minimise these issues, such as validation of the questionnaire prior to the survey, question/response randomisation or validation of responses within the questionnaire itself,32 64 these biases should always be considered when interpreting the results of any survey.

Study overview and comparison with RCTs

The completed RWE studies listed in this review show consistent results in terms of effectiveness and safety with other studies investigating NOACs, including the findings from phase III RCTs (table 4). However, it is important to note that no comparisons between RWE studies and RCTs or other types of studies can be made.26 27

Table 4

RWE study overview for use of NOACs in routine clinical practice (excluding registries and claims databases)

Practical and procedural considerations

Overall, there are several key aspects to consider when evaluating and comparing data from RWE studies. Perhaps most important is whether the study design is appropriate for the study outcomes. As detailed in previous sections, different study designs lend themselves to different outcomes: RCTs are the gold standard for the evaluation of efficacy and for comparing treatments; claims database studies may be the most appropriate for healthcare utilisation; cross-sectional studies can be used to assess prevalence; case–control and electronic health record studies are useful for rare conditions; and patient registries can investigate the natural history of a condition. Real-world clinical safety and effectiveness can be assessed using various RWE study types (and there is an argument that these should be obtained to provide the most balanced overview for an intervention), but it appears that prospective, non-interventional studies and patient registries provide the most robust data on these outcomes.

It is also essential that the study population reflects the target population. For example, the interpretation of health claims database findings should consider the specificity of diagnosis codes and the breadth of insurance coverage.

It is vital to consider the inherent sources of bias in the study designs when drawing the appropriate conclusions from RWE studies and comparing the findings with previous studies. These are far more numerous and varied than those of RCTs, from the positive response bias in patient surveys to the selection bias and potential for data-mining in administrative and claims database studies. There are sophisticated statistical approaches that can adjust for bias, including logistic regression adjustment, inverse probability weighting, propensity (or other) matching, instrumental variable methods and panel data models.67 However, despite these approaches, RWE study data still do not meet the reliability and accuracy afforded by the methodological rigour of RCTs and do not provide a reliable way of comparing treatment strategies.2

Lastly, one of the most common issues with RWE studies is the quality and consistency of data collection. When comparing data between studies, the type of data (eg, prospective, retrospective, adjudicated, non-interventional, database, survey), outcome definitions, choice of database codes, auditing and control measures, and methodologies to adjust for missing data, under-reporting, coding errors and others should be aligned to enable robust comparisons. Overall, inferences drawn from observational and retrospective RWE studies can be made, but this should be done with caution and interpretations need to take account of the design, robustness and quality assurance in each study.


RWE studies can complement the findings from RCTs, provide valuable information on treatment practices and patient characteristics in a real-world setting, and are essential to the evidence base required for sound coverage and payment decisions.2 However, it is critical that the benefits, costs, limitations and methodological challenges associated with the different forms of RWE are carefully considered when interpreting the findings,2 as highlighted by guidance from regulatory and industry bodies.6 67 68 Moreover, despite the use of sophisticated statistical approaches to adjust for bias, there will always be residual confounding in comparative studies because of the absence of randomisation; therefore, results should be considered as hypothesis-generating and should not be viewed as a substitute for RCTs.

The registration of retrospective studies, in the same fashion as prospective trials, may help standardise data collection, interpretation and reporting, improve the consistency of methodological rigour across RWE studies, and help to bridge the gap between RCTs and clinical practice.


The authors would like to thank Stuart Wakelin for editorial assistance in the preparation of the manuscript.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
  58. 58.
  59. 59.
  60. 60.
  61. 61.
  62. 62.
  63. 63.
  64. 64.
  65. 65.
  66. 66.
  67. 67.
  68. 68.
  69. 69.
  70. 70.
  71. 71.
  72. 72.
  73. 73.
  74. 74.
  75. 75.
  76. 76.
  77. 77.
  78. 78.
  79. 79.
  80. 80.
  81. 81.


  • Contributors AJC and KAAF both contributed to the drafting of the manuscript.

  • Funding Funding for editorial support was provided by Bayer AG.

  • Competing interests AJC has received research grants and speaker’s honoraria, and participated in scientific advisory boards for Bayer, Daiichi Sankyo, Bristol-Myers Squibb-Pfizer and Boehringer Ingelheim. KAAF has received grants and honoraria from Bayer and Janssen.

  • Patient consent Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.