Introduction
Hypertension (high blood pressure) is a common presentation which is a leading risk factor for both stroke and coronary heart disease, it is also the largest contributor to both morbidity and mortality worldwide.1–3 In 2017, Public Health England estimated that 26.2% of the English population over the age of 16 were hypertensive.4 The significant prevalence and associations with other long-term conditions mean that hypertension is an important condition to be considered when conducting epidemiological analyses.
Patient health records are now recorded digitally in electronic health records (EHRs) and are routinely updated throughout a person’s interaction with healthcare. They contain a wealth of information that is recorded using both clinical codes and written notes (free text). Secondary uses of EHRs are epidemiological research, owing to the large quantity of clinical information, including diagnoses, tests, symptoms and prescriptions. However, the specific codes used to determine populations when using EHR data can differ, potentially altering study outcomes.
Clinical codes are alphanumeric sequences which can be used to efficiently record clinical presentations and events. There are many clinical coding languages which have slightly different structures and use cases. The International Statistical Classification of Diseases and Related Health Problems is a coding language which has been adopted across the world to record hospitalisation and cause of death, first proposed by the WHO in 1948 and subsequently implemented in the healthcare systems of a multitude of countries.5 Read codes have been used in primary care by the UK National Health Service since 1985, though since April 2020 their use has been phased out.6 7 The replacement, Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), is a highly comprehensive clinical terminology containing over 2.5 million unique terms that describe not only diagnoses, symptoms, procedures, medications and patient characteristics but also the relationships between terms, such as whether two terms relate to the same organ system. SNOMED CT is not just used in primary care settings in the UK, it is also used internationally.8 9
When completing epidemiological research using EHRs, the population, exposures, outcomes and covariates must all be defined using lists of clinical codes relevant to the EHR database being used.10 These are termed ‘codelists’ and when applied to the data will extract exposures, covariates and outcomes. Multiple different codes can be used to record the same event (especially in a terminology as comprehensive as SNOMED CT) and therefore it is common to use multiple codes and codelists to comprehensively identify factors of interest. Clinical knowledge in both the disease area as well as its clinical coding is essential when creating codelists for epidemiological research.
It has long been suggested that transparent coding and details of phenotyping should be included in observational research and has been included in guidelines; however, the rate of reporting of individual risk factors is rarely reported even though the importance has been repeatedly highlighted.11 The REporting of studies Conducted using Observational Routinely collected Data (RECORD) checklist, created to complement the established STROBE (The Strengthening the Reporting of Observational Studies in Epidemiology) guidelines, describes the information that should be included when using EHRs to support reproducibility and interpretability. In particular, RECORD item 6.1 states that “The methods of study populations selection (such as codes or algorithms used to identify subjects) should be listed in detail”, while RECORD item 7.1 states “A complete list of codes and algorithms used to classify exposures, outcomes, confounders, and effect modifiers should be provided. If they cannot be reported, an explanation should be provided”.12 13
The objective of this study was to systematically identify codelists used to define hypertension in observational studies that use EHR data and generate recommended hypertension codelists to support reproducibility and consistency of epidemiological research in hypertension.