Original research

Machine learning facilitates the prediction of long-term mortality in patients with tricuspid regurgitation

Abstract

Objective Tricuspid regurgitation (TR) is a prevalent valve disease associated with significant morbidity and mortality. We aimed to apply machine learning (ML) to assess risk stratification in patients with ≥moderate TR.

Methods Patients with ≥moderate TR on echocardiogram between January 2005 and December 2016 were retrospectively included. We used 70% of data to train ML-based survival models including 27 clinical and echocardiographic features to predict mortality over a 3-year period on an independent test set (30%). To account for differences in baseline comorbidities, prediction was performed in groups stratified by increasing Charlson Comorbidity Index (CCI). Permutation feature importance was calculated using the best-performing model separately in these groups.

Results Of 13 312 patients, mean age 72 ± 13 years and 7406 (55%) women, 7409 (56%) had moderate, 2646 (20%) had moderate–severe and 3257 (24%) had severe TR. The overall performance for 1-year mortality by 3 ML models was good, c-statistic 0.74–0.75. Interestingly, performance varied between CCI groups, (c-statistic = 0.774 in lowest CCI group and 0.661 in highest CCI group). The performance decreased over 3-year follow-up (average c-index 0.78). Furthermore, the top 10 features contributing to these predictions varied slightly with the CCI group, the top features included heart rate, right ventricular systolic pressure, blood pressure, diuretic use and age.

Conclusions Machine learning of common clinical and echocardiographic features can evaluate mortality risk in patients with TR. Further refinement of models and validation in prospective studies are needed before incorporation into the clinical practice.

What is already known on this topic

  • In the context of high morbidity and mortality associated with tricuspid regurgitation (TR) and the growing practice of early intervention, novel scores and classifications have been developed to risk stratify patients.

What this study adds

  • In this large study of patients with ≥moderate TR, machine learning enabled assessment of all-cause mortality with good performance.

How this study might affect research, practice or policy

  • Machine learning-based prediction tools may help in risk stratification of patients with TR and identify candidates for early intervention, after further validation in prospective studies.

Introduction

Tricuspid regurgitation (TR) of mild or greater severity is estimated to be prevalent in 15%–18% of the general population.1 While some studies have shown that severe TR is associated with adverse outcome (heart failure hospitalisation and cardiovascular mortality) in heterogenous groups of patients,2 others have shown that presence of TR of any severity is associated with adverse clinical outcomes, and ≥moderate TR is an independent predictor of mortality.3 4 With many recent studies showing high morbidity and mortality associated with TR, novel scores and classifications have been developed to risk stratify patients.5–8 There continues to be a growing need to better risk stratify patients with chronic severe TR and identify features associated with mortality,9 especially with the recent emergence of data that may support percutaneous transcatheter repair option for selected patients.10 We therefore sought to evaluate the role of machine learning (ML) including various clinical and echocardiographic variables to predict mortality in a large cohort of patients with ≥moderate (haemodynamically significant) TR.

Methods

A total of 13 312 adult patients with chronic≥moderate TR who underwent echocardiography at Mayo Clinic between January 2005 and December 2016 were included. Additionally, data from 7138 patients from Mayo Clinic in Florida and Arizona were used for external validation of our ML model. This patient cohort was previously described.5 6 Excluded were patients with congenital heart disease, prior tricuspid valve intervention, and no known follow-up.

Patient characteristics included in the dataset included demographics, vitals, comorbidities including Charlson Comorbidity Index (CCI), echocardiographic variables and laboratory parameters. The relevant echocardiographic variables were extracted from the echocardiography report and included the following: measures of left ventricular (LV) end-diastolic dimension, LV ejection fraction, and variables included in diastolic function assessment (ratio of mitral early diastolic inflow (E) to mitral annulus early diastolic tissue Doppler velocity (e’) (E/e’)), information on right ventricular (RV) size and function, inferior vena cava size and severity of TR (qualitative and when available quantitative assessment)11 (online supplemental table 1). The demographics, comorbidities and laboratory reports were extracted from the electronic medical records.

Outcomes and analysis

The primary outcome was all-cause mortality. The vital status was retrieved from Mayo and Minnesota death records. For the primary analysis, patients not known to be deceased were censored at the last date of follow-up.

Data preprocessing

The dataset was randomly split into 70% training and validation (including hyperparameter tuning) and 30% as the test dataset for reporting results. The two datasets were independent of one another. Variables with >30% missingness were excluded. Remaining variables were imputed using 20 iterations of Multiple Imputation by Chained Equations12 implemented by the Python library statsmodels. Imputation was performed separately on the training and testing datasets to preserve the independence of the datasets. Categorical variables with more than two levels were recoded so that each level was encoded as a separate binary variable used in modelling.

Modelling

Fivefold cross-validation training scheme was used where the training dataset was further divided into 70% for training and 30% for validation, repeated five times. Three survival model architectures were evaluated, including penalised Cox proportional hazard regression models, random survival forest (RSF) methods and extreme gradient boosting methods. Modelling was performed with open-source Python framework lifelines v0.27.4, and Scikit Survival 0.19.0.13 Hyperparameter tuning was used to optimise the top performing example of each ML model in each family. Cox proportional hazard models were optimised for step size, and penalty term while survival forests were optimised for maximum features at each node and minimum samples per leaf node. Tree-based gradient boosting was optimised for the learning rate, sample rate and presence of dropout. All models were optimised using appropriate loss functions.

Statistical analysis

Data are presented as frequencies and percentages for categoric variables and either as mean (SD, SE of mean (SEM)) or median with IQR (Q1–Q3) for continuous variables. Model performance was evaluated on the test dataset using the Harrel’s concordance index (C-Index). C-index was computed at 1 year, 3 years and overall survival between model predictions and actual survival. In a survival forest model, feature importance was evaluated in a permutation-based fashion by a mean decrease in accuracy on replacing a feature with random data sampled from a distribution similar to the original feature. This was achieved by shuffling the data14 using the eli5 0.11.0 package. Statistically significant difference between the models was computed using the DeLong test.15 Calibration plots and Brier score were assessed to determine model calibration. Models were calibrated using regression-spline interpolation estimates that allow for non-proportional hazards and nonlinearity while taking censoring into account.16 This adaptive modelling of the observed data allows for a continuous calibration plot for a specific survival time. The ‘rms’ package was used to calibrate the performance of the predictive model by comparing predicted probabilities to observed outcomes.

Results

Of 13 312 patients included in the final analyses, the mean age was 72 ± 13 years and 7406 (55.6%) were females. A total of 5359 (40.3%) patients had a diagnosis of atrial fibrillation, 4291 (32.2%) had ≥moderate left-sided valve disease, 6530 (49.0%) had pulmonary hypertension (RV systolic pressure >50 mm Hg on transthoracic echocardiogram), 6778 (50.9%) had a diagnosis of congestive heart failure and 7118 (53.5%) were on diuretics. The baseline characteristics of overall cohort and stratified by CCI groups are presented in table 1. During median follow-up of 3 (IQR: 2.0–9.9) years, 7773 patients (58.3%) died.

Table 1
|
Baseline characteristics

All-cause mortality

We used 28 demographic, clinical and echocardiographic variables in our modeling. The mean C-index of different models for primary outcome of all-cause mortality on the test datasets is presented in table 2. Conditional RSF ranked the highest among the models (mean (SEM) C-index: 0. 755 (2.6×10–3)), followed by XGBoosted survival (mean (SEM) C-index: 0. 747 (2.7×10–3)) and Cox survival model (mean (SEM) C-index: 0. 736 (2.9×10–3)). Overall, there was no significant difference in the performance between models (DeLong test p=0.25). Figure 1 shows a decrease in performance of the random forest model with increasing follow-up period. Similar trend was observed with other models. The feature importance evaluated on the test dataset by the top performing all-cause mortality model, that is, conditional RSF model is presented in figure 2 . The top features from the other models are presented in online supplemental figure 1 (1A. cox proportional model, 1B. XGBoost model). In terms of calibration, the model performed well. The Brier score was 0.14±0.02, indicating a good fit. Additionally, the calibration slope was 0.94±0.03 suggesting the model was well calibrated. The calibration plot (online supplemental fgure 2) visually demonstrates the model’s performance. This model was further validated on an external dataset which showed similar performance measures (online supplemental table 1).

Table 2
|
Performance of the machine learning algorithms to predict mortality
Figure 1
Figure 1

Figure shows mortality C-index of the random forest machine learning model as a function of follow-up time.

Figure 2
Figure 2

The permutation feature importance plot of the top features contributing to the machine learning model. RVSP, right ventricular systolic pressure; AST, aspartate aminotransferase; SBP, systolic blood pressure; CKD, chronic kidney disease; BMI, body mass index; LVEDD, left ventricular end-diastolic dimension; DBP, diastolic blood pressure; LVEF, left ventricle ejection fraction.

The top features for association with all-cause mortality (conditional RSF model) were age, body mass index, heart rate and blood pressure, comorbidities like chronic kidney disease (CKD) and prior cardiac surgery, signs of congestion and hypoperfusion—diuretic use, hyponatraemia, aspartate transaminase (AST) and creatinine and echocardiographic features such as RV systolic pressure, LV ejection fraction, LV end-diastolic dimension.

All-cause mortality divided by comorbid groups

Table 3 shows the performance of the models estimating 1-year survival in patient groups divided by the CCI. Interestingly, the model performed best in group 1 (lowest comorbidity index), followed by groups 2 and 3. Table 4 shows the top 10 features divided by these groups. Some of the features common to all the groups included—age, lung disease, vitals (heart rate, blood pressure), laboratory parameters (creatinine and sodium), diuretic usage and echocardiographic parameters (RV systolic pressure, LV ejection fraction, LV end-diastolic volume and stroke-volume index).

Table 3
|
Model performance stratified by Charlson Comorbidity Index groups
Table 4
|
Feature performance plot stratified by Charlson Comorbidity Index groups

Discussion

Our study including 13 312 patients with ≥moderate TR has several important novel findings: (1) an ML-based algorithm had good performance in estimating mortality in patients with≥moderate TR; (2) the top variables included in the ML model associated with mortality were age, body mass index, vitals (heart rate and blood pressure), comorbidities such as CKD and prior cardiac surgery, signs of congestion—diuretic use, AST, creatinine and hyponatraemia, and echocardiographic features RV systolic pressure, LV ejection fraction, LV end-diastolic dimension; (3) the accuracy of these model was moderately high, with a C-statistic of 0.75 on the best model.

TR is a prevalent valve disease associated with significant morbidity and mortality.2 11 17 There is growing evidence that suggests referral for tricuspid valve surgery continues to be low and delayed.17–19 The current guidelines recommend isolated tricuspid valve surgery for severe TR with signs and symptoms of right sided heart failure (class 2a recommendation).20 Isolated tricuspid valve surgery for severe TR did not change mortality suggesting delayed referral when guidelines are followed.21 22 Similarly, there are wide practice variations in surgically treating less than severe TR at the time of mitral valve surgery, which can result in poor functional outcomes and increased mortality in a significant proportion of patients.23 24 Additionally, recent data suggest that transcatheter edge-to-edge repair of isolated severe TR is safe and leads to significant improvement in quality of life.10 In order to better understand the pathophysiology, identify the high-risk patients and appropriate timing of intervention, novel classifications and risk prediction models have been developed recently.5–7 25

Our study including a large database of patients with ≥moderate TR suggests a role for ML to predict outcomes in these patients. Prior studies have evaluated the role of ML-based risk stratification models to predict outcomes in patients with other valvular heart disease.5 26 Our models had slightly lower performance than some other studies,27 which reflects the heterogenous nature of TR with many associated comorbidities and different pathophysiological subtypes,5 all of which were included in the current study to increase the applicability of study results. The performance was best early-on and worsened with increasing follow-up duration which suggests that associated comorbidities and age play a significant role in predicting all-cause mortality. The prediction was strongest in the low comorbidity index group again suggesting that TR itself may be more important in predicting survival in those with fewer comorbidities. To study the complex interplay of comorbidities and echocardiographic features in patients with TR, novel risk scores (TRIO and TRI score) and phenotypes (cluster analysis) have been proposed.5–7 The current study further highlights the importance of investigating these relationships to identify the optimal candidates and timing for intervention in these patients after validation in prospective studies.

The key features associated with mortality in our study included age, body mass index, vitals (heart rate and blood pressure), comorbidities such as CKD and prior cardiac surgery, signs of congestion and hypoperfusion—diuretic use, AST, creatinine and hyponatraemia, and echocardiographic features such as RV systolic pressure, LV ejection fraction, and LV end-diastolic dimension. These factors are similar to previous studies which evaluated factors associated with mortality using multivariate analyses.6 8 26 28–30 Previous studies have shown that age and other comorbidities such as coronary artery disease, lung disease, severe renal failure, haematological abnormalities like anaemia and thrombocytopenia, liver dysfunction with synthetic impairment, diuretic use, echocardiographic parameters such as LV systolic dysfunction (heart failure with reduced ejection fraction (HFrEF)), RV systolic function and RV systolic pressure, and vena cava width predict mortality in patients with TR.6 8 28–30 In the current study, we were able to evaluate a large number of both clinical and echocardiographic features together to predict mortality using ML.

Our study has some limitations such as retrospective analysis from a single centre with limited diversity in terms of race which may reduce the applicability of study results to general population. Due to referral centre study, the data on heart failure hospitalisations or cause of mortality were not available. Additionally, due to retrospective nature of study and by including patients from many years (starting 2005), the quantitative data on TR quantification (regurgitant volume and effective regurgitant orifice area), different phenotypes of TR and quantitative markers of RV function such as tissue Doppler systolic velocity (s’), strain and 3-D RV ejection fraction were available only in a minority of patients and could not be included in the models. While some of the variables included in the model were qualitative, this situation mirrors the real-world scenario where qualitative assessments are often the primary means of evaluating TR, RV size and function. The balance between qualitative and quantitative data enhances the robustness of our model and broadens its real-life applicability, while also laying the foundation for capturing more quantitative data in future studies.

In conclusion, our simple machine-learning based model using common clinical and echocardiographic features can predict mortality in patients with ≥moderate TR with a reasonable precision. This study highlights the role of ML models to predict outcomes in ≥moderate TR; these models need to be refined by including novel markers of RV function and validated in larger prospective studies before incorporation in the clinical practice. The current study also lays the foundation for future studies using deep learning of radiomic features from echocardiogram images and video clips in combination with the clinical features studied in this report.