Article Text

Original research
Enhanced detection of severe aortic stenosis via artificial intelligence: a clinical cohort study
  1. Geoff Strange1,2,
  2. Simon Stewart3,4,
  3. Andrew Watts5 and
  4. David Playford6
  1. 1 Cardiology, Heart Research Institute Ltd, Newtown, New South Wales, Australia
  2. 2 The University of Notre Dame Australia, School of Medicine, Fremantle, Western Australia, Australia
  3. 3 Institute for Health Research, The University of Notre Dame Australia, Fremantle, Western Australia, Australia
  4. 4 School of Medicine, Dentistry & Nursing, University of Glasgow, Glasgow, UK
  5. 5 Echo IQ Pty Ltd, Sydney, New South Wales, Australia
  6. 6 School of Medicine, The University of Notre Dame Australia, Fremantle, Western Australia, Australia
  1. Correspondence to Dr Geoff Strange; gstrange{at}neda.net.au

Abstract

Objective We developed an artificial intelligence decision support algorithm (AI-DSA) that uses routine echocardiographic measurements to identify severe aortic stenosis (AS) phenotypes associated with high mortality.

Methods 631 824 individuals with 1.08 million echocardiograms were randomly spilt into two groups. Data from 442 276 individuals (70%) entered a Mixture Density Network (MDN) model to train an AI-DSA to predict an aortic valve area <1 cm2, excluding all left ventricular outflow tract velocity or dimension measurements and then using the remainder of echocardiographic measurement data. The optimal probability threshold for severe AS detection was identified at the f1 score probability of 0.235. An automated feature also ensured detection of guideline-defined severe AS. The AI-DSA’s performance was independently evaluated in 184 301 (30%) individuals.

Results The area under receiver operating characteristic curve for the AI-DSA to detect severe AS was 0.986 (95% CI 0.985 to 0.987) with 4622/88 199 (5.2%) individuals (79.0±11.9 years, 52.4% women) categorised as ‘high-probability’ severe AS. Of these, 3566 (77.2%) met guideline-defined severe AS. Compared with the AI-derived low-probability AS group (19.2% mortality), the age-adjusted and sex-adjusted OR for actual 5-year mortality was 2.41 (95% CI 2.13 to 2.73) in the high probability AS group (67.9% mortality)—5-year mortality being slightly higher in those with guideline-defined severe AS (69.1% vs 64.4%; age-adjusted and sex-adjusted OR 1.26 (95% CI 1.04 to 1.53), p=0.021).

Conclusions An AI-DSA can identify the echocardiographic measurement characteristics of AS associated with poor survival (with not all cases guideline defined). Deployment of this tool in routine clinical practice could improve expedited identification of severe AS cases and more timely referral for therapy.

  • Echocardiography
  • Aortic Valve Stenosis
  • Translational Medical Research

Data availability statement

Data are available upon reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Many people with moderate-to-severe aortic stenosis (AS) are at risk of dying without timely evidence-based care.

WHAT THIS STUDY ADDS

  • We developed and tested an artificial intelligence decision support algorithm to detect the phenotype associated with severe AS, in addition to clinical guideline-defined cases of severe AS from their routine echocardiographic report. The algorithm rapidly identified these high-risk cases with excellent performance and identified patients with a high risk of death.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • This algorithm has the capacity to be uniformly applied as an automated alert system for high-risk patients with AS in routine clinical practice.

Introduction

Affecting millions worldwide, aortic stenosis (AS) is the most common, acquired form of valvular heart disease managed in clinical practice.1 When left untreated, there are substantial societal costs attributable to high rates of premature mortality and quality-adjusted life years lost across the entire spectrum of disease.2 This is important when considering the long-held concept that only severe symptomatic cases of AS should be referred to a specialist heart care team for aortic valve replacement.3–5 Consistent with a broader phenotype of ‘high-risk’ AS (encompassing individuals undergoing mal-adaptive changes to their cardiac structure/function in response to their failing aortic valve),6 there is increasing evidence that both asymptomatic severe and moderate AS are also associated with high rates of mortality.3 4 7 Accordingly, the capacity to improve survival rates in such individuals via interventional strategies is being tested in prospective randomised trials.8

Beyond recognising the prognostic implications of all forms of AS, there is increasing clinical tension to apply optimal and timely strategies to rapidly detect and definitively treat high-risk individuals.9 However, even within the most well-resourced healthcare settings, this is problematic.10 Consequently, the potential for machine learning/artificial intelligence (AI) systems to systematise the detection, treatment and prognostication of AS is being explored via the synthesis of directly acquired and reported clinical data (including from cardiac auscultation,11 mechanical sensors,12 echocardiography,13 14 computed axial tomography/MRI15 and electronic medical records.16

We evaluated the performance of a newly refined artificial intelligence decision support system (AI-DSA)13 to identify adults with more severe forms of AS associated with high-rates of mortality. The reference standard for the performance of the AI-DSA was severe AS as defined by current guidelines.17 18 We further sought to explore the capacity of the AI-DSA to identify those with characteristic moderate-to-severe AS and elevated risk of mortality.7 We also sought to determine the ‘minimum’ echocardiographic parameters routinely reported in clinical practice required by the AI-DSA to operate efficiently.

Methods

The data that support the findings of this study are available from the corresponding author on reasonable request.

Study design

In this retrospective, patient cohort and outcome analysis, we evaluated the performance of a stand-alone AI-DSA to detect individuals with high probability of the severe AS phenotype.17 18 The AI-DSA scrutinised the same echocardiographic information used to evaluate potential heart disease as part of the routine clinical practice. The subsequent pattern of (actual) 5-year all-cause mortality according to the AI-DSA outputs (including a feature to automatically identify guideline-defined severe AS within in the AI-DSA-identified population) was then compared independently. Where appropriate, this study conforms to the ‘Standards for Reporting Diagnostic accuracy studies’ guidelines.19

Data sources

As described in greater detail previously,2 7 9 this unique resource is a vendor agnostic source neutral database containing measurement and text outputs from multiple echocardiographic laboratories Australia-wide. Source data are derived from individuals being routinely investigated/managed with heart disease within Australia’s multicultural population via its well-resourced, universal healthcare system. Only those records with insufficient demographic details to enable highly secure/anonymised individual data linkage to the well-validated National Death Index20 were excluded.

For these analyses, we used the latest version of the mortality-linked database containing 1 077 145 studies from 631 824 individuals aged ≥18 years during the period from 29 May 1985 to 26 June 2019. Individual all-cause mortality was established during a median (IQR) of 4.3 (2.3–7.3) years follow-up from last echocardiogram to a census date of 21 May 2019). A total of 280 individual echocardiographic variables (online supplemental material pp.1–11), representing base measurements and calculations as part of a standard echocardiography examination, were provided for the AI-DSA training and validation. Mortality status was not provided and did not form part of the AI-DSA training. Individuals with prior aortic valve replacement were excluded.

Supplemental material

Development of the AI model

The data set was split (ratio 70:30) into two separate groups, one for model training and the other for test/validation of the trained model (online supplemental material pp.12–18). As part of a six-step process, a modified Mixture Density Network was used to train on the 442 276 individuals (70%) and their 754 503 echocardiograms randomly assigned to the training set. As part of this development process, we were able to directly address the sparsely filled data sets typical in clinical echocardiography. Critically, the left ventricular outflow tract (LVOT) data relevant to the continuity equation (specifically, all velocity, gradient, velocity time integral, dimension and area) were withheld from the AI-DSA test set model. The MDN model was then used to predict the probability of severe AS (after being trained on the entire echo) defined by an aortic valve area (AVA) <1 cm2. The trained model was designed to be general purpose and can perform inference using arbitrary sets of available measurements.

Development of the guideline-quarantined patients

Subsequently, a guideline analysis was developed using the current AS diagnostic guidelines, to quarantine severe patients with AS from the AI-DSA-derived phenotype, if the peak velocity was ≥4.0 m/s, the mean gradient was ≥40 mm Hg and/or the AVA≤1.0 cm2.

Evaluating the model

Once developed, the model’s performance was independently evaluated using data from the remaining 189 548 (30%) individuals and their 322 642 echocardiograms (online supplemental material pp.19–21) that had never been seen by the AI-DSA during training and were linked to mortality outcomes. As an initial diagnostic, selected groups of related measurements were withheld, and the AI predictions were evaluated against known values. These results indicated that the predicted measurements had minimal bias and surprisingly low error bounds considering the heterogeneous nature of the data and that key information (ie, LVOT data) had been intentionally removed.

Testing the AI-DSA

Using the Mixure Density Network (MDN)-predicted output for AVA:

AVA=(π(LVOTd)2/4)x(LVOTVTI/AVVTI)

where LVOT=left ventricular outflow tract, AV=aortic valve, d=dimension and VTI=velocity time integral and using the cumulative distribution function to determine what percentage of the distribution falls above the severe AS threshold, we calculated the probability of an AVA <1 cm2online supplemental material pp.22–27. Curves for receiver operating characteristic (ROC) and areas under the ROC (AUROC) were then generated to evaluate the performance of the severe AS classifier, and the probability threshold value was adjusted for a maximum f1-score calculation. Analyses were conducted on the entire test group and then left ventricular ejection fraction (EF) thresholds of <30% (4203 cases) and <50% (18 799 cases)). For testing the performance of the AI-DSA, mortality outcome data were added to the analysis database.

Individual classification according to AI-DSA outputs

As shown in figure 1, after excluding those with an AV replacement, the overall performance of the AI-DSA, both in terms of determining the severity of AS17 18 at their last echocardiogram and subsequent survival, was assessed in 184 301 individuals aged >18 years. Three main groups (incorporating five subgroups) were identified from the AI-DSA outputs. The first subgroup comprised those with insufficient data (age, body surface area, aortic valve peak velocity and EF) to enable the AI-DSA to produce a probability output. To reflect real-world clinical practice, these were automatically defaulted into the main ‘low-probability’ group. Those individuals with sufficient data, according to the f1-derived threshold of 0.235 (below and above), were initially categorised as ‘low’ or ‘high’ probability of the severe AS phenotype identified by the AI. Within the ‘low’ probability group, a third main group was prospectively derived from those with sufficient data and a probability score below the f1-derived threshold. Specifically, applying 98.25th to 98.50thh percentile of probability distribution below the f1-derived threshold, a probability range of patients with ‘moderate-to-severe’ AS was identified (probability >0.0625 to <0.235). The final safety function then automatically identified guideline-based severe AS.17 18 The AI – DSA was then tested against the 58 170 individuals with a native AV, with guideline-applicable AS data reported.

Figure 1

Flow chart of the training, evaluation, and testing of the AI-DSA to detect severe forms of AS. This schema shows the distribution of cases/investigations from the NEDA cohort used to train and evaluate the AI-DSA (investigation-based, no mortality data) and then test/assess its performance in accurately detecting the severe form of aortic stenosis associated with high 5-year mortality (individual-based). The highest F1 score was chosen as the probability at the peak of the precision/recall relationship, corresponding to a probability of 0.235. The moderate-to-severe aortic stenosis group corresponds to a probability output of>0.0625 (98.25 to 98.5 percentile of probability spectrum below the f1-derived threshold). AI-DSA, Artificial Intelligence Decision Support Algorithm; AS, aortic stenosis; AV, aortic valve; BSA, body surface area, F/U, follow-up; LVEF, left ventricular ejection fraction; LVOT, left ventricular outflow tract; NEDA, National Echo Database of Australia; pct, percentile.

Statistical analyses

Given the size and scope of National Echo Database of Australia (NEDA) cohort (>5 00 000 individuals with >1 million echocardiograms) collated on a consecutive basis, no formal power calculations were performed. Beyond those analyses described above, standard methods for describing grouped data, including means (±SD), median (IQR) and proportions (with 95% CIs)) were performed. Between-group comparisons included ANOVA, Student’s t-test and χ2 analyses where appropriate. Actual 5-year mortality was calculable in 1 04 204 cases with complete 5-year follow-up. For each predetermined group identified, multiple logistic regression (entry model) was used to generate age-adjusted and sex-adjusted OR with 95% CI for 5-year mortality with the lowest probability group (as determined by the AI-DSA) set as the reference group. The same method was repeated for age-specific OR for men and women separately. All descriptive and survival analyses were performed with SPSS V.28.0 (IBM Corporation, Chicago, Illinois) and statistical significance accepted at a two-sided alpha of <0.05.

Results

The performance of our AI-DSA to detect severe AS is shown in figure 2—the AUROC (95% CI) being close to one overall (0.986 (0.985 to 0.987)) and among those with impaired (LVEF <50%–0.986% (0.984 to 0.988)) to severely impaired (LVEF <30%–0.981% (0.975 to 0.986)) left ventricular systolic function. The subsequent precision-recall of the AI-DSA to detect severe AS in those with more complete echocardiographic reports is shown in figure 3—the AUPR (95% CI) being close to 0.9 overall (0.876 (0.869 to 0.883)) and among those with impaired (LVEF <50%–0.904% (0.892 to 0.915)) to severe impaired (LVEF <30%–0.897% (0.871 to 0.920)) left ventricular systolic function. An f1-derived threshold (the harmonic mean of precision/recall) based on the probability output of the AI-DSA was identified at 0.235—figure 4. Among the 184 301 individuals comprising the test cohort, 80 971 (52.3%) did not have the minimum data to produce a probability output. Minimum data to allow for AS guideline application were also not available in 128 228 individuals (66.9%), noting that this predominantly comprised echocardiograms without AS—Most echos with insufficient data to calculate the AVA had AV peak velocity ≤2 m/s (in 62 537 individuals, 55.9%). Notably, however, there was a small but important proportion of individuals with elevated AV peak velocities (≥3 m/s, 1356 individuals, 28.1%) had insufficient data to calculate the AVA using the continuity equation (n=1264 (26.2%) without LVOT peak velocity and/or LVOT diameter and n=1984 (41.1%) without AV VTI and/or LVOT VTI and/or LVOT diameter). All of these individuals had an AS probability determined by the AI.

Figure 2

Performance of the model to detect severe AS. This graph shows the performance of the model underpinning the AI-DSA to identify an aortic valve area of<1.0 cm2. AI-DSA, Artificial Intelligence Decision Support Algorithm; AS, aortic stenosis; FPR, false positive rate; LVEF, left ventricular ejection fraction; NEDA, National Echo Database of Australia; TPR, true positive rate.

Figure 3

Precision-recall performance of the model to detect severe AS. This graph shows the precision values (true positives/(true positives+false negatives)) on the y-axis and recall values (true positives/(true positives+false positives) on the x-axis derived from the AI-DSA output. AI-DSA, Artificial Intelligence Decision Support Algorithm; AS, aortic stenosis; LVEF, left ventricular ejection fraction; NEDA, National Echo Database of Australia.

Figure 4

Probability threshold of the model to detect severe AS. This graph shows the plots used to determine the F1-derived threshold based on the average of precision and recall of the AI-DSA to detect severe aortic stenosis (main red dotted line) overall (F1-derived probability threshold 0.235) and in those with a left ventricular ejection fraction <50% and <30%. It also shows (short black dotted line)—the 0.0625 probability threshold for identifying ‘moderate aortic stenosis’ group. AI-DSA, Artificial Intelligence Decision Support Algorithm; AS, aortic stenosis; LVEF, left ventricular ejection fraction; NEDA, National Echo Database of Australia.

Table 1 summarises the demographic and echocardiographic characteristics of the three main groups identified within the test cohort; comprising 177 073 individuals (96.1% designated as low probability of severe AS, including 80 971 with a definitive AI-DSA probability score), 2606 (1.4% overall) identified as ‘increased risk’ of the moderate-to-severe AS phenotype and 4622 (2.5% overall) identified as severe AS. Of the latter, 3566 (77.2%) had severe AS according to clinical guidelines—the AI-DSA identifying all such cases when possible.

Table 1

Distribution of the test cohort according to the AI-DSA outputs

Overall, there were statistically significant differences (p<0.001 all comparisons) between the three main groups, with exception of heart rate and left ventricular diastolic/systolic diameter. Low probability AS cases identified by the AI-DSA were significantly younger and comprised more women than those with a high probability of moderate-to-severe and severe AS, while demonstrating more normal cardiac parameters. Conversely, the individuals the AI-DSI identified as high probability severe AS (the oldest, male predominant group) had appropriately high levels of AV, left ventricular and right ventricular dysfunction. Those identified with high probability moderate-to-severe AS also had high levels of cardiac dysfunction, but to a less extent. The specific comparison between those identified by the AI-DSA as a high probability of severe AS, versus those who met guideline criteria for severe AS, revealed minor, but statistically significant differences between the two in respect to AV function, indices of diastolic dysfunction and evidence of left ventricular remodelling (all worse in the guideline group). Overall, those identified as high probability of the severe AS phenotype by AI, but outside guidelines for severe AS, had findings typically found at the higher spectrum of currently reported moderate AS.

Actual 5 year mortality was 67.9% and 56.2% among the 1896 and 903 individuals identified as high probability severe and moderate-to-severe AS, respectively. This compared with a mortality rate of 22.9% in the low probability group—the age-adjusted and sex-adjusted OR (95% CI) for all-cause mortality being 1.82 (1.63 to 2.02) and 2.80 (2.57 to 3.06) for the moderate-to-severe and severe AS groups (figure 5). Within the low probability group, 5-year mortality was significantly lower in comparison to the specific AI-DSA identified group (9068/47 345 (19.2%)) than those with insufficient data (13 452/51 578 (26.3%)); age-adjusted and sex-adjusted OR 1.60 (95% 1.55 to 1.65, p<0.001). Within the high probability group, 5-year mortality was significantly higher in those who met guideline criteria for severe AS (1438/2081 (69.1%)) compared with those identified by the AI-DSA (458/711 (64.4%)); age-adjusted and sex-adjusted OR 1.26 (95% 1.04 to 1.53, p=0.021)—figure 6. On a sex-specific basis, relative to the low probability group, the age-adjusted OR (95% CI) for 5-year all-cause mortality was consistently higher for the 49 120 women versus 54 160 men categorised as moderate-to-severe AS (2.03 (1.74 to 2.36) vs 1.63 (1.40 to 1.91)) and severe AS (3.00 (95% CI 2.68 to 3.40) vs 2.56 (95% CI 2.32 to 2.91)).

Figure 5

Actual 5-year all-cause mortality according to three main outputs from the AI-DSA. This graph shows the 5- year actual mortality curves (all-cause and with no censoring of cases) for the three main output groups from the AI-DSA—with low probability individuals being the reference group). The ORs for age and sex were 1.08 (95% CI 1.08 to 1.08 per annum) and 1.50 (95% CI 1.46 to 1.55 for men vs women). AI-DSA, Artificial Intelligence Decision Support Algorithm.

Figure 6

Actual 5-year all-cause mortality in the two severe AS groups (AI-DSA identified vs guidelines). This graph shows the 5-year actual mortality curves (all-cause and with no censoring of cases) for the two output groups identified by the AI-DSA as severe aortic stenosis—according to clinical guideline criteria (black line) or otherwise (red line—reference group). The ORs for age and sex were 1.09 (95% CI 1.08 to 1.10 per annum) and 1.28 (95% CI 1.08 to 1.53 for men vs women). AI-DSA, Artificial Intelligence Decision Support Algorithm; AS, aortic stenosis.

We examined the potential importance of a simple measure of the AVA compared with the probability assessment in a series of sensitivity analyses. Of the 29 102 individuals with actual 5-year mortality follow-up data available, 1566 individuals had a calculated AVA <1 cm2, 898 had an AVA 1.0 cm2 to <1.2 cm2, 1738 had AVA 1.2 cm2 to <1.5 cm2, 4348 had an AVA 1.5 cm2 to <2 cm2 and 20 552 had an AVA >2 cm2. As expected, each AVA group had an elevated 5-year mortality, with the highest risk in the lowest AVA group. However, the probability score transcended the AVA by continuing to predict mortality beyond each AVA group, providing an additional age-adjusted and sex-adjusted hazard of 1.40 (1.06 to 1.86), p=0.019.

In a separate sensitivity analysis, we examined whether the AI probability could predict mortality risk in low-flow AS beyond the AVA alone. Of 341 patients with a recorded Stroke Volume Index below 35 mL/m2 and AVA <1 cm2, 259 individuals (76.0%) had died within 5 years. After adjustment for age and sex, the AI probability continued to be independently highly associated with mortality (HR 1.98, 95% CI 1.13 to 3.46, p=0.016), whereas low-flow low-gradient severe AS was not independently associated with mortality (HR 1.29, 95% CI 0.89 to 1.87, p=0.17). These data confirm that the probability score is not confounded by the severity or type of AS.

As a final analysis to determine whether the AI-DSA is simply mimicking expanded thresholds of risk based on AVA and Vmax (when available), we generated five new moderate-to-severe AS subgroups. These groups demonstrated the expected small increments in 5-year mortality, with the severe AS subgroup of AVA <1.0 cm2 (and Vmax >3.2 m/s) demonstrating the worst survival (see online supplemental table S1, figure S1–S3). The moderate (55.4%) and severe AS (65.4%) subgroups demonstrated the expected adverse 5-year mortality, with the comparator group experiencing 23.4% 5-year mortality (online supplemental figure S2). We also confirmed the high 5-year mortality for the two guideline-derived severe AS subgroups in addition to those without an AVA available (AVA <1.2 cm2 and Vmax >4.0 m/s, AVA <1.0 cm2 and Vmax >3.2 m/s, and no AVA and Vmax>3.2 m/s, see online supplemental figure S3).

Discussion

To our knowledge, this is the largest study to train and test the performance of an AI-DSA to interpret routinely acquired echocardiographic reporting data to transcend the more traditional AVA calculation and identify the phenotype of more severe AS associated with high mortality. In almost 90 000 individuals with age, BSA, peak aortic velocity and LVEF recorded, the AI-DSA identified 5.2% as ‘high-probability’ for severe AS. Overall, 23.8% of these cases were not guideline-defined severe AS. AUROC and precision-recall analyses consistently demonstrated excellent performance values (close to one) for all individuals, including those with impaired left ventricular systolic function. Compared with the low-probability group, the age-adjusted and sex-adjusted odds of 5-year mortality was 2.41-fold higher among those categorised as as high-probability severe AS (67.9% mortality), respectively. Our methodology is novel and meets the growing calls for automated echocardiographic reporting, to identify patients reliably and consistently with AS phenotypes that may benefit from early clinical review and consideration for AV intervention.21 22

Recently, a range of AI techniques have been applied to expedite the diagnosis of (predominantly severe) AS and improve its prognostication following AV replacement. In routine clinical practice, simple cardiac auscultation augmented by a digital AI-assisted stethoscopes has potential to identify those requiring cardiac imaging.11 A similar approach has been explored in relation to AI detection of characteristic T wave changes on surface electrocardiograms.23 Novel technology involving non-invasive inertial sensors12 has also shown distinct phenotypes can be identified using invasive haemodynamics and cardiac imaging, which strongly predict future clinical outcomes.6 22 Previously, we have demonstrated that the relationship between echocardiographic measurement variables could be automatically associated together to predict severe AS.13 This approach has been subsequently refined by applying newer machine learning techniques and clinical practice guideline criteria.17 18 Collectively, these studies have revealed the progressive nature of AV disease (involving valve calcification and myocardial fibrosis) as well as the normalisation of AV function following surgical intervention.24 After trans-aortic valve intervention, a machine learning system involving multiple clinical, CT, electrocardiographic and echocardiographic inputs was able to outperform standard clinical risk scores 1 year in predicting outcomes.25 Thus, there is increasing potential to apply machine learning techniques to streamline the identification of high-risk cases with AS in routine clinical care.26

Why should an AI system like ours add value to clinical practice, when conventionally, there is only a dichotomy of treatable cases in AS (symptomatic severe vs the rest)? Our primary goal is to highlight at-risk patients for timely clinical review, closer monitoring and guideline-directed therapy where appropriate. Typical changes associated with progressive AS include AV calcification, left ventricular hypertrophy, diastolic dysfunction, increased left atrial pressure and pulmonary hypertension.15 These pathophysiological changes have been used to improve timely identification of individuals in need of AV replacement, with the goal of improving clinical outcomes and minimising mortality risk.26 Although direct visualisation of the multiple interconnected layers inherent to the neural network is not possible, our AI-DSA automatically identifies the typical phenotypic AS features associated with progressive disease, including a gradient of mortality with increasing AS probability that remains independently predictive after adjustment for the traditionally measured AVA. Using a progression of probability of severe AS from the AI-DSA, it is possible to discriminate differing risk thresholds, as we have shown with identification of the severe phenotype (inside and outside of traditional guidelines) as well as the moderate-to-severe group of individuals who are also at increased mortality risk, and thereby highlighting at-risk individuals in need of clinical review.

Timely recognition of those at risk of dying from progressively worse AS remains a high-priority clinical issue.2 Studies from a range of health systems have consistently demonstrated the under-treatment of severe AS with consequently high mortality rates.27 Low-gradient AS, which can be technically challenging to diagnose, is particularly subjected to variation in clinical interpretation of severity and under-represented in AV intervention.28 Our AI-DSA performs equally well in high-gradient AS and in low-gradient AS, with AUCROC and AUPR almost identical in normal EF, mildly impaired and severely impaired systolic function. Recent observations of a significant detection and treatment gap in AS have prompted ‘urgent calls to action’ by major cardiology and echocardiography societies.21 29 Our AI-DSA can be directly incorporated into echocardiographic reporting systems, or applied to echo labs directly, thereby triggering an automated alert to the presence of conventional severe AS, in addition, those who require further consideration for action. When combined, these represented ~7% (ratio of 1:1) of all test cases with sufficiently detailed reporting data. Critically, the AI-DSA output requires no modification of standard echocardiographic imaging or acquisition workflow. As recently reported by rECHOmend Investigators, it is possible that similar machine learning technology with a broader relevance to all forms of structural valvular disease has the potential facilitate meaningful recommendations for echocardiography in clinical practice.30

Limitations

The AI-DSA is not intended to replace human clinical decision-making, given some patients will have conditions (eg, dementia, frailty or systemic disease) precluding surgical management. Furthermore, it is currently incapable of identifying other cardiac diseases that may adversely affect prognosis (eg, amyloid cardiomyopathy). Moreover, the AI-DSA is dependent on minimum echocardiographic measurements being performed and could not deliver a definitive output in just over 50% of echocardiograms, although most missing aortic valve data were in patients without AS consistent with common echocardiography practice. While affected individuals (who also had insufficient data to apply AS guideline criteria) had lower mortality than positively identified AS cases, it was higher than those more definitively identified as ‘low probability AS’. Although the AI-DSA reliably identifies low-gradient severe AS, this form of AS is typically associated with other cardiac diseases. Given that data from a large multicultural Australian cohort were used to train and test the AI-DSA, it has yet to be tested in other geographic regions/health systems or specific ethnic groups. Critically, we were unable to discriminate between symptomatic and non-symptomatic severe AS and NEDA does not (currently) capture potentially confounding data on comorbidities, hospital episodes and/or pharmacotherapies. The AI-DSA has yet to be compared with the routine clinical detection and management of AS in respect to the cost-effective management of severe forms of AS.

Conclusions

In summary, we have demonstrated that a readily deployable AI-DSA has the potential to identify individuals with the phenotype of more severe AS associated with poor survival (if left untreated). Consistent with calls from major cardiac societies, the AI-DSA raises an automatic alert when the results of routine echocardiography are being reported. It can do so without increasing the clinical workload for the sonographer, echocardiography laboratory or the referring clinician. With further integration into current workflow, this AI-DSA could improve expedited identification of severe AS cases and more timely referral for therapy.

Data availability statement

Data are available upon reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

All data were derived from the National Echo Database of Australia (NEDA). NEDA has obtained ethical approvals across Australia from all relevant Human Research Ethics Committees. A patient consent waiver was authorised for all retrospectively acquired data used in these analyses.

Acknowledgments

Dr Stewart is supported by the NHMRC of Australia (GNT1135894).

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @PlayfordDavid

  • Contributors The study design was conceived by SS and DP. ECHO IQ Pty Ltd developed the AI-DSA (EchoSolv®) and provided the optimal f1 probability (Dr Watts) but played no role in clinical data collection, data analysis, data outcome interpretation, or writing of the report. The data were independently analysed in a blinded fashion by Profs Stewart, Strange and Playford. SS drafted the first draft of the manuscript and did the sensitivity analyses. All authors contributed to the final version of the submitted manuscript. The corresponding author (Prof Strange) had full access to all the data (derived from the training and test groups) with the addition of mortality data, and the final responsibility to submit for publication.

  • Funding National Health & Medical Research Council of Australia and ECHO IQ Pty Ltd. Drs Strange and Playford are the Co-Principal Investigators and Directors of NEDA Ltd (a not-for-profit research entity). NEDA has received investigator-initiated funding support from Novartis Pharmaceuticals, Pfizer Pharmaceuticals, ECHO IQ and Edward Lifesciences in the past 3 years. Dr Stewart has received consultancy fees from NEDA and ECHO IQ. The funders played no role in the overall study design.

  • Competing interests Profs Stewart, Playford and Strange have previously received consultancy/speaking fees from Edwards Lifesciences. Profs Playford and Strange have received consultancy fees from Medtronic, Edwards Lifesciences, Abbott Laboratories and ECHO IQ Pty Ltd. Dr Watts is employed by ECHO IQ Pty Ltd.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.