Article Text

Original research
Interoperator reliability of an on-site machine learning-based prototype to estimate CT angiography-derived fractional flow reserve
  1. Yushui Han1,
  2. Ahmed Ibrahim Ahmed1,
  3. Chris Schwemmer2,
  4. Myra Cocker3,
  5. Talal S Alnabelsi1,
  6. Jean Michel Saad1,
  7. Juan C Ramirez Giraldo3 and
  8. Mouaz H Al-Mallah1
  1. 1Debakey Heart & Vascular Center, Houston Methodist Hospital, Houston, Texas, USA
  2. 2Computed Tomography-Research & Development, Siemens Healthcare GmbH, Erlangen, Bayern, Germany
  3. 3Computed Tomography-Research Collaborations, Siemens Healthcare USA, Malvern, Pennsylvania, USA
  1. Correspondence to Dr Mouaz H Al-Mallah; mal-mallah{at}


Background Advances in CT and machine learning have enabled on-site non-invasive assessment of fractional flow reserve (FFRCT).

Purpose To assess the interoperator and intraoperator variability of coronary CT angiography-derived FFRCT using a machine learning-based postprocessing prototype.

Materials and methods We included 60 symptomatic patients who underwent coronary CT angiography. FFRCT was calculated by two independent operators after training using a machine learning-based on-site prototype. FFRCT was measured 1 cm distal to the coronary plaque or in the middle of the segments if no coronary lesions were present. Intraclass correlation coefficient (ICC) and Bland-Altman analysis were used to evaluate interoperator variability effect in FFRCT estimates. Sensitivity analysis was done by cardiac risk factors, degree of stenosis and image quality.

Results A total of 535 coronary segments in 60 patients were assessed. The overall ICC was 0.986 per patient (95% CI 0.977 to 0.992) and 0.972 per segment (95% CI 0.967 to 0.977). The absolute mean difference in FFRCT estimates was 0.012 per patient (95% CI for limits of agreement: −0.035 to 0.039) and 0.02 per segment (95% CI for limits of agreement: −0.077 to 0.080). Tight limits of agreement were seen on Bland-Altman analysis. Distal segments had greater variability compared with proximal/mid segments (absolute mean difference 0.011 vs 0.025, p<0.001). Results were similar on sensitivity analysis.

Conclusion A high degree of interoperator and intraoperator reproducibility can be achieved by on-site machine learning-based FFRCT assessment. Future research is required to evaluate the physiological relevance and prognostic value of FFRCT.

  • Computed Tomography Angiography
  • Biostatistics

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key questions

What is already known about this subject?

  • Studies have shown similar sensitivity and specificity between machine learning (ML)-based fractional flow reserve (FFRCT) determination and computational fluid dynamics-based determination.

  • Reproducibility of measurements across operators is not well demonstrated in ML-based FFRCT determination.

What does this study add?

  • We have shown a high degree of interoperator and intraoperator reliability for ML-based FFRCT in a representative patient population.

  • Our study contributes to the body of literature supporting the role of ML-based FFRCT determination in providing timely data for guiding revascularisation strategies among patients being evaluated for coronary artery disease.

How might this impact on clinical practice?

  • This study will help enable cardiovascular clinicians to evaluate how ML-based FFRCT prototype performs in transitioning FFRCT processing from outside centre to point of care.


The role of fractional flow reserve (FFR) assessment in the evaluation and management of patients with coronary artery disease was firmly established by the FAME (Fractional Flow Reserve vs Angiography for Guiding Percutaneous Coronary Intervention) trial, which demonstrated that an FFR guided percutaneous coronary intervention (PCI) strategy was superior at reducing the rates of death, myocardial ischemia and repeat revascularisation at 1 year1 However, the need for an invasive angiography and its attendant risks limited routine use in clinical practice.

Advances in computational fluid dynamics (CFD), a non-invasive image postprocessing technique, enabled the determination of physiological significance of coronary artery stenosis by using data acquired from standard, routine diagnostic coronary cardiac tomography angiography (CCTA) studies. Machine learning (ML)-based flow assessment is the latest development using an artificial intelligence algorithm to compute the functional severity of a lesion.2–4 ML-based FFRCT determination enables a rapid on-site determination by the reading physician, providing timely point-of-care information without the potential risks to patient privacy arising from off-site data transfer.

ML-based FFRCT requires semiautomatic determination of centreline, lumen contour and stenosis area, all of which potentially contributing to variability. Although several studies have shown similar sensitivity and specificity to CFD-based determination,5 research is lacking on reproducibility of this operator dependent technology. The purpose of this study is to measure the interoperator and intraoperator reliability and determine the reproducibility of coronary CT angiography-derived fractional flow reserve (FFRCT) values using a postprocessing prototype based on ML algorithm.


Patient population

The population from which the current subgroup analysis was done has been published before.6 Briefly, the study population was defined as patients who underwent both clinically indicated CCTA and single photon emission computed tomography (SPECT) myocardial perfusion imaging for suspected coronary artery disease between 1 January 2016 through 22 June 2020 (n=965). Next, patients with prior PCI, coronary artery bypass grafts and left ventricular assist devices were excluded (n=258, 93 and 2, respectively). Moreover, patients with congenital abnormalities of the coronary tree (n=12) and those with severe valvular abnormalities (n=30) were also excluded, as were those with revascularisation or myocardial infarction between the two studies (n=25). Last, patients with excessive calcification or poor image quality who could not be processed by the FFRCT prototype (due to failure in tracing central line or vessel lumen) were excluded (n=74). A cohort of 471 patients was obtained following the application of the above exclusions.

Sixty (60) patients were randomly selected using simple random sampling without replacement. We aimed to sample 10% of the larger population and sampling was stratified by categories of stenosis using Society of Cardiovascular Computed Tomography Coronary Artery Disease Reporting & Data System (CAD-RADS) to be representative of the population from which sampling was done. Approval from the Institutional Review Board was obtained prior to the start of the study and informed consent was waived due to the retrospective nature of the study. Patients or the public were not involved in the design, or conduct, or reporting, or dissemination plans of our research.

Assessment of covariates

Information on sociodemographic variables (age and gender), medical history, comorbidities (hypertension, diabetes, dyslipidaemia), smoking history and medication use was obtained from chart review of all patient profiles in electronic medical records within 30 days of imaging.


CCTA scans were obtained using third generation SOMATOM FORCE Scanner (Siemens, Forchheim, Germany). Image acquisition was performed in accordance with the Society of Cardiovascular Computed Tomography (SCCT) guidelines.7 Intravenous metoprolol was administered for patients with a heart rate ≥65 beats/min and sublingual nitroglycerin 0.4 mg was administered immediately before image acquisition. During image acquisition, 60–100 cc of contrast was injected, followed by saline flush. Axial scans were obtained with prospective electrocardiographic gating. Image acquisition was prescribed to include the coronary arteries, left ventricle and proximal ascending aorta.

Images were assessed with a three-dimensional workstation using one of several postprocessing methods including axial, multiplanar reformat, maximum intensity projection and cross-sectional analysis. The quality of scans was determined by expert opinion and qualitatively graded as fair, good and excellent. Type and location of lesion were visually evaluated using an 18-segment model according to SCCT guidelines.7 In each segment, atherosclerosis was defined as tissue structures >1 mm2 within the coronary artery lumen or adjacent to the lumen that could be discriminated from pericardial tissue, epicardial fat or vessel lumen itself.

Per cent coronary stenosis was quantified based on a comparison of the luminal diameter of the segment exhibiting obstruction to the luminal diameter of the most normal-appearing site and classified as none (0%), mild (1%–49%), moderate (50%–69%) or severe (≥70%) based on degree of narrowing of the luminal diameter. Anatomically obstructive CAD by CCTA was defined as ≥50% in the left main (LM) artery and ≥70% stenosis severity in proximal, mid and distal branches of the left anterior descending (LAD), left circumflex (LCX) and right coronary artery (RCA) without including side branches. Findings were reported using CAD-RADS.8

Segment involvement score (SIS) was used to quantify burden of disease using CCTA. Using an 18-segment coronary artery model, each segment was individually scored as 0 or 1 based on the presence of plaque irrespective of the degree of stenosis. The sum of all involved segments was calculated for each patient. A Hounsfield unit threshold of ≥130 was used to classify plaques composition as calcified (C SIS), mixed (M SIS), calcified or mixed (C/M SIS) and non-calcified plaque (NC SIS).


FFRCT was determined using a ML-based prototype for computation of fractional flow reserve (cFFR 3.2, Siemens Healthcare GmbH, Forchheim, Germany).

CCTA accusation phase was chosen based on heart rate and absence of motion. The best diastolic phase was selected for heart rate <65 bpm, while the best systolic phase was used when heart rate ≥65 bpm. The coronary tree was isolated semiautomatically to generate a three-dimensional model. The extent of manual adjustment was proportional to severity and extent of calcification, with most cases requiring limited adjustment of centreline and contour.

The algorithm generated a value for every point of the coronary artery tree using the ratio of the average aortic and local pressure over a cardiac cycle. A three-dimensional color-coded mesh of the coronary artery tree was created in combination with functional information at each segment of interest. FFRCT was determined at the mid-point of a vessel segment for normal vessels and 1 cm distal to stenosis when one was present based on prior work showing higher prognostic role of measurements distal to stenosis.9 Determination was made for the LM and proximal, mid and distal segments of the LAD, LCX and RCA without including side branches. Vessel segments that could not be isolated by the prototype were coded as missing. FFRCT of <0.8 in the LM or any proximal, mid or distal segment was considered as the threshold for significant ischaemia based on prior literature.10–13

Image processing was done by two investigators blinded to results from other tests. Both had backgrounds in the health sciences (one with a medical degree and another with a master’s in biomedical engineering) but no prior experience with CCTA or FFRCT processing. Training was organised by the vendor on the steps of image processing. Investigators subsequently processed and received feedback on the first 20 set of patients. Experts from the vendor were consulted on difficult cases throughout the data collection phase. Investigators independently completed all steps involved in processing images (editing centreline, vessel contour and localising areas of stenosis). Each investigator did two rounds of processing for each patient. During the second round, processing was started from the beginning without using persistent data from the prior round.

Online supplemental figure 1 demonstrated the prototype interface and FFRCT results of the same patient case processed by two operators.

Statistical analysis

Analysis was done on a per-patient and per-segment level. For per-patient analysis, comparisons were made between the mean of all isolated segmental FFRCT values of the coronary artery. Per-segment analysis was stratified by each segment to assess for difference in reproducibility comparing proximal (LM and proximal branches of LAD, LCX and RCA) versus distal segments in light of prior studies that have shown a decrease in FFR/FFRCT values from proximal-distal segments even in vessels with no obstruction, and the prognostic value of change in FFR/FFRCT.14–16 Both interinvestigator and intrainvestigator agreement were assessed. The average of two rounds of processing was used for interinvestigator analysis, and per cent reclassification was determined using an FFRCT threshold of <0.8.

The mean difference with 95% limits of agreement and intraclass correlation coefficient (ICC) using two-way mixed effect model were used to assess for agreement. Thresholds for agreement were classified based on prior literature (<0.2, poor; 0.2–0.4, fair; 0.4–0.6-moderate; 0.6–0.8, good; 0.8–1.0, very good).17 Furthermore, Bland-Altman analysis was used to evaluate for variability. Sensitivity analysis was done by cardiac risk factors, degree of stenosis and image quality. All analyses were done using Stata V.16.0 (Stata, College Station, Texas, USA).



Sixty (60) patients were included in this study. Baseline characteristics are listed in table 1. The mean age was 63.5+11.7 years and 45% were women. The majority had cardiovascular comorbidities: 78% hypertension, 68% diabetes and 87% dyslipidaemia. Most patients (52%) were symptomatic with chest pain or shortness of breath and a majority were on some form of medication (83% aspirin/clopidogrel, 83% statin, 65% ACE inhibitor/angiotensin receptor blocker).

Table 1

Baseline characteristics of patients


More than half (83%) of CCTA’s image quality was graded as excellent or good by reading physicians, and none were considered non-diagnostic. Most patients had CAD-RAD scores ≤2 (62%) and the mean (SD) SIS was 4.6 (±4.09). Obstructive stenosis was present in 12 (20%) patients and 5 (8%) had multivessel disease. Nearly half (47%) of patients had functional stenosis (FFRCT <0.8), and the LAD was the most affected vessel. No patient in our cohort had an identifiable ramus intermedius branch. A total of five vessel segments (one distal LAD, one distal LCX and three distal RCA) could not be isolated by both investigators and coded as missing.

Intrainvestigator agreement

Online supplemental table 1 summarises measures of intrainvestigator agreement. There was a high degree agreement between the two measurements taken by the same investigator. Per-patient and per-segment ICCs were >0.95 for both investigators, with higher ICCs in proximal versus distal segments. Absolute differences showed similar trends, with tight limits of agreement.

Interinvestigator agreement

Table 2 summarises measures of interinvestigator agreement. Per-patient and per-segment ICC was 0.986 per patient (95% CI 0.977 to 0.992) and 0.972 per segment (95% CI 0.967 to 0.977). The absolute mean difference in estimates was 0.012 per patient (the 95% CI for limits of agreement: −0.035–0.039) and 0.02 per segment (the 95% CI for limits of agreement: −0.077–0.080). Distal segments had greater variability compared with proximal/mid segments (ICC 0.97 vs 0.962 and absolute mean difference 0.011 vs 0.025 for proximal vs distal segments). Using a threshold of FFRCT <0.8, per-patient discordance was seen in 3.3% (n=2) patients.

Table 2

Interinvestigator agreement

Figures 1 and 2 show Bland-Altman graphs per-patient and per-segment. Tight limits of agreement were seen on all analysis, with relatively wider margins on distal versus proximal segments.

Figure 1

Bland-Altman graphs per-patient and per-segment. Tight limits of agreement are seen on both per-patient and per-segment analysis.

Figure 2

Bland-Altman graphs comparing proximal vs distal segments. Tight limits of agreement are seen on both, with slightly wider margins in distal segments.

Tables 3 and 4 summarise measures of agreement by CAD-RAD and quality of CCTA scans. Absolute mean difference increased with higher CAD-RAD scores and lower quality. Results were more variable with ICC. Table 5 summarises measures of agreement comparing patients with cardiovascular risk factors. Absolute mean differences were lower and ICC was higher among those without versus with risk factors.

Table 3

Interinvestigator agreement by CAD-RAD

Table 4

Interinvestigator agreement by image quality

Table 5

Interinvestigator agreement by cardiovascular risk factors


Using a randomly selected sample from a real-world single-centre cohort of patients, we demonstrated that ML-based ML-FFRCT determination has good reproducibility and reliability.

Non-invasive determination of FFR has the potential to further enhance the gate-keeper role of CT angiography in patients evaluated for coronary artery disease by providing a functional complement to anatomic assessment.18 ML-based FFR determination takes this one step further by offering several distinct advantages all the while maintaining comparative test characteristics to the current CFD-based approach.4 19 Specifically the advantages of a switch from off-site to on-site ML-based FFR determination may translate into reductions in test turn-around time (currently as high as 24 hours), rejection rate (~15%)10 20 21 and cost combined with increased patient data protection by eliminating the need for data exporting and related infrastructural and logistical considerations .

Of the few studies that have looked at reproducibility of non-invasive FFR measurement, most have been on CFD-based methods. For example, a study with repeated off-site non-invasive FFRCT measurement (CFD-based method) on 25 patients showed good reproducibility. The study also went on report no significant difference when comparing FFRCT with FFR obtained from an invasive gold standard.22 However, few studies tackled potential operator dependence similar to the aim of our study. These studies featuring both on-site and off-site approaches have reported a high degree of interoperator correlation which was consistent among operators of different expertise and training.23 24

Our results confirm, at least in our prototype that the observed differences in reproducibility are due primarily to variability in ML algorithm and changes made by the investigators. It would be difficult to conclude of the appropriateness of adjustments as we did not compare our findings to a reference gold standard. However, prior studies have emphasised decreased variabilities in operators receiving face-to-face training.23 As such, in-person training may counter a potential source of variability which has been the incorrect determination of centreline.23 25

A case can be made to the generalisability of our findings to those presenting to a tertiary care cardiology practice as our study used a representative sample from a real-world cohort of patients with consistent results across spectrums of image quality and calcification.

However, our study is not without its limitations. This is an observational single-centre study including patients who had undergone both CCTA and SPECT with a relatively small sample size. Second, no comparison of ML-FFRCT measurements were made with a gold standard. However, two prior studies using invasive FFR as a gold standard have shown a high degree of accuracy with no significant change in variability between operators of varying levels of expertise.23 26 Third, the studied ML prototype is not yet approved for clinical use. However, a meta-analysis has shown high concordance between ML-FFRCT determination by ML prototype to invasive and computational flow dynamics.5 Although the two investigators who processed images had no background in CCTA interpretation, it can be argued that future application of these approaches will be carried out by non-physicians and previous studies have confirmed consistent correlation in ML-FFRCT across a broad range of expertise23

In conclusion, we have shown a high degree of interoperator and intraoperator reliability for ML-based FFRCT in a representative patient population. Our study contributes to the body of literature supporting the role of ML-based FFRCT determination in providing timely data for guiding revascularisation strategies among patients being evaluated for coronary artery disease.

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

Ethics statements

Patient consent for publication

Ethics approval

Approval from the Institutional Review Board was obtained prior to the start of the study and informed consent was waived due to the retrospective nature of the study.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Presented at Data from this project have been presented as an abstract at ACC 2021.

  • Contributors YH, AIA: data collection and interpretation, manuscript drafting. CS, MC, TSA, JMS, JCRG: manuscript revision. MHA-M, conception and design, responsible for the overall content as guarantor.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests MHA-M receives research support from Siemens. CS, MC and JCRG are employed by Siemens.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.