Abstract
BACKGROUND AND PURPOSE: Intracranial steno-occlusive lesions are responsible for acute ischemic stroke. However, the clinical benefits of artificial intelligence (AI)-based methods for detecting pathologic lesions in intracranial arteries have not been evaluated. We aimed to validate the clinical utility of an AI model for detecting steno-occlusive lesions in the intracranial arteries.
MATERIALS AND METHODS: Overall, 138 TOF-MRA images were collected from 2 institutions, which served as internal (n = 62) and external (n = 76) test sets, respectively. Each study was reviewed by 5 radiologists (2 neuroradiologists and 3 radiology residents) to compare the usage and nonusage of our proposed AI model for TOF-MRA interpretation. They identified the steno-occlusive lesions and recorded their reading time. Observer performance was assessed by using the area under the jackknife free-response receiver operating characteristic curve (AUFROC) and reading time for comparison.
RESULTS: The average AUFROC for the 5 radiologists demonstrated an improvement from 0.70 without AI to 0.76 with AI (P = .027). Notably, this improvement was most pronounced among the 3 radiology residents, whose performance metrics increased from 0.68 to 0.76 (P = .002). Despite an increased reading time by using AI, there was no significant change among the readings by radiology residents. Moreover, the use of AI resulted in improved interobserver agreement among the reviewers (the intraclass correlation coefficient increased from 0.734 to 0.752).
CONCLUSIONS: Our proposed AI model offers a supportive tool for radiologists, potentially enhancing the accuracy of detecting intracranial steno-occlusion lesions on TOF-MRA. Less experienced readers may benefit the most from this model.
ABBREVIATIONS:
- AI
- artificial intelligence
- AUC
- area under the receiver operating characteristic curve
- AUFROC
- area under the jackknife free-response receiver operating characteristic curve
- ICC
- intraclass correlation coefficient
- JAFROC
- jackknife free-response receiver operating characteristic
SUMMARY
PREVIOUS LITERATURE:
Previous studies have utilized deep-learning algorithms to detect intracranial steno-occlusive lesions, leveraging semi- or fully automated techniques and image reconstruction methods. Despite advancements, accurate detection and localization remain challenging due to the complex nature of intracranial arteries and limitations in existing methods, such as the time-consuming extraction of multiple arteries and the inability to accurately measure the width of occluded lesions. Our recently proposed approach integrates classification and localization within established medical imaging networks, aiming to overcome these challenges by simultaneously segmenting blood vessels and detecting lesions without extensive image reconstruction or patch-based analysis.
KEY FINDINGS:
The use of our AI model improved the detection accuracy of intracranial steno-occlusive lesions on TOF-MRA, with an improvement from 0.70 to 0.76 in the AUFROC for radiologists. Radiology residents, in particular, benefited considerably from AI assistance, highlighting its potential to enhance diagnostic accuracy.
KNOWLEDGE ADVANCEMENT:
Our study advances knowledge by demonstrating the clinical utility of an AI model in improving radiologists’ accuracy in detecting intracranial steno-occlusive lesions. This suggests that AI can be a valuable support tool, especially for less experienced readers, potentially increasing diagnostic performance and contributing to better patient outcomes.
Acute ischemic stroke is the second leading cause of death and a major cause of disability worldwide.1,2 One of the primary underlying factors responsible for ischemic stroke is intracranial steno-occlusive lesions.3⇓-5 Thus, the prompt and precise identification of steno-occlusive lesions is of paramount importance in diagnosing patients with ischemic stroke and in selecting appropriate therapeutic strategies.6,7
TOF-MRA is a commonly used noninvasive imaging technique for evaluating intracranial arteries.6 However, accurate detection and the precise localization of steno-occlusive lesions present a challenge because of the intricate shapes of intracranial arteries. Meticulous evaluation requires considerable time and effort, which leads to an increased workload and the subsequent risk of detection failure.
Methods of using deep-learning algorithms have emerged to detect steno-occlusive lesions automatically in intracranial arteries.6,8⇓-10 Previous approaches have utilized semi- or fully automated labeling and techniques such as straightened MPR images, along with extracting blood vessels in advance of measuring the width, aiming to detect stenoses. Despite advancements, the detection of steno-occlusive lesions in intracranial arteries remains challenging due to the time-consuming nature of extracting multiple arteries with MPR, loss of vascular bifurcation, and inability to measure the width of occluded lesions even with pre-extracted blood vessels. Our novel approach leverages multitask learning to segment blood vessels and detect lesions simultaneously, without the need for extensive image reconstruction or patch-based analysis.10 This method aids in enhancing lesion detection efficiency by integrating classification and localization modules, thus offering a comprehensive solution to the limitations identified in prior studies. However, the clinical benefits of such artificial intelligence (AI) methods have not yet been evaluated sufficiently.
Therefore, we aimed to investigate the potential benefits and limitations of an AI-based model to aid radiologists in detecting steno-occlusive lesions. Specifically, we assessed the model’s lesion detection accuracy and its impact on interpretation time compared with those of conventional methods.
MATERIALS AND METHODS
This multicenter retrospective study was approved by the institutional review boards of the Seoul National University Bundang Hospital (SNUBH) and Seoul St. Mary’s Hospital (SSMH), both of which waived the requirement for informed consent (No.: B-2204-753-106 and KC20RIDI0197, respectively).
Study Cohort
Sixty-two individuals were collected from the SNUBH database between October 2014 and August 2019 as an internal test set, including 30 with intracranial stenosis or occlusion (stenosis group) and 32 without steno-occlusive lesions (healthy group). The inclusion criteria for the stenosis group were as follows: 1) age >18 years, 2) TOF-MRA and DSA performance within a 1-month interval, and 3) moderate or severe degree stenosis according to the DSA report (>50% stenosis by using the Warfarin Versus Aspirin for Symptomatic Intracranial Disease method as follows: % stenosis = (1−[Dstenosis/Dnormal]) × 100). The inclusion criteria for the healthy group were as follows: 1) age >18 years, 2) TOF-MRA performance, and 3) normal MRA findings according to the radiologic report.
In addition, 76 individuals were collected as an external test set from the SSMH database from January 2016 to December 2019, comprising 30 with intracranial stenosis or occlusion (stenosis group) and 46 without steno-occlusive lesions (healthy group). The inclusion criteria for the SSMH cohort were as follows: 1) age >18 years and 2) TOF-MRA results.
Patients in the stenosis group were randomly selected. For the healthy group, the patients were randomly chosen among individuals who underwent brain MR imaging as part of health screenings and received normal MRA reports. In total, we collected 138 individuals from 2 institutions, with 60 in the stenosis group and 78 in the healthy group.
TOF-MRA Acquisition
For SNUBH, 3D TOF-MRA examinations were performed using a 1.5T (Amira, Siemens Healthineers, or Intera, Philips Healthcare) or 3T scanner (Achieva or Ingenia, Philips Healthcare). The scan parameters were as follows: TR, 20–27 ms; TE, 3.45–7.15 ms; flip angle, 18°–25°; field of view, 132–230 mm; section thickness, 0.5–1.6 mm; matrix, 256 to 704 × 163 to 360.
For SSMH 3D TOF-MRA was performed using a 1.5T (Avanto, Siemens Healthineers; Achieva) or 3T (Verio or Vida, Siemens Healthineers; Ingenia) scanner. The scan parameters were as follows: TR, 17.9–25 ms; TE, 3.5–7 ms; flip angle, 18°–23°; field of view, 170–240 mm; section thickness, 0.4–1.2 mm; matrix, 384 to 512 × 214 to 331.
AI Model
We used a deep-learning algorithm to detect steno-occlusive lesions by using traces of intracranial arteries (Fig 1). Our model utilized an image segmentation model such as U-Net as a backbone,11 augmented by additional modules to detect steno-occlusive lesions. Specifically, we designed a backbone model termed Spider U-Net, which is a modified version of U-Net that adds a long short-term memory network. The vessel segmentation performance of Spider U-Net outperformed that of U-Net.12 A multitask learning method based on Spider U-Net demonstrated that the detection performance of steno-occlusive lesions while extracting blood vessels was higher than that while detecting steno-occlusive lesions without extracting blood vessels.10 The model used a training set from SNUBH similar to that used previously and demonstrated superior overall detection performance compared with other models.10 Details of the model have been published elsewhere.10,12 Code details can be accessed through the following link: https://github.com/djchoi1742/MRA_ICAD.
Representative image of our proposed model. A, Initial TOF-MRA shows multiple steno-occlusive lesions, including left middle cerebral artery occlusion. B, Our proposed AI model identified steno-occlusive lesions, shown as red and yellow markings on the heat map.
Image Interpretation and Observer Performance Study
A neuroradiologist (L.S., with 13 years of clinical experience), a board-certified radiologist (H.L., with 5 years of clinical experience), and a neurosurgeon (T.K., with 13 years of clinical experience) thoroughly reviewed all TOF-MRA examinations and accessible DSA studies, establishing a reference standard by consensus for the number and location of steno-occlusive lesions. The reference standards for steno-occlusive lesion detection were confined to the distal ICA, the A1-2 segments of the anterior cerebral artery, the M1-2 segments of the MCA, the P1-2 segments of the posterior cerebral artery, the V4 segments of the vertebral artery, and the basilar artery.
Five radiologists, including 2 neuroradiologists (S.H.B. and S.J.C., with 10 and 9 years of experience in neuroradiology, respectively) and 3 radiology residents (J.H.J., H.S.C., and H.U.C., with 4, 3, and 3 years of clinical experience, respectively), participated as observers. The radiologists who determined the reference standard did not participate. Each reviewer conducted 2 separate assessments of all TOF-MRA studies (n = 138) across 2 sessions. The studies were randomly divided into 2 blocks (Block A and Block B, each containing 69 studies). During the first session, studies in Block A were reviewed with AI assistance, and those in Block B were reviewed without AI. To mitigate bias and memory recall effects, a compulsory 4-week washout period was placed between the sessions. After this interval, the review conditions were swapped: During the second session, studies initially reviewed with AI (Block A) were assessed without AI, and vice versa for Block B (Fig 2). The reviewers were blinded to the patient information and reference standards. They were instructed to assess whether there is stenosis or occlusion in the intracranial artery by using a commercial PACS. Upon conducting an AI-assisted review, the heat map generated by the AI model was superimposed onto the MIP TOF-MRA images. The heat map images were reviewed using an additional viewer (Windows Photo Viewer, Microsoft). The participants were instructed to mark a stenotic or occlusive lesion by recording the coordinates of the central portion of the lesion. The confidence rating for each lesion ranged from 1 (very uncertain) to 5 (absolutely certain). The overall reading time was recorded for each case. During the review, they were encouraged to evaluate both the source and MIP TOF-MRA images.
A diagram illustrating the observer performance study. Each observer conducted 2 separate TOF-MRA reviews across 2 reading sessions: 1 without AI and another with AI. There was a washout period of 4 weeks or longer between these sessions.
The observers’ performances were stratified into neuroradiologist and radiology resident groups and were combined eventually. An overlap between the lesions marked by the observer (indicated by the coordinates) and reference standard (annotated across the entire length) was classified as a true-positive finding; otherwise, it was classified as a false-positive finding.
Statistical Analysis
We performed a jackknife free-response receiver operating characteristic (JAFROC) analysis to evaluate the reviewers’ localization performance on a per-lesion basis.13 The area under the JAFROC curve (AUFROC) indicates the probability that the lesion rating marked in the diseased case is greater than the highest rating in the healthy case. We calculated the AUFROC according to AI use and performed a comparison test by using the Dorfman-Berbaum-Metz method.14 Comparisons based on AI use were calculated as fixed-reader random cases; for pooled reviewers, they were calculated as random-reader random cases. We computed the area under the receiver operating characteristic curve (AUC) to measure the performance on a per-patient basis. We calculated the average of the reviewers’ ratings per patient. Similar to JAFROC, we performed a comparison test of the AUC by using an identical method for each reviewer and pooled reviewers.
We calculated the sensitivity and specificity to measure the diagnostic accuracy of the reviewers. The sensitivity was calculated for each lesion and per patient. Sensitivity per lesion was calculated under a cutoff of the sum of the 1 – false-positive fraction, and the lesion localization fraction was maximized. The sensitivity per patient was calculated by using an optimal cutoff to maximize the Youden J statistic.15 Specificity was calculated for each patient.
We used the intraclass correlation coefficient (ICC)16 to measure interobserver agreement among the reviewers. The ICC values <0.5, 0.5–0.75, 0.75–0.9, or ≥0.9 indicated poor, moderate, good, or excellent agreement, respectively.17 We calculated the ICC based on AI use. All statistical analyses were performed by using the R statistical software version 3.6.3 (R Foundation for Statistical Computing). Statistical significance was set at P < .05.
RESULTS
Patients
Table 1 summarizes the demographic and lesion characteristics of the study cohort. The median age was 58 years (range, 28–84 years), and the male-to-female ratio was 1:1. Sixty patients in the stenosis group had 115 steno-occlusive lesions (1.92 lesions per patient). In the internal test set, steno-occlusive lesions were caused by atherosclerosis (n = 26), Moyamoya disease (n = 3), and dissection (n = 1), and in the external test set, by atherosclerosis (n = 27) and Moyamoya disease (n = 3). The patients predominantly had a single lesion, accounting for 55% of the cases (33/60). The lesion distribution did not differ between the right and left sides (51 versus 61, P = .345). We recorded 91 (79.1%) and 24 (20.9%) lesions in the anterior and posterior circulations, respectively. In the anterior circulation, they were predominantly located in the middle cerebral artery, accounting for 53.8% (49/91) of the total lesions. No significant difference was observed in the patient ratios of 1.5T (n = 12) compared with 3T (n = 48) between the 2 test sets (P = .748).
Patient demographics and lesion characteristics
Observer Performance Assessments
Table 2 summarizes results of the per-lesion and per-patient analyses. In the per-lesion analysis, AI-assisted review exhibited a higher pooled AUFROC (0.76; 95% CI, 0.67–0.85) than non-AI-assisted review (0.70; 95% CI: 0.56–0.81; P = .027). The pooled AUFROC for all residents increased from 0.68 (95% CI: 0.56–0.77) to 0.76 (95% CI: 0.67–0.85; P = .002), whereas that for all neuroradiologists did not demonstrate a statistically significant increase. In terms of per-patient analysis, the AUC for all reviewers improved marginally but did not reach statistical significance. Figure 3 illustrates the pooled JAFROC curves for all reviewers, neuroradiologists, and radiology residents.
Pooled JAFROC curves of all reviewers (A), neuroradiologists (B), and radiology residents (C). With AI, the AUFROC for all reviewers improved considerably from 0.70 to 0.76 (P = .027). Similarly, the use of AI improved AUFROC for radiology residents from 0.68 to 0.76 (P = .002). The AUFROC for neuroradiologists did not show statistical difference between the results without and with AI.
Diagnostic performance of reviewers
Table 3 presents the results in the internal (SNUBH) and external (SSMH) test sets. Notably, all resident groups in both test sets exhibit significant differences between AUFROCs with and without AI. From a lesion-based perspective, the sensitivity of the 4 reviewers (Reviewers 2, 3, 4, and 5) improved upon by using AI (Online Supplemental Data). However, we observed a marginal reduction in the sensitivity reported by Reviewer 1 (with AI, 80.9% versus without AI, 80.0%). The Online Supplemental Data present a comparison of the lesion detection sensitivity with and without AI, based on the lesion number and circulation type. Overall, AI use led to a marginal improvement in sensitivity across the subgroups. Notably, with AI assistance, an increased number of lesions correlated with a greater increase in sensitivity. For example, in cases with more than 2 lesions, the sensitivity improved from 62.8% to 73.7% with AI. Similar trends were observed in both the anterior and posterior circulation groups. The interobserver agreement among the reviewers was increased from 0.734 (95% CI: 0.670–0.792) without AI to 0.752 (95% CI: 0.693–0.805) with AI (Online Supplemental Data).
Diagnostic performance in the internal (SNUBH) and external (SSMH) test sets
The average reading time for all 5 reviewers was longer in AI readings than in non-AI readings (71.8 ± 37.0 seconds with AI versus 63.5 ± 31.7 seconds without AI, P = .044). Specifically, AI use increased the reading time for neuroradiologists (45.4 ± 19.5 seconds with AI versus 37.6 ± 16.5 seconds without AI, P < .001). However, the reading time for radiology residents did not demonstrate a statistically significant difference (89.5 ± 49.9 seconds with AI versus 80.7 ± 43.0 seconds without AI, P = .118; Table 4).
Results of reviewer performance test (reading time)
DISCUSSION
In this study, we assessed the impact of AI on observer performance in detecting steno-occlusive lesions in the intracranial arteries by using data from 2 separate institutions. We found that the pooled AUFROC for the 5 radiologists demonstrated improvement, increasing from 0.70 without AI to 0.76 with AI (P = .027). This improvement was particularly pronounced among radiology residents (AUFROC improved from 0.675 to 0.763, P = .002). For the neuroradiologists, there was a trend toward improvement with the AUFROC increasing from 0.726 to 0.750; however, this change was not statistically significant (P > .05). The average reading time of the 5 reviewers was slightly longer when using AI assistance than that without AI. However, this difference was not statistically significant when the analysis was limited to radiology residents.
The Stroke Outcomes and Neuroimaging of Intracranial Atherosclerosis trial suggested that TOF-MRA for evaluating intracranial artery stenosis has a relatively lower positive predictive value than DSA.18 In line with a previous study, our findings demonstrated low sensitivity of observers without AI assistance in detecting intracranial stenosis. DSA is the criterion standard for evaluating intracranial artery stenosis; however, it poses risks of radiation, nephrotoxicity caused by iodinated contrast agents, and thromboembolic complications. Therefore, the accurate and reliable assessment of intracranial steno-occlusive lesions by using noninvasive angiography is essential. Our results suggest that AI use can improve the accuracy of detecting intracranial steno-occlusive lesions by radiologists upon evaluating TOF-MRA.
The performance of the radiology residents considerably improved with the use of AI. The AUFROC calculated with AI assistance by radiology residents was comparable to that of neuroradiologists. Moreover, unlike that of the neuroradiologists, the reading time for radiology residents did not extend between the reading sessions with and without the use of AI. Specifically, their average reading time for the healthy group remained unchanged, while for the stenosis group, they allocated more time when using AI. In contrast, neuroradiologists experienced increased reading times with AI for both stenosis and healthy groups. Thus, the use of AI assistance has greater utility for relatively less experienced observers, such as radiology residents or specialists from other fields, highlighting its utility in supporting diagnostic accuracy without compromising efficiency.
Radiologists encounter diagnostic errors that include visual perception and cognitive errors.17,19 The “satisfaction of search” is a prevalent cognitive error, which signifies halting visual exploration upon identifying an initial abnormality during image interpretation.19 Without AI, the sensitivity for detecting steno-occlusive lesions decreased with increased number of lesions. However, AI use helped maintain the sensitivity, despite statistical insignificance (Online Supplemental Data). Integrating AI can enhance the radiologists’ performance by sustaining vigilance.
Furthermore, we observed an improvement in the interobserver agreement among the reviewers, increasing from a moderate to a good level (ICC increased from 0.734 to 0.752). This improvement could potentially elevate the level of consensus between radiologists, thereby alleviating the interobserver variability associated with MRA in determining intracranial stenosis. Similarly, Lin et al20 demonstrated that AI use not only enhances accuracy but also reduces the interobserver variability in delineating nasopharyngeal carcinoma for radiation therapy.
In our previous study, we validated the stand-alone performance of our algorithm exclusively in patients with steno-occlusive lesions, achieving an AUC of up to 0.874 and an AUFROC of up to 0.855.10 To assess the clinical utility of our model, we conducted this observer performance study by using a study cohort including both healthy individuals and patients with steno-occlusive lesions from 2 different institutions. Despite a slightly lower performance on the external test set compared with the internal test set, the substantial improvement in residents’ performance on both test sets underscores the generalizability of our AI model. In accordance with the recently published guidelines, the evidence of this study can be classified as Level 5A, signifying a retrospective study that integrates internal and external data for the purpose of concluding performance assessment.21,22
Our study has several limitations. First, only patients in the SNUBH stenosis group underwent confirmatory DSA. The reference standard assessment for participants in the healthy group from SNUBH and all participants from SSMH was based on expert consensus. The small caliber of the intracranial arteries and the limited spatial resolution of TOF-MRA may have under- or overestimated the stenosis degree. However, the observers were instructed to grade each stenotic lesion by using a 5-point Likert scale consistently. Thus, JAFROC analyses could mitigate potential calibration issues, where the optimal threshold was applied individually. Second, we measured the reading time to simulate a clinical reading session; however, the need for an additional viewer to detect the AI suggestions may have introduced a bias in accurate time measurement. The observed increase in reading time could be a natural consequence of the additional steps required. These aspects warrant future studies by using software implemented into the in-hospital PACS software to eliminate the need for a separate viewer and more accurately reflect the impact of reading time. Third, the scope of our model was confined to the distal ICA, the A1-2 segments of the anterior cerebral artery, the M1-2 segments of the MCA, the P1-2 segments of the posterior cerebral artery, the V4 segments of the vertebral artery, and the basilar artery. This focus was necessitated by the inherent spatial resolution limitations of TOF-MRA. In addition, the relatively small sample size limits the depth of subgroup analyses, such as diagnostic performance according to vascular subsegments. Consequently, this constraint may limit the utility of our model in detecting steno-occlusive lesions in the distal branches of intracranial arteries. A future study with a larger patient cohort may be needed to enable detailed subgroup analyses.
CONCLUSIONS
Our study suggests that the proposed AI model offers a supportive tool for radiologists, potentially enhancing the accuracy of detecting intracranial steno-occlusive lesions on TOF-MRA. Although the value for neuroradiologists may be limited, less experienced readers may benefit from this model.
Footnotes
Hunjong Lim and Dongjun Choi contributed equally to this work as co-first authors.
This research was funded by the SNUBH Research Fund (Grant No. 14-2023-0014).
Disclosure forms provided by the authors are available with the full text and PDF of this article at www.ajnr.org.
References
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- Received January 23, 2024.
- Accepted after revision April 21, 2024.
- © 2024 by American Journal of Neuroradiology