Graphical Abstract
Abstract
BACKGROUND AND PURPOSE: ASPECTS is a long-standing and well-documented selection criterion for acute ischemic stroke treatment; however, the interpretation of ASPECTS is a challenging and time-consuming task for physicians with notable interobserver variabilities. We conducted a multireader, multicase study in which readers assessed ASPECTS without and with the support of a deep learning (DL)-based algorithm to analyze the impact of the software on clinicians’ performance and interpretation time.
MATERIALS AND METHODS: A total of 200 NCCT scans from 5 clinical sites (27 scanner models, 4 different vendors) were retrospectively collected. The reference standard was established through the consensus of 3 expert neuroradiologists who had access to baseline CTA and CTP data. Subsequently, 8 additional clinicians (4 typical ASPECTS readers and 4 senior neuroradiologists) analyzed the NCCT scans without and with the assistance of CINA-ASPECTS (Avicenna.AI), a DL-based, FDA-cleared, and CE-marked algorithm designed to compute ASPECTS automatically. Differences were evaluated in both performance and interpretation time between the assisted and unassisted assessments.
RESULTS: With software aid, readers demonstrated increased region-based accuracy from 72.4% to 76.5% (P < .05) and increased receiver operating characteristic area under the curve (ROC AUC) from 0.749 to 0.788 (P < .05). Notably, all readers exhibited an improved ROC AUC when utilizing the software. Moreover, the use of the algorithm improved the score-based interobserver reliability and correlation coefficient of ASPECTS evaluation by 0.222 and 0.087 (P < .0001), respectively. Additionally, the readers’ mean time spent analyzing a case was significantly reduced by 6% (P < .05) when aided by the algorithm.
CONCLUSIONS: With the assistance of the algorithm, readers’ analyses were not only more accurate but also faster. Additionally, the overall ASPECTS evaluation exhibited greater consistency, fewer variabilities, and higher precision compared with the reference standard. This novel tool has the potential to enhance patient selection for appropriate treatment by enabling physicians to deliver accurate and timely diagnoses of acute ischemic stroke.
ABBREVIATIONS:
- AI
- artificial intelligence
- DL
- deep learning
- EIC
- early ischemic changes
- ICC
- intraclass correlation coefficient
- IS
- ischemic stroke
- ROC AUC
- receiver operating characteristic area under the curve
- SD
- standard deviation
SUMMARY
PREVIOUS LITERATURE:
Stroke remains the second leading cause of death globally, with IS being the most common type. ASPECTS helps quantify early ischemic changes on noncontrast CT scans and guides treatment decisions, especially for endovascular therapy. However, ASPECTS interpretation is challenging, with variability depending on experience and other factors. Recent machine learning algorithms aim to improve ASPECTS accuracy and speed, but their clinical impact remains understudied. This multireader, multicase study evaluates the effect of a deep learning–based ASPECTS tool on clinicians’ accuracy and interpretation time in near to real-world settings.
KEY FINDINGS:
A total of 200 stroke cases were analyzed. AI-assisted interpretation of ASPECTS improved readers’ accuracy by 4.1%, interrater reliability (intraclass correlation coefficient of 0.689 versus 0.467), and correlation with the reference standard. AI also reduced interpretation time by 6% (6.7 seconds) and helped accurately guide treatment decisions in 13% of cases.
KNOWLEDGE ADVANCEMENT:
Unlike previous studies, the current results are consistent across various readers, ASPECTS regions, and data subcategories, highlighting the tool’s potential to standardize stroke assessments and accelerate clinical decision-making. When taking into account all the findings together, software assistance has the potential to provide better diagnoses and improve patient outcomes.
Despite significant improvements in primary prevention and treatment in recent decades, stroke remains the second leading cause of death worldwide.1 Ischemic stroke (IS) is the most frequent type of stroke, accounting for 62.4% of all stroke cases worldwide in 2019.2 Ischemic infarcts most commonly arise from occlusion of the proximal large arterial vasculature including the MCA and/or ICA, which account for 10%–46% of all acute IS.3 Small vessel occlusions or lacunar strokes account for 20%–25% of IS cases.3,4 Recent projections suggest that the incidence of IS will continue to increase between 2020 and 2030.5
To improve stroke imaging triage and guide treatment, the ASPECTS was created as a semiquantitative visual grading system to estimate the extent and distribution of early ischemic changes (EIC) on NCCT.6,7 To calculate ASPECTS, the MCA vascular territory is divided into 10 regions and 1 point is subtracted for each region where parenchymal hypoattenuation reflecting EIC is present.7,8 ASPECTS is commonly used for patient selection for endovascular treatment. Patients with ASPECTS ≥6 are prioritized for thrombectomy treatment because it is associated with better patient outcomes and reduced risk of hemorrhagic conversion.9,10
ASPECTS is now a well-documented and widely accepted patient selection criteria for mechanical thrombectomy and an accurate predictor of long-term functional outcomes.11 However, the interpretation of ASPECTS remains challenging and time-consuming, even for stroke experts.12 Intra- and interobserver variability vary greatly with experience, level of training, knowledge of stroke symptoms, and time between onset and imaging.13⇓⇓-16 Poor image quality, motion artifacts, or head tilt may also cause errors.17 Previous studies have shown that clinical experts tend to evaluate ASPECTS with high specificity (between 80.9%–99.0%) but low or moderate sensitivity (between 10.2%–75.0%), leading to overall mixed accuracy performance.14,18⇓⇓-21
Recently, machine learning algorithms have been developed to assist clinicians in the analysis of ASPECTS to provide a more accurate and faster diagnosis. Because of their recent commercialization, there is little literature evaluating the impact of automated ASPECTS algorithms on patients’ clinical outcomes.22 Nevertheless, diverse studies evaluating other machine learning algorithms applied to IS, such as large vessel occlusion detection, demonstrated their positive impact on patient outcomes.23⇓-25 Conversely, several diagnostic studies evaluated the stand-alone performances of ASPECTS algorithms, with accuracies ranging between 66.0%–96%.18⇓⇓-21,26⇓⇓-29 However, few multireader studies assessed the effect of algorithm usage on clinical interpretation accuracy and interobserver agreement.30⇓⇓-33 Furthermore, the effect of machine learning algorithms on interpretation time remains poorly understood. This evaluation is critical as imaging interpretation time and speed of triage in the context of stroke is of utmost importance.8 Given these limitations, we conducted a multireader, multicase study in which readers graded ASPECTS without and with the assistance of a CE-marked and FDA-cleared automated algorithm, with the overall primary objective of evaluating the effect of algorithm usage on clinicians’ accuracy, interobserver variabilities, and mean interpretation time per NCCT scan. We aimed to reproduce realistic clinical routine settings and compare clinicians’ ASPECTS assessments with and without the assistance of deep learning (DL)-based software on real-world clinical data.
MATERIALS AND METHODS
Data Collection
We retrospectively collected images from acute IS code patients with suspected MCA and/or ICA occlusion from 5 different external clinical sources (4 from the United States acquired between June 2018 and June 2022, and 1 from France acquired between January 2020 and December 2022). A waiver of consent was obtained from the Western Institutional Review Board for all cases. Informed consent for participation was not required for this study in accordance with the national legislation and institutional requirements. Inclusion criteria were patients more than 21 years old who underwent baseline NCCT, CTA, and CTP for acute stroke diagnosis. Time between baseline NCCT and CTP was required to be less than 1 hour, and time from stroke onset/last known well to baseline CT scan (NCCT and CTA) was less than 12 hours. Either an ICA and/or MCA occlusion was visually confirmed on source CTA images for all included cases by the US board-certified expert neuroradiologists who established the reference standard. In fact, ASPECTS is calculated based on the MCA regions, but nowadays, several authors also use ASPECTS to quantify the extent of ICA occlusions.34⇓-36 Moreover, patients with diffuse parenchymal abnormalities precluding evaluation of ASPECTS were excluded from analysis by the experts (eg, intracranial hemorrhage and/or bilateral IS, large craniotomy with brain herniation, severe image artifacts impeding the CT interpretation). Images were acquired on 27 different scanner models (11 from GE Healthcare, 3 from Philips Healthcare, 10 from Siemens Healthineers, and 3 from Canon Medical Systems) with a slice thickness ≤2.5 mm.
Reference Standard
Two US board-certified expert neuroradiologists with more than 7- and 28-year-experience proceeded with the visual assessment to determine the infarcted ASPECTS regions on the NCCT series to establish the reference standard. Baseline CTA and CTP were additionally provided to assist experts in the ASPECTS analysis. In cases of discrepancy between the 2 primary expert reviewers, a third expert US board-certified neuroradiologist with 10 years of experience was recruited to establish the reference standard by majority agreement.
First, the experts confirmed the presence of a unilateral occlusion within MCA and/or ICA based on the CTA. Second, the laterality of the infarct was determined (left or right brain hemisphere). Finally, the presence or absence of EIC in each of the 10 ASPECTS regions was defined within the infarcted hemisphere for ASPECTS characterization. Baseline NCCT series, in conjunction with the CTP hemodynamic maps, were used by the experts to decide on the presence or absence of EIC in each region. In fact, CTP can be more sensitive for identifying lesions than NCCT; however, there can also be discrepancies between both images due to the evolution and growth of infarct core between the time of NCCT and CTP. Hence, for each case, the time between baseline NCCT and CTP acquisitions was provided to the expert to guide their decisions.
DL-Based Tool
The impact of CINA-ASPECTS (Avicenna.AI) on physicians’ interpretations was evaluated in this study. The algorithm is implemented as a series of convolutional neural network DL models. The application is composed of a hybrid 3D/2D UNet network with a regression loss function and a 4-stage 2D UNet network for anatomic localization.31,38 First, 3D reorientation and tilt correction are applied to create a uniform standardized field-of-view. Next, a landmark-based DL nonlinear registration algorithm is used to identify the ASPECTS regions within the brain. Finally, a separate DL model estimates the degree of EIC as a probability map throughout the brain regions. Based on the outputs of the previous models, a final algorithm calculates a composite ASPECT score.
Visually, the trained algorithm is designed to produce a heat map that may be overlaid on the NCCT series with the 20 ASPECTS regions outlined in either green (negative for EIC) or red (positive for EIC). The algorithm also displays a table summarizing the average Hounsfield units in each ASPECTS region within the areas of infarct and the final ASPECT score (Fig 1). The algorithm interface allows the user to modify results and requires expert confirmation before archiving in PACS.
Example of the DL-based algorithm outputs. In this case, the user confirmed an occlusion on the right side, and the algorithm detected IC, L, I, M2–M6 regions with EIC, leading to an ASPECT score of 2.
For the training phase, 1575 patient examinations were used to develop the 3D reorientation and tilt correction algorithm. Next, 522 patient examinations were used to develop both the landmark-based DL algorithm to identify ASPECTS regions within the brain and the algorithm for estimating ischemic change probability. Training data were aggregated from several US clinical centers distributed across all major scanner vendors (Canon Medical Systems, GE Healthcare, Philips Healthcare, Siemens Healthineers), patient age, slice thickness, and kVp. The testing phase was performed on 139 patient examinations, yielding a region-based sensitivity of 76.6% (95% CI: 72.4%–81.1%), specificity of 88.7% (95% CI: 87.4%–89.9%), and receiver operating characteristic area under the curve (ROC AUC) of 0.826 (P < .0001).39 Further detailed technical information about design, training, and testing of this commercially available algorithm is not disclosed publicly.
Multireader, Multicase Study
A retrospective, concurrent, crossover, fully-crossed, multireader, multicase study (level 4 of DL evidence) was conducted to evaluate the impact of the DL-based tool on readers’ assessments with respect to 3 objectives:
The readers’ region-based accuracy and ROC AUC against the reference standard,
The score-based interobserver variability and linear correlation with the reference standard,
The interpretation time per NCCT scan.
Eight additional readers, different from the ones who established the reference standard, were involved in the multireader, multicase study. Four readers are typical readers who see stroke patients regularly in their practice (reader 1 is a neurointensivist with 12 years of experience, reader 2 is a vascular neurologist with 8 years of experience, reader 3 is a stroke neurologist with 5 years of experience, and reader 4 is a general radiologist with 13 years of experience) and 4 readers are expert senior neuroradiologists with 6, 8, 9, and 12 years of experience, respectively (readers 5–8).
The readers analyzed each NCCT scan twice, once without CINA-ASPECTS (unaided arm) and once with the aid of CINA-ASPECTS (aided arm). In the first reading session, one-half of the NCCTs were randomly selected for analysis without the software, while the remaining one-half were selected for analysis with the software. In the second reading session, which occurred after a 4-week washout designed to limit potential recall bias, each reader reviewed the same images but with software usage (unaided versus aided) reversed. All images were presented in random order during both sessions for each reader. After completion of both sessions, all readers analyzed each case twice in a random order, both with and without software assistance.
For each case, the infarcted side previously defined by the reference standard was communicated to the readers. Thus, during each session, readers were asked to grade the presence or absence of EIC in each of the 10 ASPECTS regions only within the infarcted hemisphere by using the following 6-point ordinal scale: definitely not infarcted, 1; probably not infarcted, 2; possibly not infarcted, 3; possibly infarcted, 4; probably infarcted, 5; and definitely infarcted, 6. In addition, the time from initial scan review to final ASPECTS diagnosis (interpretation time) was recorded for each case across all readers during both sessions.
Statistical Analysis
An initial sample size calculation was carried out with nQuery (v8.7.2.0, Dotmatics) by using a 1-way repeated measures ANOVA. To obtain a statistically significant difference between each aided and unaided accuracy, at least 200 matched pairs were determined to be needed, assuming a statistical significance of α = .05, power of 1 – β = 0.80, standard deviation (SD) of 25%, and r = 0.50.
Interrater agreement for the first 2 experts who established the reference standard was calculated by using Cohen κ.40 For the reader study evaluation, first, we calculated the impact of the algorithm on readers’ region-based accuracy (aided versus unaided) based on the percent of ASPECTS regions matching the reference standard. For this analysis, a threshold of >3 was used to binarize reader scores as positive for EIC (possibly infarcted, 4; probably infarcted, 5; and definitely infarcted, 6 were considered as positive assessments). Furthermore, to account for the potential dependence between ASPECTS regions within the same case (10 regions within a brain hemisphere are potentially correlated), bootstrap methodology was used to estimate 95% CI.41 Statistically significant differences between aided and unaided accuracy were evaluated with the McNemar test for difference of paired proportions.
Second, a region-based ROC AUC analysis was performed by using the readers’ 6-point ordinal rating scale following the Obuchowski-Rockette ANOVA method for factorial study design combined with bootstrapping methodology for covariance estimation.42,43 The analysis was conducted by using the MRMCaov R package.44
Next, for score-based analysis, the intraclass correlation coefficient (ICC) between readers was computed for both arms. The absolute agreement was assessed by using a 2-way mixed-effects model to test interrater reliability.45 Furthermore, a regression analysis comparing the readers’ ASPECT score for both arms versus the reference standard was performed, and the Pearson correlation coefficient was calculated. In addition, a dichotomized analysis using the endovascular selection cutoff point of ASPECTS ≥6 was performed. The ASPECTS values attributed by the readers and the reference standard were dichotomized in scores ≥6, and Cohen κ was calculated for both arms. We also evaluated the percentage of cases in which the cutoff point classification improved with software assistance. A paired test for comparison of correlation coefficients was used to calculate statistical significance.
Finally, differences in average interpretation time per NCCT scan (aided versus unaided) were evaluated by using a mixed-effects repeated measures model. These analyses were conducted with MedCalc (v20.015, MedCalc Software).
RESULTS
A total of 226 cases met the inclusion criteria; 149 cases were provided by 4 different US clinical sources and 77 by 1 French clinical source. After initial review by the US board-certified expert neuroradiologists who established the reference standard, 26 cases were excluded because of presence of intracranial hemorrhage (n = 1), absence of ICA or MCA occlusion (n = 24), and image artifact degradation (n = 1). Finally, 200 cases (133 from the US and 67 from France) were included for analysis spanning 2000 ASPECTS regions. The final cohort demonstrated a mean age of 70.2 ± 14.6 [SD] years old, a 44.5% distribution of women, and a mean time from stroke onset to NCCT of 3.9 ± 2.9 [SD] hours. There were no missing data.
The first US board-certified neuroradiologist assessed 791/2000 regions as being positive, whereas the second US board-certified neuroradiologist defined 725/2000 regions as being positive. Disagreements were observed between both operators for 328/2000 (16.4%) regions, yielding a moderate interrater agreement of 0.65 [95% CI: 0.62–0.69] according to Cohen κ.40 After consensus, median ASPECTS was 6 and 39.6% of regions were defined with EIC.
Region-Based Analysis
The overall readers’ region-based accuracy in the aided arm was higher than in the unaided arm: 76.5% (95% CI: 75.8%–77.1%) versus 72.4% (95% CI: 71.6%–73.0%). The difference between the aided and unaided arm was 4.1% (95% CI: 3.3%–4.9%) and statistically significant (P < .05). Stand-alone software accuracy was 76.1% (95% CI: 74.3%–78.0%). Additional subgroup analyses based on ASPECTS grouped regions, readers’ expertise, and scanner manufacturers are shown in Table 1. One example of a typical reader’s aided and unaided assessment is shown in Fig 2.
Example of a typical reader’s aided and unaided assessment. A and B, Raw NCCT images. C and D, AI-based outputs. Expert consensus defined an ASPECTS of 4 with EIC present in the insula and M2–M6. The software detected an ASPECT score of 5 with EIC suspected in the M2–M6. Initially, without software assistance, the reader identifies EIC within M4 and M5 regions (ASPECTS of 8), leading to an accuracy of 60%. When assisted by software, the reader identifies EIC within the insula, M2, M5, M5 and M6 (ASPECTS of 5), leading to an accuracy of 90%.
Mean readers’ aided and unaided accuracy (95% CI) for each subgroup. Sample size (n) is specified for each category
Comparison of the overall ROC AUC with and without the support of the artificial intelligence (AI) tool yielded a statistically significant improvement of 0.039 (95% CI: 0.019–0.059, P < .05), from 0.749 (95% CI: 0.712–0.785) in the unaided arm to 0.788 (95% CI: 0.762–0.814) in the aided arm. Stand-alone software ROC AUC was 0.751 (95% CI: 0.728–0.773). Analysis stratified by an individual reader showed an increase in the ROC AUC for all users when assisted by software, as shown in Fig 3.
Per reader ROC AUC for the aided (blue line) and unaided (red line) arm. Readers 1–4 are typical readers and readers 5–8 are senior neuroradiologists.
Score-Based Analysis
Score-based analyses focus only on global ASPECT score assessments without considering the specific ASPECTS regions with EIC. ICC was used to measure interreader agreement across individual ASPECT scores. A poor ICC (0.467) was observed in the unaided arm, whereas a moderate ICC (0.689) was observed in the aided arm (P value < .0001), suggesting that interrater reliability was significantly improved when using the AI tool.
In addition, the Pearson correlation coefficient was computed to evaluate the linear relationship between the readers’ ASPECT score and the reference standard. The coefficient in the aided arm (r = 0.674) was statistically higher than in the unaided arm (r = 0.587, P value < .0001), indicating that scores were significantly more correlated with the reference standard when assisted by the device. All the results are shown in Table 2.
Score-based analyses (95% CI) for unaided and aided arms
Regarding the dichotomized analysis, Cohen κ was computed for each arm by dichotomizing scores according to ASPECTS ≥6. For the unaided arm, the value was 0.476, whereas for the aided arm the value was 0.523 (P value = .0766). On average, for 13% (26/200) of cases, the software helped readers to distinguish accurately between 0–5 scores and 6–10 scores. In other words, with software assistance, the readers were able to make an accurate decision on whether an endovascular treatment should be initiated or not for 26 patients, whereas without software assistance, the indicated treatment decision was incorrect. Among the 26 cases, 10 had an ASPECTS between 0 and 5, and 16 cases had an ASPECTS ≥6.
Interpretation Time Analysis
The mean interpretation time per case among all readers was significantly reduced when assisted by software. In the aided arm the mean was 108.5 seconds (95% CI: 103.3–114.8), whereas in the unaided arm, the mean was 115.2 seconds (95% CI: 110.7–118.7). This led to a statistically significant difference of −6.7 seconds (95% CI: −13.2–0.1, P < .05), representing a 6% reduction in interpretation time.
DISCUSSION
This study offers a thorough characterization of the effect of a DL-based tool in ASPECTS assessments conducted by stroke clinicians on external data. The results demonstrate a significant enhancement in both region-based and score-based performance, alongside reductions in per-case interpretation time. Conducted across multiple centers, scanners, and countries, this reader study benefits from a diverse data set encompassing various imaging parameters and patient profiles. Moreover, it engaged a panel of readers representing many of the subspecialties and expertise levels encountered in real-world stroke practice. Therefore, these results suggest robust generalization across a wide spectrum of real-world scenarios.
When assisted by the software, the region-based performance of all readers was significantly increased by an accuracy of 4.1% and ROC AUC of 0.039. Notably, the improvement in accuracy was statistically significant among all subgroups, indicating that the algorithm provides meaningful assistance regardless of the anatomic location of EIC or scanner used for the CT acquisition. Furthermore, despite variation in expertise and subspeciality, all 8 readers exhibited an increase in ROC AUC, demonstrating that the DL-based tool yields improved performance across various clinical experts. Importantly, the software’s user interface did not distract any reader; these latter affirmed that their user experience was seamless and intuitive.
A previous reader study evaluating a different automated tool for ASPECTS reported an overall improvement in accuracy of 4.2% for a panel of 8 readers and 50 CT scans; however, the performance of 2 expert readers was not improved when assisted by the software.31 Another study with 16 readers evaluating a different tool reported an overall improvement of 5.1% on 202 CT scans, but demonstrated no global improvement in 2 of the cortical ASPECTS regions.30 Similarly, other authors observed a statistically significant difference in ROC AUC of 0.02 but a very small and not significant difference in accuracy (1%) for a cohort of 54 cases assessed by 10 readers.32 By contrast, our results demonstrate a consistent improvement in performance among all types of readers, ASPECTS regions, and several data subcategories. Moreover, a 4.2% increase in performance corresponds to almost one-half an ASPECTS region per scan, which might potentially shift endovascular treatment choice toward the right decision.
From a clinical decision-making perspective, the final global ASPECT score may be more relevant than the exact anatomic distribution of EIC, as the overall composite score is used for patient selection and therapeutic triage. Indeed, high intra- and interobserver variation has been reported for conventional ASPECTS assessments, indicating that reproducibility and repeatability remain challenging for clinicians.13⇓⇓-16 In this study, both the score-based ICC and Pearson correlation coefficient were statistically higher in the aided arm, indicating that readers agreed more with each other (better interrater reliability) and that ASPECTS evaluation more closely aligned with expert consensus when assisted by the software. A similar result for increased ICC from software assistance was presented by Brinjinki et al,30 who reported an increase from fair (0.48) to good agreement (0.68, P value < .01) with aided interpretation. Regarding the readers’ ASPECT scores correlation with the reference standard, a similar study observed a significant increase in the weighted κ for 3 out of 5 readers.33 Indeed, since ASPECTS presents several interrater discrepancies, all these results suggest that an improvement in both readers’ accuracy and interrater agreement may enhance clinicians’ confidence, leading to more consistent and reliable decisions.
On the other hand, clinical decision-making by using the endovascular cutoff point was also improved in the aided arm. Even if the difference in Cohen κ was not statistically significant, there were still 13% of patients who could have benefited from a better treatment decision with software assistance. Notably, more than one-half of these patients (62%) had ASPECTS ≥6, indicating that without the help of the software, they might have been inappropriately prioritized for thrombectomy treatment. Similar results were observed by Lambert et al,33 who did identify an improvement in Cohen κ values for readers’ dichotomized analyses in the aided arm, but the difference was not statistically significant. Hence, our results suggest that automated tools have the ability to reduce discrepancies, improve reliability, and yield more objective criteria for patient selection.
In the context of IS, the notion that “time is brain” arises from the fact that any delay in intervention yields serious clinical consequences.46 For every second of untreated acute stroke, a patient loses nearly 32,000 neurons, 230 million synapses, 200 m of myelinated fibers, and the equivalent of 1.7 hours of healthy life.46⇓-48 Prompt treatment is, therefore, essential for good patient outcomes. This study demonstrates that the adjunctive use of the software leads to a statistically significant reduction in interpretation time. Even though 7 seconds are relatively short compared with the total door-in door-out cycle based on “time is brain” quantification, the current estimated gain of time may provide an additional one-half of a day (11.4 hours) of healthy life. To the best of our knowledge, this is the first study analyzing the direct impact of software assistance on the speed of ASPECTS evaluation. Since acute IS treatment is highly time-sensitive, even this small improvement in interpretation time can enhance the overall efficiency of the clinical workflow, allowing clinicians to manage more patients effectively, particularly in busy stroke centers.
Our study presents several limitations. First, all cases were collected retrospectively, which led to a potential bias. Second, though this study utilized CTP and CTA to increase the objectivity of expert consensus, a more accurate reference standard may be obtained by DWI MRI acquired immediately after NCCT, which, because of the infrequency of its utilization, would have severely limited the inclusion criteria for data selection. Third, although a total of 8 readers participated in this study, inclusion of a larger reader cohort would help to improve the generalizability of findings. Finally, the study tried to reproduce as much as possible the clinical conditions of ASPECTS assessments; however, future prospective studies are needed to confirm the downstream impact on patient outcomes.
CONCLUSIONS
This study demonstrated that readers’ analyses were not only more accurate but faster with the help of the algorithm. Furthermore, software score-based assisted interpretation yielded overall increased interreader consistency with less individual variability and improved correlation with expert consensus. When taking into account all these findings together, software assistance has the potential to provide better diagnoses and improve patient outcomes. Importantly, the value of this technology lies not only in its ability to compute ASPECTS accurately but also to empower users to interpret the ASPECTS regions heat map, analyze results, and determine the final ASPECT score based on their own clinical judgment and expertise. Indeed, multidisciplinary neuroradiologic and neurologic expertise will always be required for IS diagnosis; however, DL-based algorithms may facilitate decision-making, early treatment, and ultimately improved patient outcomes.
Acknowledgments
The authors thank all the 8 readers who participated in the study and the clinical centers that provided the data. They also thank Laurent Turek for his assistance with data processing.
Footnotes
Disclosure forms provided by the authors are available with the full text and PDF of this article at www.ajnr.org.
References
- Received May 30, 2024.
- Accepted after revision September 4, 2024.
- © 2025 by American Journal of Neuroradiology