Abstract
BACKGROUND AND PURPOSE: Stereotactic radiosurgery is a key treatment modality for cerebral AVMs, particularly for small lesions and those located in eloquent brain regions. Predicting obliteration remains challenging due to evolving treatment paradigms and complex AVM presentations. With digital subtraction angiography (DSA) being the gold standard for outcome evaluation, radiomic approaches offer potential for more objective and detailed analysis. We aimed to develop machine learning modeling using DSA quantitative features for post-SRS obliteration prediction.
MATERIALS AND METHODS: A prospective registry of patients with cerebral AVMs was screened to include patients with digital prestereotactic radiosurgery DSA. Anterior-posterior and lateral views were retrieved and manually segmented. Quantitative features were computed from the lesion ROI. Following feature selection, machine learning models were developed to predict unsuccessful 2-year total obliteration using processed radiomics features in comparison with clinical and radiosurgical features. When we evaluated through area under the receiver operating characteristic curve (AUROC), accuracy, area under the precision-recall curve F1, recall, and precision, the best performing model predictions on the test set were interpreted using the Shapley additive explanations approach.
RESULTS: DSA images of 100 included patients were retrieved and analyzed. The best-performing clinical radiosurgical model was a gradient boosting classifier with an AUROC of 68% and a recall of 67%. When we used radiomics variables as input, the AdaBoost classifier had the best evaluation metrics with an AUROC of 79% and a recall of 75%. The most important clinico-radiosurgical features, ranked by model contribution, were lesion volume, patient age, treatment dose rate, the presence of seizure at presentation, and prior resection. The most important ranked radiomics features were the following: gray-level size zone matrix, gray-level nonuniformity, kurtosis, sphericity, skewness, and gray-level dependence matrix dependence nonuniformity.
CONCLUSIONS: The combination of radiomics with machine learning is a promising approach for predicting cerebral AVM obliteration status following stereotactic radiosurgery. DSA could enhance prognostication of stereotactic radiosurgery–treated AVMs due to its high spatial resolution. Model interpretation is essential for building transparent models and establishing clinically valid radiomic signatures.
ABBREVIATIONS:
- AUROC
- area under the receiver operating characteristic curve
- BED
- biologic effective dose
- BOT
- beam-on time
- GLDM
- gray-level dependence matrix
- GLSZM
- gray-level size zone matrix
- ML
- machine learning
- SHAP
- Shapley additive explanations
- SRS
- stereotactic radiosurgery
- TDR
- treatment dose rate
Stereotactic radiosurgery (SRS) is essential in the management of AVMs, the most prevalent vascular malformations of the brain, and it is particularly valuable in small AVMs and the ones juxtaposed to eloquent cortical regions. With a prolonged therapeutic course and expanding SRS applications covering complex AVMs, obliteration rates have shifted recently1,2 making their prediction increasingly challenging. In addition to the important influence dosimetry has on the prognosis of AVMs following SRS,3⇓-5 the duration of intermittent treatment is also tightly connected with ensuring conformal therapy.6⇓-8 The biologic effective dose (BED) measure has shown a notable association with tissue survival,7,8 incorporating both radiation dosage and beam-on time (BOT). BED was found to be predictive of AVM obliteration following single-session SRS.9
Currently, establishing a definitive diagnosis of a brain AVM is typically provided using DSA, which is considered the reference standard for evaluating cerebral AVMs in SRS planning and obliteration follow-up, owing to its high spatiotemporal resolution and accurate reflection of the detailed lesion angioarchitecture.10⇓-12
Radiomics-based applications, which use image processing and quantification of radiologic phenotypic lesion traits, are increasingly being used in the realm of precision medicine for diagnostic and prognostic tool development,13⇓⇓-16 with a potential in biomarker analysis and clinical decision assistance,17⇓-19 overcoming some constraints inherent in subjective visual evaluation.20 These techniques, capable of extracting molecular and pathophysiologic process data often imperceptible to the human eye, offer advantages over subjective visual evaluation.21–22 The approach computes shape and textural information using spatial distribution of signal intensities and pixel interrelations, determined through mathematical formulas, thereby reducing subjective interreader variability and providing a good foundation for interpretable machine learning (ML) applications.23,24
Providing individualized predications of patient prognosis following SRS has valuable potential that is integral to the future of management of neurologic diseases. Numerous classical scoring systems have been developed to help clinicians better anticipate patient outcomes following radiosurgical management of brain AVMs.25⇓⇓⇓⇓⇓⇓-32 In the present study, we aimed to develop a ML predictive approach to model extracted DSA radiomics features for brain AVM obliteration prediction following radiosurgery in comparison with clinical and radiosurgical predictors found in classical established scoring systems.
MATERIALS AND METHODS
Patients
This study was conducted through retrospective examination of an SRS cohort of 527 patients with cerebral AVMs between 1990 and 2014, registered prospectively. The scope was limited to patients who underwent single-session SRS using the Leksell Gamma Knife (EleKta) for sporadic AVMs, having baseline imaging and at least 2 years of angiography or MR imaging follow-up. A summary of the inclusion process is shown in Fig 1.9 In total, of the eligible 352 patients, 100 patients had digitally retrievable angiography images and were included in the final analysis. The patients’ clinical and radiosurgical characteristics were included, with the primary outcome being total AVM obliteration, defined as the lack of flow voids on MR imaging or the absence of aberrant arteriovenous shunting on angiography. A minimum of 2 years for imaging follow-up with either DSA or MR imaging was chosen to reflect the minimum expected time for post-SRS AVM obliteration. The study was approved by the Mayo Clinic institutional review board.
Flow diagram of patient inclusion and exclusion.
Clinical and Radiosurgical Features
Clinical and lesion characteristics included age, female sex, bleeding and/or seizure at presentation, lesion diameter, lesion volume, location, rupture status, prior resection, prior embolization, deep location, size, eloquent location, deep vein drainage, and the Spetzler-Martin grading scale. Radiosurgical features included the following: BED, maximum dose, margin dose, treatment time, treatment dose rate (TDR), isodose, and modified radiosurgery-based AVM score. The feature values were normalized to a range between 0 and 1.
Image Segmentation and Radiomics Feature Extraction
Patients’ baseline DSA series were screened. 2D anterior-posterior and lateral view DSA images corresponding to SRS planning and with a peak arterial phase were selected. Lesion segmentation was performed using 3D Slicer software (http://www.slicer.org)33,34 by an experienced radiologist with the guidance and supervision of an experienced interventional neuroradiologist. For each patient, 2 ROIs of the AVM lesion were delineated excluding the draining vein, from both the anterior-posterior and lateral DSA.
Following the segmentation, AVM radiomics features were computed using pyradiomics, Version 3.0.1,35 with Python, Version 3.8. A total of 200 radiomics features (first-order, shape-based, and higher-order features) were extracted, 100 from each of the anterior-posterior view and lateral views of every patient’s angiogram. The features were also scaled between 0 and 1 to facilitate the algorithm learning process. For dimensionality reduction of the radiomics variables, the maximum relevance minimum redundancy36,37 method was implemented to select a reduced variable number of 20% of the feature space and retain the most important and least collinear information. Maximum relevance minimum redundancy was applied by fitting and transforming the training set and with no fitting and only transforming the test set to avoid any data leakage. An illustrative figure of the radiomics and prediction modeling workflow is demonstrated in Fig 2.
Workflow for DSA radiomics predictive modeling of cerebral AVM obliteration following SRS.
Statistical Analysis
Demographic and clinico-radiosurgical variables were statistically analyzed between the 2 patient outcome groups using SciPy (Version 1.6.2; https://scipy.org/) and Python. A univariate statistical comparison between the patient groups was performed regarding the obliteration outcome. Continuous quantitative variables were assessed given their distribution normality using the Student t test and Wilcoxon rank-sum test. The χ2 test was used to compare the categoric variables. P values < .05 were considered statistically significant.
ML Modeling
ML models were constructed to predict unsuccessful AVM total obliteration. The synthetic minority oversampling technique CURE-SMOTE (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1578-z#abbreviations),38 derivative of the synthetic minority over-sampling technique (SMOTE),39 was deployed to artificially generate minority class data and correct the training set class imbalance. The stratified split of the data set into a training set of 80% and a test set of 20% was applied, after which 10-fold cross-validation was used on the training set. A grid search was used for hyperparameter tuning. The benchmarked models were the following: decision tree, Gaussian Naïve Bayes, multilayer perceptron, K-nearest neighbors, random forest, BaggingClassifier, gradient boosting classifier, and eXtreme Gradient Boosting (XGBoost; https://xgboost.readthedocs.io/en/stable/). Model performance was measured and compared across the ML algorithms.
RESULTS
Patient Population and Data Set
The overall included cohort comprised 352 patients. Follow-up imaging had a median duration of 5.9 years following SRS and revealed obliteration in 259 patients (71.9%), verified by angiography in 176 (70%) patients and MR imaging in 83 (30%) patients, occurring at a median interval of 36 months subsequent to SRS (interquartile range, 26–44 months). A total of 100 patients with cerebral AVMs and 200 images were included. The median patient age was 40.6 (SD, 16) years, with women representing 59% and patients with unsuccessful total obliteration representing 34% of the studied final data set. The selection process of the included cohort is summarized in Fig 1.
Statistical Analysis
Comparison of clinico-radiosurgical variables regarding cerebral AVM obliteration status following SRS is detailed in the Table. Statistical analysis indicated younger patients (P = .02), higher AVM volume (P = .01) and diameter (P = .02), and an increased number of isocenters used (P = .004) were significantly associated with failure of total AVM obliteration. While higher BED (P = .002), elevated maximal dose (P = .002) and margin dose, (P = .002) as well as a lower Spetzler Martin Scale grade (P = .04) had a statistically significant correlation with successful total AVM obliteration following SRS in the cohort.
Variable | All | Total Obliteration | No Total Obliteration | P Value |
---|---|---|---|---|
Age | 43 | 45 (32–56) | 38 (20–47) | .023 |
Sex | .850 | |||
Male | 41 | 28 (42%) | 13 (38%) | |
Female | 59 | 38 (58%) | 21 (62%) | |
Ruptured | .733 | |||
No | 82 | 53 (80%) | 29 (85%) | |
Yes | 18 | 13 (20%) | 5 (15%) | |
Prior resection | .571 | |||
No | 93 | 61 (92%) | 32 (94%) | |
Resection | 4 | 2 (3%) | 2 (6%) | |
Location | .288 | |||
Hemispheric | 91 | 62 (94%) | 29 (85%) | |
Deep | 9 | 4 (6%) | 5 (15%) | |
Eloquent | .363 | |||
No | 37 | 27 (41%) | 10 (29%) | |
Yes | 63 | 39 (59%) | 24 (71%) | |
Deep vein drainage | .219 | |||
No | 57 | 41 (62%) | 16 (47%) | |
Yes | 43 | 25 (38%) | 18 (53%) | |
Isocenters | 6 | 6.0 (4,7) | 7.0 (5,9) | .005 |
Volume (cm3) | 3.8 | 2.9 (1,6) | 6.1 (2,8) | .016 |
Treatment time | 48.24 | 46.805 (32–63) | 50.665 (34–68) | .330 |
TDR | 2.77 | 2.80 (2,3) | 2.74 (2,3) | .346 |
BED | 137 | 148 (115–173) | 119 (104–147) | .003 |
Maximum dose | 40 | 40.0 (36–44) | 36.0 (36–40) | .002 |
Margin dose | 20 | 20.0 (18–22) | 18.0 (18–20) | .002 |
Univariate statistical comparison of the 2 patient groups relating to total obliteration status
ML Prediction and Interpretation
Following feature selection and cross-validation, evaluation of the developed ML models was performed on the test set. By means of the clinico-radiosurgical variables, the best performing model was a gradient boosting classifier with an area under the receiver operating characteristic curve (AUROC) of 68%, recall of 67%, and precision of 71%. Using radiomics variables, Adaptive Boosting (AdaBoost; https://www.machinelearningplus.com/machine-learning/introduction-to-adaboost/#google_vignette) had the best evaluation with an AUROC of 79%, recall of 75% and precision of 71%.
Figure 3 demonstrates the performance and evaluation metrics matrix comparison of predictive models for cerebral AVM obliteration following SRS using only clinical radiosurgical features (Fig 3A) in comparison with radiomics features (Fig 3B). SHAP summary plots (https://shap-lrjball.readthedocs.io/en/latest/generated/shap.summary_plot.html) in Fig 4A, -D depict the importance and impact direction of clinico-radiosurgical and radiomics features, respectively, on model predictions for obliteration failure. The color intensity represents a feature value, with SHAP values indicating the influence on the predictive outcome both in positive and negative correlations with the outcome. Heatmaps in Fig 4B, -E of the Shapley values for each feature across the instances in the test set demonstrate the individual contribution of clinico-radiosurgical and radiomics features to the predictions, with color gradients representing the magnitude of the impact. Bar charts in Fig 4C, -F illustrate the maximal impact of each clinico-radiosurgical and radiomics feature on the outcome prediction of the model, ranked by their importance.
Performance evaluation metrics matrix of predictive models for cerebral AVM obliteration following SRS using only clinical radiosurgical features (A) in comparison with radiomics features (B).
Interpretation of the best-performing clinico-radiosurgical (A–C) and radiomics (D–F) models including SHAP summary plots (A and D) with the colored test instances signifying feature value, SHAP heatmaps (B and E) where the red and blue refer to positive and negative SHAP values, and maximal feature SHAP bar plots (C and F) for the clinico-radiosurgical and radiomics models, respectively.
Global interpretation plots are represented in the Online Supplemental Data for the best-performing radiomics and clinico-radiosurgical models, demonstrating the overall prediction patterns across the test set according to obliteration outcomes. Local explanations for predictions of individual patients’ successful and unsuccessful obliteration outcomes are illustrated in the Online Supplemental Data, showcasing the interplay of features for specific cases.
The ranked most important clinico-radiosurgical features were lesional volume, patient age, TDR, seizure at presentation, and prior resection. The ranked most important radiomics features were gray-level size zone matrix (GLSZM), gray-level nonuniformity, kurtosis, skewness, and gray-level dependence matrix (GLDM) dependence nonuniformity.
DISCUSSION
The study findings highlight the value of combining quantitative morphologic imaging features with ML for predicting post-2-year total obliteration of cerebral AVMs following SRS. Models built using radiomics features achieved better overall performance and higher sensitivity compared with those constructed with classic clinico-radiosurgical variables, and relevant clinical and radiosurgical variables. Model interpretation identified key variables like lesion volume, patient age, TDR, and prior resection as top contributors, validating their significance.
In the constructed radiomics model, GLSZM gray-level nonuniformity was the most important radiomics feature driving prediction, with greater values associated with a higher probability of unfavorable AVM obliteration. GLSZM gray-level nonuniformity informs the connectedness and variability of gray-level intensity values, with a lower value indicating homogeneous intensity. This feature underscores how highly compact nidi have a more favorable chance of total obliteration in post-2-year follow-up. Kurtosis, quantifying ROI intensity distribution with high values implying distribution concentration, is located at the tails rather than the center, inferring prevalence of extreme intensity values, which could also be characteristic of diffuse AVM. This finding is in line with previous studies on the topic.43
In a recent study by Gao et al,44 radiomics models were developed to predict the outcomes of gamma knife radiosurgery for unruptured AVMs. However, unlike our study, which used lateral DSA views to capture radiomics features, Gao et al relied on cross-sectional MR imaging. The use of DSA in our study, recognized as the criterion standard due to its superior spatial resolution, allows a more precise analysis of AVM nidus architecture and could potentially provide more accurate predictive insight than MR-based imaging. This distinction is crucial because DSA provides dynamic vascular information that MRIs typically do not capture, possibly leading to better-tailored treatment plans based on more detailed vascular data.
GLDM describes how several connected pixels within a certain distance are dependent on the intensity of the center pixel. Dependence nonuniformity informs unequal dependence through the ROI, indicating heterogeneous dependencies, associated with unfavorable AVM obliteration outcome.
Skewness corresponds to the asymmetry of intensity value distribution around the mean, and sphericity quantifies the roundness of the ROI relative to a circle. Of note, all Maximum Relevance Minimum Redundancy–selected radiomics features originated from the lateral DSA view.
With results closely consistent with our clinico-radiosurgical model findings, Oermann et al45 developed an ML approach using only clinico-radiosurgical features from a large cohort with a post-2-year obliteration prediction performance of 0.70. Meng et al46 built a radiomics-based ML model to forecast cerebral AVM outcomes post-SRS following partial embolization using MR T2 images of 100 patients. Despite a K-nearest neighbors model AUROC of 0.66, its specificity of 0.44 was lower compared with their leading dosimetry model (AUROC = 0.66, specificity = 0.56). They suggested that the cohort’s prior embolization, possibly causing lesion homogeneity, weakened the prognostic strength of the radiomics models. The model comprised 4 radiomics features: minor-axis-length, total energy, and 2 types of gray-level nonuniformity. Two studies used AVM radiomics for diagnosis with no prediction of outcomes: Jiao et al47 used segmented 3D TOF-MRA images, and Shi et al48 trained a neural network model for temporospatial diagnosis of AVMs from DSA sequences for dichotomized AVM grade classification.
The study design choice for prognostic models of SRS AVM obliteration using DSA, which is the criterion standard with high spatial resolution, as well as the exclusion of patients with prior AVM embolization have allowed highlighting predictive radiomics markers. We believe such models may provide a significant step toward enhanced prediction of AVM obliteration, and with further validation and refinement, they could support clinical decision-making processes. The current study shows the potential of ML and radiomics in automating the assessment of AVM features in a precise quantitative manner with the end goal of validating radiomics signatures for SRS outcome prediction. It also underlines the promise of future prognostic tools in personalized data-centered cerebrovascular care. Future studies should further explore the radiomics association with patient presentation characteristics, such as seizures.
Limitations
Although the sample size of this study is considered within the normal range for radiomics research,49⇓-51 size remains a limiting factor for reliable generalization of the findings, and future larger multicenter projects and prospective model implementation are recommended for further validation of features predictive of cerebral AVM obliteration. Similarly, the age of our cohort may be slightly greater than that in other published studies. We recommend implementing a nested cross-validation approach incorporating the feature selection in future studies with larger sample-size populations. Another limitation may relate to not examining the different DSA machine types and intra-arterial contrast injection approaches used and the existence of potential variability between them during the span of the study and how that might influence the extracted radiomic features.
CONCLUSIONS
The combination of ML methods and quantifiable image-based markers is a valuable approach to model cerebral AVM outcome managed with SRS and could complement classic prognostic tools. In this study, a radiomics-based ML model was built to predict AVM obliteration following radiosurgical treatment. In line with the prior knowledge in the field and bringing added precision to its assessment, the predictive findings might hypothetically be based on DSA features related to the diffuseness and angioarchitecture of AVMs, which need to be verified in future studies. Model interpretation has become an essential step of ML pipelines in health care to ensure the clinical soundness and validity of prognostic radiomic biomarkers.
Footnotes
Disclosure forms provided by the authors are available with the full text and PDF of this article at www.ajnr.org.
References
- Received June 1, 2023.
- Accepted after revision May 9, 2024.
- © 2024 by American Journal of Neuroradiology