Skip to main content
Advertisement

Main menu

  • Home
  • Content
    • Current Issue
    • Accepted Manuscripts
    • Article Preview
    • Past Issue Archive
    • Video Articles
    • AJNR Case Collection
    • Case of the Week Archive
    • Case of the Month Archive
    • Classic Case Archive
  • Special Collections
    • AJNR Awards
    • Low-Field MRI
    • Alzheimer Disease
    • ASNR Foundation Special Collection
    • Photon-Counting CT
    • View All
  • Multimedia
    • AJNR Podcasts
    • AJNR SCANtastic
    • Trainee Corner
    • MRI Safety Corner
    • Imaging Protocols
  • For Authors
    • Submit a Manuscript
    • Submit a Video Article
    • Submit an eLetter to the Editor/Response
    • Manuscript Submission Guidelines
    • Statistical Tips
    • Fast Publishing of Accepted Manuscripts
    • Graphical Abstract Preparation
    • Imaging Protocol Submission
    • Author Policies
  • About Us
    • About AJNR
    • Editorial Board
    • Editorial Board Alumni
  • More
    • Become a Reviewer/Academy of Reviewers
    • Subscribers
    • Permissions
    • Alerts
    • Feedback
    • Advertisers
    • ASNR Home

User menu

  • Alerts
  • Log in

Search

  • Advanced search
American Journal of Neuroradiology
American Journal of Neuroradiology

American Journal of Neuroradiology

ASHNR American Society of Functional Neuroradiology ASHNR American Society of Pediatric Neuroradiology ASSR
  • Alerts
  • Log in

Advanced Search

  • Home
  • Content
    • Current Issue
    • Accepted Manuscripts
    • Article Preview
    • Past Issue Archive
    • Video Articles
    • AJNR Case Collection
    • Case of the Week Archive
    • Case of the Month Archive
    • Classic Case Archive
  • Special Collections
    • AJNR Awards
    • Low-Field MRI
    • Alzheimer Disease
    • ASNR Foundation Special Collection
    • Photon-Counting CT
    • View All
  • Multimedia
    • AJNR Podcasts
    • AJNR SCANtastic
    • Trainee Corner
    • MRI Safety Corner
    • Imaging Protocols
  • For Authors
    • Submit a Manuscript
    • Submit a Video Article
    • Submit an eLetter to the Editor/Response
    • Manuscript Submission Guidelines
    • Statistical Tips
    • Fast Publishing of Accepted Manuscripts
    • Graphical Abstract Preparation
    • Imaging Protocol Submission
    • Author Policies
  • About Us
    • About AJNR
    • Editorial Board
    • Editorial Board Alumni
  • More
    • Become a Reviewer/Academy of Reviewers
    • Subscribers
    • Permissions
    • Alerts
    • Feedback
    • Advertisers
    • ASNR Home
  • Follow AJNR on Twitter
  • Visit AJNR on Facebook
  • Follow AJNR on Instagram
  • Join AJNR on LinkedIn
  • RSS Feeds

AJNR Awards, New Junior Editors, and more. Read the latest AJNR updates

Research ArticleARTIFICIAL INTELLIGENCE

Using an Ensemble of Segmentation Methods to Detect Vertebral Bodies on Radiographs

Brian C. Chang, Jonathan Renslo, Qifei Dong, Sandra K. Johnston, Jessica Perry, David R. Haynor, Gang Luo, Nancy E. Lane, Jeffrey G. Jarvik and Nathan M. Cross
American Journal of Neuroradiology August 2024, DOI: https://doi.org/10.3174/ajnr.A8343
Brian C. Chang
aFrom the Departments of Biomedical Informatics and Medical Education (B.C.C., Q.D., G.L.), University of Washington, Seattle, Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Brian C. Chang
Jonathan Renslo
bKeck School of Medicine (J.R.), University of Southern California, Los Angeles, California
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Qifei Dong
aFrom the Departments of Biomedical Informatics and Medical Education (B.C.C., Q.D., G.L.), University of Washington, Seattle, Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sandra K. Johnston
cDepartments of Radiology (S.K.J., D.R.H., J.G.J., N.M.C.), University of Washington, Seattle, Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jessica Perry
dDepartments of Biostatistics (J.P.), University of Washington, Seattle, Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David R. Haynor
cDepartments of Radiology (S.K.J., D.R.H., J.G.J., N.M.C.), University of Washington, Seattle, Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gang Luo
aFrom the Departments of Biomedical Informatics and Medical Education (B.C.C., Q.D., G.L.), University of Washington, Seattle, Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nancy E. Lane
eDepartment of Medicine (N.E.L.), Rheumatology, University of California Davis, Davis, California
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeffrey G. Jarvik
cDepartments of Radiology (S.K.J., D.R.H., J.G.J., N.M.C.), University of Washington, Seattle, Washington
fDepartments of Neurological Surgery (J.G.J.), University of Washington, Seattle, Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jeffrey G. Jarvik
Nathan M. Cross
bKeck School of Medicine (J.R.), University of Southern California, Los Angeles, California
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nathan M. Cross

Abstract

BACKGROUND AND PURPOSE: Vertebral compression fractures may indicate osteoporosis but are underdiagnosed and underreported by radiologists. We have developed an ensemble of vertebral body (VB) segmentation models for lateral radiographs as a critical component of an automated, opportunistic screening tool. Our goal is to detect the approximate location of thoracic and lumbar VBs, including fractured vertebra, on lateral radiographs.

MATERIALS AND METHODS: The Osteoporotic Fractures in Men Study (MrOS) data set includes spine radiographs of 5994 men aged ≥65 years from 6 clinical centers. Two segmentation models, U-Net and Mask-RCNN (Region-based Convolutional Neural Network), were independently trained on the MrOS data set retrospectively, and an ensemble was created by combining them. Primary performance metrics for VB detection success included precision, recall, and F1 score for object detection on a held-out test set. Intersection over union (IoU) and Dice coefficient were also calculated as secondary metrics of performance for the test set. A separate external data set from a quaternary health care enterprise was acquired to test generalizability, comprising diagnostic clinical radiographs from men and women aged ≥65 years.

RESULTS: The trained models achieved F1 score of U-Net = 83.42%, Mask-RCNN = 86.30%, and ensemble = 88.34% in detecting all VBs, and F1 score of U-Net = 87.88%, Mask-RCNN = 92.31%, and ensemble = 97.14% in detecting severely fractured vertebrae. The trained models achieved an average IoU per VB of 0.759 for U-Net and 0.709 for Mask-RCNN. The trained models achieved F1 score of U-Net = 81.11%, Mask-RCNN = 79.24%, and ensemble = 87.72% in detecting all VBs in the external data set.

CONCLUSIONS: An ensemble model combining predictions from U-Net and Mask-RCNN resulted in the best performance in detecting VBs on lateral radiographs and generalized well to an external data set. This model could be a key component of a pipeline to detect fractures on all vertebrae in a radiograph in an automated, opportunistic screening tool under development.

ABBREVIATIONS:

IoU
intersection over union
m2ABQ
modified-2 algorithm based qualitative fracture classification system
MrOS
Osteoporotic Fractures in Men Study
OCF
osteoporotic compression fracture
PPV
positive predictive value
VB
vertebral body

SUMMARY

PREVIOUS LITERATURE:

Osteoporosis is a chronic disease of low bone mineral attenuation affecting millions of Americans that predisposes osteoporotic compression fractures (OCFs), sometimes escaping clinical detection. Because OCFs are underreported and underdiagnosed, an automated tool could facilitate opportunistic screening, where vertebral body (VB) segmentation is essential for many downstream analyses, including fracture classification. Several groups have developed segmentation models for automated VB detection on radiographs but were limited by small, single-institutional data sets, risking overfitting. These also did not report on models’ performance in detecting nonfractured versus fractured vertebrae.

KEY FINDINGS:

U-Net and Mask-RCNN, and the ensemble model accurately detected the location of VBs centroids on lateral radiographs. Using centroids of predicted segmentation masks for a rule-based ensemble model of U-Net and Mask-RCNN demonstrated increased performance in detecting true-positive VBs. On an external data set of diagnostic clinical radiographs acquired from a quaternary health care enterprise, the models achieved F1 score of U-Net = 81.11%, Mask-RCNN = 79.24%, and ensemble = 87.72% in detecting all VBs, demonstrating generalizability.

KNOWLEDGE ADVANCEMENT:

Using centroids as an alternate approach to analyzing segmentation masks, a rule-based ensemble of segmentation models demonstrated increased performance in detecting the approximate location of VBs on radiographs. This methodology will be a critical step in an automated, opportunistic screening tool for osteoporosis under development.

Osteoporosis is a chronic disease of low bone mineral attenuation affecting approximately 10 million Americans over age 50.1 It increases the risk of fractures, including osteoporotic compression fractures (OCFs), and the risk of subsequent OCFs.2 OCFs are associated with increased morbidity and mortality and lower quality of life.3 Although osteoporosis screening is evidence-based and recommended by the US Preventive Services Task Force, screening is grossly underutilized.4,5 While vertebral compression fractures may have other causes, osteoporosis is often the culprit in older individuals. These fractures sometimes escape clinical detection and are associated with frailty. An automated tool to detect OCFs on radiographs could facilitate opportunistic screening, because OCFs are underdiagnosed and underreported by radiologists.6⇓-8

Radiography is a high-volume technique utilized early in diagnostic work-ups, with approximately 275 million examinations performed in 2016 in the United States, making it an ideal candidate for study.9 A critical component of an automated diagnostic tool for lateral spine radiography is sufficiently accurate localization of vertebral bodies. It is well known that image classification performs better with a cropped region of interest around the object to be classified.10,11 For this reason, many classification models use upstream semiautomated or manual segmentation of VBs.12

Unlike CT and other modalities, radiographic systems have less standardization, and radiographs pose unique challenges as they are projection images with superimposing objects. However, radiographs’ ubiquity in clinical practice, low radiation dose, and relatively low cost, present an excellent opportunity for screening. Several prior studies of segmentation models compared performance in detecting VBs on thoracic and lumbar lateral radiographs13,14 but had several limitations, including small data sets from single institutions predisposing overfitting. For this study, we applied 2 convolutional neural network architectures to a large multi-institutional data set. U-Net, developed in 2015, is frequently used for semantic segmentation.15 Mask-RCNN (Region-based Convolutional Neural Network), developed in 2017, has a more complex architecture and better performance in object detection tasks.16 One study reported successful training of U-Net on 124 lateral radiographs (40 images in the test set).17 To date, there have been no studies exploring combining results from these 2 models.

In this work, we focused on detecting the approximate location of all VBs in a radiograph. We hypothesized that a rule-based ensemble of segmentation models could outperform either alone, while providing an alternate approach to analyzing segmentation masks.

This article follows the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis Checklist.18

MATERIALS AND METHODS

A pre-existing multicenter deidentified annotated radiograph data set was obtained from the Osteoporotic Fractures in Men Study (MrOS) Coordinating Center under a data use agreement. A total of 5994 consenting men aged ≥65 years19 enrolled between 2000 and 2002 with 2 lateral thoracic and lumbar spine radiographs separated by 4.5 years.20,21

The MrOS team, by using a semiautomated tool (SpineAnalyzer, Optasia Medical) to mark VB outlines, previously annotated ground truth.21 VBs on both thoracic and lumbar radiographs were only annotated on 1 of the thoracic or lumbar radiographs; VBs were annotated from T5 to T12 or L1 and T12 or L1 to L3 or L4 on thoracic and lumbar radiographs, respectively.21

Each annotated VB was labeled with the Genant semiquantitative criteria.22 We pooled moderate or severe fractures as “fractured” and normal or mild VBs as “nonfractured” to simplify predictor development and assessment while preserving the most clinically impactful findings. Models were trained, validated, and tested on 6695, 1310, and 2346 images, respectively, which were split randomly by patient (Table 1).

View this table:
  • View inline
  • View popup
Table 1:

Demographics of MrOS data

DICOM radiograph images were converted to 16-bit TIFF files; each was downsampled to 512 × 512 pixels with zero-padding to preserve the original aspect ratio. For Mask-RCNN, files were converted to 8-bit. For U-Net, images were preprocessed by using contrast limited adaptive histogram equalization.23,24 Because contrast limited adaptive histogram equalization processed images led to technical errors in Mask-RCNN training, nonprocessed images were used. No other normalization or standardization techniques were performed for either U-Net or Mask-RCNN.

U-Net was trained de novo on the semantic segmentation task.15 Further details are discussed in the Online Supplemental Data. Mask-RCNN was trained on the instance segmentation task.16 Transfer learning was applied by importing pretrained weights trained on the COCO data set, then fine-tuned on the MrOS data set. See further details in the Online Supplemental Data.

For U-Net, the output was binarized, and individual VBs were distinguished by considering separate contiguous patches filtered for size, producing centroid coordinates. For Mask-RCNN, centroid coordinates were calculated from each individual VB segmentation mask. See further details in the Online Supplemental Data.

VB centroids were filtered based on their horizontal position. Those far enough from the mean horizontal coordinate, standardized by the average endplate width, were considered “off-column,” and excluded from subsequent analysis. Further details are in the Online Supplemental Data.

An ensemble model was created by pooling predictions from both models and eliminating duplicates. Duplicates were identified by a threshold on the centroid distance of one-half the average endplate width per image. Nonduplicate U-Net centroids were appended to Mask-RCNN centroids. Further details are in the Online Supplemental Data.

As not all visible VBs were labeled by the MrOS team (eg, L1 or L2 on a thoracic radiograph); a random subset of radiographs (n = 199) from the held-out test set (n = 2346) were annotated completely by a board-certified neuroradiologist. Thirty radiographs included at least 1 fractured vertebrae to reflect the prevalence of the full held-out test set. DicomAnnotator25 was used to identify nonannotated VBs with 4 corner points within the image. These were marked with a centroid and labeled by using the modified-2 algorithm based qualitative fracture classification system (m2ABQ).26 A total of 471 VBs was added to the original 1268.

Models were evaluated primarily on detection success and secondarily on detection quality. The distance between each predicted centroid and the centroid of the nearest ground truth was calculated as the “centroid distance” (Fig 1). Centroids were used as the measure of approximate location and are robust to small errors in segmentation. Because magnification of anatomy varies among radiographs, centroid distance was normalized by the average endplate width of ground truth annotations per radiograph.

FIG 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIG 1.

Right, Example of calculating the centroid distance and the endplate width for a given vertebral body. The dotted line represents the Euclidean distance (pixels) between the ground truth (black) centroid and the predicted (white) centroid, the center of the box. The horizontal solid lines represent the superior and inferior endplate width. Normalized centroid distance % = Length(Dotted)/Avg(Length(Solid)). Left, Original image.

A threshold was set for centroid distance to measure object detection performance, and precision (positive predictive value [PPV]), recall (sensitivity), and F1 score (performance metrics and Online Supplemental Data). A flowchart of our workflow is shown in Fig 2. Segmentations with high intersection over union (IoU) scores are not necessary for our intended use-case. However, IoU and Dice coefficient were used as additional measures of segmentation quality. Details are discussed in the Online Supplemental Data.

FIG 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIG 2.

Flowchart of workflow. Each radiograph image was cropped to show a focused view of each step. 1) DICOM radiographs are converted to TIF files, preprocessed with CLAHE, and downsampled. 2) Each TIF is input into the segmentation models for VB detection. 3) Centroids are extracted from each predicted segmentation mask, represented as yellow points. 4) Centroids are matched based on a distance threshold and duplicates are eliminated, producing a final pool of ensemble centroids. Orange centroids represent “matches” between U-Net and Mask-RCNN while blue centroids represent “nonmatches.”

Two neuroradiologists manually reviewed 27 radiographs from the held-out test set with “off-column” predictions and 52 radiographs with false-negative, or missed, detections, to discern if specific radiologic features were associated with the observed failure patterns for both models.

Another set of diagnostic clinical radiographs were obtained from a quaternary health care enterprise to test generalizability. VBs on spine radiographs, originally generated between 2000 and 2017, were annotated by 4 radiologists; each VB’s 4 corner points and OCF severity were annotated with DicomAnnotator.25 Lateral radiographs of the spine were sampled randomly (n = 188), and images with hardware, swimmer’s views, and views other than lateral thoracic or lumbar spine were excluded. Images were harmonized to be white-bone and right-facing, like the MrOS data set. For U-Net they were converted to single-channel, grayscale images. Metadata are described in Table 2. The quantity of radiographs was selected to be several times greater than the size of test partitions in previous work.

View this table:
  • View inline
  • View popup
Table 2:

Demographics and metadata of external validation data set

All models were evaluated on the external data set. Manual tuning of preprocessing and postprocessing hyperparameters was performed: the contrast limited adaptive histogram equalization clipLimit parameter was set to 5, and the binarization threshold for U-Net was set to 0.61. Performance metrics were calculated in the same fashion except for manual review and the Dice coefficient/IoU, which were not done for these data. Performance metrics were also reported for fractured versus nonfractured versus all vertebrae.

RESULTS

Detection success was quantified on a per-VB basis on the original MrOS annotations. Examples of a successful detection of a fractured vertebrae (Fig 3A) and successful ensemble of centroids (Fig 3B) are shown. Performance metrics with 95% CI are plotted in Fig 4. Performance metrics were high in the test partition with recall of 91.28%, 94.01%, and 95.73% for U-Net, Mask-RCNN, and the ensemble model, respectively.

FIG 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIG 3.

A, Example of successful detection by Mask-RCNN of fractured vertebra, marked by the red arrow. Left, Ground truth (green) segmentation masks overlayed on prediction (red) masks. Overlaps are shown in yellow. Middle, Predicted (red) segmentation masks from Mask-RCNN. Right, Centroids. B, Example of ensemble centroids from U-Net and Mask-RCNN. Orange centroids represent “matches” between U-Net and Mask-RCNN while blue centroids represent “nonmatches.” The blue centroid marked by the red arrow denotes a fractured vertebrae that was detected by only 1 model. The blue centroid marked by the purple arrow denotes an “off-column” detection.

FIG 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIG 4.

A, Object detection scores on the MrOS test partition. Error bars are the 95% CI calculated with the bootstrap. N is the number of vertebral bodies. B, Segmentation scores on the MrOS test partition. IoU, Sorenson Dice coefficient, and centroid distance are reported as percentages.

Performance on the random subset from the MrOS test partition (n = 199 radiographs) with and without the annotations of additional VBs is shown in Fig 5A and Fig 5B. Metrics are reported for nonfractured versus fractured versus all vertebrae for U-Net, Mask-RCNN, and the ensemble model. Performance with the additional annotations had higher PPV (approaching 100%), but lower sensitivity by around 10 points.

FIG 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIG 5.

A, Object detection scores by fracture status for the subset of the MrOS data set test partition. Error bars are the 95% CI calculated with the bootstrap. N is the number of vertebral bodies. Subgroups by fracture status. B, Object detection scores by fracture status for the subset of the MrOS data set test partition with additional annotations of missing VBs. Error bars are the 95% CI calculated with the bootstrap. N is the number of vertebral bodies. Subgroups by fracture status. C, Object detection scores by fracture status for the external validation (non-MrOS) set. Error bars are the 95% CI calculated with the bootstrap. N is the number of vertebral bodies. Subgroups by fracture status.

Performance on the external data set is plotted in Fig 5C. Compared with the MrOS subset with additional annotations, overall sensitivity decreased approximately 15 points while PPV remained high. Within the external data set, sensitivity of fractured vertebrae was lower by approximately 11 points compared with that of nonfractured vertebrae.

Fig 4B describes the detection quality metrics for each model on the MrOS test data. All models had comparable average centroid displacement. U-Net had a higher average IoU compared with Mask-RCNN. Dice coefficients were higher than IoU but trended similarly among models. The ensemble model pools centroids extracted from U-Net and Mask-RCNN segmentations, so IoU and Dice scores were not applicable.

Several failure modes were observed across U-Net and Mask-RCNN predictions. In 52 of 2346 radiographs, the models had at least 1 false-negative (missed VB detection) compared with the MrOS ground truth annotations in the middle of the inferred spinal column (Online Supplemental Data). Both models had difficulty detecting VBs in the presence of hardware (Online Supplemental Data). In the subset of fully annotated radiographs, VBs at the superior and inferior ends that were originally not annotated were often not detected (Online Supplemental Data). Of the 356 false-negatives from the ensemble model on the fully annotated subset, 347 were at the superior and inferior ends (Online Supplemental Data).

A variety of anatomic structures and boundaries presented challenges to the models in detecting VBs on a small subset of radiographs. Neuroradiologists suggested the following features were associated with false-positives: overlap of VBs with ribs, bowel gas, spinous processes, and the diaphragm. For false-negative detections, radiologists commented that reversed contrast and underexposure, marked transition in attenuation, parallax, and diffuse idiopathic skeletal hyperostosis were repeatedly observed. Example cases are shown in Online Supplemental Data.

DISCUSSION

Automated imaging analysis pipelines have become achievable. VB segmentation is essential for many downstream analyses, including fracture classification. While extensive work has been done to segment vertebrae from CT images,27⇓-29 image processing and machine learning methods differ considerably from segmentation on radiographs. Several groups have trained and developed segmentation models for automated VB detection from radiographs. Kim et al14 trained segmentation models to measure vertebral compression ratios. Kónya et al13 compared several segmentation models in detecting VBs on lumbar lateral radiographs. These studies were limited by small, single-institutional data sets, risking overfitting. Our data set included more than 10,000 radiographs of the thoracic and lumbar spine from a multi-institutional cohort across the United States, with validation on an external clinical cohort. Prior studies commented on models’ abilities to detect fractured vertebrae but did not compare performance in detecting nonfractured versus fractured vertebrae, while this work does.

Both U-Net and Mask-RCNN achieved high PPV in detecting VBs on the original MrOS annotations and the fully annotated subset. The ensemble model had slightly lower PPV, suggesting both U-Net and Mask-RCNN uniquely produced false-positives but importantly had higher sensitivity and F1 score, indicating VBs that were not detected by U-Net were detected by Mask-RCNN and vice versa, ie, the models were complementary. Overall, the ensemble of the 2 models increased sensitivity considerably with minimal loss in PPV.

As expected, the additional annotations increased PPV but lowered sensitivity, as some false-positives became true-positives, but more false-negatives were identified. False-negatives were clustered at the edges of the image (Online Supplemental Data), matching the limited annotations of the MrOS training data.

All 3 models achieved high performance in detecting fractured vertebrae, despite the class imbalance in the training set. This demonstrated that both U-Net and Mask-RCNN can detect VBs with different contours and geometry. Interestingly, model performance was slightly better on fractured vertebrae on both the original and additional annotations. This may be due to the small sample of fractured vertebrae, but also because fractures tend to cluster in T12–L2 in the center of the image, where the model performed best and had the greatest training data. Recent studies indicate 60%–75% of fractures occur between T12–L2, and another 30% occur between L2–L5.30

The performance on the external data set was lower compared with the fully annotated internal (MrOS) subset, which is expected when applying a model on data different from the training set. The external data set differed considerably from the training and internal test sets in that it was acquired entirely digitally, the demographics were broader, and the radiographs were captured in clinical settings with a variety of protocols and variation in quality. This may have contributed to the lower sensitivity of fractured vertebrae. Using only parameter tuning of the pre- and postprocessing steps (without fine-tuning of the neural-net model), the ensemble model maintained an excellent PPV and fairly high F1 score overall, with shortcomings mainly in sensitivity.

The MrOS annotations were created for objectives different from that of this work and were not optimal for evaluating model performance detecting all relevant VBs on a radiograph. There were often several visible VBs in radiographs that were not annotated (Online Supplemental Data). We accounted for this by fully annotating a subset of the test partition to evaluate the performance in detecting all VBs present on a radiograph. The MrOS data set was also collected for research, optimized to show bone, and is more than 20 years old, lacking data about scanner vendor. These data are also homogeneous in demographics, containing only men who were predominantly white, which may have introduced bias given sex and ethnicity differences in bone mineral attenuation and morphology.31⇓-33 These limitations were partially addressed by evaluation on the external data set. The lower sensitivity on the external data set would be: 1) less problematic at the patient level because only 1 detected fracture is required to refer a patient to a provider; and 2) less impactful to osteoporotic fracture detection because many of the missed vertebrae are in regions of the spine where osteoporotic fractures infrequently occur. In addition, the method of filtering out “off-column” detections is somewhat crude and may be overly conservative in cases with severe kyphosis or lordosis. Further evaluation of generalizability, performance, and bias will be necessary before integration into clinical workflows.

The largest opportunity for improvement lies in increasing sensitivity on the external data set. False-negatives could be recovered (and sensitivity increased) by exploiting domain knowledge: VBs are expected to be found at consistent vertical intervals along a curve. An improved method might feature spine centerline estimation to infer missing centroids and improve the centroid-spine-centerline distance calculation. Incorporating additional models into the ensemble may also improve overall performance, especially sensitivity. In addition, training on more varied data sets of radiographs obtained in clinical settings would likely increase overall performance and produce a model generalizable across institutions.

Applications of this work could reference predicted centroids to crop a region and perform fracture detection, a problem for which models have been described previously.12 An automatic tool with high PPV could be valuable in triaging studies in high volume clinical settings. This could facilitate broader screening for OCFs, potentially catching cases where intervention could prevent additional complications. On top of providing a tool to the diagnostic radiologist, an automated diagnostic model could be applied retrospectively to flag patients for additional clinical scrutiny or to generate research cohorts. It could also be applied judiciously in low resource settings to help nonradiologists and trainees assess scans before review by a radiologist.

This work is also agnostic of the diagnostic question; it could be chained with classification models for any disease where a plain film of a VB is relevant. However, considerable evaluation and testing, including evaluating bias, will be necessary before clinical application.

CONCLUSIONS

An ensemble model combining the predictions from 2 independent segmentation models increased performance by 2%–5% in detecting VBs on lateral radiographs of the thoracic and lumbar spine compared with either model individually. The ensemble model also demonstrated high performance (F1 score of 97%) in detecting fractured vertebrae, and some generalizability on an external data set of radiographs captured in clinical settings showing that this approach, with additional work to improve sensitivity, could serve as a precursor component in diagnostic pipelines.

Footnotes

  • This work was supported by the University of Washington Clinical Learning, Evidence, And Research (CLEAR) Center for Musculoskeletal Disorders, Administrative, Methodologic Cores and National Institute of Arthritis and Musculoskeletal and Skin Diseases/National Institutes of Health (NIAMS/NIH) grant P30AR072572; and supported in part by the General Electric-Association of University Radiologists Radiology Research Academic Fellowship (GERRAF, Dr. Cross), a career development award co-sponsored by General Electric Healthcare and the Association of University Radiologists. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

  • The Osteoporotic Fractures in Men Study (MrOS) is supported by NIH funding. The following institutes provide support: the National Institute on Aging (NIA), NIAMS, the National Center for Advancing Translational Sciences (NCATS), and NIH Roadmap for Medical Research under the following grant numbers: U01 AG027810, U01 AG042124, U01 AG042139, U01 AG042140, U01 AG042143, U01 AG042145, U01 AG042168, U01 AR066160, R01 AG066671, and UL1 TR000128.

  • Gang Luo was partially supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health under Award R01HL142503.

  • Brian Chang was supported by the National Library of Medicine training grant T15LM007442.

  • Disclosure forms provided by the authors are available with the full text and PDF of this article at www.ajnr.org.

References

  1. 1.
    1. Clynes MA,
    2. Harvey NC,
    3. Curtis EM, et al
    . The epidemiology of osteoporosis. Br Med Bull 2020;133:105–17 doi:10.1093/bmb/ldaa005 pmid:32282039
  2. 2.
    1. Burge R,
    2. Dawson-Hughes B,
    3. Solomon DH, et al
    . Incidence and economic burden of osteoporosis-related fractures in the United States, 2005-2025. J Bone Miner Res 2007;22:465–75 doi:10.1359/jbmr.061113 pmid:17144789
  3. 3.
    1. Fink HA,
    2. Milavetz DL,
    3. Palermo L, et al
    ; Fracture Intervention Trial Research Group. What proportion of incident radiographic vertebral deformities is clinically diagnosed and vice versa? J Bone Miner Res 2005;20:1216–22 doi:10.1359/JBMR.050314 pmid:15940375
  4. 4.
    1. Curry SJ,
    2. Krist AH,
    3. Owens DK, et al
    ; US Preventive Services Task Force. Screening for osteoporosis to prevent fractures: US Preventive Services Task Force Recommendation Statement. JAMA 2018;319:2521–31 doi:10.1001/jama.2018.7498 pmid:29946735
  5. 5.
    1. Meadows ES,
    2. Whangbo A,
    3. McQuarrie N, et al
    . Compliance with mammography and bone mineral density screening in women at least 50 years old. Menopause 2011;18:794–801 doi:10.1097/gme.0b013e3182083f28 pmid:21505373
  6. 6.
    1. Carberry GA,
    2. Pooler BD,
    3. Binkley N, et al
    . Unreported vertebral body compression fractures at abdominal multidetector CT. Radiology 2013;268:120–26 doi:10.1148/radiol.13121632 pmid:23449956
  7. 7.
    1. Kim N,
    2. Rowe BH,
    3. Raymond G, et al
    . Underreporting of vertebral fractures on routine chest radiography. Am J Roentgenol 2004;182:297–300 doi:10.2214/ajr.182.2.1820297
  8. 8.
    1. Diacinti D,
    2. Vitali C,
    3. Gussoni G, et al
    ; Research Department of FADOI. Misdiagnosis of vertebral fractures on local radiographic readings of the multicentre POINT (Prevalence of Osteoporosis in INTernal medicine) study. Bone 2017;101:230–35 doi:10.1016/j.bone.2017.05.008 pmid:28511873
  9. 9.
    1. Mahesh M,
    2. Ansari AJ,
    3. Mettler FA Jr.
    . Patient Exposure from Radiologic and Nuclear Medicine Procedures in the United States and Worldwide: 2009–2018. Radiology 2023;307:e221263 doi:10.1148/radiol.221263 pmid:36511806
  10. 10.
    1. Tang X
    . The role of artificial intelligence in medical imaging research. BJR Open 2020;2:20190031 doi:10.1259/bjro.20190031 pmid:33178962
  11. 11.
    1. Mohammad N,
    2. Muad AM,
    3. Ahmad R, et al
    . Accuracy of advanced deep learning with tensorflow and keras for classifying teeth developmental stages in digital panoramic imaging. BMC Med Imaging 2022;22:66 doi:10.1186/s12880-022-00794-6 pmid:35395737
  12. 12.
    1. Dong Q,
    2. Luo G,
    3. Lane NE, et al
    . Deep learning classification of spinal osteoporotic compression fractures on radiographs using an adaptation of the Genant semiquantitative criteria. Acad Radiology 2022;29:1819–32 doi:10.1016/j.acra.2022.02.020 pmid:35351363
  13. 13.
    1. Kónya S,
    2. Natarajan TS,
    3. Allouch H, et al
    . Convolutional neural network-based automated segmentation and labeling of the lumbar spine X-ray. J Craniovertebral Junction Spine 2021;12:136–43 doi:10.4103/jcvjs.jcvjs_186_20
  14. 14.
    1. Kim DH,
    2. Jeong JG,
    3. Kim YJ, et al
    . Automated vertebral segmentation and measurement of vertebral compression ratio based on deep learning in x-ray images. J Digit Imaging 2021;34:853–61 doi:10.1007/s10278-021-00471-0 pmid:34236562
  15. 15.
    1. Ronneberger O,
    2. Fischer P,
    3. Brox T
    . U-Net: Convolutional networks for biomedical image segmentation. https://doi.org/10.48550/arXiv.1505.04597. Accessed February 2, 2023
  16. 16.
    1. He K,
    2. Gkioxari G,
    3. Dollár P, et al
    . Mask R-CNN. Accessed March 13, 2023 doi:https://doi.org/10.48550/arXiv.1703.06870
  17. 17.
    1. Badhe S,
    2. Singh V,
    3. Li J, et al
    . Automated segmentation of vertebrae on lateral chest radiography using deep learning. Accessed May 19, 2023 doi:https://doi.org/10.48550/arXiv.2001.01277
  18. 18.
    1. Collins GS,
    2. Reitsma JB,
    3. Altman DG,
    4. Moons KGM
    . Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. Ann Intern Med 2015;162:55–63 doi:10.7326/M14-069712
  19. 19.
    1. Blank JB,
    2. Cawthon PM,
    3. Carrion-Petersen ML, et al
    . Overview of recruitment for the osteoporotic fractures in men study (MrOS). Contemp Clin Trials 2005;26:557–68 doi:10.1016/j.cct.2005.05.005
  20. 20.
    1. Orwoll E,
    2. Blank JB,
    3. Barrett-Connor E, et al
    . Design and baseline characteristics of the osteoporotic fractures in men (MrOS) study—A large observational study of the determinants of fracture in older men. Contemp Clin Trials 2005;26:569–85 doi:10.1016/j.cct.2005.05.006
  21. 21.
    1. Cawthon PM,
    2. Haslam J,
    3. Fullman R, et al
    ; Osteoporotic Fractures in Men (MrOS) Research Group. Methods and reliability of radiographic vertebral fracture detection in older men: the osteoporotic fractures in men study. Bone 2014;67:152–55 doi:10.1016/j.bone.2014.06.039 pmid:25003811
  22. 22.
    1. Genant HK,
    2. Wu CY,
    3. van Kuijk C, et al
    . Vertebral fracture assessment using a semiquantitative technique. J Bone Miner Res 1993;8:1137–48 doi:10.1002/jbmr.5650080915 pmid:8237484
  23. 23.
    1. van der Walt S,
    2. Schönberger JL,
    3. Nunez-Iglesias J, et al
    ; The scikit-image contributors. scikit-image: image processing in Python. PeerJ 2014;2:e453 doi:10.7717/peerj.453
  24. 24.
    1. Pizer SM,
    2. Amburn EP,
    3. Austin JD, et al
    . Adaptive histogram equalization and its variations. Comput Vis Graph Image Process 1987;39:355–68
  25. 25.
    1. Dong Q,
    2. Luo G,
    3. Haynor D, et al
    . DicomAnnotator: a configurable open-source software program for efficient DICOM image annotation. J Digit Imaging 2020;33:1514–26 doi:10.1007/s10278-020-00370-w pmid:32666365
  26. 26.
    1. Aaltonen HL,
    2. O’Reilly MK,
    3. Linnau KF, et al
    . m2ABQ-a proposed refinement of the modified algorithm-based qualitative classification of osteoporotic vertebral fractures. Osteoporos Int 2023;34:137–45 doi:10.1007/s00198-022-06546-0
  27. 27.
    1. Qadri SF,
    2. Shen L,
    3. Ahmad M, et al
    . OP-convNet: A patch classification-based framework for CT vertebrae segmentation. IEEE Access 2021;9:158227–40 doi:10.1109/ACCESS.2021.3131216
  28. 28.
    1. Qadri SF,
    2. Shen L,
    3. Ahmad M, et al
    . SVseg: stacked sparse autoencoder-based patch classification modeling for vertebrae segmentation. Mathematics 2022;10:796 doi:10.3390/math10050796
  29. 29.
    1. Qadri SF,
    2. Lin H,
    3. Shen L, et al
    . CT-based automatic spine segmentation using patch-based deep learning. Int J Intell Syst 2023;2023:e2345835
  30. 30.
    1. Donnally CJ, III.,
    2. DiPompeo CM,
    3. Varacallo M
    . Vertebral Compression Fractures. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2023.
  31. 31.
    1. Looker AC,
    2. Melton LJ,
    3. Borrud LG, et al
    . Lumbar spine bone mineral density in US adults: demographic patterns and relationship with femur neck skeletal status. Osteoporos Int 2012;23:1351–60 doi:10.1007/s00198-011-1693-z pmid:21720893
  32. 32.
    1. Gilsanz V,
    2. Boechat MI,
    3. Gilsanz R, et al
    . Gender differences in vertebral sizes in adults: biomechanical implications. Radiology 1994;190:678–82 doi:10.1148/radiology.190.3.8115610 pmid:8115610
  33. 33.
    1. Ebbesen EN,
    2. Thomsen JS,
    3. Beck-Nielsen H, et al
    . Age- and gender-related differences in vertebral bone mass, density, and strength. J Bone Miner Res 1999;14:1394–403 doi:10.1359/jbmr.1999.14.8.1394 pmid:10457272
  • Received February 9, 2024.
  • Accepted after revision May 3, 2024.
  • © 2024 by American Journal of Neuroradiology
Advertisement

Indexed Content

  • Current Issue
  • Accepted Manuscripts
  • Article Preview
  • Past Issues
  • Editorials
  • Editor's Choice
  • Fellows' Journal Club
  • Letters to the Editor
  • Video Articles

Cases

  • Case Collection
  • Archive - Case of the Week
  • Archive - Case of the Month
  • Archive - Classic Case

More from AJNR

  • Trainee Corner
  • Imaging Protocols
  • MRI Safety Corner
  • Book Reviews

Multimedia

  • AJNR Podcasts
  • AJNR Scantastics

Resources

  • Turnaround Time
  • Submit a Manuscript
  • Submit a Video Article
  • Submit an eLetter to the Editor/Response
  • Manuscript Submission Guidelines
  • Statistical Tips
  • Fast Publishing of Accepted Manuscripts
  • Graphical Abstract Preparation
  • Imaging Protocol Submission
  • Evidence-Based Medicine Level Guide
  • Publishing Checklists
  • Author Policies
  • Become a Reviewer/Academy of Reviewers
  • News and Updates

About Us

  • About AJNR
  • Editorial Board
  • Editorial Board Alumni
  • Alerts
  • Permissions
  • Not an AJNR Subscriber? Join Now
  • Advertise with Us
  • Librarian Resources
  • Feedback
  • Terms and Conditions
  • AJNR Editorial Board Alumni

American Society of Neuroradiology

  • Not an ASNR Member? Join Now

© 2025 by the American Society of Neuroradiology All rights, including for text and data mining, AI training, and similar technologies, are reserved.
Print ISSN: 0195-6108 Online ISSN: 1936-959X

Powered by HighWire