Table 2:

Performance measures for each machine learning model applied to the external testing data seta

Model AUCTPRFPRPPVNPVF1 ScoreBalanced AccuracyMisclassification Error
BG0.740.710.350.240.930.360.680.34
RF0.760.710.320.260.940.380.700.32
SVM0.840.930.350.300.980.450.790.31
KNN 0.760.790.360.260.950.390.710.34
LR0.770.860.370.270.960.410.740.34
  • Note:—NPV indicates negative predictive value, the number of true-negatives divided by the number of true- and false-negatives; AUC, area under curve; FPR, false-positive rate (1-specificity = number of false-positives divided by all negatives); PPV, positive predictive value (precision = number of true-positives divided by number of true- and false-positives); TPR, true-positive rate (sensitivity or recall = number of true-positives divided by all positives).

  • a F1 = 2 × PPV × TPR / (PPV + TPR) is the harmonic mean of precision and recall. Balanced accuracy is accuracy accounting for class imbalance [(sensitivity + specificity)/ 2]. Misclassification error is the number of incorrect classifications divided by sample size.