Anomaly Detection Benchmarks: Navigating Imbalanced Industrial Data
Analysis
Key Takeaways
- •Anomaly detection performance is highly sensitive to the number of faulty examples in the training data.
- •Unsupervised methods (kNN/LOF) perform well with very few faulty examples (<20).
- •Semi-supervised (XGBOD) and supervised (SVM/CatBoost) methods show significant performance gains with 30-50 faulty examples, especially with higher dimensionality.
“Our findings reveal that the best detector is highly dependant on the total number of faulty examples in the training dataset, with additional healthy examples offering insignificant benefits in most cases.”