Self-Supervised Pretraining for Railway Sound Classification
Keywords:
Contrastive Triplet Embedding, Railway Sound Classification, Self Supervised LearningAbstract
This study addresses the limitations of labeled data in railway sound classification by investigating unsupervised pretraining for representation learning. It proposes a two-phase approach involving self-supervised learning (SSL) on a large, unlabeled dataset, followed by supervised finetuning. Two SSL methods are compared: masked autoencoder (MAE) reconstruction and contrastive triplet embedding, using a ResNet-50 encoder. The MAE approach attempts to reconstruct masked segments of sound data, while the contrastive method enhances learning by distinguishing between different samples. In tests with proprietary railway data, the MAE approach did not outperform baseline models; however, contrastive triplet embedding significantly improved the macro F1 score, especially for minority classes, enhancing balance in classification performance. This research highlights the effectiveness of SSL in utilizing unlabeled data to address data imbalances, contributing to more robust and adaptive machine learning systems for real-world railway applications.
Downloads
Published
Issue
Track Selection
License
Copyright (c) 2025 The Authors(s)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.