Self-Supervised Pretraining for Railway Sound Classification

Authors

  • Gabriel Schachinger
  • Matthias Blaickner
  • Georg Brandmayr University of Applied Sciences Technikum Wien

Keywords:

Contrastive Triplet Embedding, Railway Sound Classification, Self Supervised Learning

Abstract

This study addresses the limitations of labeled data in railway sound classification by investigating unsupervised pretraining for representation learning. It proposes a two-phase approach involving self-supervised learning (SSL) on a large, unlabeled dataset, followed by supervised finetuning. Two SSL methods are compared: masked autoencoder (MAE) reconstruction and contrastive triplet embedding, using a ResNet-50 encoder. The MAE approach attempts to reconstruct masked segments of sound data, while the contrastive method enhances learning by distinguishing between different samples. In tests with proprietary railway data, the MAE approach did not outperform baseline models; however, contrastive triplet embedding significantly improved the macro F1 score, especially for minority classes, enhancing balance in classification performance. This research highlights the effectiveness of SSL in utilizing unlabeled data to address data imbalances, contributing to more robust and adaptive machine learning systems for real-world railway applications.

DOI: https://doi.org/10.24135/ICONIP3

Downloads

Published

2025-03-17