Tunnel Boring Machine (TBM) construction, particularly with closed-face TBMs, faces uncertainties due to the inability of the operator to directly observe the ground ahead. These uncertainties can lead to time delays, cost overruns, and accidents. While supervised machine learning techniques have been used to predict geology from TBM sensor data, their performance drops significantly when applied to other projects, indicating poor generalization. To ensure accurate results and improved generalization to future data, supervised learning models require high-quality, well-labeled data which is not usually the case for TBM datasets. This paper addresses the issue of “noisy” labels in TBM datasets, which human operators and engineers often label with varying interpretations. A data-centric framework was adapted and applied to an Earth Pressure Balance Machines (EPBM) tunnel dataset to detect and identify these mislabeled datapoints. The framework's outputs were validated using two techniques and apply several methods to clean the dataset. The best-performing method was selected for the test set. The paper concludes by discussing the limitations of the proposed method, the challenges encountered, and future research directions in this area.