Virtual metrology (VM) has been applied to semiconductor manufacturing processes for the quality management of wafers. However, noises included in training datasets degrade the performance of VM, which is a key obstacle to the application of VM in real-world semiconductor manufacturing processes. In this paper, we develop a VM dataset construction method by identifying and removing noises. We define noises by considering both input and output variables and classify noises into fault detection and classification (FDC) noises and metrology noises, which have abnormal FDC variables and normal metrology variables, and normal FDC variables and abnormal metrology variables, respectively. We propose the construction of a VM training dataset including FDC noises and excluding metrology noises. By employing novelty detection methods, the normal/abnormal regions of FDC variables are identified. In experiments conducted on a real-world photolithography (photo) data, VM models trained with the dataset constructed by the proposed method showed the best accuracy and the most robustness.