Data augmentation using MG-GAN for improved cancer classification on gene expression data

P Chaudhari, H Agrawal, K Kotecha - Soft Computing, 2020 - Springer
Soft Computing, 2020Springer
Molecular biology studies on cancer, using gene expression datasets, have revealed that
the datasets have a very small number of samples. Obtaining medical data is difficult and
expensive due to privacy constraints. Accuracy of classifiers depends greatly on the quality
and quantity of input data. The problem of small sample size or small data size has been
addressed by augmentation. Owing to the sensitivity of synthetic data samples for the cancer
data classification for gene expression data, this paper is motivated to investigate data …
Abstract
Molecular biology studies on cancer, using gene expression datasets, have revealed that the datasets have a very small number of samples. Obtaining medical data is difficult and expensive due to privacy constraints. Accuracy of classifiers depends greatly on the quality and quantity of input data. The problem of small sample size or small data size has been addressed by augmentation. Owing to the sensitivity of synthetic data samples for the cancer data classification for gene expression data, this paper is motivated to investigate data augmentation using GAN. GAN is based on the principle of two blocks (generator and discriminator) working in a collaborative yet adversarial way. This paper proposes modified generator GAN (MG-GAN) where the generator is fed with original data and multivariate noise to generate data with Gaussian distribution. As the generated data lie within latent space, we reach saddle point faster. GAN has been widely used in data augmentation for image datasets. As per our understanding, this is the first attempt of using GAN for augmentation on gene expression dataset. The performance merit of proposed MG-GAN was compared with KNN and Basic GAN. As compared to KNN and GAN, MG-GAN improves classification accuracy by 18.8% and 11.9%, respectively. The loss value of the error function for MG-GAN is drastically reduced, from 0.6978 to 0.0082, ensuring sensitivity of the generated data. Improved classification accuracy and reduction in the loss value make our improved MG-GAN method better suited for critical applications with sensitive data.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果