作者
Ivan Izonin, Roman Tkachenko, Roman Pidkostelnyi, Olena Pavliuk, Viktor Khavalko, Anatoliy Batyuk
发表日期
2021
研讨会论文
IDDM
页码范围
223-232
简介
Improving the accuracy of diagnostics tasks is essential in various medical fields. When there are small data for training, there are high risks of overfitting or underfitting the machine learning model. This makes it impossible to apply it in practice. To solve such a problem, we can use various data augmentation methods. This paper focuses on neural network methods of data augmentation. The authors have investigated a variational autoencoder and approach based on GAN to generate artificial numerical data and then use it by machine-learning-based classifiers. The authors examined the proposed method for diagnosing diabetes mellitus development task. Experiments confirmed that autoencoders generated a dataset similar to an initial one, with a similarity score being 0.93. The authors established a significant accuracy improvement of Random Forest, AdaBoost, and Logistic regression classifiers based on processing an extended dataset. The application of the new dataset obtained using GAN does not ensure satisfactory accuracy. Such an issue may be due to a lack of samples for the training of this neural networks class. Further research is likely to be carried out into ensembles based on a single machine learning method, which will process decorrelated samples acquired by methods investigated in this paper.
引用总数