[PDF][PDF] Stratification of Chronic Myeloid Leukemia Cancer Dataset into Risk Groups using Four Machine Learning Algorithms with Minimal Loss Functi

OO Taiwo, FA Kasali, IO Akinyemi, SO Kuyoro… - 2019 - afrjmis.net
OO Taiwo, FA Kasali, IO Akinyemi, SO Kuyoro, DD Awodele, DD Ogbaro, TS Olaniyan
2019afrjmis.net
ABSTRACT Chronic Myeloid Leukemia (CML) had been stratified into risk groups using
scoring systems but these systems have limitation of overfitting data. Machine Learning (ML)
algorithms were used to extract meaningful information from the datasets, but the loss
function (empirical risk) of the algorithms was not considered to determine the risk that was
incurred in adopting the algorithms for stratification. In this paper, secondary dataset of 1640
CML patients, between 2003 and 2017 was collected from Obafemi Awolowo University …
ABSTRACT Chronic Myeloid Leukemia (CML) had been stratified into risk groups using scoring systems but these systems have limitation of overfitting data. Machine Learning (ML) algorithms were used to extract meaningful information from the datasets, but the loss function (empirical risk) of the algorithms was not considered to determine the risk that was incurred in adopting the algorithms for stratification. In this paper, secondary dataset of 1640 CML patients, between 2003 and 2017 was collected from Obafemi Awolowo University Teaching Hospitals Complex, Ile-Ife, Osun Sate, Nigeria. An experimental analysis was performed in Waikato Environment for Knowledge Analysis 3.8. 0 using basophil count and spleen size values on four ML algorithms (BayesNet, Multilayered perceptron, Projective Adaptive Resonance Theory (PART) and Logistic Regression) to determine low and high risk patients. Holdout and 10-fold cross-validation techniques were used to evaluate the performance of the algorithms on correctly classified instances, time to learn, kappa statistics, sensitivity and specificity. Considering the performance metrics, Logistic regression and PART algorithms were the two algorithms with better performance in stratifying patients’ risk group as against other algorithms used in this study. Afterwards, the loss functions of the two algorithms were determined by finding the difference between the true output and the predicted output. The results of the loss function of Logistic regression algorithm for low and high risk in holdout and 10-fold cross-validation showed 0.22%, 1.40% and-0.22%,-0.02% respectively. Similarly, PART algorithm yielded-1.58%, 1.40% and-0.22%,-0.26%. From the findings, the Logistic regression algorithm had the minimum non-negative loss function in holdout technique and was used in the developed model to stratify CML into their risk groups. Therefore, the determination of loss function of algorithms minimizes the empirical risk and as such plays a significant role in producing optimum and faster results for accurate stratification.
afrjmis.net
以上显示的是最相近的搜索结果。 查看全部搜索结果