作者
Stefan Hegselmann, Leonard Gruelich, Julian Varghese, Martin Dugas
发表日期
2018/11/29
研讨会论文
Machine Learning for Healthcare Conference
页码范围
49-66
出版商
PMLR
简介
Survival prediction for cancer patients can increase the prognostic accuracy and might ultimately lead to better informed decision making. To this end, many studies apply machine learning to cancer data of the Surveillance, Epidemiology, and End Results (SEER) program. The first part of this report contains a literature review to obtain a systematic overview of these studies. We identify 34 publications and extract information about experimental setups and efforts to ensure reproducibility. The review shows that only one of the identified studies mentions reproducibility and no study contains straightforward reproducible results. This motivates the second part of this work. We demonstrate the feasibility of reproducible cohort selection and survival prediction with SEER cancer data. Experiments are performed for 1-and 5-year survival of breast and lung cancer with cases diagnosed between 2004 and 2009. We compare minimal data preprocessing with 1-n encoding of categorical inputs and apply logistic regression and multilayer perceptron (MLP) models. Encoding with 1-n vectors proves beneficial throughout all experiments. For lung cancer, MLP models show a slightly superior performance. Moreover, importance of input attributes is analyzed with logistic regression weights and ablation analysis for MLPs.
引用总数
201920202021202220232024114544
学术搜索中的文章
S Hegselmann, L Gruelich, J Varghese, M Dugas - Machine Learning for Healthcare Conference, 2018