A comparison of random forest variable selection methods for classification prediction modeling

JL Speiser, ME Miller, J Tooze, E Ip - Expert systems with applications, 2019 - Elsevier
Random forest classification is a popular machine learning method for developing prediction
models in many research settings. Often in prediction modeling, a goal is to reduce the …

Hyperparameters and tuning strategies for random forest

P Probst, MN Wright… - … Reviews: data mining and …, 2019 - Wiley Online Library
The random forest (RF) algorithm has several hyperparameters that have to be set by the
user, for example, the number of observations drawn randomly for each tree and whether …

[HTML][HTML] Benchmark for filter methods for feature selection in high-dimensional classification data

A Bommert, X Sun, B Bischl, J Rahnenführer… - … Statistics & Data Analysis, 2020 - Elsevier
Feature selection is one of the most fundamental problems in machine learning and has
drawn increasing attention due to high-dimensional data sets emerging from different fields …

Tunability: Importance of hyperparameters of machine learning algorithms

P Probst, AL Boulesteix, B Bischl - Journal of Machine Learning Research, 2019 - jmlr.org
Modern supervised machine learning algorithms involve hyperparameters that have to be
set before running them. Options for setting hyperparameters are default values from the …

To tune or not to tune the number of trees in random forest

P Probst, AL Boulesteix - Journal of Machine Learning Research, 2018 - jmlr.org
The number of trees T in the random forest (RF) algorithm for supervised learning has to be
set by the user. It is unclear whether T should simply be set to the largest computationally …

Visualizing the feature importance for black box models

G Casalicchio, C Molnar, B Bischl - … 10–14, 2018, Proceedings, Part I 18, 2019 - Springer
In recent years, a large amount of model-agnostic methods to improve the transparency,
trustability, and interpretability of machine learning models have been developed. Based on …

Model-agnostic feature importance and effects with dependent features: a conditional subgroup approach

C Molnar, G König, B Bischl, G Casalicchio - Data Mining and Knowledge …, 2024 - Springer
The interpretation of feature importance in machine learning models is challenging when
features are dependent. Permutation feature importance (PFI) ignores such dependencies …

Automl in the age of large language models: Current challenges, future opportunities and risks

A Tornede, D Deng, T Eimer, J Giovanelli… - arXiv preprint arXiv …, 2023 - arxiv.org
The fields of both Natural Language Processing (NLP) and Automated Machine Learning
(AutoML) have achieved remarkable results over the past years. In NLP, especially Large …

Openml benchmarking suites

B Bischl, G Casalicchio, M Feurer, P Gijsbers… - arXiv preprint arXiv …, 2017 - arxiv.org
Machine learning research depends on objectively interpretable, comparable, and
reproducible algorithm benchmarks. We advocate the use of curated, comprehensive suites …

Large-scale benchmark study of survival prediction methods using multi-omics data

M Herrmann, P Probst, R Hornung… - Briefings in …, 2021 - academic.oup.com
Multi-omics data, that is, datasets containing different types of high-dimensional molecular
variables, are increasingly often generated for the investigation of various diseases …