Generating and imputing tabular data via diffusion and flow-based gradient-boosted trees

A Jolicoeur-Martineau, K Fatras… - International …, 2024 - proceedings.mlr.press
Tabular data is hard to acquire and is subject to missing values. This paper introduces a
novel approach for generating and imputing mixed-type (continuous and categorical) tabular …

Fractional norms and quasinorms do not help to overcome the curse of dimensionality

EM Mirkes, J Allohibi, A Gorban - Entropy, 2020 - mdpi.com
The curse of dimensionality causes the well-known and widely discussed problems for
machine learning methods. There is a hypothesis that using the Manhattan distance and …

Tuning kernel parameters for SVM based on expected square distance ratio

S Yin, J Yin - Information Sciences, 2016 - Elsevier
The performance of a support vector machine (SVM) depends highly on the selection of the
kernel function type and relevant parameters. To choose the kernel parameters properly …

Nonlinear dimensionality reduction with missing data using parametric multiple imputations

C De Bodt, D Mulders, M Verleysen… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
Dimensionality reduction (DR) aims at faithfully and meaningfully representing high-
dimensional (HD) data into a low-dimensional (LD) space. Recently developed neighbor …

Decision trees as partitioning machines to characterize their generalization properties

JS Leboeuf, F LeBlanc… - Advances in neural …, 2020 - proceedings.neurips.cc
Decision trees are popular machine learning models that are simple to build and easy to
interpret. Even though algorithms to learn decision trees date back to almost 50 years, key …

Scaling Up Diffusion and Flow-based XGBoost Models

JC Cresswell, T Kim - arXiv preprint arXiv:2408.16046, 2024 - arxiv.org
Novel machine learning methods for tabular data generation are often developed on small
datasets which do not match the scale required for scientific applications. We investigate a …

The use of Lorentzian distance metric in classification problems

Y Kerimbekov, HŞ Bilge, HH Uğurlu - Pattern Recognition Letters, 2016 - Elsevier
In this paper, we introduce Lorentzian distance metric into classification problem. Here we
benefit from the interesting properties of the distance metric of the Lorentzian space. A …

Generalization properties of decision trees on real-valued and categorical features

JS Leboeuf, F LeBlanc, M Marchand - arXiv preprint arXiv:2210.10781, 2022 - arxiv.org
We revisit binary decision trees from the perspective of partitions of the data. We introduce
the notion of partitioning function, and we relate it to the growth function and to the VC …

[PDF][PDF] Extensive assessment of Barnes-Hut t-SNE.

C De Bodt, D Mulders, M Verleysen, JA Lee - ESANN, 2018 - researchgate.net
Stochastic Neighbor Embedding (SNE) and variants are dimensionality reduction (DR)
methods able to foil the curse of dimensionality to deliver outstanding experimental results …

A Game Theoretic Based K-Nearest Neighbor Approach for Binary Classification

RI Lung, MA Suciu - 2023 IEEE Symposium Series on …, 2023 - ieeexplore.ieee.org
K-nearest neighbor is one of the simplest and most intuitive binary classification methods
providing robust results on a wide range of data. However, classification results can be …