Robust semi-supervised learning in open environments

LZ Guo, LH Jia, JJ Shao, YF Li - Frontiers of Computer Science, 2025 - Springer
Semi-supervised learning (SSL) aims to improve performance by exploiting unlabeled data
when labels are scarce. Conventional SSL studies typically assume close environments …

Better by default: Strong pre-tuned mlps and boosted trees on tabular data

D Holzmüller, L Grinsztajn, I Steinwart - arXiv preprint arXiv:2407.04491, 2024 - arxiv.org
For classification and regression on tabular data, the dominance of gradient-boosted
decision trees (GBDTs) has recently been challenged by often much slower deep learning …

Still More Shades of Null: An Evaluation Suite for Responsible Missing Value Imputation

FA Khan, D Herasymuk, N Protsiv… - arXiv preprint arXiv …, 2024 - arxiv.org
Data missingness is a practical challenge of sustained interest to the scientific community. In
this paper, we present Shades-of-Null, an evaluation suite for responsible missing value …

Optimizing Semantic Joinability in Heterogeneous Data: A Triplet-Based Approach with Pre-trained Deep Learning Models

MG Pedersen, BK Fazal, KS Kim - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
This paper presents a novel approach to optimizing semantic joinability in heterogeneous
data, leveraging embedding techniques and deep learning in the context of big data …

Leveraging Meteorological Predictions for Data-Driven Forecasting of Photovoltaic Power

LE Barreno Reyes - 2024 - diva-portal.org
Accurate forecasting of PV power generation is crucial for energy providers to effectively
participate in day-ahead (D-1) and two-day-ahead (D-2) markets, including FCR-D, day …

REVISITING NEAREST NEIGHBOR FOR TABULAR DATA: ADeep TABULAR BASELINE TWO DECADES LATER

ADTBTWOD LATER - openreview.net
The widespread enthusiasm for deep learning has recently expanded into the domain of
tabular data. Recognizing that the advancement in deep tabular methods is often inspired by …