Supertml: Two-dimensional word embedding for the precognition on structured tabular data

B Sun, L Yang, W Zhang, M Lin… - Proceedings of the …, 2019 - openaccess.thecvf.com
Tabular data is the most commonly used form of data in industry according to a Kaggle ML
and DS Survey. Gradient Boosting Trees, Support Vector Machine, Random Forest, and …

Relational data embeddings for feature enrichment with background information

A Cvetkov-Iliev, A Allauzen, G Varoquaux - Machine Learning, 2023 - Springer
For many machine-learning tasks, augmenting the data table at hand with features built from
external sources is key to improving performance. For instance, estimating housing prices …

Catch: Collaborative feature set search for automated feature engineering

G Lu, H Wang, S Yang, J Yuan, G Yang… - Proceedings of the …, 2023 - dl.acm.org
Feature engineering often plays a crucial role in building mining systems for tabular data,
which traditionally requires experienced human experts to perform. Thanks to the rapid …

Feature construction using explanations of individual predictions

B Vouk, M Guid, M Robnik-Šikonja - Engineering Applications of Artificial …, 2023 - Elsevier
Feature construction can contribute to comprehensibility and performance of machine
learning models. Unfortunately, it usually requires exhaustive search in the attribute space …

Supervised learning on relational databases with graph neural networks

M Cvitkovic - arXiv preprint arXiv:2002.02046, 2020 - arxiv.org
The majority of data scientists and machine learning practitioners use relational data in their
work [State of ML and Data Science 2017, Kaggle, Inc.]. But training machine learning …

Automated data science for relational data

HT Lam, B Buesser, H Min, TN Minh… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Feature engineering is a crucial but tedious task that requires up to 80% of the total time in
data science projects. A significant challenge is when data consists of tables from different …

GFS: Graph-based Feature Synthesis for Prediction over Relational Databases

H Zhang, Q Gan, D Wipf, W Zhang - arXiv preprint arXiv:2312.02037, 2023 - arxiv.org
Relational databases are extensively utilized in a variety of modern information system
applications, and they always carry valuable data patterns. There are a huge number of data …

[HTML][HTML] A roadmap for semi-automatically extracting predictive and clinically meaningful temporal features from medical data for predictive modeling

G Luo - Global transitions, 2019 - Elsevier
Predictive modeling based on machine learning with medical data has great potential to
improve healthcare and reduce costs. However, two hurdles, among others, impede its …

Empowering Machine Learning with Scalable Feature Engineering and Interpretable AutoML

H Eldeeb, R Elshawi - IEEE Transactions on Artificial …, 2024 - ieeexplore.ieee.org
Automated feature engineering has gained considerable attention in academia and industry.
Nevertheless, existing systems often lack practical scalability and efficiency. This paper …

Regularizing conjunctive features for classification

P Barceló, A Baumgartner, V Dalmau… - Proceedings of the 38th …, 2019 - dl.acm.org
We consider the feature-generation task wherein we are given a database with entities
labeled as positive and negative examples, and the goal is to find feature queries that allow …