The art and practice of data science pipelines: A comprehensive study of data science pipelines in theory, in-the-small, and in-the-large

S Biswas, M Wardat, H Rajan - … of the 44th International Conference on …, 2022 - dl.acm.org
Increasingly larger number of software systems today are including data science
components for descriptive, predictive, and prescriptive analytics. The collection of data …

Min-max optimization without gradients: Convergence and applications to black-box evasion and poisoning attacks

S Liu, S Lu, X Chen, Y Feng, K Xu… - International …, 2020 - proceedings.mlr.press
In this paper, we study the problem of constrained min-max optimization in a black-box
setting, where the desired optimizer cannot access the gradients of the objective function but …

How much automation does a data scientist want?

D Wang, QV Liao, Y Zhang, U Khurana… - arXiv preprint arXiv …, 2021 - arxiv.org
Data science and machine learning (DS/ML) are at the heart of the recent advancements of
many Artificial Intelligence (AI) applications. There is an active research thread in AI,\autoai …

Workflow analysis of data science code in public GitHub repositories

D Ramasamy, C Sarasua, A Bacchelli… - Empirical Software …, 2023 - Springer
Despite the ubiquity of data science, we are far from rigorously understanding how coding in
data science is performed. Even though the scientific literature has hinted at the iterative and …

Teaching data science through storytelling: Improving undergraduate data literacy

Y Li, Y Wang, Y Lee, H Chen, AN Petri, T Cha - Thinking Skills and …, 2023 - Elsevier
This study proposes and evaluates the OCEL. AI (Open Collaborative Experiential Learning.
AI) paradigm that aims at broadening participation in data science education and enhancing …

Trust Your Gut: Comparing Human and Machine Inference from Noisy Visualizations

R Koonchanok, ME Papka… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
People commonly utilize visualizations not only to examine a given dataset, but also to draw
generalizable conclusions about the underlying models or phenomena. Prior research has …

[PDF][PDF] Instance-level metalearning for outlier detection

L Vu, P Kirchner, C Aggarwal, H Samulowitz - … Joint Conference on Artificial …, 2024 - ijcai.org
A machine learning task can be viewed as a sequential pipeline of different algorithmic
choices, including data preprocessing, model selection, and hyper-parameter tuning …

Automated data science for relational data

HT Lam, B Buesser, H Min, TN Minh… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Feature engineering is a crucial but tedious task that requires up to 80% of the total time in
data science projects. A significant challenge is when data consists of tables from different …

Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering …

Q Zhu, D Wang, S Ma, AY Wang, Z Chen… - Proceedings of the …, 2024 - dl.acm.org
As AI technology continues to advance, the importance of human-AI collaboration becomes
increasingly evident, with numerous studies exploring its potential in various fields. One vital …

Toward building edge learning pipelines

A Gounaris, AV Michailidou… - IEEE Internet …, 2023 - ieeexplore.ieee.org
From a bird's eye point of view, large-scale data analytics workflows, eg, those executed in
popular tools, such as Apache Spark and Flink, are typically represented by directed acyclic …