KGTorrent: A dataset of python jupyter notebooks from kaggle

L Quaranta, F Calefato… - 2021 IEEE/ACM 18th …, 2021 - ieeexplore.ieee.org
Computational notebooks have become the tool of choice for many data scientists and
practitioners for performing analyses and disseminating results. Despite their increasing …

Exploring how deprecated python library apis are (not) handled

J Wang, L Li, K Liu, H Cai - Proceedings of the 28th acm joint meeting on …, 2020 - dl.acm.org
In this paper, we present the first exploratory study of deprecated Python library APIs to
understand the status quo of API deprecation in the realm of Python libraries. Specifically …

Computational reproducibility of Jupyter notebooks from biomedical publications

S Samuel, D Mietchen - GigaScience, 2024 - academic.oup.com
Background Jupyter notebooks facilitate the bundling of executable code with its
documentation and output in one interactive environment, and they represent a popular …

Eliciting best practices for collaboration with computational notebooks

L Quaranta, F Calefato, F Lanubile - Proceedings of the ACM on Human …, 2022 - dl.acm.org
Despite the widespread adoption of computational notebooks, little is known about best
practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a …

Corpus-based discourse analysis: from meta-reflection to accountability

M Bednarek, M Schweinberger… - Corpus Linguistics and …, 2024 - degruyter.com
Recent years have seen an increase in data and method reflection in corpus-based
discourse analysis. In this article, we first take stock of some of the issues arising from such …

A static analysis framework for data science notebooks

P Subotić, L Milikić, M Stojić - … of the 44th International Conference on …, 2022 - dl.acm.org
Notebooks provide an interactive environment for programmers to develop code, analyse
data and inject interleaved visualisations in a single environment. Despite their flexibility, a …

Repeatability, Reproducibility, Replicability, Reusability (4R) in Journals' Policies and Software/Data Management in Scientific Publications: A Survey, Discussion, and …

JA Hernández, M Colom - arXiv preprint arXiv:2312.11028, 2023 - arxiv.org
With the recognized crisis of credibility in scientific research, there is a growth of
reproducibility studies in computer science, and although existing surveys have reviewed …

Error identification strategies for Python Jupyter notebooks

D Robinson, NA Ernst, EL Vargas… - Proceedings of the 30th …, 2022 - dl.acm.org
Computational notebooks---such as Jupyter or Colab---combine text and data analysis code.
They have become ubiquitous in the world of data science and exploratory data analysis …

Detecting and explaining Python name errors

J Wang, L Li, K Liu, X Du - Information and Software Technology, 2024 - Elsevier
Python has become one of the most popular programming languages nowadays but has not
received enough attention from the software engineering community. Many errors, either …

Shifting left for early detection of machine-learning bugs

B Liblit, L Luo, A Molina, R Mukherjee… - … Symposium on Formal …, 2023 - Springer
Computational notebooks are widely used for machine learning (ML). However, notebooks
raise new correctness concerns beyond those found in traditional programming …