PTMTorrent: a dataset for mining open-source pre-trained model packages

W Jiang, N Synovic, P Jajal… - 2023 IEEE/ACM 20th …, 2023 - ieeexplore.ieee.org
Due to the cost of developing and training deep learning models from scratch, machine
learning engineers have begun to reuse pre-trained models (PTMs) and fine-tune them for …

Peatmoss: A dataset and initial analysis of pre-trained models in open-source software

W Jiang, J Yasmin, J Jones, N Synovic… - 2024 IEEE/ACM 21st …, 2024 - ieeexplore.ieee.org
The development and training of deep learning models have become increasingly costly
and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for …

On the differences between unit and integration testing in the travistorrent dataset

G Orellana, G Laghari, A Murgia… - 2017 IEEE/ACM 14th …, 2017 - ieeexplore.ieee.org
Already from the early days of testing, practitioners distinguish between unit tests and
integration tests as a strategy to locate defects. Unfortunately, the mining software …

Lean GHTorrent: GitHub data on demand

G Gousios, B Vasilescu, A Serebrenik… - Proceedings of the 11th …, 2014 - dl.acm.org
In recent years, GitHub has become the largest code host in the world, with more than 5M
developers collaborating across 10M repositories. Numerous popular open source projects …

Characterizing deep learning package supply chains in PyPI: Domains, clusters, and disengagement

K Gao, R He, B Xie, M Zhou - ACM Transactions on Software …, 2024 - dl.acm.org
Deep learning (DL) frameworks have become the cornerstone of the rapidly developing DL
field. Through installation dependencies specified in the distribution metadata, numerous …

Modelmine: a tool to facilitate mining models from open source repositories

SM Reza, O Badreddin, K Rahad - Proceedings of the 23rd ACM/IEEE …, 2020 - dl.acm.org
Mining Software Repositories (MSR) has opened up new pathways and rich sources of data
for research and practical purposes. This research discipline facilitates mining data from …

Conpan: a tool to analyze packages in software containers

A Zerouali, V Cosentino, G Robles… - 2019 IEEE/ACM 16th …, 2019 - ieeexplore.ieee.org
Deploying software packages and services into containers is a popular software
engineering practice that increases portability and reusability. Docker, the most popular …

MSR4ML: Reconstructing artifact traceability in machine learning repositories

AT Njomou, AJB Africa, B Adams… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
The increasing popularity of Machine Learning (ML) is generating challenges also for
developers. The multitude of programming languages, libraries and available resources …

What do developers know about machine learning: a study of ml discussions on stackoverflow

AA Bangash, H Sahar, S Chowdhury… - 2019 IEEE/ACM 16th …, 2019 - ieeexplore.ieee.org
Machine learning, a branch of Artificial Intelligence, is now popular in software engineering
community and is successfully used for problems like bug prediction, and software …

Crossflow: a framework for distributed mining of software repositories

D Kolovos, P Neubauer, K Barmpis… - 2019 IEEE/ACM 16th …, 2019 - ieeexplore.ieee.org
Large-scale software repository mining typically requires substantial storage and
computational resources, and often involves a large number of calls to (rate-limited) APIs …