FRUGAL: Unlocking semi-supervised learning for software analytics

H Tu, T Menzies - … 36th IEEE/ACM International Conference on …, 2021 - ieeexplore.ieee.org
Standard software analytics often involves having a large amount of data with labels in order
to commission models with acceptable performance. However, prior work has shown that …

How to find actionable static analysis warnings: A case study with FindBugs

R Yedida, HJ Kang, H Tu, X Yang, D Lo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Automatically generated static code warnings suffer from a large number of false alarms.
Hence, developers only take action on a small percent of those warnings. To better predict …

When less is more: on the value of “co-training” for semi-supervised software defect predictors

S Majumder, J Chakraborty, T Menzies - Empirical Software Engineering, 2024 - Springer
Labeling a module defective or non-defective is an expensive task. Hence, there are often
limits on how much-labeled data is available for training. Semi-supervised classifiers use far …

Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project Health

A Lustosa, T Menzies - ACM Transactions on Software Engineering and …, 2024 - dl.acm.org
When data is scarce, software analytics can make many mistakes. For example, consider
learning predictors for open source project health (eg, the number of closed pull requests in …

What not to test (for cyber-physical systems)

X Ling, T Menzies - IEEE Transactions on Software …, 2023 - ieeexplore.ieee.org
For simulation-based systems, finding a set of test cases with the least cost by exploring
multiple goals is a complex task. Domain-specific optimization goals (eg, maximize output …

Trading Off Scalability, Privacy, and Performance in Data Synthesis

X Ling, T Menzies, C Hazard, J Shu, J Beel - IEEE Access, 2024 - ieeexplore.ieee.org
Synthetic data has been widely applied in the real world recently. One typical example is the
creation of synthetic data for privacy concerned datasets. In this scenario, synthetic data …

Predicting health indicators for open source projects (using hyperparameter optimization)

T Xia, W Fu, R Shu, R Agrawal, T Menzies - Empirical Software …, 2022 - Springer
Software developed on public platform is a source of data that can be used to make
predictions about those projects. While the individual developing activity may be random …

Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples

T Menzies, A Lustosa - arXiv preprint arXiv:2405.12920, 2024 - arxiv.org
This paper proposes a new challenge problem for software analytics. In the process we shall
call" software review", a panel of SMEs (subject matter experts) review examples of software …

Learning transfers via transfer learning

M Arifuzzaman, E Arslan - … on Innovating the Network for Data …, 2021 - ieeexplore.ieee.org
Detecting performance anomalies is key to efficiently utilize network resources and improve
the quality of service. Researchers proposed various approaches to identify the presence of …

On the Benefits of Semi-Supervised Test Case Generation for Simulation Models

X Ling, T Menzies - arXiv preprint arXiv:2305.03714, 2023 - arxiv.org
Testing complex simulation models can be expensive and time consuming. Current state-of-
the-art methods that explore this problem are fully-supervised; ie they require that all …