Towards language-independent brown build detection

D Olewicki, M Nayrolles, B Adams - Proceedings of the 44th International …, 2022 - dl.acm.org
Proceedings of the 44th International Conference on Software Engineering, 2022dl.acm.org
In principle, continuous integration (CI) practices allow modern software organizations to
build and test their products after each code change to detect quality issues as soon as
possible. In reality, issues with the build scripts (eg, missing dependencies) and/or the
presence of" flaky tests" lead to build failures that essentially are false positives, not
indicative of actual quality problems of the source code. For our industrial partner, which is
active in the video game industry, such" brown builds" not only require multidisciplinary …
In principle, continuous integration (CI) practices allow modern software organizations to build and test their products after each code change to detect quality issues as soon as possible. In reality, issues with the build scripts (e.g., missing dependencies) and/or the presence of "flaky tests" lead to build failures that essentially are false positives, not indicative of actual quality problems of the source code. For our industrial partner, which is active in the video game industry, such "brown builds" not only require multidisciplinary teams to spend more effort interpreting or even re-running the build, leading to substantial redundant build activity, but also slows down the integration pipeline. Hence, this paper aims to prototype and evaluate approaches for early detection of brown build results based on textual similarity to build logs of prior brown builds. The approach is tested on 7 projects (6 closed-source from our industrial collaborators and 1 open-source, Graphviz). We find that our model manages to detect brown builds with a mean F1-score of 53% on the studied projects, which is three times more than the best baseline considered, and at least as good as human experts (but with less effort). Furthermore, we found that cross-project prediction can be used for a project's onboarding phase, that a training set of 30-weeks works best, and that our retraining heuristics keep the F1-score higher than the baseline, while retraining only every 4--5 weeks.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果