A large-scale comparative analysis of coding standard conformance in open-source data science projects

AJ Simmons, S Barnett, J Rivera-Villicana… - Proceedings of the 14th …, 2020 - dl.acm.org
Proceedings of the 14th ACM/IEEE International Symposium on Empirical …, 2020dl.acm.org
Background: Meeting the growing industry demand for Data Science requires cross-
disciplinary teams that can translate machine learning research into production-ready code.
Software engineering teams value adherence to coding standards as an indication of code
readability, maintainability, and developer expertise. However, there are no large-scale
empirical studies of coding standards focused specifically on Data Science projects. Aims:
This study investigates the extent to which Data Science projects follow code standards. In …
Background
Meeting the growing industry demand for Data Science requires cross-disciplinary teams that can translate machine learning research into production-ready code. Software engineering teams value adherence to coding standards as an indication of code readability, maintainability, and developer expertise. However, there are no large-scale empirical studies of coding standards focused specifically on Data Science projects.
Aims
This study investigates the extent to which Data Science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects?
Method
We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity.
Results
Data Science projects suffer from a significantly higher rate of functions that use an excessive numbers of parameters and local variables. Data Science projects also follow different variable naming conventions to non-Data Science projects.
Conclusions
The differences indicate that Data Science codebases are distinct from traditional software codebases and do not follow traditional software engineering conventions. Our conjecture is that this may be because traditional software engineering conventions are inappropriate in the context of Data Science projects.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果