作者
Mehdi Golzadeh, Alexandre Decan, Eleni Constantinou, Tom Mens
发表日期
2021/6/4
研讨会论文
2021 IEEE/ACM Third International Workshop on Bots in Software Engineering (BotSE)
页码范围
21-25
出版商
IEEE
简介
Development bots are used on Github to automate repetitive activities. Such bots communicate with human actors via issue comments and pull request comments. Identifying such bot comments allows to prevent bias in socio-technical studies related to software development. To automate their identification, we propose a classification model based on natural language processing. Starting from a balanced ground-truth dataset of 19,282 PR and issue comments, we encode the comments as vectors using a combination of the bag of words and TF-IDF techniques. We train a range of binary classifiers to predict the type of comment (human or bot) based on this vector representation. A multinomial Naive Bayes classifier provides the best results. Its performance on a test set containing 50% of the data achieves an average precision, recall, and F1 score of 0.88. Although the model shows a promising result on the pull …
引用总数
学术搜索中的文章
M Golzadeh, A Decan, E Constantinou, T Mens - 2021 IEEE/ACM Third International Workshop on Bots …, 2021