Development of an efficient method to detect mixed social media data with tamil-english code using machine learning techniques

S Fha, U Sharma, HMM Naleer - ACM Transactions on Asian and Low …, 2023 - dl.acm.org
ACM Transactions on Asian and Low-Resource Language Information Processing, 2023dl.acm.org
On social networking sites, online hate speech has become more prevalent due to the quick
expansion of mobile computing and Web technology. Previous research has found that
being exposed to Internet hate speech has substantial offline implications for historically
disadvantaged communities. Therefore, there is a lot of interest in research on automated
hate-based comment and post detection. Hate speech can have an influence on any
population group, but some are more vulnerable than others. From this background …
On social networking sites, online hate speech has become more prevalent due to the quick expansion of mobile computing and Web technology. Previous research has found that being exposed to Internet hate speech has substantial offline implications for historically disadvantaged communities. Therefore, there is a lot of interest in research on automated hate-based comment and post detection. Hate speech can have an influence on any population group, but some are more vulnerable than others. From this background, detecting and reporting such hate related comments and posts can help to avoid the harmful effects of hate speech. There are some studies available on this context and it was found that machine learning algorithms are more efficient in detecting abusive texts in social media. In this research, we applied selected seven machine learning algorithms such as Support Vector Machine (SVM), Naïve Bayes (NB), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Gradient Boost (GB) and K Nearest Neighbor (KNN) to detect hate speech and compare the performances of those algorithms to develop an ensemble model. Researchers collected and combined Tamil – English code-mixed hate speech tweets dataset which was created in HASOC. This dataset's tweets are divided into two groups: not offensive and offensive. This dataset includes 35,442 tweets. In this research, NB has obtained highest F1 scores in detecting offensive and not offensive tweets with highest weighted average. But SVM has obtained highest accuracy in detecting Tamil – English hate speech texts with 80% in 10-fold cross-validation. Based on the stand-alone performances, researchers developed two ensemble classifiers including max-voting and averaging ensemble. Averaging ensemble classification obtained 90.67% in accuracy. The research study's findings are significant because these results can be applied as a model for Tamil – English code-mixed hate speech to evaluate future research works using various algorithms for identifying hate contents more accurately and professionally.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果