The impact of combining Arabic sarcasm detection datasets on the performance of BERT-based model

R Obeidat, A Bashayreh… - 2022 13th International …, 2022 - ieeexplore.ieee.org
2022 13th International Conference on Information and …, 2022ieeexplore.ieee.org
Sarcasm Detection (SD) is the task of predicting sarcasm in text. It is crucial to the success of
sentiment analysis since recognizing both the literal and figurative meaning of the text is
essential to understanding users' opinions on various topics on social media. However, to
date, the limited availability of annotated data for Arabic Sarcasm Detection obstructs
building robust models. One way to overcome this issue is by collecting more labeled data.
However, data annotation is time and effort-consuming. One possibility is to explore …
Sarcasm Detection (SD) is the task of predicting sarcasm in text. It is crucial to the success of sentiment analysis since recognizing both the literal and figurative meaning of the text is essential to understanding users’ opinions on various topics on social media. However, to date, the limited availability of annotated data for Arabic Sarcasm Detection obstructs building robust models. One way to overcome this issue is by collecting more labeled data. However, data annotation is time and effort-consuming. One possibility is to explore combining several available training sets to obtain one more extensive training set without additional effort. This paper investigates the impact of combining the training data from different datasets on the performance of Bert-based SD models. We have chosen ArSarcasm and IDAT datasets as they are relatively recent and vary in size, class distribution, and the annotation approaches employed during data collection. ArSarcasm is a comparatively large corpus labeled manually and has a skewed class distribution, while IDAT is a smaller dataset annotated by distance supervision with a nearly even class distribution. We define different merging scenarios to examine the impact of combining variant training data subsets of ArSarcasm and IDAT and to measure the effect of merging them on the Bert-based SD model evaluated on the testing set of each of ArSarcasm and IDAT individually. The results show that merging datasets could enhance the sarcasm detection model’s performance if the merging process improves the class balance and reduces the discrepancy between the domains resulting from the divergent vocabulary and the inconsistent labeling.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果