查看文章

mdpi.com 中的 [HTML]

A Mixed Malay–English Language COVID-19 Twitter Dataset: A Sentiment Analysis

作者

Jeffery TH Kong, Filbert H Juwono, Ik Ying Ngu, I Gde Dharma Nugraha, Yan Maraden, WK Wong

发表日期

2023/3/27

期刊

Big Data and Cognitive Computing

卷号

期号

页码范围

出版商

MDPI

简介

Social media has evolved into a platform for the dissemination of information, including fake news. There is a lot of false information about the current situation of the Coronavirus Disease 2019 (COVID-19) pandemic, such as false information regarding vaccination. In this paper, we focus on sentiment analysis for Malaysian COVID-19-related news on social media such as Twitter. Tweets in Malaysia are often a combination of Malay, English, and Chinese with plenty of short forms, symbols, emojis, and emoticons within the maximum length of a tweet. The contributions of this paper are twofold. Firstly, we built a multilingual COVID-19 Twitter dataset, comprising tweets written from 1 September 2021 to 12 December 2021. In particular, we collected 108,246 tweets, with over 67% in Malay language, 27% in English, 2% in Chinese, and 4% in other languages. We then manually annotated and assigned the sentiment of 11,568 tweets into three-class sentiments (positive, negative, and neutral) to develop a Malay-language sentiment analysis tool. For this purpose, we applied a data compression method using Byte-Pair Encoding (BPE) on the texts and used two deep learning approaches, i.e., the Multilingual Bidirectional Encoder Representation for Transformer (M-BERT) and convolutional neural network (CNN). BPE tokenization is used to encode rare and unknown words into smaller meaningful subwords. With the CNN, we converted the labeled tweets into image files. Our experiments explored different BPE vocabulary sizes with our BPE-Text-to-Image-CNN and BPE-M-BERT models. The results show that the optimal vocabulary size for BPE is …

引用总数

被引用次数：4

202320242 2

学术搜索中的文章

A Mixed Malay–English Language COVID-19 Twitter Dataset: A Sentiment Analysis

JTH Kong, FH Juwono, IY Ngu, IGD Nugraha… - Big Data and Cognitive Computing, 2023

被引用次数：4 相关文章所有 6 个版本