A systematic review on language identification of code-mixed text: techniques, data availability, challenges, and framework development

AF Hidayatullah, A Qazi, DTC Lai, RA Apong - IEEE access, 2022 - ieeexplore.ieee.org
The mix of native language with other languages (code-mixing) in social media has posed a
severe challenge for language identification (LID) systems. It has encouraged research on …

Pre-processing tasks in Indonesian Twitter messages

AF Hidayatullah, MR Ma'Arif - Journal of Physics: Conference …, 2017 - iopscience.iop.org
Twitter text messages are very noisy. Moreover, tweet data are unstructured and
complicated enough. The focus of this work is to investigate pre-processing technique for …

Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation

Rianto, AB Mutiara, EP Wibowo, PI Santosa - Journal of Big Data, 2021 - Springer
Background Stemming has long been used in data pre-processing to retrieve information by
tracking affixed words back into their root. In an Indonesian setting, existing stemming …

Corpus creation and language identification for code-mixed Indonesian-Javanese-English Tweets

AF Hidayatullah, RA Apong, DTC Lai, A Qazi - PeerJ Computer Science, 2023 - peerj.com
With the massive use of social media today, mixing between languages in social media text
is prevalent. In linguistics, the phenomenon of mixing languages is known as code-mixing …

A model of preprocessing for social media data extraction

DZ Abidin, S Nurmaini, RF Malik… - 2019 International …, 2019 - ieeexplore.ieee.org
Tropical disease grows fast and requires detection. One source of data for detections is
social media Twitter. However, social media data has data with diverse data structures …

[HTML][HTML] Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation

AB Mutiara, EP Wibowo… - Journal of Big …, 2021 - journalofbigdata.springeropen.com
Stemming has long been used in data pre-processing to retrieve information by tracking
affixed words back into their root. In an Indonesian setting, existing stemming methods have …

SEMAR: An interface for Indonesian hate speech detection using machine learning

UAN Rohmawati, SW Sihwi… - … International Seminar on …, 2018 - ieeexplore.ieee.org
Hate Speech has become government and public's concern because of the high number of
hate speech cases on social media that occur in Indonesia, which are getting increased in …

Sentiment analysis on Bahasa Indonesia tweets using Unibigram models and machine learning techniques

BH Iswanto, V Poerwoto - IOP Conference Series: Materials …, 2018 - iopscience.iop.org
Sentiment analysis on English tweets has its challenges. In addition to frequent use of the
informal language, the words used are usually less consistent, contain abbreviations, and …

A combination of query expansion ranking and ga-svm for improving indonesian sentiment classification performance

PH Prastyo, I Ardiyanto, R Hidayat - Procedia Computer Science, 2021 - Elsevier
The sentiment classification method is a research field that is proliferating in Indonesia since
it is fast in extracting public opinion and provides essential and valuable information for …

[PDF][PDF] Improving the Accuracy of Text Classification using Stemming Method, A Case of Informal Indonesian Conversation

R Rianto, AB Mutiara, EP Wibowo… - Journal of Big …, 2020 - scholar.archive.org
As social beings, humans always interact with one another using either verbal or non-verbal
language. Language is an arbitrary sound-symbol system, which is used by members of a …