Vocabulary learning via optimal transport for neural machine translation

J Xu, H Zhou, C Gan, Z Zheng, L Li - arXiv preprint arXiv:2012.15671, 2020 - arxiv.org
The choice of token vocabulary affects the performance of machine translation. This paper
aims to figure out what is a good vocabulary and whether one can find the optimal …

Online multilingual hate speech detection: experimenting with Hindi and English social media

N Vashistha, A Zubiaga - Information, 2020 - mdpi.com
The last two decades have seen an exponential increase in the use of the Internet and
social media, which has changed basic human interaction. This has led to many positive …

[PDF][PDF] Text analysis for psychology: Methods, principles, and practices

B Kennedy, A Ashokkumar, RL Boyd, M Dehghani - 2021 - psyarxiv.com
Due to the explosion of new sources of human language data and the rapid progression of
computational methods for extracting meaning from natural language, language analysis is …

[HTML][HTML] Positionless aspect based sentiment analysis using attention mechanism

RK Yadav, L Jiao, M Goodwin, OC Granmo - Knowledge-Based Systems, 2021 - Elsevier
Aspect-based sentiment analysis (ABSA) aims at identifying fine-grained polarity of opinion
associated with a given aspect word. Several existing articles demonstrated promising …

Improving classifier training efficiency for automatic cyberbullying detection with feature density

J Eronen, M Ptaszynski, F Masui… - Information Processing …, 2021 - Elsevier
We study the effectiveness of Feature Density (FD) using different linguistically-backed
feature preprocessing methods in order to estimate dataset complexity, which in turn is used …

Fall of Giants: How popular text-based MLaaS fall against a simple evasion attack

L Pajola, M Conti - … IEEE European Symposium on Security and …, 2021 - ieeexplore.ieee.org
The increased demand for machine learning applications made companies offer Machine-
Learning-as-a-Service (MLaaS). In MLaaS (a market estimated 8000M USD by 2025), users …

Bridging the generalization gap in text-to-SQL parsing with schema expansion

C Zhao, Y Su, A Pauls, EA Platanios - Proceedings of the 60th …, 2022 - aclanthology.org
Text-to-SQL parsers map natural language questions to programs that are executable over
tables to generate answers, and are typically evaluated on large-scale datasets like Spider …

Benchmarking scalable predictive uncertainty in text classification

J Van Landeghem, M Blaschko, B Anckaert… - Ieee …, 2022 - ieeexplore.ieee.org
This paper explores the question of how predictive uncertainty methods perform in practice
in Natural Language Processing, specifically multi-class and multi-label text classification …

Codesc: A large code-description parallel dataset

M Hasan, T Muttaqueen, AA Ishtiaq, KS Mehrab… - arXiv preprint arXiv …, 2021 - arxiv.org
Translation between natural language and source code can help software development by
enabling developers to comprehend, ideate, search, and write computer programs in natural …

Game-theoretic vocabulary selection via the shapley value and banzhaf index

R Patel, M Garnelo, I Gemp, C Dyer… - Proceedings of the …, 2021 - aclanthology.org
The input vocabulary and the representations learned are crucial to the performance of
neural NLP models. Using the full vocabulary results in less explainable and more memory …