Datamafia at wnut-2020 task 2: A study of pre-trained language models along with regularization techniques for downstream tasks

A Sengupta - Proceedings of the Sixth Workshop on Noisy User …, 2020 - aclanthology.org
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT …, 2020aclanthology.org
This document describes the system description developed by team datamafia at WNUT-
2020 Task 2: Identification of informative COVID-19 English Tweets. This paper contains a
thorough study of pre-trained language models on downstream binary classification task
over noisy user generated Twitter data. The solution submitted to final test leaderboard is a
fine tuned RoBERTa model which achieves F1 score of 90.8% and 89.4% on the dev and
test data respectively. In the later part, we explore several techniques for injecting …
Abstract
This document describes the system description developed by team datamafia at WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. This paper contains a thorough study of pre-trained language models on downstream binary classification task over noisy user generated Twitter data. The solution submitted to final test leaderboard is a fine tuned RoBERTa model which achieves F1 score of 90.8% and 89.4% on the dev and test data respectively. In the later part, we explore several techniques for injecting regularization explicitly into language models to generalize predictions over noisy data. Our experiments show that adding regularizations to RoBERTa pre-trained model can be very robust to data and annotation noises and can improve overall performance by more than 1.2%.
aclanthology.org
以上显示的是最相近的搜索结果。 查看全部搜索结果