[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

A primer in BERTology: What we know about how BERT works

A Rogers, O Kovaleva, A Rumshisky - Transactions of the Association …, 2021 - direct.mit.edu
Transformer-based models have pushed state of the art in many areas of NLP, but our
understanding of what is behind their success is still limited. This paper is the first survey of …

Bidirectional language modeling: a systematic literature review

M Shah Jahan, HU Khan, S Akbar… - Scientific …, 2021 - Wiley Online Library
In transfer learning, two major activities, ie, pretraining and fine‐tuning, are carried out to
perform downstream tasks. The advent of transformer architecture and bidirectional …

A comparison of svm against pre-trained language models (plms) for text classification tasks

Y Wahba, N Madhavji, J Steinbacher - International Conference on …, 2022 - Springer
The emergence of pre-trained language models (PLMs) has shown great success in many
Natural Language Processing (NLP) tasks including text classification. Due to the minimal to …

Investigating the difference of fake news source credibility recognition between ANN and BERT algorithms in artificial intelligence

THC Chiang, CS Liao, WC Wang - Applied Sciences, 2022 - mdpi.com
Fake news permeating life through channels misleads people into disinformation. To reduce
the harm of fake news and provide multiple and effective news credibility channels, the …

Less is more: Pruning BERTweet architecture in Twitter sentiment analysis

R Moura, J Carvalho, A Plastino, A Paes - Information Processing & …, 2024 - Elsevier
Transformer-based models have been scaled up to account for absorbing more information
and improve their performances. However, several studies have called attention to their …

Tuning Language Models by Mixture-of-Depths Ensemble

H Luo, L Specia - arXiv preprint arXiv:2410.13077, 2024 - arxiv.org
Transformer-based Large Language Models (LLMs) traditionally rely on final-layer loss for
training and final-layer representations for predictions, potentially overlooking the predictive …

Mitigating Hallucination Issues in Small-Parameter LLMs through Inter-Layer Contrastive Decoding

F Li, P Zhang - … Joint Conference on Neural Networks (IJCNN …, 2024 - ieeexplore.ieee.org
In this paper, we introduce a new decoding method to mitigate the issue of hallucinations in
Large Language Models (LLMs). Specifically, our method dynamically selects appropriate …

GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method

N Peinelt, M Rei, M Liakata - arXiv preprint arXiv:2010.12532, 2020 - arxiv.org
Large pre-trained language models such as BERT have been the driving force behind
recent improvements across many NLP tasks. However, BERT is only trained to predict …

[PDF][PDF] AI Open

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - nlp.csai.tsinghua.edu.cn
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …