- 学术资源搜索

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

被引用次数：921 相关文章所有 9 个版本

[PDF] mit.edu

A primer in BERTology: What we know about how BERT works

A Rogers, O Kovaleva, A Rumshisky - Transactions of the Association …, 2021 - direct.mit.edu

Transformer-based models have pushed state of the art in many areas of NLP, but our
understanding of what is behind their success is still limited. This paper is the first survey of …

被引用次数：1817 相关文章所有 12 个版本

[PDF] wiley.com Full View

Bidirectional language modeling: a systematic literature review

M Shah Jahan, HU Khan, S Akbar… - Scientific …, 2021 - Wiley Online Library

In transfer learning, two major activities, ie, pretraining and fine‐tuning, are carried out to
perform downstream tasks. The advent of transformer architecture and bidirectional …

被引用次数：14 相关文章所有 9 个版本

[PDF] arxiv.org

A comparison of svm against pre-trained language models (plms) for text classification tasks

Y Wahba, N Madhavji, J Steinbacher - International Conference on …, 2022 - Springer

The emergence of pre-trained language models (PLMs) has shown great success in many
Natural Language Processing (NLP) tasks including text classification. Due to the minimal to …

被引用次数：27 相关文章所有 7 个版本

[PDF] mdpi.com

Investigating the difference of fake news source credibility recognition between ANN and BERT algorithms in artificial intelligence

THC Chiang, CS Liao, WC Wang - Applied Sciences, 2022 - mdpi.com

Fake news permeating life through channels misleads people into disinformation. To reduce
the harm of fake news and provide multiple and effective news credibility channels, the …

被引用次数：7 相关文章所有 8 个版本

Less is more: Pruning BERTweet architecture in Twitter sentiment analysis

R Moura, J Carvalho, A Plastino, A Paes - Information Processing & …, 2024 - Elsevier

Transformer-based models have been scaled up to account for absorbing more information
and improve their performances. However, several studies have called attention to their …

被引用次数：3 相关文章

[PDF] arxiv.org

Tuning Language Models by Mixture-of-Depths Ensemble

H Luo, L Specia - arXiv preprint arXiv:2410.13077, 2024 - arxiv.org

Transformer-based Large Language Models (LLMs) traditionally rely on final-layer loss for
training and final-layer representations for predictions, potentially overlooking the predictive …

Mitigating Hallucination Issues in Small-Parameter LLMs through Inter-Layer Contrastive Decoding

F Li, P Zhang - … Joint Conference on Neural Networks (IJCNN …, 2024 - ieeexplore.ieee.org

In this paper, we introduce a new decoding method to mitigate the issue of hallucinations in
Large Language Models (LLMs). Specifically, our method dynamically selects appropriate …

[PDF] arxiv.org

GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method

N Peinelt, M Rei, M Liakata - arXiv preprint arXiv:2010.12532, 2020 - arxiv.org

Large pre-trained language models such as BERT have been the driving force behind
recent improvements across many NLP tasks. However, BERT is only trained to predict …

被引用次数：1 相关文章所有 2 个版本

[PDF] tsinghua.edu.cn

[PDF][PDF] AI Open

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - nlp.csai.tsinghua.edu.cn

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

高级搜索

QQ 群

[HTML][HTML] Pre-trained models: Past, present and future

A primer in BERTology: What we know about how BERT works

Bidirectional language modeling: a systematic literature review

A comparison of svm against pre-trained language models (plms) for text classification tasks

Investigating the difference of fake news source credibility recognition between ANN and BERT algorithms in artificial intelligence

Less is more: Pruning BERTweet architecture in Twitter sentiment analysis

Tuning Language Models by Mixture-of-Depths Ensemble

Mitigating Hallucination Issues in Small-Parameter LLMs through Inter-Layer Contrastive Decoding

GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method

[PDF][PDF] AI Open

引用