NCBI disease corpus: a resource for disease name recognition and concept normalization

S Tian, Q Jin, L Yeganova, PT Lai, Q Zhu… - Briefings in …, 2024 - academic.oup.com

ChatGPT has drawn considerable attention from both the general public and domain experts
with its remarkable text generation capabilities. This has subsequently led to the emergence …

被引用次数：75 相关文章所有 13 个版本

[PDF] arxiv.org

A survey of deep active learning

P Ren, Y Xiao, X Chang, PY Huang, Z Li… - ACM computing …, 2021 - dl.acm.org

Active learning (AL) attempts to maximize a model's performance gain while annotating the
fewest samples possible. Deep learning (DL) is greedy for data and requires a large amount …

被引用次数：1100 相关文章所有 8 个版本

[PDF] arxiv.org

Linkbert: Pretraining language models with document links

M Yasunaga, J Leskovec, P Liang - arXiv preprint arXiv:2203.15827, 2022 - arxiv.org

Language model (LM) pretraining can learn various knowledge from text corpora, helping
downstream tasks. However, existing methods such as BERT model a single document, and …

被引用次数：270 相关文章所有 11 个版本

[PDF] arxiv.org

Domain-specific language model pretraining for biomedical natural language processing

Y Gu, R Tinn, H Cheng, M Lucas, N Usuyama… - ACM Transactions on …, 2021 - dl.acm.org

Pretraining large neural language models, such as BERT, has led to impressive gains on
many natural language processing (NLP) tasks. However, most pretraining efforts focus on …

被引用次数：1618 相关文章所有 5 个版本

[PDF] arxiv.org

Does synthetic data generation of llms help clinical text mining?

R Tang, X Han, X Jiang, X Hu - arXiv preprint arXiv:2303.04360, 2023 - arxiv.org

Recent advancements in large language models (LLMs) have led to the development of
highly potent models like OpenAI's ChatGPT. These models have exhibited exceptional …

被引用次数：103 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of knowledge enhanced pre-trained language models

L Hu, Z Liu, Z Zhao, L Hou, L Nie… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Pre-trained Language Models (PLMs) which are trained on large text corpus via self-
supervised learning method, have yielded promising performance on various tasks in …

被引用次数：93 相关文章所有 8 个版本

[PDF] arxiv.org

SciBERT: A pretrained language model for scientific text

I Beltagy, K Lo, A Cohan - arXiv preprint arXiv:1903.10676, 2019 - arxiv.org

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging
and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin …

被引用次数：3225 相关文章所有 8 个版本

[PDF] arxiv.org

Self-alignment pretraining for biomedical entity representations

F Liu, E Shareghi, Z Meng, M Basaldella… - arXiv preprint arXiv …, 2020 - arxiv.org

Despite the widespread success of self-supervised learning via masked language models
(MLM), accurately capturing fine-grained semantic relationships in the biomedical domain …

被引用次数：265 相关文章所有 9 个版本

[PDF] oup.com

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

J Lee, W Yoon, S Kim, D Kim, S Kim, CH So… - …, 2020 - academic.oup.com

Motivation Biomedical text mining is becoming increasingly important as the number of
biomedical documents rapidly grows. With the progress in natural language processing …

被引用次数：5577 相关文章所有 14 个版本

[PDF] arxiv.org

S2ORC: The semantic scholar open research corpus

K Lo, LL Wang, M Neumann, R Kinney… - arXiv preprint arXiv …, 2019 - arxiv.org

We introduce S2ORC, a large corpus of 81.1 M English-language academic papers
spanning many academic disciplines. The corpus consists of rich metadata, paper abstracts …

被引用次数：507 相关文章所有 10 个版本

高级搜索

QQ 群