Mobilebert: Task-agnostic compression of bert by progressive knowledge transfer

X Qiu, T Sun, Y Xu, Y Shao, N Dai, X Huang - Science China …, 2020 - Springer

Recently, the emergence of pre-trained models (PTMs) has brought natural language
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …

被引用次数：1661 相关文章所有 9 个版本

[PDF] 159.226.43.17

[PDF][PDF] 知识蒸馏研究综述

黄震华，杨顺志，林威，倪娟，孙圣力，陈运文，汤庸 - 计算机学报, 2022 - 159.226.43.17

摘要高性能的深度学习网络通常是计算型和参数密集型的, 难以应用于资源受限的边缘设备.
为了能够在低资源设备上运行深度学习模型, 需要研发高效的小规模网络 …

被引用次数：10 相关文章所有 4 个版本

[PDF] neurips.cc

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers

W Wang, F Wei, L Dong, H Bao… - Advances in Neural …, 2020 - proceedings.neurips.cc

Pre-trained language models (eg, BERT (Devlin et al., 2018) and its variants) have achieved
remarkable success in varieties of NLP tasks. However, these models usually consist of …

被引用次数：981 相关文章所有 6 个版本

[PDF] arxiv.org

Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers

W Wang, H Bao, S Huang, L Dong, F Wei - arXiv preprint arXiv …, 2020 - arxiv.org

We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-
attention relation distillation for task-agnostic compression of pretrained Transformers. In …

被引用次数：182 相关文章所有 6 个版本

[PDF] neurips.cc

Dynabert: Dynamic bert with adaptive width and depth

L Hou, Z Huang, L Shang, X Jiang… - Advances in Neural …, 2020 - proceedings.neurips.cc

The pre-trained language models like BERT, though powerful in many natural language
processing tasks, are both computation and memory expensive. To alleviate this problem …

被引用次数：284 相关文章所有 6 个版本

[PDF] acm.org

Dnnfusion: accelerating deep neural networks execution with advanced operator fusion

W Niu, J Guan, Y Wang, G Agrawal, B Ren - Proceedings of the 42nd …, 2021 - dl.acm.org

Deep Neural Networks (DNNs) have emerged as the core enabler of many major
applications on mobile devices. To achieve high accuracy, DNN models have become …

被引用次数：117 相关文章所有 7 个版本

[PDF] arxiv.org

SqueezeBERT: What can computer vision teach NLP about efficient neural networks?

FN Iandola, AE Shaw, R Krishna… - arXiv preprint arXiv …, 2020 - arxiv.org

Humans read and write hundreds of billions of messages every day. Further, due to the
availability of large datasets, large computing systems, and better neural network models …

被引用次数：119 相关文章所有 3 个版本

[PDF] arxiv.org

On the effect of dropping layers of pre-trained transformer models

H Sajjad, F Dalvi, N Durrani, P Nakov - Computer Speech & Language, 2023 - Elsevier

Transformer-based NLP models are trained using hundreds of millions or even billions of
parameters, limiting their applicability in computationally constrained environments. While …

被引用次数：92 相关文章所有 7 个版本

[PDF] researchgate.net

Pre-trained embeddings for entity resolution: an experimental analysis

A Zeakis, G Papadakis, D Skoutas… - Proceedings of the VLDB …, 2023 - dl.acm.org

Many recent works on Entity Resolution (ER) leverage Deep Learning techniques involving
language models to improve effectiveness. This is applied to both main steps of ER, ie …

被引用次数：16 相关文章所有 3 个版本

[PDF] 159.226.43.17

[PDF][PDF] 深度学习中知识蒸馏研究综述

邵仁荣，刘宇昂，张伟，王骏 - 计算机学报, 2022 - 159.226.43.17

摘要在人工智能迅速发展的今天, 深度神经网络广泛应用于各个研究领域并取得了巨大的成功,
但也同样面临着诸多挑战. 首先, 为了解决复杂的问题和提高模型的训练效果 …

被引用次数：6 相关文章所有 5 个版本

高级搜索

QQ 群