The best of both worlds: Combining recent advances in neural machine translation

P Ren, Y Xiao, X Chang, PY Huang, Z Li… - ACM Computing …, 2021 - dl.acm.org

Deep learning has made substantial breakthroughs in many fields due to its powerful
automatic representation capabilities. It has been proven that neural architecture design is …

被引用次数：627 相关文章所有 11 个版本

[PDF] ieee.org

A survey of the usages of deep learning for natural language processing

DW Otter, JR Medina, JK Kalita - IEEE transactions on neural …, 2020 - ieeexplore.ieee.org

Over the last several years, the field of natural language processing has been propelled
forward by an explosion in the use of deep learning models. This article provides a brief …

被引用次数：1789 相关文章所有 8 个版本

[PDF] arxiv.org

Is chatgpt a good recommender? a preliminary study

J Liu, C Liu, P Zhou, R Lv, K Zhou, Y Zhang - arXiv preprint arXiv …, 2023 - arxiv.org

Recommendation systems have witnessed significant advancements and have been widely
used over the past decades. However, most traditional recommendation methods are task …

被引用次数：164 相关文章所有 2 个版本

[PDF] thecvf.com

Emerging properties in self-supervised vision transformers

M Caron, H Touvron, I Misra, H Jégou… - Proceedings of the …, 2021 - openaccess.thecvf.com

In this paper, we question if self-supervised learning provides new properties to Vision
Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the …

被引用次数：4477 相关文章所有 16 个版本

[PDF] openreview.net

Resmlp: Feedforward networks for image classification with data-efficient training

H Touvron, P Bojanowski, M Caron… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image
classification. It is a simple residual network that alternates (i) a linear layer in which image …

被引用次数：505 相关文章所有 10 个版本

[PDF] mlr.press

[PDF][PDF] Is space-time attention all you need for video understanding?

G Bertasius, H Wang, L Torresani - ICML, 2021 - proceedings.mlr.press

Training. We train our model for 15 epochs with an initial learning rate of 0.005, which is
divided by 10 at epochs 11, and 14. During training, we first resize the shorter side of the …

被引用次数：1860 相关文章所有 4 个版本

[PDF] mlr.press

Branchformer: Parallel mlp-attention architectures to capture local and global context for speech recognition and understanding

Y Peng, S Dalmia, I Lane… - … Conference on Machine …, 2022 - proceedings.mlr.press

Conformer has proven to be effective in many speech processing tasks. It combines the
benefits of extracting local dependencies using convolutions and global dependencies …

被引用次数：110 相关文章所有 8 个版本

[PDF] arxiv.org

Rethinking attention with performers

K Choromanski, V Likhosherstov, D Dohan… - arXiv preprint arXiv …, 2020 - arxiv.org

We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …

被引用次数：1451 相关文章所有 8 个版本

Adapted large language models can outperform medical experts in clinical text summarization

D Van Veen, C Van Uden, L Blankemeier… - Nature medicine, 2024 - nature.com

Analyzing vast textual data and summarizing key information from electronic health records
imposes a substantial burden on how clinicians allocate their time. Although large language …

被引用次数：48 相关文章所有 3 个版本

[PDF] arxiv.org

Action transformer: A self-attention model for short-time pose-based human action recognition

V Mazzia, S Angarano, F Salvetti, F Angelini… - Pattern Recognition, 2022 - Elsevier

Deep neural networks based purely on attention have been successful across several
domains, relying on minimal architectural priors from the designer. In Human Action …

被引用次数：182 相关文章所有 12 个版本

高级搜索

QQ 群