A comprehensive survey of neural architecture search: Challenges and solutions

P Ren, Y Xiao, X Chang, PY Huang, Z Li… - ACM Computing …, 2021 - dl.acm.org
Deep learning has made substantial breakthroughs in many fields due to its powerful
automatic representation capabilities. It has been proven that neural architecture design is …

A survey of the usages of deep learning for natural language processing

DW Otter, JR Medina, JK Kalita - IEEE transactions on neural …, 2020 - ieeexplore.ieee.org
Over the last several years, the field of natural language processing has been propelled
forward by an explosion in the use of deep learning models. This article provides a brief …

Is chatgpt a good recommender? a preliminary study

J Liu, C Liu, P Zhou, R Lv, K Zhou, Y Zhang - arXiv preprint arXiv …, 2023 - arxiv.org
Recommendation systems have witnessed significant advancements and have been widely
used over the past decades. However, most traditional recommendation methods are task …

Emerging properties in self-supervised vision transformers

M Caron, H Touvron, I Misra, H Jégou… - Proceedings of the …, 2021 - openaccess.thecvf.com
In this paper, we question if self-supervised learning provides new properties to Vision
Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the …

Resmlp: Feedforward networks for image classification with data-efficient training

H Touvron, P Bojanowski, M Caron… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image
classification. It is a simple residual network that alternates (i) a linear layer in which image …

[PDF][PDF] Is space-time attention all you need for video understanding?

G Bertasius, H Wang, L Torresani - ICML, 2021 - proceedings.mlr.press
Training. We train our model for 15 epochs with an initial learning rate of 0.005, which is
divided by 10 at epochs 11, and 14. During training, we first resize the shorter side of the …

Branchformer: Parallel mlp-attention architectures to capture local and global context for speech recognition and understanding

Y Peng, S Dalmia, I Lane… - … Conference on Machine …, 2022 - proceedings.mlr.press
Conformer has proven to be effective in many speech processing tasks. It combines the
benefits of extracting local dependencies using convolutions and global dependencies …

Rethinking attention with performers

K Choromanski, V Likhosherstov, D Dohan… - arXiv preprint arXiv …, 2020 - arxiv.org
We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …

Adapted large language models can outperform medical experts in clinical text summarization

D Van Veen, C Van Uden, L Blankemeier… - Nature medicine, 2024 - nature.com
Analyzing vast textual data and summarizing key information from electronic health records
imposes a substantial burden on how clinicians allocate their time. Although large language …

Action transformer: A self-attention model for short-time pose-based human action recognition

V Mazzia, S Angarano, F Salvetti, F Angelini… - Pattern Recognition, 2022 - Elsevier
Deep neural networks based purely on attention have been successful across several
domains, relying on minimal architectural priors from the designer. In Human Action …