Blockwise self-attention for long document understanding

F Catania, M Spitale, F Garzotto - ACM Computing Surveys, 2023 - dl.acm.org

Neurodevelopmental Disorders (NDD) are a group of conditions with onset in the
developmental period characterized by deficits in the cognitive and social areas …

被引用次数：1370 相关文章所有 10 个版本

[PDF] arxiv.org

Advancing transformer architecture in long-context large language models: A comprehensive survey

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org

With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …

被引用次数：30 相关文章所有 2 个版本

[PDF] mlr.press

Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting

T Zhou, Z Ma, Q Wen, X Wang… - … on machine learning, 2022 - proceedings.mlr.press

Long-term time series forecasting is challenging since prediction accuracy tends to
decrease dramatically with the increasing horizon. Although Transformer-based methods …

被引用次数：1476 相关文章所有 4 个版本

[PDF] arxiv.org

Separable self-attention for mobile vision transformers

S Mehta, M Rastegari - arXiv preprint arXiv:2206.02680, 2022 - arxiv.org

Mobile vision transformers (MobileViT) can achieve state-of-the-art performance across
several mobile vision tasks, including classification and detection. Though these models …

被引用次数：293 相关文章所有 4 个版本

[PDF] arxiv.org

Fnet: Mixing tokens with fourier transforms

J Lee-Thorp, J Ainslie, I Eckstein, S Ontanon - arXiv preprint arXiv …, 2021 - arxiv.org

We show that Transformer encoder architectures can be sped up, with limited accuracy
costs, by replacing the self-attention sublayers with simple linear transformations that" mix" …

被引用次数：536 相关文章所有 7 个版本

[PDF] neurips.cc

Diagonal state spaces are as effective as structured state spaces

A Gupta, A Gu, J Berant - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Modeling long range dependencies in sequential data is a fundamental step towards
attaining human-level performance in many modalities such as text, vision, audio and video …

被引用次数：264 相关文章所有 8 个版本

[PDF] aaai.org

Informer: Beyond efficient transformer for long sequence time-series forecasting

H Zhou, S Zhang, J Peng, S Zhang, J Li… - Proceedings of the …, 2021 - ojs.aaai.org

Many real-world applications require the prediction of long sequence time-series, such as
electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a …

被引用次数：4346 相关文章所有 15 个版本

[PDF] arxiv.org

Deformable detr: Deformable transformers for end-to-end object detection

X Zhu, W Su, L Lu, B Li, X Wang, J Dai - arXiv preprint arXiv:2010.04159, 2020 - arxiv.org

DETR has been recently proposed to eliminate the need for many hand-designed
components in object detection while demonstrating good performance. However, it suffers …

被引用次数：5762 相关文章所有 5 个版本

[PDF] thecvf.com

Multi-scale vision longformer: A new vision transformer for high-resolution image encoding

P Zhang, X Dai, J Yang, B Xiao… - Proceedings of the …, 2021 - openaccess.thecvf.com

This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision
Longformer, which significantly enhances the ViT of [??] for encoding high-resolution …

被引用次数：384 相关文章所有 6 个版本

[PDF] neurips.cc

Big bird: Transformers for longer sequences

M Zaheer, G Guruganesh, KA Dubey… - Advances in neural …, 2020 - proceedings.neurips.cc

Transformers-based models, such as BERT, have been one of the most successful deep
learning models for NLP. Unfortunately, one of their core limitations is the quadratic …

被引用次数：2345 相关文章所有 8 个版本

高级搜索

QQ 群