You only sample (almost) once: Linear cost self-attention via bernoulli sampling

D Soydaner - Neural Computing and Applications, 2022 - Springer

A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

被引用次数：170 相关文章所有 8 个版本

[PDF] arxiv.org

Flowformer: Linearizing transformers with conservation flows

H Wu, J Wu, J Xu, J Wang, M Long - arXiv preprint arXiv:2202.06258, 2022 - arxiv.org

Transformers based on the attention mechanism have achieved impressive success in
various areas. However, the attention mechanism has a quadratic complexity, significantly …

被引用次数：101 相关文章所有 4 个版本

[PDF] neurips.cc

Primal-attention: Self-attention through asymmetric kernel svd in primal representation

Y Chen, Q Tao, F Tonin… - Advances in Neural …, 2024 - proceedings.neurips.cc

Recently, a new line of works has emerged to understand and improve self-attention in
Transformers by treating it as a kernel machine. However, existing works apply the methods …

被引用次数：25 相关文章所有 7 个版本

[PDF] thecvf.com

Deep unlearning via randomized conditionally independent hessians

R Mehta, S Pal, V Singh… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Recent legislation has led to interest in machine unlearning, ie, removing specific training
samples from a predictive model as if they never existed in the training dataset. Unlearning …

被引用次数：90 相关文章所有 11 个版本

[PDF] arxiv.org

Sampling is all you need on modeling long-term user behaviors for CTR prediction

Y Cao, X Zhou, J Feng, P Huang, Y Xiao… - Proceedings of the 31st …, 2022 - dl.acm.org

Rich user behavior data has been proven to be of great value for Click-Through Rate (CTR)
prediction applications, especially in industrial recommender, search, or advertising …

被引用次数：44 相关文章所有 3 个版本

[PDF] arxiv.org

Computation and parameter efficient multi-modal fusion transformer for cued speech recognition

L Liu, L Liu, H Li - IEEE/ACM Transactions on Audio, Speech …, 2024 - ieeexplore.ieee.org

Cued Speech (CS) is a pure visual coding method used by hearing-impaired people that
combines lip reading with several specific hand shapes to make the spoken language …

被引用次数：5 相关文章所有 4 个版本

[PDF] mlr.press

Multi resolution analysis (MRA) for approximate self-attention

Z Zeng, S Pal, J Kline, GM Fung… - … on Machine Learning, 2022 - proceedings.mlr.press

Transformers have emerged as a preferred model for many tasks in natural langugage
processing and vision. Recent efforts on training and deploying Transformers more …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

When linear attention meets autoregressive decoding: Towards more effective and efficient linearized large language models

H You, Y Fu, Z Wang, A Yazdanbakhsh… - arXiv preprint arXiv …, 2024 - arxiv.org

Autoregressive Large Language Models (LLMs) have achieved impressive performance in
language tasks but face two significant bottlenecks:(1) quadratic complexity in the attention …

被引用次数：3 相关文章所有 4 个版本

[PDF] mlr.press

LookupFFN: making transformers compute-lite for CPU inference

Z Zeng, M Davies, P Pulijala… - International …, 2023 - proceedings.mlr.press

While GPU clusters are the de facto choice for training large deep neural network (DNN)
models today, several reasons including ease of workflow, security and cost have led to …

被引用次数：5 相关文章所有 6 个版本

[PDF] neurips.cc

VCC: scaling transformers to 128K tokens or more by prioritizing important tokens

Z Zeng, C Hawkins, M Hong, A Zhang… - Advances in …, 2024 - proceedings.neurips.cc

Transformers are central in modern natural language processing and computer vision
applications. Despite recent works devoted to reducing the quadratic cost of such models …

被引用次数：8 相关文章所有 6 个版本

高级搜索

QQ 群