Attention mechanism in neural networks: where it comes and where it goes

D Soydaner - Neural Computing and Applications, 2022 - Springer
A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

Flowformer: Linearizing transformers with conservation flows

H Wu, J Wu, J Xu, J Wang, M Long - arXiv preprint arXiv:2202.06258, 2022 - arxiv.org
Transformers based on the attention mechanism have achieved impressive success in
various areas. However, the attention mechanism has a quadratic complexity, significantly …

Primal-attention: Self-attention through asymmetric kernel svd in primal representation

Y Chen, Q Tao, F Tonin… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recently, a new line of works has emerged to understand and improve self-attention in
Transformers by treating it as a kernel machine. However, existing works apply the methods …

Deep unlearning via randomized conditionally independent hessians

R Mehta, S Pal, V Singh… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Recent legislation has led to interest in machine unlearning, ie, removing specific training
samples from a predictive model as if they never existed in the training dataset. Unlearning …

Sampling is all you need on modeling long-term user behaviors for CTR prediction

Y Cao, X Zhou, J Feng, P Huang, Y Xiao… - Proceedings of the 31st …, 2022 - dl.acm.org
Rich user behavior data has been proven to be of great value for Click-Through Rate (CTR)
prediction applications, especially in industrial recommender, search, or advertising …

Computation and parameter efficient multi-modal fusion transformer for cued speech recognition

L Liu, L Liu, H Li - IEEE/ACM Transactions on Audio, Speech …, 2024 - ieeexplore.ieee.org
Cued Speech (CS) is a pure visual coding method used by hearing-impaired people that
combines lip reading with several specific hand shapes to make the spoken language …

Multi resolution analysis (MRA) for approximate self-attention

Z Zeng, S Pal, J Kline, GM Fung… - … on Machine Learning, 2022 - proceedings.mlr.press
Transformers have emerged as a preferred model for many tasks in natural langugage
processing and vision. Recent efforts on training and deploying Transformers more …

When linear attention meets autoregressive decoding: Towards more effective and efficient linearized large language models

H You, Y Fu, Z Wang, A Yazdanbakhsh… - arXiv preprint arXiv …, 2024 - arxiv.org
Autoregressive Large Language Models (LLMs) have achieved impressive performance in
language tasks but face two significant bottlenecks:(1) quadratic complexity in the attention …

LookupFFN: making transformers compute-lite for CPU inference

Z Zeng, M Davies, P Pulijala… - International …, 2023 - proceedings.mlr.press
While GPU clusters are the de facto choice for training large deep neural network (DNN)
models today, several reasons including ease of workflow, security and cost have led to …

VCC: scaling transformers to 128K tokens or more by prioritizing important tokens

Z Zeng, C Hawkins, M Hong, A Zhang… - Advances in …, 2024 - proceedings.neurips.cc
Transformers are central in modern natural language processing and computer vision
applications. Despite recent works devoted to reducing the quadratic cost of such models …