We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …
Objective. Low-dose computed tomography (LDCT) denoising is an important problem in CT research. Compared to the normal dose CT, LDCT images are subjected to severe noise …
In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
The remarkable success of transformers in the field of natural language processing has sparked the interest of the speech-processing community, leading to an exploration of their …
Self-supervised speech models are powerful speech representation extractors for downstream applications. Recently, larger models have been utilized in acoustic model …
X Yang, B Yan, H Li, Y Chen - … of the 39th International Conference on …, 2020 - dl.acm.org
Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in …
S Takase, S Kiyono - arXiv preprint arXiv:2104.06022, 2021 - arxiv.org
We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer …
Transformer models have achieved state-of-the-art results across a diverse range of domains. However, concern over the cost of training the attention mechanism to learn …
M Behnke, K Heafield - The 2020 Conference on Empirical …, 2020 - research.ed.ac.uk
The attention mechanism is the crucial component of the transformer architecture. Recent research shows that most attention heads are not confident in their decisions and can be …