MoEUT: Mixture-of-Experts Universal Transformers

R Csordás, K Irie, J Schmidhuber, C Potts… - arXiv preprint arXiv …, 2024 - arxiv.org
Previous work on Universal Transformers (UTs) has demonstrated the importance of
parameter sharing across layers. By allowing recurrence in depth, UTs have advantages …

Determining 3D structure from molecular formula and isotopologue rotational spectra in natural abundance with reflection-equivariant diffusion

AH Cheng, A Lo, S Miret, BH Pate… - The Journal of Chemical …, 2024 - pubs.aip.org
Structure determination is necessary to identify unknown organic molecules, such as those
in natural products, forensic samples, the interstellar medium, and laboratory syntheses …

[HTML][HTML] Seeing the world from its words: All-embracing Transformers for fingerprint-based indoor localization

SM Nguyen, DV Le, PJM Havinga - Pervasive and Mobile Computing, 2024 - Elsevier
In this paper, we present all-embracing Transformers (AaTs) that are capable of deftly
manipulating attention mechanism for Received Signal Strength (RSS) fingerprints in order …

52B to 1T: Lessons Learned via Tele-FLM Series

X Li, Y Yao, X Jiang, X Fang, C Wang, X Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) represent a significant stride toward Artificial General
Intelligence. As scaling laws underscore the potential of increasing model sizes, the …

Thangka Image—Text Matching Based on Adaptive Pooling Layer and Improved Transformer

K Wang, T Wang, X Guo, K Xu, J Wu - Applied Sciences, 2024 - mdpi.com
Image–text matching is a research hotspot in the multimodal task of integrating image and
text processing. In order to solve the difficult problem of associating image and text data in …

[HTML][HTML] The Transformer Blueprint: A Holistic Guide to the Transformer Neural Network Architecture

J Nyandwi - Deep Learning Revision, 2023 - deeprevision.github.io
AI Research Blog - The Transformer Blueprint: A Holistic Guide to the Transformer Neural
Network Architecture AI Research Blog About The Transformer Blueprint: A Holistic Guide to the …

Recognition of professions in medical documentation

A Madrid García - 2023 - e-spacio.uned.es
Abstract Named Entity Recognition (NER) in Electronic Health Record (EHR) is the area of
Natural Language Processing (NLP) that seeks to identify and extract unstructured …

MABViT--Modified Attention Block Enhances Vision Transformers

M Ramesh, A Ramkumar - arXiv preprint arXiv:2312.01324, 2023 - arxiv.org
Recent studies have demonstrated the effectiveness of Gated Linear Units (GLU) in
enhancing transformer models, particularly in Large Language Models (LLMs). Additionally …

Energy Saving Based on Transformer Models with LeakyReLU Activation Function

J Wang, X Li, J Wang - 2023 13th International Conference on …, 2023 - ieeexplore.ieee.org
In this paper, energy saving based on transformers with LeakyReLU attention mechanisms
is discussed. Softmax functions in attention mechanisms of transformers are replaced by …

Classification of Power Quality Disturbances in Microgrids Using a Multi-Level Global Convolutional Neural Network and Transformer Approach

J Jiang, H Wu, CH Zhong, CW Zheng… - Available at SSRN … - papers.ssrn.com
As the adoption of new energy sources like photovoltaic and wind power increases
alongside the influx of advanced power electronic devices, there has been a significant rise …