Patentgpt: A large language model for intellectual property

Z Bai, R Zhang, L Chen, Q Cai, Y Zhong… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, large language models (LLMs) have attracted significant attention due to
their exceptional performance across a multitude of natural language process tasks, and …

Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition

A Basharin, A Chertkov, I Oseledets - arXiv preprint arXiv:2410.17765, 2024 - arxiv.org
We propose a new model for multi-token prediction in transformers, aiming to enhance
sampling efficiency without compromising accuracy. Motivated by recent work that predicts …

Monet: Mixture of Monosemantic Experts for Transformers

J Park, YJ Ahn, KE Kim, J Kang - arXiv preprint arXiv:2412.04139, 2024 - arxiv.org
Understanding the internal computations of large language models (LLMs) is crucial for
aligning them with human values and preventing undesirable behaviors like toxic content …

Architecture Design: From Neural Networks to Foundation Models

G Chrysos - 2024 IEEE 11th International Conference on Data …, 2024 - ieeexplore.ieee.org
Historically, we are taught to use task-dependent architecture design and objectives to
tackle data science tasks. Counter intuitively, this dogma has been proven (partly) wrong by …