Lessons on parameter sharing across layers in transformers

S Takase, S Kiyono - arXiv preprint arXiv:2104.06022, 2021 - arxiv.org
We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The
proposed approach relaxes a widely used technique, which shares parameters for one layer …

Rethinking perturbations in encoder-decoders for fast training

S Takase, S Kiyono - arXiv preprint arXiv:2104.01853, 2021 - arxiv.org
We often use perturbations to regularize neural models. For neural encoder-decoders,
previous studies applied the scheduled sampling (Bengio et al., 2015) and adversarial …

CipherDAug: Ciphertext based data augmentation for neural machine translation

N Kambhatla, L Born, A Sarkar - arXiv preprint arXiv:2204.00665, 2022 - arxiv.org
We propose a novel data-augmentation technique for neural machine translation based on
ROT-$ k $ ciphertexts. ROT-$ k $ is a simple letter substitution cipher that replaces a letter in …

Embedding compression with hashing for efficient representation learning in large-scale graph

CCM Yeh, M Gu, Y Zheng, H Chen, J Ebrahimi… - Proceedings of the 28th …, 2022 - dl.acm.org
Graph neural networks (GNNs) are deep learning models designed specifically for graph
data, and they typically rely on node features as the input to the first layer. When applying …

Efficient estimation of influence of a training instance

S Kobayashi, S Yokoi, J Suzuki, K Inui - arXiv preprint arXiv:2012.04207, 2020 - arxiv.org
Understanding the influence of a training instance on a neural network model leads to
improving interpretability. However, it is difficult and inefficient to evaluate the influence …

Large Vocabulary Size Improves Large Language Models

S Takase, R Ri, S Kiyono, T Kato - arXiv preprint arXiv:2406.16508, 2024 - arxiv.org
This paper empirically investigates the relationship between subword vocabulary size and
the performance of large language models (LLMs) to provide insights on how to define the …

TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation

H Qu, W Fan, Z Zhao, Q Li - arXiv preprint arXiv:2406.10450, 2024 - arxiv.org
There is a growing interest in utilizing large-scale language models (LLMs) to advance next-
generation Recommender Systems (RecSys), driven by their outstanding language …

Extract, Select and Rewrite: A Modular Sentence Summarization Method

S Guan, V Padmakumar - Proceedings of the 4th New Frontiers in …, 2023 - aclanthology.org
A modular approach has the advantage of being compositional and controllable, comparing
to most end-to-end models. In this paper we propose Extract-Select-Rewrite (ESR), a three …

Automatic text summarization using transformer-based language models

R Rao, S Sharma, N Malik - International Journal of System Assurance …, 2024 - Springer
Automatic text summarization is a lucrative field in natural language processing (NLP). The
amount of data flow has multiplied with the switch to digital. The massive datasets hold a …

[PDF][PDF] Faculty of Mechanical Engineering and Informatics Extended Sentence Parsing Method for Text-to-Semantic Application

WT Sewunetie - 2024 - geik.uni-miskolc.hu
The field of Natural Language Processing (NLP) has advanced significantly in recent years,
resulting in the creation of complex systems that indicate a deep comprehension of human …