V Zouhar, C Meister, JL Gastaldi, L Du… - The 61st Annual …, 2023 - virtual2023.aclweb.org
Subword tokenization is a key part of most NLP pipelines. However, little is known about
why some tokenizer and hyperparameter combinations lead to improved downstream model …