Efficient deep learning: A survey on making deep learning models smaller, faster, and better

G Menghani - ACM Computing Surveys, 2023 - dl.acm.org
Deep learning has revolutionized the fields of computer vision, natural language
understanding, speech recognition, information retrieval, and more. However, with the …

Cramming: Training a Language Model on a single GPU in one day.

J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press
Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

Universal-KD: Attention-based output-grounded intermediate layer knowledge distillation

Y Wu, M Rezagholizadeh, A Ghaddar… - Proceedings of the …, 2021 - aclanthology.org
Intermediate layer matching is shown as an effective approach for improving knowledge
distillation (KD). However, this technique applies matching in the hidden spaces of two …

Translate & Fill: Improving zero-shot multilingual semantic parsing with synthetic data

M Nicosia, Z Qu, Y Altun - arXiv preprint arXiv:2109.04319, 2021 - arxiv.org
While multilingual pretrained language models (LMs) fine-tuned on a single language have
shown substantial cross-lingual task transfer capabilities, there is still a wide performance …

Mergedistill: Merging pre-trained language models using distillation

S Khanuja, M Johnson, P Talukdar - arXiv preprint arXiv:2106.02834, 2021 - arxiv.org
Pre-trained multilingual language models (LMs) have achieved state-of-the-art results in
cross-lingual transfer, but they often lead to an inequitable representation of languages due …

pNLP-mixer: An efficient all-MLP architecture for language

F Fusco, D Pascual, P Staar, D Antognini - arXiv preprint arXiv:2202.04350, 2022 - arxiv.org
Large pre-trained language models based on transformer architecture have drastically
changed the natural language processing (NLP) landscape. However, deploying those …

Too brittle to touch: comparing the stability of quantization and distillation towards developing low-resource MT models

H Diddee, S Dandapat, M Choudhury… - Proceedings of the …, 2022 - aclanthology.org
Leveraging shared learning through Massively Multilingual Models, state-of-the-art Machine
translation (MT) models are often able to adapt to the paucity of data for low-resource …

Unsupervised term extraction for highly technical domains

F Fusco, P Staar, D Antognini - arXiv preprint arXiv:2210.13118, 2022 - arxiv.org
Term extraction is an information extraction task at the root of knowledge discovery
platforms. Developing term extractors that are able to generalize across very diverse and …

Data augmentation and learned layer aggregation for improved multilingual language understanding in dialogue

E Razumovskaia, I Vulić… - Findings of the Association …, 2022 - aclanthology.org
Scaling dialogue systems to a multitude of domains, tasks and languages relies on costly
and time-consuming data annotation for different domain-task-language configurations. The …

Using Large Text To Image Models with Structured Prompts for Skin Disease Identification: A Case Study

S Rajapaksa, JMU Vianney, R Castro… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper investigates the potential usage of large text-to-image (LTI) models for the
automated diagnosis of a few skin conditions with rarity or a serious lack of annotated …