Optimizing deeper transformers on small datasets

S Ahmed, IE Nielsen, A Tripathi, S Siddiqui… - Circuits, Systems, and …, 2023 - Springer

Transformer architectures have widespread applications, particularly in Natural Language
Processing and Computer Vision. Recently, Transformers have been employed in various …

被引用次数：153 相关文章所有 6 个版本

[PDF] arxiv.org

A survey on text-to-sql parsing: Concepts, methods, and future directions

B Qin, B Hui, L Wang, M Yang, J Li, B Li… - arXiv preprint arXiv …, 2022 - arxiv.org

Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL parsing is
to convert a natural language (NL) question to its corresponding structured query language …

被引用次数：74 相关文章所有 2 个版本

[PDF] neurips.cc

H2o: Heavy-hitter oracle for efficient generative inference of large language models

Z Zhang, Y Sheng, T Zhou, T Chen… - Advances in …, 2023 - proceedings.neurips.cc

Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …

被引用次数：235 相关文章所有 7 个版本

[PDF] aaai.org

Resdsql: Decoupling schema linking and skeleton parsing for text-to-sql

H Li, J Zhang, C Li, H Chen - Proceedings of the AAAI Conference on …, 2023 - ojs.aaai.org

One of the recent best attempts at Text-to-SQL is the pre-trained language model. Due to the
structural property of the SQL queries, the seq2seq model takes the responsibility of parsing …

被引用次数：134 相关文章所有 4 个版本

[PDF] arxiv.org

PICARD: Parsing incrementally for constrained auto-regressive decoding from language models

T Scholak, N Schucher, D Bahdanau - arXiv preprint arXiv:2109.05093, 2021 - arxiv.org

Large pre-trained language models for textual data have an unconstrained output space; at
each decoding step, they can produce any of 10,000 s of sub-word tokens. When fine-tuned …

被引用次数：350 相关文章所有 4 个版本

[PDF] ieee.org

Deepnet: Scaling transformers to 1,000 layers

H Wang, S Ma, L Dong, S Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

In this paper, we propose a simple yet effective method to stabilize extremely deep
Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify …

被引用次数：157 相关文章所有 4 个版本

[PDF] arxiv.org

LGESQL: line graph enhanced text-to-SQL model with mixed local and non-local relations

R Cao, L Chen, Z Chen, Y Zhao, S Zhu, K Yu - arXiv preprint arXiv …, 2021 - arxiv.org

This work aims to tackle the challenging heterogeneous graph encoding problem in the text-
to-SQL task. Previous methods are typically node-centric and merely utilize different weight …

被引用次数：147 相关文章所有 4 个版本

[PDF] arxiv.org

State-of-the-art generalisation research in NLP: a taxonomy and review

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - arXiv preprint arXiv …, 2022 - arxiv.org

The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …

被引用次数：60 相关文章所有 7 个版本

[PDF] neurips.cc

Sadga: Structure-aware dual graph aggregation network for text-to-sql

R Cai, J Yuan, B Xu, Z Hao - Advances in Neural …, 2021 - proceedings.neurips.cc

The Text-to-SQL task, aiming to translate the natural language of the questions into SQL
queries, has drawn much attention recently. One of the most challenging problems of Text-to …

被引用次数：60 相关文章所有 9 个版本

[PDF] mit.edu

Few-shot text-to-sql translation using structure and content prompt learning

Z Gu, J Fan, N Tang, L Cao, B Jia, S Madden… - Proceedings of the ACM …, 2023 - dl.acm.org

A common problem with adopting Text-to-SQL translation in database systems is poor
generalization. Specifically, when there is limited training data on new datasets, existing few …

被引用次数：41 相关文章所有 4 个版本

高级搜索

QQ 群