Pre-training LLMs using human-like development data corpus

K Bhardwaj, RS Shah, S Varma - arXiv preprint arXiv:2311.04666, 2023 - arxiv.org
Pre-trained Large Language Models (LLMs) have shown success in a diverse set of
language inference and understanding tasks. The pre-training stage of LLMs looks at a …

Generating Faithful and Salient Text from Multimodal Data

T Hashem, W Wang, DT Wijaya, ME Ali… - arXiv preprint arXiv …, 2024 - arxiv.org
While large multimodal models (LMMs) have obtained strong performance on many
multimodal tasks, they may still hallucinate while generating text. Their performance on …

TourismNLG: A Multi-lingual Generative Benchmark for the Tourism Domain

SM Bhatt, S Agarwal, O Gurjar, M Gupta… - … on Information Retrieval, 2023 - Springer
The tourism industry is important for the benefits it brings and due to its role as a commercial
activity that creates demand and growth for many more industries. Yet there is not much …

Inference and Reasoning for Semi-Structured Tables

V Gupta - 2023 - search.proquest.com
Semi-structured tabular data, such as ones in e-commerce product descriptions, annual
financial reports, sports score statistics, scientific articles, etc., are ubiquitous in real-world …

[PDF][PDF] Domain-Specific Pretrained Models For Natural Language Generation

SM Bhatt - 2023 - cdn.iiit.ac.in
Abstract Natural Language Generation (NLG) focuses on the automatic generation of natural
language text, which should ideally be coherent, fluent, and stylistically appropriate for a …

Opsa: Order Preserving Token Shuffling Augmentation with Wargame Simulation Dataset

J Heo, S Park, T Kim, J Cho, SW Han - Available at SSRN 4903910 - papers.ssrn.com
Table-to-text generation involves converting structured data in table format into natural
language descriptions or summaries. However, existing table-to-text generation tasks are …