Large language models for robotics: A survey

F Zeng, W Gan, Y Wang, N Liu, PS Yu - arXiv preprint arXiv:2311.07226, 2023 - arxiv.org
The human ability to learn, generalize, and control complex manipulation tasks through multi-
modality feedback suggests a unique capability, which we refer to as dexterity intelligence …

Palm-e: An embodied multimodal language model

D Driess, F Xia, MSM Sajjadi, C Lynch… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models excel at a wide range of complex tasks. However, enabling general
inference in the real world, eg, for robotics problems, raises the challenge of grounding. We …

Large language models and political science

M Linegar, R Kocielnik, RM Alvarez - Frontiers in Political Science, 2023 - frontiersin.org
Large Language Models (LLMs) are a type of artificial intelligence that uses information from
very large datasets to model the use of language and generate content. While LLMs like …

Textdiffuser: Diffusion models as text painters

J Chen, Y Huang, T Lv, L Cui… - Advances in Neural …, 2024 - proceedings.neurips.cc
Diffusion models have gained increasing attention for their impressive generation abilities
but currently struggle with rendering accurate and coherent text. To address this issue, we …

Dan: a segmentation-free document attention network for handwritten document recognition

D Coquenet, C Chatelain… - IEEE transactions on …, 2023 - ieeexplore.ieee.org
Unconstrained handwritten text recognition is a challenging computer vision task. It is
traditionally handled by a two-step approach, combining line segmentation followed by text …

Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning

Y Hu, F Lin, T Zhang, L Yi, Y Gao - arXiv preprint arXiv:2311.17842, 2023 - arxiv.org
In this study, we are interested in imbuing robots with the capability of physically-grounded
task planning. Recent advancements have shown that large language models (LLMs) …

Exploring ocr capabilities of gpt-4v (ision): A quantitative and in-depth evaluation

Y Shi, D Peng, W Liao, Z Lin, X Chen, C Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper presents a comprehensive evaluation of the Optical Character Recognition
(OCR) capabilities of the recently released GPT-4V (ision), a Large Multimodal Model …

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

S Zhong, Z Huang, S Gao, W Wen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Chain-of-Thought (CoT) guides large language models (LLMs) to reason step-by-
step and can motivate their logical reasoning ability. While effective for logical tasks CoT is …

American stories: A large-scale structured text dataset of historical us newspapers

M Dell, J Carlson, T Bryan, E Silcock… - Advances in …, 2024 - proceedings.neurips.cc
Existing full text datasets of US public domain newspapers do not recognize the often
complex layouts of newspaper scans, and as a result the digitized content scrambles texts …

Owl: A large language model for it operations

H Guo, J Yang, J Liu, L Yang, L Chai, J Bai… - arXiv preprint arXiv …, 2023 - arxiv.org
With the rapid development of IT operations, it has become increasingly crucial to efficiently
manage and analyze large volumes of data for practical applications. The techniques of …