Trocr: Transformer-based optical character recognition with pre-trained models

F Zeng, W Gan, Y Wang, N Liu, PS Yu - arXiv preprint arXiv:2311.07226, 2023 - arxiv.org

The human ability to learn, generalize, and control complex manipulation tasks through multi-
modality feedback suggests a unique capability, which we refer to as dexterity intelligence …

被引用次数：44 相关文章所有 3 个版本

[PDF] arxiv.org

Palm-e: An embodied multimodal language model

D Driess, F Xia, MSM Sajjadi, C Lynch… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models excel at a wide range of complex tasks. However, enabling general
inference in the real world, eg, for robotics problems, raises the challenge of grounding. We …

被引用次数：1003 相关文章所有 6 个版本

[PDF] frontiersin.org

Large language models and political science

M Linegar, R Kocielnik, RM Alvarez - Frontiers in Political Science, 2023 - frontiersin.org

Large Language Models (LLMs) are a type of artificial intelligence that uses information from
very large datasets to model the use of language and generate content. While LLMs like …

被引用次数：12 相关文章

[PDF] neurips.cc

Textdiffuser: Diffusion models as text painters

J Chen, Y Huang, T Lv, L Cui… - Advances in Neural …, 2024 - proceedings.neurips.cc

Diffusion models have gained increasing attention for their impressive generation abilities
but currently struggle with rendering accurate and coherent text. To address this issue, we …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

Dan: a segmentation-free document attention network for handwritten document recognition

D Coquenet, C Chatelain… - IEEE transactions on …, 2023 - ieeexplore.ieee.org

Unconstrained handwritten text recognition is a challenging computer vision task. It is
traditionally handled by a two-step approach, combining line segmentation followed by text …

被引用次数：61 相关文章所有 10 个版本

[PDF] arxiv.org

Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning

Y Hu, F Lin, T Zhang, L Yi, Y Gao - arXiv preprint arXiv:2311.17842, 2023 - arxiv.org

In this study, we are interested in imbuing robots with the capability of physically-grounded
task planning. Recent advancements have shown that large language models (LLMs) …

被引用次数：29 相关文章所有 3 个版本

[PDF] arxiv.org

Exploring ocr capabilities of gpt-4v (ision): A quantitative and in-depth evaluation

Y Shi, D Peng, W Liao, Z Lin, X Chen, C Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents a comprehensive evaluation of the Optical Character Recognition
(OCR) capabilities of the recently released GPT-4V (ision), a Large Multimodal Model …

被引用次数：25 相关文章所有 3 个版本

[PDF] thecvf.com

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

S Zhong, Z Huang, S Gao, W Wen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Chain-of-Thought (CoT) guides large language models (LLMs) to reason step-by-
step and can motivate their logical reasoning ability. While effective for logical tasks CoT is …

被引用次数：6 相关文章所有 3 个版本

[PDF] neurips.cc

American stories: A large-scale structured text dataset of historical us newspapers

M Dell, J Carlson, T Bryan, E Silcock… - Advances in …, 2024 - proceedings.neurips.cc

Existing full text datasets of US public domain newspapers do not recognize the often
complex layouts of newspaper scans, and as a result the digitized content scrambles texts …

被引用次数：10 相关文章所有 9 个版本

[PDF] arxiv.org

Owl: A large language model for it operations

H Guo, J Yang, J Liu, L Yang, L Chai, J Bai… - arXiv preprint arXiv …, 2023 - arxiv.org

With the rapid development of IT operations, it has become increasingly crucial to efficiently
manage and analyze large volumes of data for practical applications. The techniques of …

被引用次数：20 相关文章所有 3 个版本

高级搜索

QQ 群