ChatGPT-like large-scale foundation models for prognostics and health management: A survey and roadmaps

YF Li, H Wang, M Sun - Reliability Engineering & System Safety, 2024 - Elsevier
PHM technology is vital in industrial production and maintenance, identifying and predicting
potential equipment failures and damages. This enables proactive maintenance measures …

Merlot reserve: Neural script knowledge through vision and language and sound

R Zellers, J Lu, X Lu, Y Yu, Y Zhao… - Proceedings of the …, 2022 - openaccess.thecvf.com
As humans, we navigate a multimodal world, building a holistic understanding from all our
senses. We introduce MERLOT Reserve, a model that represents videos jointly over time …

Learning to exploit temporal structure for biomedical vision-language processing

S Bannur, S Hyland, Q Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Self-supervised learning in vision--language processing (VLP) exploits semantic alignment
between imaging and text modalities. Prior work in biomedical VLP has mostly relied on the …

Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu
Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

Breaking common sense: Whoops! a vision-and-language benchmark of synthetic and compositional images

N Bitton-Guetta, Y Bitton, J Hessel… - Proceedings of the …, 2023 - openaccess.thecvf.com
Weird, unusual, and uncanny images pique the curiosity of observers because they
challenge commonsense. For example, an image released during the 2022 world cup …

A survey on masked autoencoder for self-supervised learning in vision and beyond

C Zhang, C Zhang, J Song, JSK Yi, K Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org
Masked autoencoders are scalable vision learners, as the title of MAE\cite {he2022masked},
which suggests that self-supervised learning (SSL) in vision might undertake a similar …

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

S Zhong, Z Huang, S Gao, W Wen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Chain-of-Thought (CoT) guides large language models (LLMs) to reason step-by-
step and can motivate their logical reasoning ability. While effective for logical tasks CoT is …

Accelerating vision-language pretraining with free language modeling

T Wang, Y Ge, F Zheng, R Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
The state of the arts in vision-language pretraining (VLP) achieves exemplary performance
but suffers from high training costs resulting from slow convergence and long training time …

Is bert blind? exploring the effect of vision-and-language pretraining on visual language understanding

M Alper, M Fiman… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Most humans use visual imagination to understand and reason about language, but models
such as BERT reason about language using knowledge acquired during text-only …

A fistful of words: Learning transferable visual models from bag-of-words supervision

A Tejankar, M Sanjabi, B Wu, S Xie, M Khabsa… - arXiv preprint arXiv …, 2021 - arxiv.org
Using natural language as a supervision for training visual recognition models holds great
promise. Recent works have shown that if such supervision is used in the form of alignment …