Is sora a world simulator? a comprehensive survey on general world models and beyond

Z Zhu, X Wang, W Zhao, C Min, N Deng, M Dou… - arXiv preprint arXiv …, 2024 - arxiv.org
General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …

Adapting visual-language models for generalizable anomaly detection in medical images

C Huang, A Jiang, J Feng, Y Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advancements in large-scale visual-language pre-trained models have led to
significant progress in zero-/few-shot anomaly detection within natural image domains …

Mindbridge: A cross-subject brain decoding framework

S Wang, S Liu, Z Tan, X Wang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Brain decoding a pivotal field in neuroscience aims to reconstruct stimuli from acquired brain
signals primarily utilizing functional magnetic resonance imaging (fMRI). Currently brain …

Mutual-modality adversarial attack with semantic perturbation

J Ye, R Yu, S Liu, X Wang - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Adversarial attacks constitute a notable threat to machine learning systems, given their
potential to induce erroneous predictions and classifications. However, within real-world …

Tc-lif: A two-compartment spiking neuron model for long-term sequential modelling

S Zhang, Q Yang, C Ma, J Wu, H Li… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
The identification of sensory cues associated with potential opportunities and dangers is
frequently complicated by unrelated events that separate useful cues by long delays. As a …

Cross-attention makes inference cumbersome in text-to-image diffusion models

W Zhang, H Liu, J Xie, F Faccio, MZ Shou… - arXiv preprint arXiv …, 2024 - arxiv.org
This study explores the role of cross-attention during inference in text-conditional diffusion
models. We find that cross-attention outputs converge to a fixed point after few inference …

Laptop-diff: Layer pruning and normalized distillation for compressing diffusion models

D Zhang, S Li, C Chen, Q Xie, H Lu - arXiv preprint arXiv:2404.11098, 2024 - arxiv.org
In the era of AIGC, the demand for low-budget or even on-device applications of diffusion
models emerged. In terms of compressing the Stable Diffusion models (SDMs), several …

NightRain: Nighttime Video Deraining via Adaptive-Rain-Removal and Adaptive-Correction

B Lin, Y Jin, W Yan, W Ye, Y Yuan, S Zhang… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Existing deep-learning-based methods for nighttime video deraining rely on synthetic data
due to the absence of real-world paired data. However, the intricacies of the real world …

Prompt-driven target speech diarization

Y Jiang, Z Chen, R Tao, L Deng… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We introduce a novel task named 'target speech diarization', which seeks to determine
'when target event occurred'within an audio signal. We devise a neural architecture called …

Life regression based patch slimming for vision transformers

J Chen, L Chen, J Yang, T Shi, L Cheng, Z Feng… - Neural Networks, 2024 - Elsevier
Vision transformers have achieved remarkable success in computer vision tasks by using
multi-head self-attention modules to capture long-range dependencies within images …