Temporally grounding natural sentence in video

J Chen, X Chen, L Ma, Z Jie… - Proceedings of the 2018 …, 2018 - aclanthology.org
We introduce an effective and efficient method that grounds (ie, localizes) natural sentences
in long, untrimmed video sequences. Specifically, a novel Temporal GroundNet (TGN) is …

Computational understanding of visual interestingness beyond semantics: literature survey and analysis of covariates

MG Constantin, M Redi, G Zen, B Ionescu - ACM Computing Surveys …, 2019 - dl.acm.org
Understanding visual interestingness is a challenging task addressed by researchers in
various disciplines ranging from humanities and psychology to, more recently, computer …

CBR: context bias aware recommendation for debiasing user modeling and click prediction

Z Zheng, Z Qiu, T Xu, X Wu, X Zhao, E Chen… - Proceedings of the ACM …, 2022 - dl.acm.org
With the prosperity of recommender systems, the biases existing in user behaviors, which
may lead to inconsistency between user preference and behavior records, have attracted …

Sending or not? A multimodal framework for Danmaku comment prediction

D Xi, W Xu, R Chen, Y Zhou, Z Yang - Information Processing & …, 2021 - Elsevier
Danmaku is an emerging comment design for videos that allows real-time, interactive
comments from viewers. Danmaku increases viewers' interaction with other viewers and …

On the consensus of synchronous temporal and spatial views: A novel multimodal deep learning method for social video prediction

S Xiao, J Wang, J Wang, R Chen, G Chen - Information Processing & …, 2024 - Elsevier
The blowout development of video social platforms has spawned a wide range of social
video prediction (SVP) tasks, such as video attractiveness prediction and video sentiment …

Predicting video engagement using heterogeneous DeepWalk

I Chaturvedi, K Thapa, S Cavallari, E Cambria… - Neurocomputing, 2021 - Elsevier
Video engagement is important in online advertisements where there is no physical
interaction with the consumer. Engagement can be directly measured as the number of …

Non-local netvlad encoding for video classification

Y Tang, X Zhang, L Ma, J Wang… - Proceedings of the …, 2018 - openaccess.thecvf.com
This paper describes our solution for the 2nd YouTube-8M video understanding challenge
organized by Google AI. Unlike the video recognition benchmarks, such as Kinetics and …

Bidirectional image-sentence retrieval by local and global deep matching

L Ma, W Jiang, Z Jie, X Wang - Neurocomputing, 2019 - Elsevier
In this paper, we propose a novel local and global deep matching model to tackle
bidirectional image-sentence retrieval. Our proposed matching model can simultaneously …

Parsimonious quantile regression of financial asset tail dynamics via sequential learning

X Yan, W Zhang, L Ma, W Liu… - Advances in neural …, 2018 - proceedings.neurips.cc
We propose a parsimonious quantile regression framework to learn the dynamic tail
behaviors of financial asset returns. Our model captures well both the time-varying …

Multi-branch LSTM encoded latent features with CNN-LSTM for Youtube popularity prediction

N Sangwan, V Bhatnagar - Scientific Reports, 2025 - nature.com
As digital media grows, there is an increasing demand for engaging content that can
captivate audiences. Along with that, the monetary conversion of those engaging videos is …