相关文章- 学术资源搜索

To create what you tell: Generating videos from captions

Y Pan, Z Qiu, T Yao, H Li, T Mei - Proceedings of the 25th ACM …, 2017 - dl.acm.org

We are creating multimedia contents everyday and everywhere. While automatic content
generation has played a fundamental challenge to multimedia community for decades …

被引用次数：149 相关文章所有 4 个版本

Video captioning by adversarial LSTM

Y Yang, J Zhou, J Ai, Y Bin, A Hanjalic… - … on Image Processing, 2018 - ieeexplore.ieee.org

In this paper, we propose a novel approach to video captioning based on adversarial
learning and long short-term memory (LSTM). With this solution concept, we aim at …

被引用次数：216 相关文章所有 8 个版本

[PDF] arxiv.org

Stylevideogan: A temporal generative model using a pretrained stylegan

G Fox, A Tewari, M Elgharib, C Theobalt - arXiv preprint arXiv:2107.07224, 2021 - arxiv.org

Generative adversarial models (GANs) continue to produce advances in terms of the visual
quality of still images, as well as the learning of temporal correlations. However, few works …

被引用次数：44 相关文章所有 3 个版本

[PDF] thecvf.com

Adversarial inference for multi-sentence video description

JS Park, M Rohrbach, T Darrell… - Proceedings of the …, 2019 - openaccess.thecvf.com

While significant progress has been made in the image captioning task, video description is
still in its infancy due to the complex nature of video data. Generating multi-sentence …

被引用次数：106 相关文章所有 7 个版本

[PDF] thecvf.com

Temporal generative adversarial nets with singular value clipping

M Saito, E Matsumoto, S Saito - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com

In this paper, we propose a generative model, Temporal Generative Adversarial Nets
(TGAN), which can learn a semantic representation of unlabeled videos, and is capable of …

被引用次数：691 相关文章所有 7 个版本

[PDF] thecvf.com

Panda-70m: Captioning 70m videos with multiple cross-modality teachers

TS Chen, A Siarohin, W Menapace… - Proceedings of the …, 2024 - openaccess.thecvf.com

The quality of the data and annotation upper-bounds the quality of a downstream model.
While there exist large text corpora and image-text pairs high-quality video-text data is much …

被引用次数：49 相关文章所有 3 个版本

[PDF] arxiv.org

SBAT: Video captioning with sparse boundary-aware transformer

T Jin, S Huang, M Chen, Y Li, Z Zhang - arXiv preprint arXiv:2007.11888, 2020 - arxiv.org

In this paper, we focus on the problem of applying the transformer structure to video
captioning effectively. The vanilla transformer is proposed for uni-modal language …

被引用次数：54 相关文章所有 8 个版本

[PDF] thecvf.com

End-to-end generative pretraining for multimodal video captioning

PH Seo, A Nagrani, A Arnab… - Proceedings of the …, 2022 - openaccess.thecvf.com

Recent video and language pretraining frameworks lack the ability to generate sentences.
We present Multimodal Video Generative Pretraining (MV-GPT), a new pretraining …

被引用次数：181 相关文章所有 6 个版本

[PDF] thecvf.com

EMScore: Evaluating video captioning via coarse-grained and fine-grained embedding matching

Y Shi, X Yang, H Xu, C Yuan, B Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

Current metrics for video captioning are mostly based on the text-level comparison between
reference and candidate captions. However, they have some insuperable drawbacks, eg …

被引用次数：31 相关文章所有 5 个版本

[PDF] arxiv.org

End-to-end dense video captioning as sequence generation

W Zhu, B Pang, AV Thapliyal, WY Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Dense video captioning aims to identify the events of interest in an input video, and generate
descriptive captions for each event. Previous approaches usually follow a two-stage …

被引用次数：34 相关文章所有 4 个版本

高级搜索

QQ 群

To create what you tell: Generating videos from captions

Video captioning by adversarial LSTM

Stylevideogan: A temporal generative model using a pretrained stylegan

Adversarial inference for multi-sentence video description

Temporal generative adversarial nets with singular value clipping

Panda-70m: Captioning 70m videos with multiple cross-modality teachers

SBAT: Video captioning with sparse boundary-aware transformer

End-to-end generative pretraining for multimodal video captioning

EMScore: Evaluating video captioning via coarse-grained and fine-grained embedding matching

End-to-end dense video captioning as sequence generation

引用