The AMI meeting corpus: A pre-announcement

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：109 相关文章所有 6 个版本

[PDF] arxiv.org

Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier

Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

被引用次数：352 相关文章所有 9 个版本

[PDF] arxiv.org

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

被引用次数：182 相关文章所有 3 个版本

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

被引用次数：703 相关文章所有 13 个版本

[PDF] arxiv.org

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

被引用次数：592 相关文章所有 5 个版本

[PDF] springer.com

Hota: A higher order metric for evaluating multi-object tracking

J Luiten, A Osep, P Dendorfer, P Torr, A Geiger… - International journal of …, 2021 - Springer

Multi-object tracking (MOT) has been notoriously difficult to evaluate. Previous metrics
overemphasize the importance of either detection or association. To address this, we …

被引用次数：645 相关文章所有 22 个版本

[PDF] arxiv.org

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier

Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

被引用次数：335 相关文章所有 7 个版本

[PDF] arxiv.org

An empirical survey on long document summarization: Datasets, models, and metrics

HY Koh, J Ju, M Liu, S Pan - ACM computing surveys, 2022 - dl.acm.org

Long documents such as academic articles and business reports have been the standard
format to detail out important issues and complicated subjects that require extra attention. An …

被引用次数：83 相关文章所有 7 个版本

[PDF] arxiv.org

Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition

Y Zhang, DS Park, W Han, J Qin… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …

被引用次数：163 相关文章所有 4 个版本

[PDF] aclanthology.org

QMSum: A new benchmark for query-based multi-domain meeting summarization

M Zhong, D Yin, T Yu, A Zaidi, M Mutuma, R Jha… - arXiv preprint arXiv …, 2021 - arxiv.org

Meetings are a key component of human collaboration. As increasing numbers of meetings
are recorded and transcribed, meeting summaries have become essential to remind those …

被引用次数：226 相关文章所有 6 个版本

高级搜索

QQ 群