The spotify podcast dataset

A Clifton, A Pappu, S Reddy, Y Yu, J Karlgren… - arXiv preprint arXiv …, 2020 - arxiv.org
… avenues for the Information Retrieval and Natural Language Processing communities. In
this work, we present the Spotify Podcasts Dataset, the first large scale corpus of podcast audio …

100,000 podcasts: A spoken English document corpus

A Clifton, S Reddy, Y Yu, A Pappu… - Proceedings of the …, 2020 - aclanthology.org
… We have compiled the Spotify Podcast Dataset, the first large scale corpus of podcast audio
data with … value, to amateurs recording podcasts using an application on their mobile phone. …

A baseline analysis for podcast abstractive summarization

C Zheng, HJ Wang, K Zhang, L Fan - arXiv preprint arXiv:2008.10648, 2020 - arxiv.org
… paper, the dataset we study is the recently released TREC 2020 Spotify Podcasts Dataset
[3], … In this paper, we present the performance of podcast summarization using two baselines …

Spotify at TREC 2020: Genre-Aware abstractive podcast summarization

R Rezapour, S Reddy, A Clifton, R Jones - arXiv preprint arXiv …, 2021 - arxiv.org
… on the CNN/Daily Mail news summarization dataset1 and then fine-tuned it on our podcast
transcript dataset with respect to the two proposed models (described below in §4). We used …

PodcastMix: A dataset for separating music and speech in podcasts

N Schmidt, J Pons, M Miron - arXiv preprint arXiv:2207.07403, 2022 - arxiv.org
… a synthetic speech dataset, to generalize to unseen datasets [18]… datasets to assess the
generalization capabilities of the … of another podcast-related dataset, the Spotify Podcast Dataset […

Topic modeling on podcast short-text metadata

FB Valero, M Baranes, EV Epure - European Conference on Information …, 2022 - Springer
… We start with describing the existing podcast datasets from iTunes [23] and Spotify [8]. Then,
we introduce our newly collected dataset, Deezer, which is the largest one among the three …

Neural instant search for music and podcast

H Hashemi, A Pappu, M Tian, P Chandar… - … the 27th ACM SIGKDD …, 2021 - dl.acm.org
… Specifically, we identify the need to improve podcast search performance. … To evaluate
the model, we use a large-scale dataset containing real search queries sampled from Spotify

Machine learning system implementation of education podcast recommendations on spotify applications using content-based filtering and tf-idf

MM Raharjo, F Arifin - Elinvo (Electronics, Informatics, and …, 2023 - journal.uny.ac.id
the podcast listening experience on Spotify, this research addresses the challenge of locating
suitable podcasts… change in a dataset from data collection obtained from the Spotify API to …

[PDF][PDF] Spotify at the TREC 2020 Podcasts Track: Segment Retrieval.

Y Yu, J Karlgren, A Clifton, MI Tanveer, R Jones… - TREC, 2020 - trec.nist.gov
the details of our submissions to the TREC Podcasts Track 2020. Based on the task guidelines,
a segment, for the purposes of the … Unlike other corpora, the podcast dataset contains …

Leveraging multimodal content for podcast summarization

L Vaiani, M La Quatra, L Cagliero, P Garza - … of the 37th ACM/SIGAPP …, 2022 - dl.acm.org
… A more detailed description of the real-world dataset considered in our study is reported
in … Table 1: Statistics about the textual content extracted from the Spotify podcast dataset [5]. …