Audio set: An ontology and human-labeled dataset for audio events JF Gemmeke, DPW Ellis, D Freedman, A Jansen, W Lawrence, RC Moore, ... 2017 IEEE international conference on acoustics, speech and signal …, 2017 | 3304 | 2017 |
CNN architectures for large-scale audio classification S Hershey, S Chaudhuri, DPW Ellis, JF Gemmeke, A Jansen, RC Moore, ... 2017 ieee international conference on acoustics, speech and signal …, 2017 | 2886 | 2017 |
Attention bottlenecks for multimodal fusion A Nagrani, S Yang, A Arnab, A Jansen, C Schmid, C Sun Advances in neural information processing systems 34, 14200-14213, 2021 | 512 | 2021 |
Musiclm: Generating music from text A Agostinelli, TI Denk, Z Borsos, J Engel, M Verzetti, A Caillon, Q Huang, ... arXiv preprint arXiv:2301.11325, 2023 | 374 | 2023 |
Shared computational principles for language processing in humans and deep language models A Goldstein, Z Zada, E Buchnik, M Schain, A Price, B Aubrey, SA Nastase, ... Nature neuroscience 25 (3), 369-380, 2022 | 265 | 2022 |
The zero resource speech challenge 2015. M Versteegh, R Thiolliere, T Schatz, XN Cao, X Anguera, A Jansen, ... Interspeech 15, 3169-3173, 2015 | 219 | 2015 |
Efficient spoken term discovery using randomized algorithms A Jansen, B Van Durme 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 401-406, 2011 | 191 | 2011 |
Towards spoken term discovery at scale with zero resources. A Jansen, K Church, H Hermansky Interspeech, 1676-1679, 2010 | 183 | 2010 |
Evaluating speech features with the minimal-pair ABX task: Analysis of the classical MFC/PLP pipeline T Schatz, V Peddinti, F Bach, A Jansen, H Hermansky, E Dupoux INTERSPEECH 2013: 14th Annual Conference of the International Speech …, 2013 | 179 | 2013 |
Unsupervised learning of semantic audio representations A Jansen, M Plakal, R Pandya, DPW Ellis, S Hershey, J Liu, RC Moore, ... 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 177 | 2018 |
Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition Y Zhang, DS Park, W Han, J Qin, A Gulati, J Shor, A Jansen, Y Xu, ... IEEE Journal of Selected Topics in Signal Processing 16 (6), 1519-1532, 2022 | 159 | 2022 |
Towards learning a universal non-semantic representation of speech J Shor, A Jansen, R Maor, O Lang, O Tuval, FC Quitry, M Tagliasacchi, ... arXiv preprint arXiv:2002.12764, 2020 | 154 | 2020 |
Unsupervised neural network based feature extraction using weak top-down constraints H Kamper, M Elsner, A Jansen, S Goldwater 2015 IEEE International Conference on Acoustics, Speech and Signal …, 2015 | 138 | 2015 |
Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings K Levin, K Henry, A Jansen, K Livescu 2013 IEEE workshop on automatic speech recognition and understanding, 410-415, 2013 | 137 | 2013 |
A segmental framework for fully-unsupervised large-vocabulary speech recognition H Kamper, A Jansen, S Goldwater Computer Speech & Language 46, 154-174, 2017 | 126 | 2017 |
A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition A Jansen, E Dupoux, S Goldwater, M Johnson, S Khudanpur, K Church, ... 2013 IEEE International Conference on Acoustics, Speech and Signal …, 2013 | 120 | 2013 |
Rapid evaluation of speech representations for spoken term discovery MA Carlin, S Thomas, A Jansen, H Hermansky Twelfth Annual Conference of the International Speech Communication Association, 2011 | 114 | 2011 |
A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge D Renshaw, H Kamper, A Jansen, S Goldwater Sixteenth Annual Conference of the International Speech Communication …, 2015 | 112 | 2015 |
Mulan: A joint embedding of music audio and natural language Q Huang, A Jansen, J Lee, R Ganti, JY Li, DPW Ellis arXiv preprint arXiv:2208.12415, 2022 | 108 | 2022 |
Unsupervised word segmentation and lexicon discovery using acoustic word embeddings H Kamper, A Jansen, S Goldwater IEEE/ACM Transactions on Audio, Speech, and Language Processing 24 (4), 669-679, 2016 | 102 | 2016 |