Why gradient clipping accelerates training: A theoretical justification for adaptivity J Zhang, T He, S Sra, A Jadbabaie International Conference on Learning Representations 2020, 2019 | 420 | 2019 |
Reshaping deep neural network for fast decoding by node-pruning T He, Y Fan, Y Qian, T Tan, K Yu 2014 IEEE International Conference on Acoustics, Speech and Signal …, 2014 | 164 | 2014 |
Can language models solve graph problems in natural language? H Wang, S Feng, T He, Z Tan, X Han, Y Tsvetkov Advances in Neural Information Processing Systems 36, 2024 | 77 | 2024 |
Analyzing the forgetting problem in pretrain-finetuning of open-domain dialogue response models T He, J Liu, K Cho, M Ott, B Liu, J Glass, F Peng Proceedings of the 16th Conference of the European Chapter of the …, 2021 | 49 | 2021 |
Exploiting LSTM structure in deep neural networks for speech recognition T He, J Droppo 2016 IEEE international conference on acoustics, speech and signal …, 2016 | 49 | 2016 |
Negative training for neural dialogue response generation T He, J Glass ACL 2020, 2019 | 48 | 2019 |
An empirical study of transformer-based neural language model adaptation K Li, Z Liu, T He, H Huang, F Peng, D Povey, S Khudanpur ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 44 | 2020 |
Quantifying exposure bias for neural language generation T He, J Zhang, Z Zhou, J Glass | 33 | 2019 |
Suvrit Sra, and Ali Jadbabaie J Zhang, T He Neural network weights do not converge to stationary points: An invariant …, 2020 | 28 | 2020 |
On the blind spots of model-based evaluation metrics for text generation T He, J Zhang, T Wang, S Kumar, K Cho, J Glass, Y Tsvetkov arXiv preprint arXiv:2212.10020, 2022 | 26 | 2022 |
A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation M Nadeem, T He, K Cho, J Glass AACL 2020, 2020 | 26 | 2020 |
Exposure bias versus self-recovery: Are distortions really incremental for autoregressive text generation? T He, J Zhang, Z Zhou, J Glass arXiv preprint arXiv:1905.10617, 2019 | 26 | 2019 |
Detecting egregious responses in neural sequence-to-sequence models T He, J Glass International Conference on Learning Representations 2019, 2018 | 23 | 2018 |
On training bi-directional neural network language model with noise contrastive estimation T He, Y Zhang, J Droppo, K Yu 2016 10th International Symposium on Chinese Spoken Language Processing …, 2016 | 22 | 2016 |
Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models T He, B McCann, C Xiong, E Hosseini-Asl EACL 2021, 2021 | 17 | 2021 |
An investigation on DNN-derived bottleneck features for GMM-HMM based robust speech recognition Y You, Y Qian, T He, K Yu 2015 IEEE China Summit and International Conference on Signal and …, 2015 | 15 | 2015 |
On the zero-shot generalization of machine-generated text detectors X Pu, J Zhang, X Han, Y Tsvetkov, T He arXiv preprint arXiv:2310.05165, 2023 | 12 | 2023 |
Semstamp: A semantic watermark with paraphrastic robustness for text generation AB Hou, J Zhang, T He, Y Wang, YS Chuang, H Wang, L Shen, ... arXiv preprint arXiv:2310.03991, 2023 | 11 | 2023 |
Resolving knowledge conflicts in large language models Y Wang, S Feng, H Wang, W Shi, V Balachandran, T He, Y Tsvetkov arXiv preprint arXiv:2310.00935, 2023 | 9 | 2023 |
Recurrent neural network language model with structured word embeddings for speech recognition T He, X Xiang, Y Qian, K Yu 2015 IEEE International Conference on Acoustics, Speech and Signal …, 2015 | 9 | 2015 |