Flamingo: a visual language model for few-shot learning JB Alayrac, J Donahue, P Luc, A Miech, I Barr, Y Hasson, K Lenc, ... Advances in neural information processing systems 35, 23716-23736, 2022 | 2283 | 2022 |
Video representation learning by dense predictive coding T Han, W Xie, A Zisserman Workshop on Large-scale Holistic Video Understanding, ICCV, 2019 | 413 | 2019 |
Self-supervised Co-training for Video Representation Learning T Han, W Xie, A Zisserman Conference on Neural Information Processing Systems (NeurIPS), 2020 | 409 | 2020 |
Prompting visual-language models for efficient video understanding C Ju, T Han, K Zheng, Y Zhang, W Xie European Conference on Computer Vision, 105-124, 2022 | 274 | 2022 |
Memory-augmented Dense Predictive Coding for Video Representation Learning T Han, W Xie, A Zisserman European Conference on Computer Vision (ECCV), 2020, 2020 | 256 | 2020 |
Whisperx: Time-accurate speech transcription of long-form audio M Bain, J Huh, T Han, A Zisserman arXiv preprint arXiv:2303.00747, 2023 | 94 | 2023 |
Temporal Alignment Networks for Long-term Video T Han, W Xie, A Zisserman IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022, 2022 | 67 | 2022 |
Human pose forecasting via deep markov models S Toyer, A Cherian, T Han, S Gould 2017 International Conference on Digital Image Computing: Techniques and …, 2017 | 56 | 2017 |
AutoAD: Movie description in context T Han, M Bain, A Nagrani, G Varol, W Xie, A Zisserman Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 23 | 2023 |
Human action forecasting by learning task grammars T Han, J Wang, A Cherian, S Gould arXiv preprint arXiv:1709.06391, 2017 | 17 | 2017 |
Autoad ii: The sequel-who, when, and what in movie audio description T Han, M Bain, A Nagrani, G Varol, W Xie, A Zisserman Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 16 | 2023 |
Prompt Generation Networks for Input-based Adaptation of Frozen Vision Transformers J Loedeman, MC Stol, T Han, YM Asano arXiv preprint arXiv:2210.06466, 2022 | 15 | 2022 |
Open-world text-specified object counting N Amini-Naieni, K Amini-Naieni, T Han, A Zisserman British Machine Vision Association, 2023 | 7 | 2023 |
Turbo training with token dropout T Han, W Xie, A Zisserman arXiv preprint arXiv:2210.04889, 2022 | 7 | 2022 |
Stale Diffusion: Hyper-realistic 5D Movie Generation Using Old-school Methods JF Henriques, D Campbell, T Han arXiv preprint arXiv:2404.01079, 2024 | | 2024 |
AutoAD III: The Prequel-Back to the Pixels T Han, M Bain, A Nagrani, G Varol, W Xie, A Zisserman Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | | 2024 |
A Strong Baseline for Temporal Video-Text Alignment Z Li, Q Chen, T Han, Y Zhang, Y Wang, W Xie arXiv preprint arXiv:2312.14055, 2023 | | 2023 |
Semantic Counting from Self-Collages L Knobel, T Han, YM Asano arXiv preprint arXiv:2307.08727, 2023 | | 2023 |