Twins: Revisiting the design of spatial attention in vision transformers X Chu, Z Tian, Y Wang, B Zhang, H Ren, X Wei, H Xia, C Shen Advances in Neural Information Processing Systems 34, 2021 | 1044 | 2021 |
End-to-End Video Instance Segmentation with Transformers Y Wang, Z Xu, X Wang, C Shen, B Cheng, H Shen, H Xia IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2021 | 803 | 2021 |
Centermask: single shot instance segmentation with point representation Y Wang, Z Xu, H Shen, B Cheng, L Yang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 103 | 2020 |
Loong: Generating minute-level long videos with autoregressive language models Y Wang, T Xiong, D Zhou, Z Lin, Y Zhao, B Kang, J Feng, X Liu arXiv preprint arXiv:2410.02757, 2024 | 4 | 2024 |
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions T Xiong, Y Wang, D Zhou, Z Lin, J Feng, X Liu arXiv preprint arXiv:2410.10816, 2024 | | 2024 |