Curriculum learning for vision-and-language navigation J Zhang, Z Wei, J Fan, J Peng Advances in Neural Information Processing Systems 34, 13328-13339, 2021 | 19 | 2021 |
Android in the zoo: Chain-of-action-thought for gui agents J Zhang, J Wu, Y Teng, M Liao, N Xu, X Xiao, Z Wei, D Tang arXiv preprint arXiv:2403.02713, 2024 | 10 | 2024 |
Reform-eval: Evaluating large vision language models via unified re-formulation of task-oriented benchmarks Z Li, Y Wang, M Du, Q Liu, B Wu, J Zhang, C Zhou, Z Fan, J Fu, J Chen, ... arXiv preprint arXiv:2310.02569, 2023 | 3 | 2023 |
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models Z Li, R Luo, J Zhang, M Qiu, Z Wei arXiv preprint arXiv:2405.16919, 2024 | 1 | 2024 |
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning M Du, B Wu, J Zhang, Z Fan, Z Li, R Luo, X Huang, Z Wei arXiv preprint arXiv:2404.01994, 2024 | 1 | 2024 |
UI-Hawk: Unleashing the Screen Stream Understanding for GUI Agents J Zhang, Y Yu, M Liao, W Li, J Wu, Z Wei Preprints, 2024 | | 2024 |
Breaking Down the Task: A Unit-Grained Hybrid Training Framework for Vision and Language Decision Making R Luo, J Zhang, Z Wei arXiv preprint arXiv:2307.08016, 2023 | | 2023 |