MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models D Zhu*, J Chen*, X Shen, X Li, M Elhoseiny ICLR 2024, 2023 | 1324 | 2023 |
MiniGPT-v2: Large Language Model As a Unified Interface For Vision-Language Multi-task Learning J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang, R Krishnamoorthi, ... arXiv preprint arXiv:2310.09478, 2023 | 246 | 2023 |
VisualGPT: Data-efficient adaptation of pretrained language models for image captioning J Chen, H Guo, K Yi, B Li, M Elhoseiny CVPR 2022, 2021 | 174 | 2021 |
Chatgpt asks, blip-2 answers: Automatic questioning towards enriched visual descriptions D Zhu, J Chen, K Haydarov, X Shen, W Zhang, M Elhoseiny arXiv preprint arXiv:2303.06594, 2023 | 71 | 2023 |
Predicting candidate genes from phenotypes, functions, and anatomical site of expression J Chen, AT Althagafi, R Hoehndorf Bioinformatics 2020, 2020 | 49 | 2020 |
DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes. W Liu-Wei, S Kafkas, J Chen, NJ Dimonaco, J Tegnér, R Hoehndorf Bioinformatics 2021, 2021 | 44 | 2021 |
Exploring long tail visual relationship recognition with large vocabulary S Abdelkarim, A Agarwal, P Achlioptas, J Chen, J Huang, B Li, K Church, ... ICCV 2021, 15921-15930, 2021 | 35* | 2021 |
Efficient self-supervised vision pretraining with local masked reconstruction J Chen, M Hu, B Li, M Elhoseiny arXiv preprint arXiv:2206.00790, 2022 | 33 | 2022 |
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions J Chen, D Zhu, K Haydarov, X Li, M Elhoseiny arXiv preprint arXiv:2304.04227, 2023 | 26 | 2023 |
Llm as a robotic brain: Unifying egocentric memory and control J Mai, J Chen, G Qian, M Elhoseiny, B Ghanem arXiv, 2023 | 24 | 2023 |
Exploring open-vocabulary semantic segmentation from clip vision encoder distillation only J Chen, D Zhu, G Qian, B Ghanem, Z Yan, C Zhu, F Xiao, SC Culatana, ... Proceedings of the IEEE/CVF International Conference on Computer Vision, 699-710, 2023 | 21* | 2023 |
RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition J Chen, A Agarwal, S Abdelkarim, D Zhu, M Elhoseiny CVPR 2022, 19507-19517, 2022 | 14* | 2022 |
Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation U Akujuobi, J Chen, M Elhoseiny, M Spranger, X Zhang NeurIPS 2020, 2020 | 13 | 2020 |
Efficient long-distance relation extraction with DG-SpanBERT J Chen, R Hoehndorf, M Elhoseiny, X Zhang Technical Report, 2020 | 10 | 2020 |
MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding J Chen, M Hu, DJ Coker, ML Berumen, B Costelloe, S Beery, A Rohrbach, ... CVPR 2023, 13052-13061, 2023 | 9 | 2023 |
An introduction to vision-language modeling F Bordes, RY Pang, A Ajay, AC Li, A Bardes, S Petryk, O Mañas, Z Lin, ... arXiv preprint arXiv:2405.17247, 2024 | 2 | 2024 |
MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis A Alkhaldi, R Alnajim, L Alabdullatef, R Alyahya, J Chen, D Zhu, A Alsinan, ... arXiv preprint arXiv:2407.04106, 2024 | | 2024 |
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time S Chowdhury, S Nag, S Dasgupta, J Chen, M Elhoseiny, R Gao, ... arXiv preprint arXiv:2407.01851, 2024 | | 2024 |