About me

I am a 3rd-year Ph.D student, supervised by Prof. Tat-Seng Chua in NExT++, School of Computing, National University of Singapore. Before that, I spent 1 year in THUNLP in Tsinghua University, working as a research assistant supervised by Prof. Zhiyuan Liu. During my bachelor learning in Nanjing University, I was a member in MAGUS and got supervised by Prof. Tongwei Ren. My research interests include scene graph generation and pre-trained models for vision and language understanding.

Chat with the [VL-Vicuna] built with our VPGTrans!

Preprint

* indicates equal contribution.

 • CPT: Colorful prompt tuning for pre-trained vision-language models. [arxiv] [code]
  Yuan Yao*, Ao Zhang*, Zhengyan Zhang, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun.

 • Pre-Trained Models: Past, Present and Future. [arxiv]
  Xu Han*, Zhengyan Zhang*, Ning Ding*, Yuxian Gu*, Xiao Liu*, Yuqi Huo*, Jiezhong Qiu, Yuan Yao, Ao Zhang, Liang Zhang, Wentao Han, Minlie Huang, Qin Jin, Yanyan Lan, Yang Liu, Zhiyuan Liu, Zhiwu Lu, Xipeng Qiu, Ruihua Song, Jie Tang, Ji-Rong Wen, Jinhui Yuan, Wayne Xin Zhao, Jun Zhu.

Publications

2024

 • MiniCPM-V: A GPT-4V Level MLLM on Your Phone. [huggingface] [code]
  Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Ronghua Zhou, Zhihui He, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, and Maosong Sun. (report is on the way)

 • NExT-Chat: An LMM for Chat, Detection and Segmentation. [website] [arxiv] [code]
  Ao Zhang, Yuan Yao#, Wei Ji, Zhiyuan Liu, and Tat-Seng Chua. (#Correspondence) International Conference on Machine Learning (ICML 2024)

2023

 • VPGTrans: Transfer Visual Prompt Generator across LLMs. [demo] [arxiv] [code] [zhihu] Ao Zhang, Hao Fei#, Yuan Yao#, Wei Ji, Li Li, Zhiyuan Liu, and Tat-Seng Chua. (#Correspondence) Conference on Neural Information Processing Systems (NeurIPS 2023)

2022

 • Fine-Grained Scene Graph Generation with Data Transfer. [arxiv] [code]
  Ao Zhang*, Yuan Yao*, Qianyu Chen, Wei Ji, Zhiyuan Liu, Maosong Sun, Tat-Seng Chua.
  European Conference on Computer Vision (ECCV 2022)
  (Oral Presentation, 2.7%)

 • PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models. [arxiv] [code]
  Yuan Yao, Qianyu Chen, Ao Zhang, Wei Ji, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun.
  Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)

2021

 • Visual Distant Supervision for Scene Graph Generation. [arxiv] [code]
  Yuan Yao*, Ao Zhang*, Xu Han, Mengdi Li, Cornelius Weber, Zhiyuan Liu, Stefan Wermter, Maosong Sun.
  International Conference on Computer Vision (ICCV 2021).

Previous

 • Monocular image based 3D model retrieval. [paper]
  Wenhui Li, Anan Liu, Weizhi Nie, Dan Song, Yuqian Li, Weijie Wang, Shu Xiang, Heyu Zhou, Ngoc-Minh Bui, Yunchi Cen, Zenian Chen, Huy-Hoang Chung-Nguyen, Gia-Han Diep, Trong-Le Do, Eugeni L. Doubrovski, Anh-Duc Duong, Jo M. P. Geraedts, Haobin Guo, Trung-Hieu Hoang, Yichen Li, Xing Liu, Zishun Liu, Duc-Tuan Luu, Yunsheng Ma, Vinh-Tiep Nguyen, Jie Nie, Tongwei Ren, Mai-Khiem Tran, Son-Thanh Tran-Nguyen, Minh-Triet Tran, The-Anh Vu-Le, Charlie C. L. Wang, Shijie Wang, Gangshan Wu, Caifei Yang, Meng Yuan, Hao Zhai, Ao Zhang, Fan Zhang, and Sicheng Zhao.
  Eurographics Workshop on 3D Object Retrieval (EGW’19-3DOR), Genoa, Italy, 2019.

Book Chapter

 • RGB-D salient object detection: a review. Chapter of book “RGB-D image analysis and processing”, edited by Paul Rosin, Yu-Kun Lai, Ling Shao, and Yonghuai Liu, 2019. [link]
  Tongwei Ren, and Ao Zhang.