Yifan Du (都一凡)

I am a third-year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China, and I have the fortune of being advised by Prof. Wayne Xin Zhao. My primary research interests are centered around Multimodal Large Language Models (MLLMs), focusing on visual instruction tuning, long video understanding, and complex visual reasoning. I welcome communication, please feel free to drop me an email. :)

Education

  • Ph.D. student of Artificial Intelligence, Renmin University of China, 2022-2027 (Expected)

  • B.Sc. of Statistics, Shandong University, 2018-2022

Experience

  • 2024/04 - present: Baichuan, MLLM Research Intern

  • 2023/04 - 2024/03: Meituan Group, Research Intern

Publication

  • Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

    Yifan Du†, Zikang Liu†, Yifan Li†, Wayne Xin Zhao, Yuqi Huo, Bingning Wang, Weipeng Chen, Zheng Liu, Zhongyuan Wang, Ji-Rong Wen [pdf] [code]

  • Exploring the Design Space of Visual Context Representation in Video MLLMs (arXiv)

    Yifan Du†, Yuqi Huo†, Kun Zhou†, Zijia Zhao, Haoyu Lu, Han Huang, Wayne Xin Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen [pdf] [code]

  • Towards Event-oriented Long Video Understanding (arXiv)

    Yifan Du†, Kun Zhou†, Yuqi Huo, Yifan Li, Wayne Xin Zhao, Haoyu Lu, Zijia Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen [pdf] [code]

  • What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning (COLING 2025)
    Yifan Du†, Hangyu Guo†, Kun Zhou†, Wayne Xin Zhao, Jinpeng Wang, Chuyuan Wang, Mingchen Cai, Ruihua Song, Ji-Rong Wen
    [pdf] [code]

  • Evaluating Object Hallucination in Large Vision-Language Models (EMNLP 2023)
    Yifan Li†, Yifan Du†, Kun Zhou†, Jinpeng Wang, Wayne Xin Zhao, Ji-Rong Wen
    [pdf] [code]

  • A Survey of Large Language Models
    Wayne Xin Zhao, Kun Zhou†, Junyi Li†, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen
    [pdf] [code]

  • Zero-shot Visual Question Answering with Language Model Feedback (ACL 2023 Findings)
    Yifan Du, Junyi Li, Tianyi Tang, Wayne Xin Zhao, Ji-Rong Wen
    [pdf] [code]

  • Learning to Imagine: Visually-Augmented Natural Language Generation (ACL 2023)
    Tianyi Tang, Yushuo Chen, Yifan Du, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen
    [pdf] [code]

  • A Survey of Vision-Language Pre-Trained Models (IJCAI 2022)
    Yifan Du†, Zikang Liu†, Junyi Li, Wayne Xin Zhao
    [pdf] [code]

Open-Source Projects:

  • Model: Virgo, a MLLM with slow-thinking reasoning ability
  • Benchmark: POPE, a benchmark evaluating the object hallucination problem of MLLM