Yifan Du (都一凡)

I am a fourth-year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China, supervised by Prof. Wayne Xin Zhao. I have a broad interest in Multimodal Large Language Models (MLLMs), especially VL post-training and long video understanding. Now I'm also interested in Multimodal Agent (e.g., Omni Search Agent).

Email  /  Github  /  Google Scholar  /  Twitter

I will graduate at 2027 Fall. Feel free to contact me via email if your are recruting!

profile photo
Research

My current research focuses on Multimodal Large Language Models (MLLMs). I am particularly interested in visual instruction tuning, long video understanding, and complex visual reasoning. Representative works include Virgo (reproducing o1-like MLLM) and POPE (evaluating object hallucination in LVLMs).

Publications

(* denotes equal contribution.)

V-CoT Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization
Yifan Du*, Kun Zhou*, Yingqian Min, Yue Ling, Wayne Xin Zhao, Youbin Wu
CVPR 2026
Paper / Code

Seed1.5-VL Seed1.5-VL Technical Report
Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, ...Yifan Du, ...
arXiv 2025
Paper / Code

Virgo Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Yifan Du*, Zikang Liu*, Yifan Li*, Wayne Xin Zhao, Yuqi Huo, Bingning Wang, Weipeng Chen, Zheng Liu, Zhongyuan Wang, Ji-Rong Wen
arXiv 2025
Paper / Code

Opt-Visor Exploring the Design Space of Visual Context Representation in Video MLLMs
Yifan Du*, Yuqi Huo*, Kun Zhou*, Zijia Zhao, Haoyu Lu, Han Huang, Wayne Xin Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen
ICLR 2025
Paper / Code

Event-Bench Towards Event-oriented Long Video Understanding
Yifan Du*, Kun Zhou*, Yuqi Huo, Yifan Li, Wayne Xin Zhao, Haoyu Lu, Zijia Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen
arXiv 2024
Paper / Code

ComVint What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du*, Hangyu Guo*, Kun Zhou*, Wayne Xin Zhao, Jinpeng Wang, Chuyuan Wang, Mingchen Cai, Ruihua Song, Ji-Rong Wen
COLING 2025
Paper / Code

POPE Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li*, Yifan Du*, Kun Zhou*, Jinpeng Wang, Wayne Xin Zhao, Ji-Rong Wen
EMNLP 2023
Paper / Code

LLM Survey A Survey of Large Language Models
Wayne Xin Zhao, Kun Zhou*, Junyi Li*, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen
arXiv 2023
Paper / Code

LAMOC Zero-shot Visual Question Answering with Language Model Feedback
Yifan Du, Junyi Li, Tianyi Tang, Wayne Xin Zhao, Ji-Rong Wen
ACL 2023
Paper / Code

LIVE Learning to Imagine: Visually-Augmented Natural Language Generation
Tianyi Tang, Yushuo Chen, Yifan Du, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen
ACL 2023
Paper / Code

VLP Survey A Survey of Vision-Language Pre-Trained Models
Yifan Du*, Zikang Liu*, Junyi Li, Wayne Xin Zhao
IJCAI 2022
Paper

Education
RUC logo Renmin University of China
Ph.D. student in Artificial Intelligence (2022 - 2027 Expected)
Advisor: Prof. Wayne Xin Zhao
SDU logo Shandong University
B.Sc. in Statistics (2018 - 2022)
Experience
Baichuan logo ByteDance Seed
VLM Post-training Intern (Feb. 2025 - Present)
Baichuan logo Baichuan
MLLM Research Intern (Apr. 2024 - Jan. 2025)
Meituan logo Meituan Group
Research Intern (Apr. 2023 - Mar. 2024)
Open-Source Projects
  • Virgo: A MLLM with slow-thinking reasoning ability
  • POPE: A benchmark evaluating the object hallucination problem of MLLM



Thanks Jon Barron for this amazing template.