|
Research
My current research focuses on Multimodal Large Language Models (MLLMs). I am particularly interested in visual instruction tuning, long video understanding, and complex visual reasoning. Representative works include Virgo (reproducing o1-like MLLM) and POPE (evaluating object hallucination in LVLMs).
|
|
Publications
(* denotes equal contribution.)
|
|
Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization
Yifan Du*, Kun Zhou*, Yingqian Min, Yue Ling, Wayne Xin Zhao, Youbin Wu
CVPR 2026
Paper / Code
|
|
Seed1.5-VL Technical Report
Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, ...Yifan Du, ...
arXiv 2025
Paper / Code
|
|
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Yifan Du*, Zikang Liu*, Yifan Li*, Wayne Xin Zhao, Yuqi Huo, Bingning Wang, Weipeng Chen, Zheng Liu, Zhongyuan Wang, Ji-Rong Wen
arXiv 2025
Paper / Code
|
|
Exploring the Design Space of Visual Context Representation in Video MLLMs
Yifan Du*, Yuqi Huo*, Kun Zhou*, Zijia Zhao, Haoyu Lu, Han Huang, Wayne Xin Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen
ICLR 2025
Paper / Code
|
|
Towards Event-oriented Long Video Understanding
Yifan Du*, Kun Zhou*, Yuqi Huo, Yifan Li, Wayne Xin Zhao, Haoyu Lu, Zijia Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen
arXiv 2024
Paper / Code
|
|
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du*, Hangyu Guo*, Kun Zhou*, Wayne Xin Zhao, Jinpeng Wang, Chuyuan Wang, Mingchen Cai, Ruihua Song, Ji-Rong Wen
COLING 2025
Paper / Code
|
|
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li*, Yifan Du*, Kun Zhou*, Jinpeng Wang, Wayne Xin Zhao, Ji-Rong Wen
EMNLP 2023
Paper / Code
|
|
A Survey of Large Language Models
Wayne Xin Zhao, Kun Zhou*, Junyi Li*, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen
arXiv 2023
Paper / Code
|
|
Zero-shot Visual Question Answering with Language Model Feedback
Yifan Du, Junyi Li, Tianyi Tang, Wayne Xin Zhao, Ji-Rong Wen
ACL 2023
Paper / Code
|
|
Learning to Imagine: Visually-Augmented Natural Language Generation
Tianyi Tang, Yushuo Chen, Yifan Du, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen
ACL 2023
Paper / Code
|
|
A Survey of Vision-Language Pre-Trained Models
Yifan Du*, Zikang Liu*, Junyi Li, Wayne Xin Zhao
IJCAI 2022
Paper
|
|
ByteDance Seed
VLM Post-training Intern (Feb. 2025 - Present)
|
|
Baichuan
MLLM Research Intern (Apr. 2024 - Jan. 2025)
|
|
Meituan Group
Research Intern (Apr. 2023 - Mar. 2024)
|
Open-Source Projects
-
Virgo: A MLLM with slow-thinking reasoning ability
-
POPE: A benchmark evaluating the object hallucination problem of MLLM
|
|