I’m a Ph.D. student at Tsinghua University, supervised by Prof. Yu Wang and Prof. Yi Wu. My research interests include language agent, reinforcement learning, and multi-agent system.
🎓 Educations
- 2022.09 - 2026.06: Tsinghua University, Ph.D. Student
- 2016.09 - 2020.06: Tsinghua University, B.E. with honor
💻 Internships
- 2025.11 - 2026.04: Moonshot AI, RL Team
- 2025.03 - 2025.10: Ant Research, RL Lab
📄 Publications
Technical Report
- Kimi K2.5: Visual Agentic Intelligence
Zelai Xu contributes to Agent Swarm
Preprint [paper] [blog] [agent swarm]
Highlights
-
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning
Zelai Xu*, Zhexuan Xu*, Ruize Zhang*, Chunyang Zhu, Shi Yu, Weilin Liu, Quanlu Zhang, Wenbo Ding, Chao Yu, Yu Wang
Preprint [paper] [code] [dataset] [model] [project page] -
MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
Huining Yuan*, Zelai Xu*, Zheyue Tan, Xiangmin Yi, Mo Guang, Kaiwen Long, Haojia Hui, Boxun Li, Xinlei Chen, Bo Zhao, Xiao-Ping Zhang, Chao Yu, Yu Wang
ICLR 2026 [paper] [code] [model] [project page] -
VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments
Zelai Xu*, Zhexuan Xu*, Xiangmin Yi, Huining Yuan, Mo Guang, Kaiwen Long, Xinlei Chen, Yi Wu, Chao Yu, Yu Wang
CVPR 2026 Oral [paper] [code] [dataset] [project page] -
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
Zelai Xu, Chao Yu, Fei Fang, Yu Wang, Yi Wu
ICML 2024 [paper] [code] [project page]
(Co-)First Author
-
RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment
Xiaoyang Cao*, Zelai Xu*, Mo Guang, Kaiwen Long, Michiel A Bakker, Yu Wang, Chao Yu
ICLR 2026 [paper] [code] [project page] -
Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization
Zelai Xu, Wanjun Gu, Chao Yu, Yi Wu, Yu Wang
ICML 2025 [paper] -
Learning Global Nash Equilibrium in Team Competitive Games with Generalized Fictitious Cross-Play
Zelai Xu, Chao Yu, Yancheng Liang, Yi Wu, Yu Wang
JMLR 2025 [paper] [code] -
VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play
Zelai Xu*, Ruize Zhang*, Chao Yu, Huining Yuan, Xiangmin Yi, Shilong Ji, Chuqi Wang, Wenhao Tang, Feng Gao, Wenbo Ding, Xinlei Chen, Yu Wang
NeurIPS 2025 [paper] [code] [project page] -
Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with Subgame Curriculum Learning
Jiayu Chen*, Zelai Xu*, Yunfei Li, Chao Yu, Jiaming Song, Huazhong Yang, Fei Fang, Yu Wang, Yi Wu
AAAI 2024 [paper] [code] [project page] -
Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games
Zelai Xu*, Yancheng Liang, Chao Yu, Yu Wang, Yi Wu
AAMAS 2023 [paper] [code] -
Texture BERT for Cross-Modal Texture Image Retrieval
Zelai Xu, Yu Tan, Ping Li
CIKM 2022 [paper] -
Towards Efficient Evaluation of Risk via Herding
Zelai Xu*, Tiancheng Yu*, Suvrit Sra
ICML 2019 Workshop [paper] -
MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation
Lu Yang*, Zelai Xu*, Minyang Xie, Jiaxuan Gao, Zhao Shok, Yu Wang, Yi Wu
Preprint [paper] [code] -
AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models
Le Qiu*, Zelai Xu*, Qixin Tan*, Wenhao Tang, Chao Yu, Yu Wang
Preprint [paper] [code]
Collaborations
-
EARL: Efficient Agentic RL Post-Training for LLMs under Dynamic Context Lengths
Zheyue Tan, Tuo Shi, Huining Yuan, Zelai Xu, Chao Yu, Boxun Li, Yu Wang, Bo Zhao
EuroMLSys 2026 Workshop [paper] -
Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning
Ruize Zhang, Sirui Xiang, Zelai Xu, Feng Gao, Shilong Ji, Wenhao Tang, Wenbo Ding, Chao Yu, Yu Wang
CoRL 2025 [paper] [code] [project page] -
Multi-Agent Vulnerability Discovery for Autonomous Driving Policy by Finding AV-Responsible Scenarios
Ye Mu*, Weilin Liu*, Chao Yu, Xuefei Ning, Zhong Cao, Zelai Xu, Shuang Liang, Huazhong Yang, Yu Wang
CASE 2024 [paper] -
Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, Yi Wu
ICML 2022 [paper] [code] [project page] -
A Survey on Self-Play Methods in Reinforcement Learning
Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Wenhao Tang, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, Yu Wang
Preprint [paper]
🎖 Honors and Awards
- 2025.10: NeurIPS 2025 Top Reviewer
- 2024.10: National Scholarship for Graduate Students
- 2019.10: National Scholarship for Undergraduate Students