I’m a Ph.D. student at Tsinghua University, supervised by Prof. Yu Wang and Prof. Yi Wu. My research interests include language agent, reinforcement learning, and multi-agent system.

🎓 Educations

  • 2022.09 - 2026.06: Tsinghua University, Ph.D. Student
  • 2016.09 - 2020.06: Tsinghua University, B.E. with honor

💻 Internships

  • 2025.11 - 2026.04: Moonshot AI, RL Team
  • 2025.03 - 2025.10: Ant Research, RL Lab

📄 Publications

Technical Report

Highlights

  • WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning
    Zelai Xu*, Zhexuan Xu*, Ruize Zhang*, Chunyang Zhu, Shi Yu, Weilin Liu, Quanlu Zhang, Wenbo Ding, Chao Yu, Yu Wang
    Preprint [paper] [code] [dataset] [model] [project page]

  • MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
    Huining Yuan*, Zelai Xu*, Zheyue Tan, Xiangmin Yi, Mo Guang, Kaiwen Long, Haojia Hui, Boxun Li, Xinlei Chen, Bo Zhao, Xiao-Ping Zhang, Chao Yu, Yu Wang
    ICLR 2026 [paper] [code] [model] [project page]

  • VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments
    Zelai Xu*, Zhexuan Xu*, Xiangmin Yi, Huining Yuan, Mo Guang, Kaiwen Long, Xinlei Chen, Yi Wu, Chao Yu, Yu Wang
    CVPR 2026 Oral [paper] [code] [dataset] [project page]

  • Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
    Zelai Xu, Chao Yu, Fei Fang, Yu Wang, Yi Wu
    ICML 2024 [paper] [code] [project page]

(Co-)First Author

  • RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment
    Xiaoyang Cao*, Zelai Xu*, Mo Guang, Kaiwen Long, Michiel A Bakker, Yu Wang, Chao Yu
    ICLR 2026 [paper] [code] [project page]

  • Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization
    Zelai Xu, Wanjun Gu, Chao Yu, Yi Wu, Yu Wang
    ICML 2025 [paper]

  • Learning Global Nash Equilibrium in Team Competitive Games with Generalized Fictitious Cross-Play
    Zelai Xu, Chao Yu, Yancheng Liang, Yi Wu, Yu Wang
    JMLR 2025 [paper] [code]

  • VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play
    Zelai Xu*, Ruize Zhang*, Chao Yu, Huining Yuan, Xiangmin Yi, Shilong Ji, Chuqi Wang, Wenhao Tang, Feng Gao, Wenbo Ding, Xinlei Chen, Yu Wang
    NeurIPS 2025 [paper] [code] [project page]

  • Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with Subgame Curriculum Learning
    Jiayu Chen*, Zelai Xu*, Yunfei Li, Chao Yu, Jiaming Song, Huazhong Yang, Fei Fang, Yu Wang, Yi Wu
    AAAI 2024 [paper] [code] [project page]

  • Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games
    Zelai Xu*, Yancheng Liang, Chao Yu, Yu Wang, Yi Wu
    AAMAS 2023 [paper] [code]

  • Texture BERT for Cross-Modal Texture Image Retrieval
    Zelai Xu, Yu Tan, Ping Li
    CIKM 2022 [paper]

  • Towards Efficient Evaluation of Risk via Herding
    Zelai Xu*, Tiancheng Yu*, Suvrit Sra
    ICML 2019 Workshop [paper]

  • MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation
    Lu Yang*, Zelai Xu*, Minyang Xie, Jiaxuan Gao, Zhao Shok, Yu Wang, Yi Wu
    Preprint [paper] [code]

  • AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models
    Le Qiu*, Zelai Xu*, Qixin Tan*, Wenhao Tang, Chao Yu, Yu Wang
    Preprint [paper] [code]

Collaborations

  • EARL: Efficient Agentic RL Post-Training for LLMs under Dynamic Context Lengths
    Zheyue Tan, Tuo Shi, Huining Yuan, Zelai Xu, Chao Yu, Boxun Li, Yu Wang, Bo Zhao
    EuroMLSys 2026 Workshop [paper]

  • Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning
    Ruize Zhang, Sirui Xiang, Zelai Xu, Feng Gao, Shilong Ji, Wenhao Tang, Wenbo Ding, Chao Yu, Yu Wang
    CoRL 2025 [paper] [code] [project page]

  • Multi-Agent Vulnerability Discovery for Autonomous Driving Policy by Finding AV-Responsible Scenarios
    Ye Mu*, Weilin Liu*, Chao Yu, Xuefei Ning, Zhong Cao, Zelai Xu, Shuang Liang, Huazhong Yang, Yu Wang
    CASE 2024 [paper]

  • Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
    Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, Yi Wu
    ICML 2022 [paper] [code] [project page]

  • A Survey on Self-Play Methods in Reinforcement Learning
    Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Wenhao Tang, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, Yu Wang
    Preprint [paper]

🎖 Honors and Awards

  • 2025.10: NeurIPS 2025 Top Reviewer
  • 2024.10: National Scholarship for Graduate Students
  • 2019.10: National Scholarship for Undergraduate Students