Hello! I am Zijun Gao. I received my B.S. in Mathematics and Computer Science from the University of Illinois Urbana-Champaign in December 2025. My research interests focus on the intersection of LLM reasoning, Reinforcement Learning (RL), and AI agents, with a particular emphasis on large-scale post-training and reasoning enhancement.

Currently, I am a Research Intern at Northwestern University's MLL Lab, advised by Prof. Manling Li working on agent RL. I have also conducted research at Arizona State University's ARC Lab under the supervision of Prof. Ben Zhou on improving mathematical reasoning in large language models.

I am actively seeking industrial or research roles focused on LLM Post-training (RL) and RL Infra development. This includes:

  • Training stronger reasoning models (e.g., DeepSeek-R1, Qwen-style reasoning models)
  • Improving agent training frameworks in reinforcement learning settings (e.g., veRL, RAGEN)

Please feel free to contact me at zijung3@illinois.edu — I would be happy to discuss collaboration or opportunities.

News

  • [Jan 2026] CORE paper accepted at ICLR 2026!
  • [Mar 2025] SimWorld accepted at CVPR 2025 Demo Track!

Research Interests

  • Reinforcement Learning: On/off-policy algorithms and RL Infra optimization.
  • LLM Post-training: RLHF alignment, CoT reasoning, and instruction tuning.
  • Multimodal Intelligence: Multi-sensory reasoning across Vision, Speech, and Language.
  • Collaborative Agents: Multi-agent systems featuring autonomous planning, memory, and interaction.

Academic Research

Northwestern University – MLL Lab May 2025 – Present

Multi-Agent Collaborative Training (MAGEN): Developed the MAGEN framework, a multi-turn multi-agent reinforcement learning pipeline for collaborative training of LLMs and VLMs.
Outcome: Co-First Author, Project in Progress. Supervised by Prof. Manling Li.

Arizona State University – ARC Lab Feb 2025 – Dec 2025

Concept-Oriented Reinforcement Learning (CORE): Proposed the CORE framework to bridge concept definitions and mathematical reasoning through reinforcement learning, and achieved consistent improvements on both in-domain and out-of-domain mathematical benchmarks.
Outcome: First Author, ICLR 2026. Supervised by Prof. Ben Zhou. [Paper] [Code]

University of California, San Diego Jul 2024 – Jan 2025

Photorealistic World Simulator (SimWorld): Built photorealistic 3D environments for multi-agent interaction. Developed automated asset pipelines using UnrealCV and Blender, and contributed to large-scale dataset generation.
Outcome: CVPR 2025 Demo Track. Supervised by Prof. Zhiting Hu and Prof. Lianhui Qin.

Education

University of Illinois Urbana-Champaign (UIUC) Jan 2024 – Dec 2025
B.S. in Mathematics and Computer Science, Highest Distinction

Beijing Jiaotong University (BJTU) Aug 2021 – Dec 2023
B.E. in Computer Science and Technology, Top 5% (Transferred)