Hello! I am Zijun Gao. I received my B.S. in Mathematics and Computer Science from the University of Illinois Urbana-Champaign in December 2025. My research interests focus on the intersection of LLM reasoning, Reinforcement Learning (RL), and AI agents, with a particular emphasis on large-scale post-training and reasoning enhancement.
Currently, I am a Research Intern at Northwestern University's MLL Lab, advised by Prof. Manling Li working on agent RL. I have also conducted research at Arizona State University's ARC Lab under the supervision of Prof. Ben Zhou on improving mathematical reasoning in large language models.
I am actively seeking industrial or research roles focused on LLM Post-training (RL) and RL Infra development. This includes:
- Training stronger reasoning models (e.g., DeepSeek-R1, Qwen-style reasoning models)
- Improving agent training frameworks in reinforcement learning settings (e.g., veRL, RAGEN)
Please feel free to contact me at zijung3@illinois.edu — I would be happy to discuss collaboration or opportunities.
News
Research Interests
- Reinforcement Learning: On/off-policy algorithms and RL Infra optimization.
- LLM Post-training: RLHF alignment, CoT reasoning, and instruction tuning.
- Multimodal Intelligence: Multi-sensory reasoning across Vision, Speech, and Language.
- Collaborative Agents: Multi-agent systems featuring autonomous planning, memory, and interaction.
Academic Research
Northwestern University – MLL Lab May 2025 – Present
Multi-Agent Collaborative Training (MAGEN): Developed the MAGEN framework, a multi-turn multi-agent reinforcement learning pipeline for collaborative training of LLMs and VLMs.
Outcome: Co-First Author, Project in Progress. Supervised by Prof. Manling Li.
Arizona State University – ARC Lab Feb 2025 – Dec 2025
Concept-Oriented Reinforcement Learning (CORE): Proposed the CORE framework to bridge concept definitions and mathematical reasoning through reinforcement learning, and achieved consistent improvements on both in-domain and out-of-domain mathematical benchmarks.
Outcome: First Author, ICLR 2026. Supervised by Prof. Ben Zhou. [Paper] [Code]
University of California, San Diego Jul 2024 – Jan 2025
Photorealistic World Simulator (SimWorld): Built photorealistic 3D environments for multi-agent interaction. Developed automated asset pipelines using UnrealCV and Blender, and contributed to large-scale dataset generation.
Outcome: CVPR 2025 Demo Track. Supervised by Prof. Zhiting Hu and Prof. Lianhui Qin.
Education
University of Illinois Urbana-Champaign (UIUC)
Jan 2024 – Dec 2025
B.S. in Mathematics and Computer Science, Highest Distinction
Beijing Jiaotong University (BJTU)
Aug 2021 – Dec 2023
B.E. in Computer Science and Technology, Top 5% (Transferred)