Research
I am currently a PhD student in Computer Science at University of Wisconsin-Madison, advised by Prof. Tengyang Xie. My research focuses on reinforcement learning theory, AI agents, and LLM post-training. I am particularly interested in developing theoretical foundations and practical algorithms for intelligent systems.
My current research spans multiple exciting areas in AI and machine learning, with a focus on both theoretical understanding and practical applications. I am always open to collaborations and discussions about these topics.
Research Interests
- LLM Post-training
- AI Agent
- Reinforcement Learning
- Artificial Intelligence
Research Experience
Heuristic Exploration for Self-Improving Agentic Systems
Feb. 2025 – Present
University of Wisconsin-Madison
Work with Dr. Ching-An Cheng (Microsoft Research) and Prof. Tengyang Xie
- Established a handcrafted baseline agent for the benchmark and defined reproducible evaluation metrics.
- Integrated Microsoft’s Trace framework to automate structured prompt tuning and decision-logic optimization.
- Designed and implement novel meta search algorithms that exploit Trace to enhance the optimization process.
Offline Alignment for Language Models
Jan. 2024 – Present
University of Wisconsin-Madison
Advisor: Prof. Tengyang Xie
- Performed an in-depth literature review on offline reinforcement learning (RL) algorithms.
- Applied offline RL algorithms to enhance alignment in large language models (LLMs).
Offline Reinforcement Learning and Policy Evaluation Theory
Sept. 2024 – Present
University of Wisconsin-Madison
Advisor: Prof. Tengyang Xie
- Developed theoretical insights into value-based reinforcement learning (RL) algorithms.
- Theoretically analyzed estimation and approximation errors across various policy evaluation methods.
- Designed and conducted experiments to validate theoretical predictions.
Optimal Batched Linear Bandits
Aug. 2023 – Jan. 2024
Duke University
Advisor: Prof. Pan Xu (Department of Computer Science, Duke University)
- Devised an algorithm striving for asymptotic and non-asymptotic optimality in the linear bandits setting, an achievement previously unattained.
- Adapted the algorithm into a batched version with provable least batch complexity, extending applicability to common real-world problems.
- Confirmed the algorithm’s superiority over existing baseline methods through rigorous experimentation, showcasing its practical efficacy in linear bandits problems.
Publications
Xuanfei Ren, Tianyuan Jin, Pan Xu. Optimal Batched Linear Bandits. In Proc. of the 41st International Conference on Machine Learning (ICML 2024).
