Research

I am currently a PhD student in Computer Science at University of Wisconsin-Madison, advised by Prof. Tengyang Xie. My research focuses on reinforcement learning theory, AI agents, and LLM post-training. I am particularly interested in developing theoretical foundations and practical algorithms for intelligent systems.

My current research spans multiple exciting areas in AI and machine learning, with a focus on both theoretical understanding and practical applications. I am always open to collaborations and discussions about these topics.

Research Interests

LLM Post-training
AI Agent
Reinforcement Learning
Artificial Intelligence

Research Experience

Heuristic Exploration for Self-Improving Agentic Systems

Feb. 2025 – Present
University of Wisconsin-Madison
Work with Dr. Ching-An Cheng (Microsoft Research) and Prof. Tengyang Xie

Established a handcrafted baseline agent for the benchmark and defined reproducible evaluation metrics.
Integrated Microsoft’s Trace framework to automate structured prompt tuning and decision-logic optimization.
Designed and implement novel meta search algorithms that exploit Trace to enhance the optimization process.

Offline Alignment for Language Models

Jan. 2024 – Present
University of Wisconsin-Madison
Advisor: Prof. Tengyang Xie

Performed an in-depth literature review on offline reinforcement learning (RL) algorithms.
Applied offline RL algorithms to enhance alignment in large language models (LLMs).

Offline Reinforcement Learning and Policy Evaluation Theory

Sept. 2024 – Present
University of Wisconsin-Madison
Advisor: Prof. Tengyang Xie

Developed theoretical insights into value-based reinforcement learning (RL) algorithms.
Theoretically analyzed estimation and approximation errors across various policy evaluation methods.
Designed and conducted experiments to validate theoretical predictions.

Optimal Batched Linear Bandits

Aug. 2023 – Jan. 2024
Duke University
Advisor: Prof. Pan Xu (Department of Computer Science, Duke University)

Devised an algorithm striving for asymptotic and non-asymptotic optimality in the linear bandits setting, an achievement previously unattained.
Adapted the algorithm into a batched version with provable least batch complexity, extending applicability to common real-world problems.
Confirmed the algorithm’s superiority over existing baseline methods through rigorous experimentation, showcasing its practical efficacy in linear bandits problems.

Publications

Xuanfei Ren, Tianyuan Jin, Pan Xu. Optimal Batched Linear Bandits. In Proc. of the 41st International Conference on Machine Learning (ICML 2024).