Yi Hu (胡逸)

Email: huyi2002 (at) stu (dot) pku (dot) edu (dot) cn

I am now a first-year Ph.D. student at the Institute for Artificial Intelligence, Peking University, advised by Prof. Muhan Zhang. Welcome to visit our group page: Graphpku!

I received my B.S. in Physics from School of Physics, Peking University, where I was fortunate to work with Prof. Huichao Song on ML applications in heavy-ion collision.

I am dedicated to exploring the reasoning mechanisms of large language models (LLMs) and researching how to enhance the models’ reasoning capabilities to the level of human experts. Additionally, I have a keen interest in various topics related to LLMs, including efficiency, alignment, and applications across different downstream domains etc. If you share these research interests, please feel free to get in touch!

(I have recently moved to this site, the website is still under construction.)

News

Sep 23, 2025	🥳🥳 Two papers accepted at NeurIPS 2025, check out here: Meta-RFFT and PHYBench. One paper accepted at NeurIPS Mech Interp Workshop: What Affects the Effective Depth of Large Language Models? (Waiting for the openreview link come out.)
Apr 23, 2025	🔥🔥 We collaborate with School of Physics, Peking University and release PHYBench, a physical reasoning benchmark for modern LLMs. Covering mechanics, electromagnetism, thermodynamics, optics, modern physics, and advanced physics, the benchmark spans difficulty levels from high school exercises to undergraduate problems and Physics Olympiad challenges. Check out our paper: PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models.
Feb 17, 2025	How to enhance the rule-following ability of LLMs? 🤔 We propose Meta Rule-Following Fine-Tuning (Meta-RFFT) to improve the cross-task transferability of rule-following abilities. We construct a dataset of 88 length generalization tasks and show Meta-RFFT helps models to outperform baselines in both downstream fine-tuning and few-shot prompting scenarios. 👋 Check out our new paper: Training Large Language Models to be Better Rule Followers!
Jan 23, 2025	NUPA got accepted by ICLR 2025🥳🥳!! Check out our paper: Number Cookbook: Number Understanding of Language Models and How to Improve It!
Nov 07, 2024	Why `9.9 > 9.11` is so hard for LLMs? 😫 Check out our new paper: Number Cookbook: Number Understanding of Language Models and How to Improve It, where we introduce a comprehensive benchmark covering four common numerical representations and 17 distinct numerical tasks and investigate the numerical understanding and processing ability (NUPA) of LLMs.
Sep 10, 2024	I begin my Ph.D. in the Institute for Artificial Intelligence, Peking University! ✨🥳
Jul 20, 2024	✋ I will be at ICML 2024 @ Vienna, Austria, presenting our paper: Case-Based or Rule-Based: How Do Transformers Do the Math?. 🤩 Looking forward to meeting researchers and having discussions!
Jul 02, 2024	I graduate from School of Physics, Peking University and accomplish my Bachelor degree today 🎓🥳!
May 02, 2024	Our paper: Case-Based or Rule-Based: How Do Transformers Do the Math? is accepted by ICML 2024 !!! In our paper, we demonstrate that current large language models are performing case-based reasoning rather than rule-based reasoning like humans in math reasoning tasks, revealing the intrinsic reasons for the models’ limitations in length generalization. To bridge this gap and shift the models’ reasoning paradigm closer to rule-based reasoning, we propose Rule-Following Fine-Tuning. This method enhances the models’ ability to follow rules, thereby improving their performance in length generalization.

Selected publications

ICML2024

Case-Based or Rule-Based: How Do Transformers Do the Math?

Yi Hu, Xiaojuan Tang, Haotong Yang, and Muhan Zhang

2024

Paper Code Poster
ICLR2025

Number Cookbook: Number Understanding of Language Models and How to Improve It

Haotong Yang, Yi Hu, Shijia Kang, Zhouchen Lin, and Muhan Zhang

2024

Paper Code
preprint

RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?

Haotian Xu, Xing Wu, Weinong Wang, Zhongzhi Li, Da Zheng, Boyuan Chen, Yi Hu, Shijia Kang, Jiaming Ji, Yingying Zhang, Zhijiang Guo, Yaodong Yang, Muhan Zhang, and Debing Zhang

2025

Paper Code
NIPS2025

Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs

Yi Hu, Shijia Kang, Haotong Yang, Haotian Xu, and Muhan Zhang

2025

Paper
NIPS2025

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Weike Wang, Xudong Tian, Anqi Lv, Laifu Man, Jianxiang Li, Feiyu Tao, Qihua Sun, Zhou Liang, Yushu Mu, Zhongxuan Li, Jing-Jun Zhang, Shutao Zhang, Xiaotian Li, Xingqi Xia, Jiawei Lin, Zheyu Shen, Jiahang Chen, Qiuhao Xiong, Binran Wang, Fengyuan Wang, Ziyang Ni, Bohan Zhang, Fan Cui, Changkun Shao, Qing-Hong Cao, Ming-xing Luo, Muhan Zhang, and Hua Xing Zhu

2025

Paper
NIPS25 MI Workshop

What Affects the Effective Depth of Large Language Models?

Yi Hu, Cai Zhou, and Muhan Zhang

2025

Paper