📝 Publications
🧠 LLM Reasoning
Arxiv

Language Models Can Learn from Verbal Feedback without Scalar Rewards
Renjie Luo*, Zichen Liu, Xiangyan Liu, Chao Du, Min Lin,
Wenhu Chen, Wei Lu, Tianyu Pang*
- TLDR: 🚀 We show that LLMs can directly learn from verbal feedback — no scalar rewards required.
- Method: We propose the Feedback-Conditional Policy (FCP) — treating feedback as a conditioning signal.
- Offline stage: Learn from response–feedback pairs via simple MLE.
- Online stage: Bootstrap with fresh critiques, refining the policy iteratively.
EMNLP 2025

Through the Valley: Path to Effective Long CoT Training for Small Language Models
Renjie Luo, Jiaxi Li, Chen Huang, Wei Lu
- TLDR: We reveal the “Long CoT Degradation” phenomenon where small language models (≤3B) suffer performance drops when trained on limited long chain-of-thought data, and propose effective training strategie (via RLVR) to overcome this challenge.
ACL 2024

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Xu Han, et al.
- This work is widely used by RL&Reasoning community, such as SimpleRL-Zoo, Dr.GRPO, Seed 1.5-VL, etc.
⚙️ LLM Evaluation Framework
ACL 2024

UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs
Chaoqun He, Renjie Luo, Shengding Hu, Yuanqian Zhao, Jie Zhou, et al.
- Open Source Impact: An automated evaluation framework for large language models, multi-dimensional, with user-friendly and highly customizable evaluation strategies.
- Community Adoption: 200+ stars on GitHub, widely used by researchers for LLM evaluation.
- Comprehensive Features: Supports flexible evaluation strategies with highly customizable pipeline design.