Research Stack
Current Focus
Reward Modeling
Reinforcement Learning / Imitation Learning
LLM Reasoning
Weak Supervision
RLHF / RLVR
Offline RL
Papers
Selected Publications & Preprints
- VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
- Learning view-invariant world models for visual robotic manipulation (ICLR 2025)
- Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains (RLC 2025)
- Imitation Learning from Vague Feedback (NeurIPS 2023)
- Distributional Pareto-Optimal Multi-Objective RL (NeurIPS 2023)
- Heterogeneously Observable Imitation Learning (ICLR 2023)
Timeline
Recent Updates
- 🎉 My research proposal titled "Reinforcement Learning with Unreliable Verification" has been selected for funding by the JSPS Grant-in-Aid for Early-Career Scientists (KAKENHI 若手研究).
- 📝 VI-CuRL released on arXiv.
- 📝 PU Reinforcement Learning Distillation released on arXiv.
- 📝 Multi-Label Toxicity Evaluation released on arXiv.
- 🎉 Paper on cross-domain offline RL accepted by RLC 2025.
Contact