Research Stack
Current Focus
Reward Modeling
Reinforcement Learning
Imitation Learning
Weak Supervision
RLHF Reliability
Offline RL
Papers
Selected Publications & Preprints
- VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
- Positive-Unlabeled Reinforcement Learning Distillation for On-Premise Small Models
- Rethinking Toxicity Evaluation in Large Language Models: A Multi-Label Perspective
- Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
- PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models
- UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality
- Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains (RLC 2025)
- Imitation Learning from Vague Feedback (NeurIPS 2023)
- Distributional Pareto-Optimal Multi-Objective RL (NeurIPS 2023)
- Heterogeneously Observable Imitation Learning (ICLR 2023)
Courses
Teaching
- 2026 S1S2 (Graduate): Special Topics in Mechano-Informatics II at The University of Tokyo
- 2026 Spring (Undergraduate/Graduate): Special Lecture on Information Science IV at Ochanomizu University
Timeline
Recent Updates
- 📖 Lecture on "An Introduction to Reinforcement Learning" for the course "Special Lecture on Information Science IV" at Ochanomizu University.
- 📖 Lecture on "An Introduction to Reinforcement Learning" for the course "Special Topics in Mechano-Informatics II" at The University of Tokyo.
- 🎉 My research proposal titled "Reinforcement Learning with Unreliable Verification" has been selected for funding by the JSPS Grant-in-Aid for Early-Career Scientists (KAKENHI 若手研究).
- VI-CuRL released on arXiv.
- PU Reinforcement Learning Distillation released on arXiv.
- Multi-Label Toxicity Evaluation released on arXiv.
- Paper on cross-domain offline RL accepted by RLC 2025.
Contact