Xin-Qiang Cai / Research Log

Xin-Qiang Cai

Reinforcement Learning, Reward Modeling, and Weak Supervision. Postdoctoral Researcher, Imperfect Information Learning Team, RIKEN AIP (Tokyo).

$ role --current
postdoc / riken-aip
$ focus --areas
reward modeling · rl · weak supervision
$ location
tokyo, japan
Research Stack

Current Focus

Reward Modeling Reinforcement Learning Imitation Learning Weak Supervision RLHF Reliability Offline RL
Papers

Selected Publications & Preprints

  1. VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
  2. Positive-Unlabeled Reinforcement Learning Distillation for On-Premise Small Models
  3. Rethinking Toxicity Evaluation in Large Language Models: A Multi-Label Perspective
  4. Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
  5. PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models
  6. UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality
  7. Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains (RLC 2025)
  8. Imitation Learning from Vague Feedback (NeurIPS 2023)
  9. Distributional Pareto-Optimal Multi-Objective RL (NeurIPS 2023)
  10. Heterogeneously Observable Imitation Learning (ICLR 2023)

For the complete list, visit Publications.

Courses

Teaching

  • 2026 S1S2 (Graduate): Special Topics in Mechano-Informatics II at The University of Tokyo
  • 2026 Spring (Undergraduate/Graduate): Special Lecture on Information Science IV at Ochanomizu University

For the complete list, visit Teaching.

Timeline

Recent Updates

  • 📖 Lecture on "An Introduction to Reinforcement Learning" for the course "Special Lecture on Information Science IV" at Ochanomizu University.
  • 📖 Lecture on "An Introduction to Reinforcement Learning" for the course "Special Topics in Mechano-Informatics II" at The University of Tokyo.
  • 🎉 My research proposal titled "Reinforcement Learning with Unreliable Verification" has been selected for funding by the JSPS Grant-in-Aid for Early-Career Scientists (KAKENHI 若手研究).
  • VI-CuRL released on arXiv.
  • PU Reinforcement Learning Distillation released on arXiv.
  • Multi-Label Toxicity Evaluation released on arXiv.
  • Paper on cross-domain offline RL accepted by RLC 2025.
Contact

Correspondence

For research collaboration, please contact via email above. Full publication list and PDFs are available through files and Google Scholar.