Xin-Qiang Cai / Research Log

Xin-Qiang Cai

Reinforcement Learning, Reward Modeling, and Weak Supervision. Postdoctoral Researcher, Imperfect Information Learning Team, RIKEN AIP (Tokyo).

$ role --current
postdoc / riken-aip
$ focus --areas
reward modeling · rl · weak supervision
$ location
tokyo, japan
Research Stack

Current Focus

Reward Modeling Reinforcement Learning / Imitation Learning LLM Reasoning Weak Supervision RLHF / RLVR Offline RL
Papers

Selected Publications & Preprints

  1. VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
  2. Learning view-invariant world models for visual robotic manipulation (ICLR 2025)
  3. Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains (RLC 2025)
  4. Imitation Learning from Vague Feedback (NeurIPS 2023)
  5. Distributional Pareto-Optimal Multi-Objective RL (NeurIPS 2023)
  6. Heterogeneously Observable Imitation Learning (ICLR 2023)

For the complete list, visit Publications.

Timeline

Recent Updates

  • 🎉 My research proposal titled "Reinforcement Learning with Unreliable Verification" has been selected for funding by the JSPS Grant-in-Aid for Early-Career Scientists (KAKENHI 若手研究).
  • 📝 VI-CuRL released on arXiv.
  • 📝 PU Reinforcement Learning Distillation released on arXiv.
  • 📝 Multi-Label Toxicity Evaluation released on arXiv.
  • 🎉 Paper on cross-domain offline RL accepted by RLC 2025.
Contact

Correspondence

Please feel free to contact me if you want to cooperate or discuss with me (xinqiang[dot]cai[at]riken[dot]jp / jkrsndivide[at]gmail[dot]com).