Xin-Qiang Cai / Research Log

Xin-Qiang Cai

Reinforcement Learning, Reward Modeling, and Weak Supervision. Postdoctoral Researcher, Imperfect Information Learning Team, RIKEN AIP (Tokyo).

$ role --current
postdoc / riken-aip
$ focus --areas
reward modeling · rl · weak supervision
$ location
tokyo, japan
Research Stack

Current Focus

Reward Modeling Reinforcement Learning / Imitation Learning LLM Reasoning Weak Supervision RLHF / RLVR Offline RL
Papers

Selected Publications & Preprints

  1. VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
  2. Learning view-invariant world models for visual robotic manipulation (ICLR 2025)
  3. Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains (RLC 2025)
  4. Imitation Learning from Vague Feedback (NeurIPS 2023)
  5. Distributional Pareto-Optimal Multi-Objective RL (NeurIPS 2023)
  6. Heterogeneously Observable Imitation Learning (ICLR 2023)

For the complete list, visit Publications.

Timeline

Recent Updates

Contact

Correspondence

Please feel free to contact me if you want to cooperate or discuss with me (xinqiang[dot]cai[at]riken[dot]jp / jkrsndivide[at]gmail[dot]com).