Publications

Complete preprints and publications list.

Preprints (* denotes equal contributions)

Xin-Qiang Cai, Masashi Sugiyama. VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction. In: arXiv. [arXiv]

Zhiqiang Kou, Junyang Chen, Xin-Qiang Cai, Ming-Kun Xie, Biao Liu, Changwei Wang, Lei Feng, Yuheng Jia, Gang Niu, Masashi Sugiyama, Xin Geng. Rethinking Toxicity Evaluation in Large Language Models: A Multi-Label Perspective. In: arXiv. [arXiv]

Xin-Qiang Cai, Wei Wang, Feng Liu, Tongliang Liu, Gang Niu, Masashi Sugiyama. Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers. In: arXiv. [arXiv]

Jiansong Wan, Chengming Zhou, Jinkua Liu, Xiangge Huang, Xiaoyu Chen, Xiaohan Yi, Qisen Yang, Baiting Zhu, Xin-Qiang Cai, Lixing Liu, Rushuai Yang, Chuheng Zhang, Sherif Abdelfattah, Hayong Shin, Pushi Zhang, Li Zhao, Jiang Bian. PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models. In: arXiv. [arXiv]

Zelei Cheng*, Xin-Qiang Cai*, Yuting Tang, Pushi Zhang, Boming Yang, Masashi Sugiyama, Xingyu Xing. UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality. In: arXiv. [arXiv]

Publications (* denotes equal contributions)

Zhiqiang Kou*, Junyang Chen*, Xin-Qiang Cai, Xiaobo Xia, Ming-Kun Xie, Dong-Dong Wu, Biao Liu, Yuheng Jia, Xin Geng, Masashi Sugiyama, Tat-Seng Chua. Positive-Unlabeled Reinforcement Learning Distillation for On-Premise Small Models. In: Proceedings of the 43rd International Conference on Machine Learning (ICML'26), Seoul, South Korea, Jul. 6-11, 2026. [arXiv]

Soichiro Nishimori, Xin-Qiang Cai, Johannes Ackermann, Masashi Sugiyama. Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains. Proceedings of the second Reinforcement Learning Conference (RLC'25), Montréal, QC, Canada, Aug. 16-19, 2025. [arXiv]

Jing-Cheng Pang, Nan Tang, Kaiyuan Li, Yuting Tang, Xin-Qiang Cai, Zhen-Yu Zhang, Gang Niu, Masashi Sugiyama, Yang Yu. Learning View-invariant World Models for Visual Robotic Manipulation. In: Proceedings of the Thirteenth International Conference on Learning Representations (ICLR'25), Singapore, Apr. 24-28, 2025.

Zelei Cheng, Xian Wu, Jiahao Yu, Shuo Han, Xin-Qiang Cai, Xinyu Xing. Soft-Label Integration for Robust Toxicity Classification. In: Proceedings of the Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS'24), Vancouver, Canada, Dec. 10–15, 2024. [arXiv] [code] [paper]

Yuting Tang*, Xin-Qiang Cai*, Yao-Xiang Ding, Qiyu Wu, Guoqing Liu, Masashi Sugiyama. Reinforcement Learning from Bagged Reward. In: Proceedings of the 41st International Conference on Machine Learning (ICML'24), ARLET workshop, Vienna, Austria, Jul. 21–27, 2024. [arXiv]

Xingyu Song, Zhan Li, Shi Chen, Xin-Qiang Cai, Kazuyuki Demachi. An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video. In: Proceedings of the 27th European Conference on Artificial Intelligence (ECAI'24), Santiago de Compostela, Spain, Oct. 19–24, 2024. [arXiv]

Kaiyan Zhao, Qiyu Wu, Xin-Qiang Cai, Yoshimasa Tsuruoka. Leveraging Multi-lingual Positive Instances in Contrastive Learning to Improve Sentence Embedding. In: Proceedings of the 8th Conference of the European Chapter of the Association for Computational Linguistics (EACL'24), Malta, Mar. 17–22, 2024. [arXiv]

Pushi Zhang*, Baiting Zhu*, Xin-Qiang Cai*, Li Zhao, Masashi Sugiyama, Jiang Bian. IG-Net: Image-Goal Network for Offline Visual Navigation on A Large-Scale Game Map. In: Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS'23), 6th Robot Learning Workshop, New Orleans, US, Dec. 10–16, 2023. [paper] [openreview]

Xin-Qiang Cai, Yu-Jie Zhang, Chao-Kai Chiang, Masashi Sugiyama. Imitation Learning from Vague Feedback. In: Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS'23), New Orleans, US, Dec. 10–16, 2023. [paper] [bibtex]

Xin-Qiang Cai, Pushi Zhang, Li Zhao, Jiang Bian, Masashi Sugiyama, Ashley Juan Llorens. Distributional Pareto-Optimal Multi-Objective Reinforcement Learning. In: Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS'23), New Orleans, US, Dec. 10–16, 2023. [paper] [bibtex]

Xin-Qiang Cai, Yao-Xiang Ding, Zi-Xuan Chen, Yuan Jiang, Masashi Sugiyama, Zhi-Hua Zhou. Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning. In: Proceedings of the Eleventh International Conference on Learning Representations (ICLR'23) (spotlight), Kigali, Rwanda, May 1–5, 2023. [openreview] [paper] [bibtex]

Zi-Xuan Chen*, Xin-Qiang Cai*, Yuan Jiang, Zhi-Hua Zhou. Anomaly Guided Policy Learning from Imperfect Demonstrations. In: Proceedings of the 21st International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'22) (oral), Auckland, New Zealand, May 9–13, 2022. Page: 244–252. [paper] [bibtex]

Xin-Qiang Cai, Yao-Xiang Ding, Yuan Jiang, Zhi-Hua Zhou. Imitation Learning from Pixel-Level Demonstrations by HashReward. In: Proceedings of the 20th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'21) (oral), online, May 3–7, 2021. Page: 279–287. [code] [paper] [bibtex]

Xin-Qiang Cai, Peng Zhao, Kai Ming Ting, Xin Mu, Yuan Jiang. Nearest Neighbor Ensembles: An Effective Method for Difficult Problems in Streaming Classification with Emerging New Classes. In: Proceedings of the 19th IEEE International Conference on Data Mining (ICDM'19), Beijing, China, Nov. 8–11, 2019. Page: 970–975. [code] [paper] [bibtex]

Patent

一种摄像器材记录的视频图像数据的高维模仿学习方法. Patent No. 202011450396.1, 2020.