Research Stack
Current Focus
Reward Modeling
Reinforcement Learning / Imitation Learning
LLM Reasoning
Weak Supervision
RLHF / RLVR
Offline RL
Papers
Selected Publications & Preprints
- VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
- Learning view-invariant world models for visual robotic manipulation (ICLR 2025)
- Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains (RLC 2025)
- Imitation Learning from Vague Feedback (NeurIPS 2023)
- Distributional Pareto-Optimal Multi-Objective RL (NeurIPS 2023)
- Heterogeneously Observable Imitation Learning (ICLR 2023)
Timeline
Recent Updates
- VI-CuRL released on arXiv.
- PU Reinforcement Learning Distillation released on arXiv.
- Multi-Label Toxicity Evaluation released on arXiv.
- Paper on cross-domain offline RL accepted by RLC 2025.
Contact