Peer-reviewed & pre-prints

Publications.

Distributed systems for modern AI workloads, serving, scheduling, caching, and post-training. Pre-prints are marked in muted tone; * denotes co-first author.

Google Scholar · 130+ citations ↗7 papers4 first / co-first author3 second author
  1. MLSys 2026
    first author

    Beat the Long Tail: Distribution-Aware Speculative Decoding for RL Training ↗

    Zelei Shao*, Vikranth Srivatsa*, Sanjana Srivastava, Qingyang Wu, Alpay Ariyak, Xiaoxia Wu, Ameen Patel, Jue Wang, Percy Liang, Tri Dao, Ce Zhang, Yiying Zhang, Ben Athiwaratkun, Chenfeng Xu, Junxiong Wang

    Result50% rollout latency reduction with no accuracy loss
    RL post-trainingspeculative decodingscheduling
    ↗ arXiv