Long-form · cross-posted from mlsys.wuklab.io

Blog.

Long-form posts I have co-authored on the WukLab blog, covering system measurements and design decisions behind production LLM serving and post-training infrastructure. The full archive (including lab-mates' work) lives at mlsys.wuklab.io.

  1. May 17, 2026
    MLSys 2026

    Beat the Long Tail: Distribution-Aware Speculative Decoding for RL Training

    Vikranth Srivatsa, Yiying Zhang

    Accelerates reinforcement-learning rollouts by targeting long generations with an adaptive speculative-decoding framework.

  2. May 17, 2026
    Pre-print

    TClone: Decoupling Fast Branch Creation from Durable Checkpointing for Computer-Use Agents

    Yutong Huang, Vikranth Srivatsa, Alex Asch, Hansin Tushar Patwa, Yiying Zhang

    A workspace-versioning system enabling fast branching for agents through copy-on-write memory and filesystem sharing.

  3. May 16, 2026
    Multi-SLO serving

    Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism

    Vikranth Srivatsa, Zijian He, Pu Guo, Dongming Li, Yiying Zhang

    A runtime system treating tensor parallelism as a control surface to dynamically optimize mixed-priority workloads.

  4. November 25, 2024
    KDD 2026

    Cognify: A Comprehensive, Multi-Faceted Gen-AI Workflow Optimizer

    Yiying Zhang, Reyna Abhyankar, Zijian He (paper co-authored with Vikranth Srivatsa)

    An autotuning framework for generative-AI workflows. AdaSeek performs hierarchical search across workflow structure, operators, and prompts, improving quality by up to 2.8× while reducing cost by 10× and latency by 2.7×. Accepted to KDD 2026.

  5. September 10, 2024
    Systems analysis

    Can Scheduling Overhead Dominate LLM Inference Performance?

    Vikranth Srivatsa, Dongming Li, Yiying Zhang, Reyna Abhyankar

    An analysis of CPU scheduling overhead in modern LLM serving systems. The findings informed the vLLM scheduler redesign for approximately 30% better performance.