publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. TrainVerify
    TrainVerify: Equivalence-Based Verification for Distributed LLM Training
    In Proceedings of ACM SIGOPS 31st Symposium on Operating Systems Principles, Seoul, Republic of Korea, Oct 2025
  2. Slow-Fault
    One-Size-Fits-None: Understanding and Enhancing Slow-Fault Tolerance in Modern Distributed Systems
    In Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, Philadelphia, PA, USA, Apr 2025