The Infrastructure Cost of MoE Routing Replay
Published:
Routing replay (R3) stabilizes MoE RL training, but the routing data is 97% of the generation payload. This post traces the bottleneck — the single-threaded manager pipeline, not bandwidth — and the failed ‘obvious’ fix that revealed a fundamental constraint of mixing NCCL with inference.
