Tinkerings
My notes to build mental models for how things work.
Context Parallelism
Context Parallelism (CP) is a distributed training technique that enables training of LLMs with extremely long sequences (up to 1M tokens) by sharding the sequence dimension…
Sep 7, 2025
Expert Parallelism
Let’s distill how to run a Mixture-of-Experts (MoE) model with expert parallelism with an example.
Sep 1, 2025
No matching items