Course
Transformer Complete Learning Path
A bilingual 30-chapter course from tensor basics to Transformer internals, diffusion, source reading, and model hacking.
30 lectures30 Python examplesChinese + English
Stage 1 / 1-2 months
Tensor 与 Attention 基础
Tensor and Attention Fundamentals
Chapter 1: NumPy Tensor 基础NumPy Tensor FundamentalsPython example includedChapter 2: 向量 - Attention 的灵魂Vectors - The Soul of AttentionPython example includedChapter 3: 矩阵与投影Matrices and ProjectionPython example includedChapter 4: Softmax - 概率的艺术Softmax - The Art of ProbabilityPython example includedChapter 5: 从零手写 AttentionWriting Attention from ScratchPython example included
Stage 2 / 1-2 months
PyTorch 与训练系统
PyTorch and Training System
Chapter 6: PyTorch Tensor - 从 NumPy 到 GPUPyTorch Tensor - From NumPy to GPUPython example includedChapter 7: Autograd - PyTorch 的灵魂Autograd - The Soul of PyTorchPython example includedChapter 8: Gradient 与 OptimizerGradient and OptimizerPython example includedChapter 9: 手写训练循环Writing Training LoopPython example included
Stage 3 / 2-3 months
Transformer 核心
Transformer Core
Chapter 10: Multi-Head AttentionMulti-Head AttentionPython example includedChapter 11: Position Encoding - 给模型顺序感Position Encoding - Giving the Model a Sense of OrderPython example includedChapter 12: Residual + LayerNormResidual + LayerNormPython example includedChapter 13: FFN - Feed Forward NetworkFFN - Feed Forward NetworkPython example includedChapter 14: Transformer Block - 拼完整Transformer Block - Putting It All TogetherPython example includedChapter 15: Decoder-only GPTDecoder-only GPTPython example includedChapter 16: KV CacheKV CachePython example included
Stage 4 / 1-2 months
Diffusion
Diffusion
Chapter 17: Diffusion 基础Diffusion BasicsPython example includedChapter 18: VAE 与 Latent SpaceVAE and Latent SpacePython example includedChapter 19: UNetUNetPython example includedChapter 20: Cross AttentionCross AttentionPython example includedChapter 21: CFGClassifier-Free GuidancePython example includedChapter 22: SamplerSamplerPython example included
Stage 5 / Long-term
源码与魔改
Source Code and Hacking
Chapter 23: nanoGPTReading a Small GPTPython example includedChapter 24: minGPTA Cleaner ImplementationPython example includedChapter 25: HuggingFace TransformersIndustry StandardPython example includedChapter 26: 开始改 AttentionModifying AttentionPython example includedChapter 27: LoRALow-Rank AdaptationPython example includedChapter 28: FlashAttentionIO-Aware OptimizationPython example includedChapter 29: 推理系统Inference SystemsPython example includedChapter 30: RLHFAligning Human PreferencesPython example included