2026 8posts
05-10 [CUDA in Practice] HGEMM SM120 — Micro-Sculpture Warfare in 100KB SMEM: Tensor Core, TMA, ldmatrix, mma #vitamin-cuda #cuda #c++ #GPU 05-10 [CUDA in Practice] HGEMM — Beating cuBLAS: Tensor Core, cp.async, ldmatrix, mma #vitamin-cuda #cuda #c++ #GPU 05-09 [CUDA in Practice] SGEMM TF32 — Beating cuBLAS with Tensor Cores, cp.async, ldmatrix & mma #vitamin-cuda #cuda #c++ #GPU 03-31 [CUDA 优化实战] safe online softmax - 面试必问:任意 hidden_size、one pass、two pass、trade-off、split-k #vitamin-cuda #cuda #c++ #GPU 03-05 [CUDA 优化实战] sgemm - 超越 cuBLAS:带你学会极致优化的矩阵乘法 cuda c++ 实现 #vitamin-cuda #cuda #c++ #GPU 02-13 [CUDA in Practice] Matrix Transpose — From Padding to XOR Swizzle: The Art of Shared Memory Optimization #vitamin-cuda #cuda #c++ #GPU 02-09 Numbers Every CUDA Developer Should Know #vitamin-cuda #cuda #c++ #GPU 02-06 A Deep Dive into DeviceQuery: Understanding Your GPU Hardware #vitamin-cuda #cuda #c++ #GPU