OpenSourceWeek
DeepSeek(@deepseek_ai)
🚀 Day 1 of #OpenSourceWeek: FlashMLA
Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.
✅ BF16 support
✅ Paged KV cache (block size 64)
âš¡ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
🔗 Explore on GitHub: https://github.com/deepseek-ai/FlashMLAdeepseek-ai/FlashMLA