flash-attn
https://pypi.org/project/flash-attn/
https://huggingface.co/docs/transformers/perf_infer_gpu_one?install=NVIDIA#flashattention-2
https://github.com/Dao-AILab/flash-attention