Medusa
https://huggingface.co/papers/2401.10774
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
LLMの高速化