Efficient Memory Management for Large Language Model Serving with PagedAttention
[2309.06180
Efficient Memory Management for Large Language Model Serving with PagedAttention]
#vLLM