Efficient Memory Management for Large Language Model Serving with PagedAttention
[2309.06180 Efficient Memory Management for Large Language Model Serving with PagedAttention]
#vLLM