Repository Details
Shared by


HelloGitHub Rating
0 ratings
Free•MIT
Claim
Discuss
Collect
Share
5.1k
Stars
No
Chinese
Python
Language
Yes
Active
7
Contributors
21
Issues
No
Organization
None
Latest
600
Forks
MIT
License
More
This project is a lightweight vLLM (large language model inference engine) implemented in Python. The core code is only over 1000 lines. It has a clear structure and is easy to read. The inference speed is comparable to the original vLLM and integrates inference optimization techniques such as prefix caching, tensor parallelism, and Torch compilation.
Comments
Rating:
No comments yet