下拉刷新
Repository Details
Shared bynavbar_avatar
repo_avatar
HelloGitHub Rating
0 ratings
MLA Kernel Optimization Based on Hopper GPU
FreeMIT
Claim
Collect
Share
1.9k
Stars
No
Chinese
C++
Language
Yes
Active
1
Contributors
6
Issues
Yes
Organization
None
Latest
76
Forks
MIT
License
More
This is an efficient MLA decoding kernel designed specifically for Hopper architecture GPUs, aiming to improve the inference efficiency of large language models (LLMs). Developed in C++ and CUDA, it addresses the performance bottlenecks of traditional methods when handling variable-length sequences by leveraging NVIDIA's CUTLASS library and paginated caching techniques, significantly enhancing memory bandwidth and computational efficiency.

Comments

Rating:
No comments yet