Repository Details
Shared by
HelloGitHub Rating
9.2
9 ratings
Free•MIT
Claim
Discuss
Collect
Share
102k
Stars
Yes
Chinese
Python
Language
No
Active
1
Contributors
108
Issues
Yes
Organization
1.0.0
Latest
2w
Forks
MIT
License
More

This is an open-source large language model based on the Mixture of Experts (MoE) and Multi-Head Latent Attention (MLA) architectures, which performs exceptionally well in complex tasks such as mathematical reasoning and code generation. The model has a total scale of 671B parameters, but only 37B parameters are activated for each token. That is, not all 'experts' participate in the computation when processing inputs; instead, a subset of experts is selected for processing. By activating only a portion of the parameters (37B), the model can complete computations, significantly reducing the costs of training and inference.
Comments
Rating:
No comments yet