Repository Details
Shared by
HelloGitHub Rating
10.0
2 ratings
Free•MIT
Claim
Discuss
Collect
Share
22.7k
Stars
Yes
Chinese
Python
Language
Yes
Active
13
Contributors
74
Issues
Yes
Organization
None
Latest
2k
Forks
MIT
License
More
This is an open-source large language model based on the Mixture of Experts (MoE) and Multi-Head Latent Attention (MLA) architectures, which performs exceptionally well in complex tasks such as mathematical reasoning and code generation. The model has a total scale of 671B parameters, but only 37B parameters are activated for each token. That is, not all 'experts' participate in the computation when processing inputs; instead, a subset of experts is selected for processing. By activating only a portion of the parameters (37B), the model can complete computations, significantly reducing the costs of training and inference.
Comments
Rating:
No comments yet