Repository Details
Shared by


HelloGitHub Rating
10.0
5 ratings
Free•MIT
Claim
Discuss
Collect
Share
89.3k
Stars
Yes
Chinese
Python
Language
Yes
Active
18
Contributors
126
Issues
Yes
Organization
None
Latest
1w
Forks
MIT
License
More

This is an open-source large language model based on the Mixture of Experts (MoE) and Multi-Head Latent Attention (MLA) architectures, which performs exceptionally well in complex tasks such as mathematical reasoning and code generation. The model has a total scale of 671B parameters, but only 37B parameters are activated for each token. That is, not all 'experts' participate in the computation when processing inputs; instead, a subset of experts is selected for processing. By activating only a portion of the parameters (37B), the model can complete computations, significantly reducing the costs of training and inference.
Comments
Rating:
No comments yet