下拉刷新
Repository Details
Shared bynavbar_avatar
repo_avatar
HelloGitHub Rating
10.0
2 ratings
DeepSeek's Open-Source MoE Model
FreeMIT
Claim
Collect
Share
22.7k
Stars
Yes
Chinese
Python
Language
Yes
Active
13
Contributors
74
Issues
Yes
Organization
None
Latest
2k
Forks
MIT
License
More
DeepSeek-V3 image
This is an open-source large language model based on the Mixture of Experts (MoE) and Multi-Head Latent Attention (MLA) architectures, which performs exceptionally well in complex tasks such as mathematical reasoning and code generation. The model has a total scale of 671B parameters, but only 37B parameters are activated for each token. That is, not all 'experts' participate in the computation when processing inputs; instead, a subset of experts is selected for processing. By activating only a portion of the parameters (37B), the model can complete computations, significantly reducing the costs of training and inference.

Comments

Rating:
No comments yet