项目详情
由
分享


HelloGitHub 评分
10.0
1 人评分
开源•Apache-2.0
认领
讨论
收藏
分享
142.3k
星数
是
中文
Python
主语言
是
活跃
3k
贡献者
2k
Issues
是
组织
4.50.3
最新版本
3w
Forks
Apache-2.0
协议
更多
Google 神级语言表示模型的 PyTorch 预训练模型和 PyTorch 框架结合,使得更加容易上手。PyTorch 版本更方便小白上手实验。示例代码:
```python
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenized input
text = "Who was Jim Henson ? Jim Henson was a puppeteer"
tokenized_text = tokenizer.tokenize(text)
# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 6
tokenized_text[masked_index] = '[MASK]'
assert tokenized_text == ['who', 'was', 'jim', 'henson', '?', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer']
# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
```
收录于:
第 34 期
评论
评分:
暂无精选评论