Transformers - transformers 3.0.0 documentation

pip install transformers

model input

https://huggingface.co/transformers/glossary.html#model-inputs

Tokenizer

https://huggingface.co/transformers/model_doc/bert.html#berttokenizer

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

# list of string -> list of id_token
encoded_dict = tokenizer.encode_plus(
                        "the dog meow",    # First sentence to encode.
                        "the cat meow",    # Second sentence
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        max_length = 64,           # Pad & truncate all sentences.
                        pad_to_max_length = True,  # Padding
                        return_attention_mask = True,   # Construct attn. masks.
                        return_tensors = 'pt',     # Return pytorch tensors.
                )

BERT

設定 BERT:https://huggingface.co/transformers/model_doc/bert.html#bertconfig

BERT:https://huggingface.co/transformers/model_doc/bert.html#bertmodel

bert = BertModel.from_pretrained("bert-base-uncased")
last_hidden_state, pooler_output = bert(input_ids, 
											                  token_type_ids=token_type_ids, 
											                  attention_mask=attention_mask)

# last_hidden_state[:,0,:]
# (batch, sentence len, hidden dim)

Example

>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertModel.from_pretrained('bert-base-uncased')

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple