BigBird
This model was released on 2020-07-28 and added to Hugging Face Transformers on 2021-03-30.
BigBird
Section titled “BigBird”BigBird is a transformer model built to handle sequence lengths up to 4096 compared to 512 for BERT. Traditional transformers struggle with long inputs because attention gets really expensive as the sequence length grows. BigBird fixes this by using a sparse attention mechanism, which means it doesn’t try to look at everything at once. Instead, it mixes in local attention, random attention, and a few global tokens to process the whole input. This combination gives it the best of both worlds. It keeps the computation efficient while still capturing enough of the sequence to understand it well. Because of this, BigBird is great at tasks involving long documents, like question answering, summarization, and genomic applications.
You can find all the original BigBird checkpoints under the Google organization.
The example below demonstrates how to predict the [MASK] token with Pipeline, AutoModel, and from the command line.
import torchfrom transformers import pipeline
pipeline = pipeline( task="fill-mask", model="google/bigbird-roberta-base", dtype=torch.float16, device=0)pipeline("Plants create [MASK] through a process known as photosynthesis.")import torchfrom transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained( "google/bigbird-roberta-base",)model = AutoModelForMaskedLM.from_pretrained( "google/bigbird-roberta-base", dtype=torch.float16, device_map="auto",)inputs = tokenizer("Plants create [MASK] through a process known as photosynthesis.", return_tensors="pt").to(model.device)
with torch.no_grad(): outputs = model(**inputs) predictions = outputs.logits
masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]predicted_token_id = predictions[0, masked_index].argmax(dim=-1)predicted_token = tokenizer.decode(predicted_token_id)
print(f"The predicted token is: {predicted_token}")!echo -e "Plants create [MASK] through a process known as photosynthesis." | transformers run --task fill-mask --model google/bigbird-roberta-base --device 0- Inputs should be padded on the right because BigBird uses absolute position embeddings.
- BigBird supports
original_fullandblock_sparseattention. If the input sequence length is less than 1024, it is recommended to useoriginal_fullsince sparse patterns don’t offer much benefit for smaller inputs. - The current implementation uses window size of 3 blocks and 2 global blocks, only supports the ITC-implementation, and doesn’t support
num_random_blocks=0. - The sequence length must be divisible by the block size.
Resources
Section titled “Resources”- Read the BigBird blog post for more details about how its attention works.
BigBirdConfig
Section titled “BigBirdConfig”[[autodoc]] BigBirdConfig
BigBirdTokenizer
Section titled “BigBirdTokenizer”[[autodoc]] BigBirdTokenizer - get_special_tokens_mask - save_vocabulary
BigBirdTokenizerFast
Section titled “BigBirdTokenizerFast”[[autodoc]] BigBirdTokenizerFast
BigBird specific outputs
Section titled “BigBird specific outputs”[[autodoc]] models.big_bird.modeling_big_bird.BigBirdForPreTrainingOutput
BigBirdModel
Section titled “BigBirdModel”[[autodoc]] BigBirdModel - forward
BigBirdForPreTraining
Section titled “BigBirdForPreTraining”[[autodoc]] BigBirdForPreTraining - forward
BigBirdForCausalLM
Section titled “BigBirdForCausalLM”[[autodoc]] BigBirdForCausalLM - forward
BigBirdForMaskedLM
Section titled “BigBirdForMaskedLM”[[autodoc]] BigBirdForMaskedLM - forward
BigBirdForSequenceClassification
Section titled “BigBirdForSequenceClassification”[[autodoc]] BigBirdForSequenceClassification - forward
BigBirdForMultipleChoice
Section titled “BigBirdForMultipleChoice”[[autodoc]] BigBirdForMultipleChoice - forward
BigBirdForTokenClassification
Section titled “BigBirdForTokenClassification”[[autodoc]] BigBirdForTokenClassification - forward
BigBirdForQuestionAnswering
Section titled “BigBirdForQuestionAnswering”[[autodoc]] BigBirdForQuestionAnswering - forward