Skip to content

Gpt Neo

This model was released on 2021-03-21 and added to Hugging Face Transformers on 2021-03-30.

PyTorch FlashAttention

GPT-Neo is an open-source alternative to GPT-2 and GPT-3 models, built with Mesh TensorFlow for TPUs. GPT-Neo uses local attention in every other layer for more efficiency. It is trained on the Pile, a diverse dataset consisting of 22 smaller high-quality datasets. The original github repository can be found here

You can find all the original GPT-Neo checkpoints under the EleutherAI organization.

The example below demonstrates how to generate text with Pipeline or the AutoModel, and from the command line.

import torch
from transformers import pipeline
pipeline = pipeline(task="text-generation", model="EleutherAI/gpt-neo-1.3B", dtype=torch.float16, device=0)
pipeline("Hello, I'm a language model")
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B", dtype=torch.float16, device_map="auto", attn_implementation="flash_attention_2")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")
input_ids = tokenizer("Hello, I'm a language model", return_tensors="pt").to(model.device)
output = model.generate(**input_ids)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Terminal window
echo -e "Hello, I'm a language model" | transformers run --task text-generation --model EleutherAI/gpt-neo-1.3B --device 0

Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the Quantization overview for more available quantization backends.

The example below uses bitsandbytes to only quantize the weights to 4-bits.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
bnb_4bit_use_double_quant=True
)
model = AutoModelForCausalLM.from_pretrained(
"EleutherAI/gpt-neo-2.7B",
quantization_config=quantization_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
inputs = tokenizer("Hello, I'm a language model", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
  • Pad inputs on the right because GPT-Neo uses absolute position embeddings.

[[autodoc]] GPTNeoConfig

[[autodoc]] GPTNeoModel - forward

[[autodoc]] GPTNeoForCausalLM - forward

[[autodoc]] GPTNeoForQuestionAnswering - forward

[[autodoc]] GPTNeoForSequenceClassification - forward

[[autodoc]] GPTNeoForTokenClassification - forward