Gpt Neo

This model was released on 2021-03-21 and added to Hugging Face Transformers on 2021-03-30.

GPT-Neo

GPT-Neo is an open-source alternative to GPT-2 and GPT-3 models, built with Mesh TensorFlow for TPUs. GPT-Neo uses local attention in every other layer for more efficiency. It is trained on the Pile, a diverse dataset consisting of 22 smaller high-quality datasets. The original github repository can be found here

You can find all the original GPT-Neo checkpoints under the EleutherAI organization.

The example below demonstrates how to generate text with Pipeline or the AutoModel, and from the command line.

import torch
from transformers import pipeline

pipeline = pipeline(task="text-generation", model="EleutherAI/gpt-neo-1.3B", dtype=torch.float16, device=0)
pipeline("Hello, I'm a language model")

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B", dtype=torch.float16, device_map="auto", attn_implementation="flash_attention_2")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")

input_ids = tokenizer("Hello, I'm a language model", return_tensors="pt").to(model.device)

output = model.generate(**input_ids)
print(tokenizer.decode(output[0], skip_special_tokens=True))

echo -e "Hello, I'm a language model" | transformers run --task text-generation --model EleutherAI/gpt-neo-1.3B --device 0

Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the Quantization overview for more available quantization backends.

The example below uses bitsandbytes to only quantize the weights to 4-bits.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True
)

model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/gpt-neo-2.7B",
    quantization_config=quantization_config,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
inputs = tokenizer("Hello, I'm a language model", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Notes

Pad inputs on the right because GPT-Neo uses absolute position embeddings.

GPTNeoConfig

[[autodoc]] GPTNeoConfig

GPTNeoModel

[[autodoc]] GPTNeoModel - forward

GPTNeoForCausalLM

[[autodoc]] GPTNeoForCausalLM - forward

GPTNeoForQuestionAnswering

[[autodoc]] GPTNeoForQuestionAnswering - forward

GPTNeoForSequenceClassification

[[autodoc]] GPTNeoForSequenceClassification - forward

GPTNeoForTokenClassification

[[autodoc]] GPTNeoForTokenClassification - forward