SegGPT
This model was released on 2023-04-06 and added to Hugging Face Transformers on 2024-02-26.
SegGPT
Section titled “SegGPT”
Overview
Section titled “Overview”The SegGPT model was proposed in SegGPT: Segmenting Everything In Context by Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang. SegGPT employs a decoder-only Transformer that can generate a segmentation mask given an input image, a prompt image and its corresponding prompt mask. The model achieves remarkable one-shot results with 56.1 mIoU on COCO-20 and 85.6 mIoU on FSS-1000.
The abstract from the paper is the following:
We present SegGPT, a generalist model for segmenting everything in context. We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images. The training of SegGPT is formulated as an in-context coloring problem with random color mapping for each data sample. The objective is to accomplish diverse tasks according to the context, rather than relying on specific colors. After training, SegGPT can perform arbitrary segmentation tasks in images or videos via in-context inference, such as object instance, stuff, part, contour, and text. SegGPT is evaluated on a broad range of tasks, including few-shot semantic segmentation, video object segmentation, semantic segmentation, and panoptic segmentation. Our results show strong capabilities in segmenting in-domain and out-of
Tips:
- One can use
SegGptImageProcessorto prepare image input, prompt and mask to the model. - One can either use segmentation maps or RGB images as prompt masks. If using the latter make sure to set
do_convert_rgb=Falsein thepreprocessmethod. - It’s highly advisable to pass
num_labelswhen usingsegmentation_maps(not considering background) during preprocessing and postprocessing withSegGptImageProcessorfor your use case. - When doing inference with
SegGptForImageSegmentationif yourbatch_sizeis greater than 1 you can use feature ensemble across your images by passingfeature_ensemble=Truein the forward method.
Here’s how to use the model for one-shot semantic segmentation:
import torchfrom datasets import load_datasetfrom transformers import SegGptImageProcessor, SegGptForImageSegmentation
checkpoint = "BAAI/seggpt-vit-large"image_processor = SegGptImageProcessor.from_pretrained(checkpoint)model = SegGptForImageSegmentation.from_pretrained(checkpoint)
dataset_id = "EduardoPacheco/FoodSeg103"ds = load_dataset(dataset_id, split="train")# Number of labels in FoodSeg103 (not including background)num_labels = 103
image_input = ds[4]["image"]ground_truth = ds[4]["label"]image_prompt = ds[29]["image"]mask_prompt = ds[29]["label"]
inputs = image_processor( images=image_input, prompt_images=image_prompt, segmentation_maps=mask_prompt, num_labels=num_labels, return_tensors="pt")
with torch.no_grad(): outputs = model(**inputs)
target_sizes = [image_input.size[::-1]]mask = image_processor.post_process_semantic_segmentation(outputs, target_sizes, num_labels=num_labels)[0]This model was contributed by EduardoPacheco. The original code can be found here.
SegGptConfig
Section titled “SegGptConfig”[[autodoc]] SegGptConfig
SegGptImageProcessor
Section titled “SegGptImageProcessor”[[autodoc]] SegGptImageProcessor - preprocess - post_process_semantic_segmentation
SegGptModel
Section titled “SegGptModel”[[autodoc]] SegGptModel - forward
SegGptForImageSegmentation
Section titled “SegGptForImageSegmentation”[[autodoc]] SegGptForImageSegmentation - forward