Depth Anything
This model was released on 2024-01-19 and added to Hugging Face Transformers on 2024-01-25.
Depth Anything
Section titled “Depth Anything”Depth Anything is designed to be a foundation model for monocular depth estimation (MDE). It is jointly trained on labeled and ~62M unlabeled images to enhance the dataset. It uses a pretrained DINOv2 model as an image encoder to inherit its existing rich semantic priors, and DPT as the decoder. A teacher model is trained on unlabeled images to create pseudo-labels. The student model is trained on a combination of the pseudo-labels and labeled images. To improve the student model’s performance, strong perturbations are added to the unlabeled images to challenge the student model to learn more visual knowledge from the image.
You can find all the original Depth Anything checkpoints under the Depth Anything collection.
The example below demonstrates how to obtain a depth map with Pipeline or the AutoModel class.
import torchfrom transformers import pipeline
pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-base-hf", dtype=torch.bfloat16, device=0)pipe("http://images.cocodataset.org/val2017/000000039769.jpg")["depth"]import torchimport requestsimport numpy as npfrom PIL import Imagefrom transformers import AutoImageProcessor, AutoModelForDepthEstimation
image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-base-hf")model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-base-hf", dtype=torch.bfloat16)url = "http://images.cocodataset.org/val2017/000000039769.jpg"image = Image.open(requests.get(url, stream=True).raw)inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad(): outputs = model(**inputs)
post_processed_output = image_processor.post_process_depth_estimation( outputs, target_sizes=[(image.height, image.width)],)predicted_depth = post_processed_output[0]["predicted_depth"]depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())depth = depth.detach().cpu().numpy() * 255Image.fromarray(depth.astype("uint8"))- DepthAnythingV2, released in June 2024, uses the same architecture as Depth Anything and is compatible with all code examples and existing workflows. It uses synthetic data and a larger capacity teacher model to achieve much finer and robust depth predictions.
DepthAnythingConfig
Section titled “DepthAnythingConfig”[[autodoc]] DepthAnythingConfig
DepthAnythingForDepthEstimation
Section titled “DepthAnythingForDepthEstimation”[[autodoc]] DepthAnythingForDepthEstimation - forward