home..

Segmant Anything

huyi / May 2023

Abstract
Introduction
Task: Segmant Anything
Model: SAM

Abstract

segmant anything
1. new task
2. new model
3. new dataset (largest segmentation dataset: 1 billion masks on 11 million images)
impressive zero-shot performance (promptable?)

Introduction

task: prompt segmentation
model:
- flexible prompts $\rightarrow$ point, box, mask prompts & free-form text prompt
- ambiguity $\rightarrow$ predict multiple masks for a single prompt
data engine:
- data engine: co-develop our model with model-in-the-loop dataset annotation
- 3 stages:
  - assisted-manual
  - semi-automatic (subset of objects with local prompts)
  - fully automatic (all objects with global prompt)
responsible AI
experiments

Task: Segmant Anything

promptabla segmentation prompt: a set of foreground / background points, a rough box or mask, free-form text, or, in general, any information indicating what to segment in an image.

Model: SAM

pipeline

Image encoder

ViT

prompt encoder

points and boxes: positional encodings + prompt type embedding
text: clip
masks: elementwise_sum(convolutions + image embedding)
mask decoder

self-attention cross-attention(image <–> prompt) upsample MLP | the mask foreground probability at each image location(?)

resolving ambiguity

3 output image (whole, part, subpart) with confidence score backprop only the minimum loss

efficiency

loss and train

loss = linear(focal loss + dice loss)