home..
Segmant Anything
huyi / May 2023
Abstract
- segmant anything
- new task
- new model
- new dataset (largest segmentation dataset: 1 billion masks on 11 million images)
- impressive zero-shot performance (promptable?)
Introduction
- task: prompt segmentation
- model:
- flexible prompts $\rightarrow$ point, box, mask prompts & free-form text prompt
- ambiguity $\rightarrow$ predict multiple masks for a single prompt
- data engine:
- data engine: co-develop our model with model-in-the-loop dataset annotation
- 3 stages:
- assisted-manual
- semi-automatic (subset of objects with local prompts)
- fully automatic (all objects with global prompt)
- responsible AI
- experiments
Task: Segmant Anything
promptabla segmentation prompt: a set of foreground / background points, a rough box or mask, free-form text, or, in general, any information indicating what to segment in an image.
Model: SAM
Image encoder
ViT
prompt encoder
- points and boxes: positional encodings + prompt type embedding
- text: clip
- masks:
elementwise_sum(convolutions + image embedding)
mask decoder
self-attention cross-attention(image <–> prompt) upsample MLP | the mask foreground probability at each image location(?)
resolving ambiguity
3 output image (whole, part, subpart) with confidence score backprop only the minimum loss
efficiency
loss and train
loss = linear(focal loss + dice loss)