SA-Co/Silver is a large-scale, diverse benchmark for promptable concept segmentation (PCS) in images. Unlike SA-Co/Gold, each datapoint has a single ground-truth annotation, covering 10 different domains from food to robotics to underwater imagery.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/facebookresearch/sam3/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The benchmark contains images paired with noun phrases (NPs), each exhaustively annotated with masks for all object instances matching the phrase. SA-Co/Silver comprises 10 subsets covering diverse visual domains.Dataset Composition
10 Annotation Domains
BDD100k - Driving Scenes
BDD100k - Driving Scenes
Urban driving scenarios from Berkeley Driving Dataset
- 5,546 image-NP pairs
- 13,210 image-NP-masks
- Domain: Autonomous driving
DROID - Robotics
DROID - Robotics
Robot manipulation scenarios from diverse environments
- 9,445 image-NP pairs
- 11,098 image-NP-masks
- Domain: Robotics and manipulation
Ego4D - Egocentric Video
Ego4D - Egocentric Video
First-person perspective frames from daily activities
- 12,608 image-NP pairs
- 24,049 image-NP-masks
- Domain: Egocentric vision
MyFoodRepo-273 - Food Recognition
MyFoodRepo-273 - Food Recognition
Food dishes and ingredients
- 20,985 image-NP pairs
- 28,347 image-NP-masks
- Domain: Food recognition
GeoDE - Geographic Diversity
GeoDE - Geographic Diversity
Images from geographically diverse locations worldwide
- 14,850 image-NP pairs
- 7,570 image-NP-masks
- Domain: Geographic diversity
iNaturalist-2017 - Wildlife
iNaturalist-2017 - Wildlife
Natural world observations of plants and animals
- 1,439,051 image-NP pairs
- 48,899 image-NP-masks
- Domain: Biodiversity and nature
National Gallery of Art - Art
National Gallery of Art - Art
Artworks from the National Gallery of Art collection
- 22,294 image-NP pairs
- 18,991 image-NP-masks
- Domain: Art and cultural heritage
SA-V - General Video
SA-V - General Video
Diverse video frames from Segment Anything Video dataset
- 18,337 image-NP pairs
- 39,683 image-NP-masks
- Domain: General video understanding
YT-Temporal-1B - YouTube
YT-Temporal-1B - YouTube
Frames from YouTube videos across various categories
- 7,816 image-NP pairs
- 12,221 image-NP-masks
- Domain: Web video
Fathomnet - Underwater
Fathomnet - Underwater
Marine life and underwater environments
- 287,193 image-NP pairs
- 14,174 image-NP-masks
- Domain: Marine biology
Statistics Table
| Domain | # Image-NPs | # Image-NP-Masks |
|---|---|---|
| BDD100k | 5,546 | 13,210 |
| DROID | 9,445 | 11,098 |
| Ego4D | 12,608 | 24,049 |
| MyFoodRepo-273 | 20,985 | 28,347 |
| GeoDE | 14,850 | 7,570 |
| iNaturalist-2017 | 1,439,051 | 48,899 |
| National Gallery of Art | 22,294 | 18,991 |
| SA-V | 18,337 | 39,683 |
| YT-Temporal-1B | 7,816 | 12,221 |
| Fathomnet | 287,193 | 14,174 |
Download Dataset
Annotations
Download GT annotations from:- Hugging Face: facebook/SACo-Silver
- Roboflow: sa-co-silver
Images and Frames
Each domain has different download instructions:- Image Datasets
- Frame Datasets
Annotation Format
The annotation format is identical to SA-Co/Gold, derived from COCO format.Example from DROID Domain
Images
Annotations
For detailed field descriptions, see the SA-Co/Gold annotation format which is identical.
Benchmark Results
Overall Performance
| Model | Average cgF1 | IL_MCC | pmF1 |
|---|---|---|---|
| SAM 3 | 49.57 | 0.76 | 65.17 |
| OWLv2* | 11.23 | 0.32 | 31.18 |
| Gemini 2.5 | 9.67 | 0.19 | 45.51 |
| OWLv2 | 8.18 | 0.23 | 32.55 |
| LLMDet-L | 6.73 | 0.17 | 28.19 |
| gDino-T | 3.09 | 0.12 | 19.75 |
Per-Domain Results (SAM 3)
| Domain | cgF1 | IL_MCC | pmF1 |
|---|---|---|---|
| iNaturalist | 70.07 | 0.89 | 78.73 |
| National Gallery of Art | 65.80 | 0.82 | 80.67 |
| Food Recognition | 52.96 | 0.79 | 67.21 |
| Fathomnet | 51.53 | 0.86 | 59.98 |
| BDD100k | 46.61 | 0.78 | 60.13 |
| DROID | 45.58 | 0.76 | 60.35 |
| YT-Temporal-1B | 42.07 | 0.72 | 58.36 |
| Ego4D | 38.64 | 0.62 | 62.56 |
| SA-V | 38.06 | 0.66 | 57.62 |
| GeoDE | 44.36 | 0.67 | 66.05 |
Visualization
View examples from the dataset:Offline Evaluation
If you have predictions in COCO result format:Next Steps
Run Evaluations
Learn how to evaluate SAM 3 on SA-Co/Silver
SA-Co/VEval
Explore the video benchmark