Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/facebookresearch/sam3/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The CocoEvaluator class provides standard COCO evaluation metrics (AP, AR) for segmentation and detection tasks with distributed training support.

CocoEvaluator

Class Initialization

from sam3.eval.coco_eval import CocoEvaluator

evaluator = CocoEvaluator(
    coco_gt,
    iou_types=["segm"],
    useCats=False,
    dump_dir=None,
    postprocessor=None,
    average_by_rarity=False,
    use_normalized_areas=True,
    maxdets=[1, 10, 100],
    exhaustive_only=False,
    all_exhaustive_only=True
)

Parameters

coco_gt
COCO | list[COCO]
required
COCO API object(s) containing ground truth annotations. Can be a single COCO object or list for oracle evaluation.
iou_types
list[str]
required
Types of IoU to evaluate: ["segm"] for masks, ["bbox"] for boxes, or both.
useCats
bool
required
Whether to use categories for evaluation. Set False for open-vocabulary tasks.
dump_dir
str | None
required
Directory to dump predictions. If None, predictions are not saved.
postprocessor
object
required
Postprocessor module to convert model outputs to COCO format.
average_by_rarity
bool
default:"False"
Whether to compute AP separately for different object rarity buckets and average.
use_normalized_areas
bool
default:"True"
Whether object areas are normalized by image area. Affects size bucket definitions.
maxdets
list[int]
default:"[1, 10, 100]"
Maximum number of detections to evaluate per image.
exhaustive_only
bool
default:"False"
Whether to restrict evaluation to exhaustively annotated images only.
all_exhaustive_only
bool
default:"True"
Whether to require all ground truth sources to be exhaustive (for oracle evaluation).

Methods

update

Update evaluator with model outputs.
evaluator.update(
    model_outputs,
    targets,
    image_ids
)

synchronize_between_processes

Synchronize predictions across distributed processes.
evaluator.synchronize_between_processes()

accumulate

Accumulate evaluation results.
evaluator.accumulate(imgIds=None)

summarize

Compute and print summary metrics.
results = evaluator.summarize()
results
dict
Dictionary containing COCO metrics:
  • coco_eval_masks_AP: Mask AP (averaged over IoU thresholds)
  • coco_eval_masks_AP_50: Mask AP @ IoU=0.5
  • coco_eval_masks_AP_75: Mask AP @ IoU=0.75
  • coco_eval_masks_AP_{size}: AP by size (tiny/small/medium/large/huge)
  • coco_eval_masks_AR: Average Recall
  • Similar metrics for bbox if enabled

compute_synced

Run full evaluation pipeline (sync + accumulate + summarize).
results = evaluator.compute_synced()

Example Usage

Basic Evaluation

from pycocotools.coco import COCO
from sam3.eval.coco_eval import CocoEvaluator

# Load ground truth
coco_gt = COCO("annotations.json")

# Initialize evaluator
evaluator = CocoEvaluator(
    coco_gt=coco_gt,
    iou_types=["segm"],
    useCats=False,  # Open-vocabulary
    dump_dir="./predictions",
    postprocessor=my_postprocessor
)

# During evaluation loop
for batch in dataloader:
    outputs = model(batch)
    evaluator.update(outputs, batch["targets"], batch["image_ids"])

# Compute final metrics
results = evaluator.compute_synced()

print(f"Mask AP: {results['coco_eval_masks_AP']:.3f}")
print(f"Mask AP50: {results['coco_eval_masks_AP_50']:.3f}")
print(f"Mask AP75: {results['coco_eval_masks_AP_75']:.3f}")

Distributed Training

import torch.distributed as dist

# Initialize evaluator on all ranks
evaluator = CocoEvaluator(
    coco_gt=coco_gt,
    iou_types=["segm"],
    useCats=True,
    dump_dir="./predictions",
    postprocessor=postprocessor
)

# Each rank processes its data
for batch in dataloader:
    outputs = model(batch)
    evaluator.update(outputs, batch["targets"], batch["image_ids"])

# Synchronize across ranks
evaluator.synchronize_between_processes()

# Only rank 0 computes and prints metrics
if dist.get_rank() == 0:
    results = evaluator.summarize()

Box and Mask Evaluation

# Evaluate both boxes and masks
evaluator = CocoEvaluator(
    coco_gt=coco_gt,
    iou_types=["bbox", "segm"],
    useCats=True,
    dump_dir=None,
    postprocessor=postprocessor
)

# ... run evaluation ...

results = evaluator.compute_synced()

print(f"Box AP: {results['coco_eval_bbox_AP']:.3f}")
print(f"Mask AP: {results['coco_eval_masks_AP']:.3f}")

Custom Max Detections

# Evaluate with different max detection thresholds
evaluator = CocoEvaluator(
    coco_gt=coco_gt,
    iou_types=["segm"],
    useCats=False,
    dump_dir=None,
    postprocessor=postprocessor,
    maxdets=[1, 10, 300]  # Custom thresholds
)

Normalized Areas

# When object areas are normalized by image area
evaluator = CocoEvaluator(
    coco_gt=coco_gt,
    iou_types=["segm"],
    useCats=False,
    dump_dir=None,
    postprocessor=postprocessor,
    use_normalized_areas=True  # Adjusts size buckets
)

# Size buckets become:
# - tiny: [0, 0.001]
# - small: [0.001, 0.01]
# - medium: [0.01, 0.1]
# - large: [0.1, 0.5]
# - huge: [0.5, 0.95]
# - whole_image: [0.95, inf]

Metrics Explained

Average Precision (AP)

AP - Mean AP over IoU thresholds [0.5, 0.95] with step 0.05 AP_50 - AP at IoU threshold 0.5 (loose localization) AP_75 - AP at IoU threshold 0.75 (strict localization) AP_ - AP for specific object sizes:
  • tiny: Very small objects (area < 0.1% of image)
  • small: Small objects (0.1% - 1% of image)
  • medium: Medium objects (1% - 10% of image)
  • large: Large objects (10% - 50% of image)
  • huge: Very large objects (50% - 95% of image)
  • whole_image: Nearly entire image (> 95%)

Average Recall (AR)

AR - Mean recall at max detections threshold AR_50 - AR at maxDets=50 (if maxdets includes 50) AR_75 - AR at maxDets=75 (if maxdets includes 75) AR_ - Recall by object size

Postprocessor Requirements

The postprocessor must implement:
class MyPostprocessor:
    def process_results(self, outputs, targets, image_ids):
        """
        Convert model outputs to COCO prediction format.
        
        Returns:
            dict: {image_id: {"masks": ..., "boxes": ..., "scores": ..., "labels": ...}}
        """
        predictions = {}
        for img_id, output in zip(image_ids, outputs):
            predictions[img_id] = {
                "masks": output["masks"],  # (N, H, W) binary masks
                "boxes": output["boxes"],  # (N, 4) boxes in XYXY format
                "scores": output["scores"],  # (N,) confidence scores
                "labels": output["labels"],  # (N,) category IDs
            }
        return predictions

COCO Format Requirements

Ground Truth

{
  "images": [
    {"id": 1, "width": 640, "height": 480, "file_name": "image.jpg"}
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "segmentation": {"size": [480, 640], "counts": "..."},  // RLE
      "area": 5000,
      "bbox": [x, y, w, h],
      "iscrowd": 0
    }
  ],
  "categories": [
    {"id": 1, "name": "person", "supercategory": "person"}
  ]
}

Predictions

Predictions are automatically converted to:
[
  {
    "image_id": 1,
    "category_id": 1,
    "segmentation": {"size": [480, 640], "counts": "..."},
    "score": 0.95,
    "area": 5000
  }
]

Notes

  • Uses pycocotools internally
  • Supports distributed evaluation across multiple GPUs
  • Predictions can be dumped to disk for later analysis
  • Size buckets automatically adjusted for normalized areas
  • Compatible with COCO, LVIS, and custom datasets in COCO format
  • For open-vocabulary tasks, set useCats=False

See Also