Documentation Index Fetch the complete documentation index at: https://mintlify.com/facebookresearch/sam3/llms.txt
Use this file to discover all available pages before exploring further.
SAM 3 uses Hydra for configuration management, with YAML files defining all training parameters. This page documents the configuration structure and key options.
Configuration Structure
Training configs are located in sam3/train/configs/ and follow this hierarchy:
configs/
├── eval_base.yaml # Base configuration
├── roboflow_v100/
│ └── roboflow_v100_full_ft_100_images.yaml
├── gold_image_evals/
└── silver_image_evals/
Basic Configuration Example
Here’s a minimal training configuration:
# @package _global_
defaults :
- _self_
paths :
dataset_root : /path/to/dataset
experiment_log_dir : /path/to/experiments
bpe_path : /path/to/bpe_simple_vocab_16e6.txt.gz
launcher :
num_nodes : 1
gpus_per_node : 4
experiment_log_dir : ${paths.experiment_log_dir}
submitit :
use_cluster : False
timeout_hour : 72
cpus_per_task : 10
port_range : [ 10000 , 65000 ]
trainer :
_target_ : sam3.train.trainer.Trainer
max_epochs : 20
mode : train
accelerator : cuda
seed_value : 123
Configuration Sections
Paths
Define dataset and output paths:
paths :
# Dataset location
dataset_root : /path/to/dataset
# Experiment outputs (logs, checkpoints)
experiment_log_dir : /path/to/experiments/run_001
# BPE tokenizer for text encoding
bpe_path : /path/to/bpe_simple_vocab_16e6.txt.gz
# Pretrained checkpoint (optional)
checkpoint_path : null # Auto-downloads if null
Launcher
Configure distributed training resources:
launcher :
num_nodes : 1 # Number of compute nodes
gpus_per_node : 4 # GPUs per node
experiment_log_dir : ${paths.experiment_log_dir}
multiprocessing_context : forkserver
Submitit (SLURM)
SLURM cluster configuration:
submitit :
use_cluster : True # True for SLURM, False for local
account : null # SLURM account
partition : gpu # SLURM partition
qos : null # Quality of Service
timeout_hour : 72 # Job timeout
cpus_per_task : 10 # CPUs per task
port_range : [ 10000 , 65000 ] # Distributed training ports
constraint : null # Node constraints
mem_gb : 128 # Memory per node (optional)
Trainer
Main training parameters:
trainer :
_target_ : sam3.train.trainer.Trainer
# Training duration
max_epochs : 20
# Mode: train, val, or train_only
mode : train
# Hardware
accelerator : cuda
# Reproducibility
seed_value : 123
# Validation frequency
val_epoch_freq : 10
# Skip first validation
skip_first_val : True
# Checkpoint management
skip_saving_ckpts : false
# Memory management
empty_gpu_mem_cache_after_eval : True
# Gradient accumulation
gradient_accumulation_steps : 1
Model
Model architecture configuration:
trainer :
model :
_target_ : sam3.model_builder.build_sam3_image_model
bpe_path : ${paths.bpe_path}
device : cpus # Load on CPU first, move to GPU later
eval_mode : false
enable_segmentation : True # Enable mask prediction
checkpoint_path : ${paths.checkpoint_path}
Data
Dataset and dataloader configuration:
trainer :
data :
train :
_target_ : sam3.train.data.torch_dataset.TorchDataset
dataset :
_target_ : sam3.train.data.sam3_image_dataset.Sam3ImageDataset
img_folder : ${paths.dataset_root}/train/
ann_file : ${paths.dataset_root}/train/_annotations.coco.json
transforms : ${scratch.train_transforms}
load_segmentation : ${scratch.enable_segmentation}
max_ann_per_img : 500000
training : true
limit_ids : 100 # Limit to N images (null for all)
shuffle : True
batch_size : ${scratch.train_batch_size}
num_workers : ${scratch.num_train_workers}
pin_memory : True
drop_last : True
collate_fn : ${scratch.collate_fn}
val :
_target_ : sam3.train.data.torch_dataset.TorchDataset
dataset :
_target_ : sam3.train.data.sam3_image_dataset.Sam3ImageDataset
img_folder : ${paths.dataset_root}/test/
ann_file : ${paths.dataset_root}/test/_annotations.coco.json
transforms : ${scratch.val_transforms}
training : false
shuffle : False
batch_size : ${scratch.val_batch_size}
num_workers : ${scratch.num_val_workers}
Data augmentation pipeline:
scratch :
train_transforms :
- _target_ : sam3.train.transforms.basic_for_api.ComposeAPI
transforms :
# Random resize with scale range
- _target_ : sam3.train.transforms.basic_for_api.RandomResizeAPI
sizes :
_target_ : sam3.train.transforms.basic.get_random_resize_scales
size : ${scratch.resolution}
min_size : 480
rounded : false
max_size :
_target_ : sam3.train.transforms.basic.get_random_resize_max_size
size : ${scratch.resolution}
square : true
# Pad to fixed size
- _target_ : sam3.train.transforms.basic_for_api.PadToSizeAPI
size : ${scratch.resolution}
# Convert to tensor
- _target_ : sam3.train.transforms.basic_for_api.ToTensorAPI
# Normalize
- _target_ : sam3.train.transforms.basic_for_api.NormalizeAPI
mean : [ 0.5 , 0.5 , 0.5 ]
std : [ 0.5 , 0.5 , 0.5 ]
Optimizer
Optimizer and scheduler configuration:
trainer :
optim :
# Mixed precision training
amp :
enabled : True
amp_dtype : bfloat16 # bfloat16 or float16
# Optimizer
optimizer :
_target_ : torch.optim.AdamW
# Gradient clipping
gradient_clip :
_target_ : sam3.train.optim.optimizer.GradientClipper
max_norm : 0.1
norm_type : 2
# Learning rate schedules
options :
lr :
# Transformer learning rate
- scheduler :
_target_ : sam3.train.optim.schedulers.InverseSquareRootParamScheduler
base_lr : 0.00008 # 8e-5
timescale : 20
warmup_steps : 20
cooldown_steps : 20
# Vision backbone learning rate
- scheduler :
_target_ : sam3.train.optim.schedulers.InverseSquareRootParamScheduler
base_lr : 0.000025 # 2.5e-5
timescale : 20
warmup_steps : 20
cooldown_steps : 20
param_names :
- 'backbone.vision_backbone.*'
# Language backbone learning rate
- scheduler :
_target_ : sam3.train.optim.schedulers.InverseSquareRootParamScheduler
base_lr : 0.000005 # 5e-6
timescale : 20
warmup_steps : 20
cooldown_steps : 20
param_names :
- 'backbone.language_backbone.*'
# Weight decay
weight_decay :
- scheduler :
_target_ : fvcore.common.param_scheduler.ConstantParamScheduler
value : 0.1
- scheduler :
_target_ : fvcore.common.param_scheduler.ConstantParamScheduler
value : 0.0
param_names :
- '*bias*'
module_cls_names : [ 'torch.nn.LayerNorm' ]
Loss Functions
Loss configuration for detection and segmentation:
trainer :
loss :
all :
_target_ : sam3.train.loss.sam3_loss.Sam3LossWrapper
# Matching strategy
matcher :
_target_ : sam3.train.matcher.BinaryHungarianMatcherV2
focal : true
cost_class : 2.0
cost_bbox : 5.0
cost_giou : 2.0
alpha : 0.25
gamma : 2
# One-to-many matching
o2m_weight : 2.0
o2m_matcher :
_target_ : sam3.train.matcher.BinaryOneToManyMatcher
alpha : 0.3
threshold : 0.4
topk : 4
# Detection losses
loss_fns_find :
- _target_ : sam3.train.loss.loss_fns.Boxes
weight_dict :
loss_bbox : 5.0
loss_giou : 2.0
- _target_ : sam3.train.loss.loss_fns.IABCEMdetr
weight_dict :
loss_ce : 20.0
presence_loss : 20.0
pos_weight : 10.0
alpha : 0.25
gamma : 2
# Segmentation loss (optional)
# loss_fns_find:
# - _target_: sam3.train.loss.loss_fns.Masks
# weight_dict:
# loss_mask: 200.0
# loss_dice: 10.0
Checkpoint
Checkpoint saving configuration:
trainer :
checkpoint :
save_dir : ${launcher.experiment_log_dir}/checkpoints
save_freq : 0 # 0 = only save last checkpoint
save_list : [ 5 , 10 , 15 ] # Also save at specific epochs
# Resume from checkpoint
resume_from : null # Path to checkpoint.pt
# Model initialization
model_weight_initializer : null
Logging
Logging and monitoring:
trainer :
logging :
log_dir : ${launcher.experiment_log_dir}/logs/
log_freq : 10 # Log every N iterations
# TensorBoard
tensorboard_writer :
_target_ : sam3.train.utils.logger.make_tensorboard_logger
log_dir : ${launcher.experiment_log_dir}/tensorboard
flush_secs : 120
should_log : True
# Weights & Biases (optional)
wandb_writer : null
log_level_primary : INFO
log_level_secondary : ERROR
Distributed Training
Distributed training settings:
trainer :
distributed :
backend : nccl # nccl for GPU, gloo for CPU
find_unused_parameters : True
gradient_as_bucket_view : True
static_graph : False
comms_dtype : null # bfloat16, float16, or null
timeout_mins : 30
CUDA Settings
CUDA optimization options:
trainer :
cuda :
cudnn_deterministic : false
cudnn_benchmark : true
allow_tf32 : false
matmul_allow_tf32 : null # Override for matmul
cudnn_allow_tf32 : null # Override for cudnn
Scratch Parameters
Common training hyperparameters in the scratch section:
scratch :
# Model
enable_segmentation : True
d_model : 256
# Image processing
resolution : 1008
max_ann_per_img : 200
# Normalization
train_norm_mean : [ 0.5 , 0.5 , 0.5 ]
train_norm_std : [ 0.5 , 0.5 , 0.5 ]
# Batch size
train_batch_size : 1
val_batch_size : 1
gradient_accumulation_steps : 1
# Workers
num_train_workers : 10
num_val_workers : 4
# Learning rates
lr_scale : 0.1
lr_transformer : 0.00008
lr_vision_backbone : 0.000025
lr_language_backbone : 0.000005
lrd_vision_backbone : 0.9 # Layer decay
wd : 0.1 # Weight decay
# Scheduler
scheduler_timescale : 20
scheduler_warmup : 20
scheduler_cooldown : 20
Always validate your configuration before starting training: python -m sam3.train.train -c configs/your_config.yaml --use-cluster 0
Check the printed config for any errors or unexpected values.
Configuration Tips
For Small Datasets
Reduce learning rate: lr_scale: 0.01
More epochs: max_epochs: 50
Frequent validation: val_epoch_freq: 5
For Large Datasets
Standard learning rate: lr_scale: 0.1
Fewer epochs: max_epochs: 20
Less frequent validation: val_epoch_freq: 10
For Memory Constraints
Smaller resolution: resolution: 512
Gradient accumulation: gradient_accumulation_steps: 4
Reduce workers: num_train_workers: 2
For Speed
Disable segmentation: enable_segmentation: False
Larger batch size: train_batch_size: 2
More workers: num_train_workers: 16
Next Steps
Local Training Run training with your configuration
Cluster Training Scale to SLURM clusters