Documentation Index Fetch the complete documentation index at: https://mintlify.com/facebookresearch/sam3/llms.txt
Use this file to discover all available pages before exploring further.
This guide walks through setting up the environment and dependencies for training SAM 3 models.
Prerequisites
Python 3.9 or later
CUDA 11.8 or later (for GPU training)
16GB+ VRAM per GPU
Linux operating system (recommended)
Installation
Install SAM 3
First, install the SAM 3 package from source: git clone https://github.com/facebookresearch/sam3.git
cd sam3
pip install -e .
This installs SAM 3 in editable mode with all core dependencies.
Install Training Dependencies
Install additional packages required for training: pip install hydra-core submitit fvcore iopath tensorboard
Optional dependencies: # For Weights & Biases logging
pip install wandb
# For COCO evaluation
pip install pycocotools
Verify PyTorch Installation
Ensure PyTorch is installed with CUDA support: python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"
Expected output: PyTorch: 2.1.0+cu118
CUDA: True
If CUDA is not available, reinstall PyTorch: pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
Download Assets
Download the BPE tokenizer file required for text encoding: mkdir -p sam3/assets
cd sam3/assets
wget https://huggingface.co/facebook/sam3/resolve/main/bpe_simple_vocab_16e6.txt.gz
Note the path to this file - you’ll need it in your training config.
Directory Structure
Set up your training workspace:
workspace/
├── sam3/ # SAM 3 repository
│ ├── train/ # Training code
│ │ ├── configs/ # Configuration files
│ │ └── train.py # Main training script
│ └── assets/ # Model assets
│ └── bpe_simple_vocab_16e6.txt.gz
├── datasets/ # Your datasets
│ └── my_dataset/
│ ├── train/
│ └── test/
└── experiments/ # Training outputs
└── logs/
Prepare Your Dataset
Format Annotations
Ensure your dataset uses COCO JSON format: datasets/my_dataset/
├── train/
│ ├── image_001.jpg
│ ├── image_002.jpg
│ └── _annotations.coco.json
└── test/
├── image_001.jpg
└── _annotations.coco.json
The annotations file should contain:
images: List of image metadata
annotations: Bounding boxes and optional masks
categories: Object categories
Validate Dataset
Verify your annotations are correctly formatted: import json
from pycocotools.coco import COCO
# Load and validate
coco = COCO( 'datasets/my_dataset/train/_annotations.coco.json' )
print ( f "Images: { len (coco.imgs) } " )
print ( f "Annotations: { len (coco.anns) } " )
print ( f "Categories: { len (coco.cats) } " )
Prepare Segmentation Masks (Optional)
If training with segmentation, ensure annotations include mask data: {
"id" : 1 ,
"image_id" : 1 ,
"category_id" : 1 ,
"bbox" : [ 100 , 100 , 200 , 150 ],
"segmentation" : [[ x 1 , y 1 , x 2 , y 2 , ... ]], // Polygon or RLE
"area" : 30000 ,
"iscrowd" : 0
}
Masks can be in polygon format or RLE (Run-Length Encoding).
Environment Configuration
Set Environment Variables
Create a .env file or export variables:
# CUDA settings
export CUDA_VISIBLE_DEVICES = 0 , 1 , 2 , 3
# PyTorch settings
export PYTORCH_CUDA_ALLOC_CONF = max_split_size_mb : 512
# Distributed training
export NCCL_DEBUG = INFO
export NCCL_IB_DISABLE = 1 # If using InfiniBand
# Data paths
export DATASET_ROOT = / path / to / datasets
export EXPERIMENT_ROOT = / path / to / experiments
Update your training config with local paths:
paths :
# Dataset location
dataset_root : /path/to/datasets/my_dataset
# Where to save logs and checkpoints
experiment_log_dir : /path/to/experiments/my_training
# BPE tokenizer path
bpe_path : /path/to/sam3/assets/bpe_simple_vocab_16e6.txt.gz
# Pretrained checkpoint (optional)
checkpoint_path : null # Downloads from HuggingFace if null
GPU Setup
Single GPU
For single GPU training:
launcher :
num_nodes : 1
gpus_per_node : 1
submitit :
use_cluster : False
Multiple GPUs (Single Node)
For multi-GPU training on one machine:
launcher :
num_nodes : 1
gpus_per_node : 4 # Number of GPUs
submitit :
use_cluster : False
Cluster Setup
For SLURM cluster training, see Cluster Training .
Verify Installation
Test that everything is set up correctly:
# Check training script
python -m sam3.train.train --help
# Validate configuration
python -c "from hydra import compose, initialize_config_module; \
initialize_config_module('sam3.train', version_base='1.2'); \
cfg = compose(config_name='configs/eval_base.yaml'); \
print('Config loaded successfully')"
Ensure you have sufficient disk space for:
Dataset storage (varies by dataset size)
Checkpoints (~5GB per checkpoint)
Logs and tensorboard files
Troubleshooting
CUDA Out of Memory
If you encounter OOM errors:
Reduce batch size in config:
scratch :
train_batch_size : 1
Reduce image resolution:
scratch :
resolution : 512 # Default is 1008
Enable gradient accumulation:
scratch :
gradient_accumulation_steps : 4
Import Errors
If modules are not found:
# Add SAM 3 to PYTHONPATH
export PYTHONPATH = / path / to / sam3 : $PYTHONPATH
Slow Data Loading
If data loading is slow:
scratch :
num_train_workers : 8 # Increase workers
num_val_workers : 4
Next Steps
Now that your environment is set up:
Configuration Learn about training configuration options
Local Training Start training on local GPUs