s3:
  bucket: your-bucket-name

sagemaker:
  training:
    image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.6-cpu-py312-ubuntu22.04-sagemaker-v1
    instance_type: ml.m4.xlarge
    instance_count: 1
    source_dir: examples/training/source
    entry_point: train.py
    hyperparameters:
      epochs: 1
      batch-size: 32
      learning-rate: 0.001
      image-size: 160
      validation-split: 0.2

Training Hyperparameters

Values under sagemaker.training.hyperparameters are passed to the training entry point as command-line arguments. For this example, they map to arguments defined in source/train.py.

Supported by this example:

Name	Type	Default	Description
`epochs`	int	`1`	Number of training epochs.
`batch-size`	int	`32`	Images per training batch.
`learning-rate`	float	`0.001`	Adam optimizer learning rate.
`image-size`	int	`160`	Resize images to square `image-size x image-size`.
`validation-split`	float	`0.2`	Fraction of data used for validation.
`max-samples`	int	`0`	Optional cap for smoke tests; `0` means use all images.
`seed`	int	`13`	Random seed for reproducible splitting.
`num-workers`	int	`2`	DataLoader worker count.

Do not set train-dir or model-dir in normal SageMaker runs. SageMaker sets those automatically through SM_CHANNEL_TRAIN and SM_MODEL_DIR.

1. Download The Dataset

bash examples/training/download_flower_photos.sh

This creates:

examples/training/data/flower_photos_sagemaker/
  daisy/
  dandelion/
  roses/
  sunflowers/
  tulips/

2. Run Training

Run the training script and wait until it finishes:

bash examples/training/run_training.sh --config config.yaml --wait

Use a dataset that is already uploaded to s3.data_prefix:

bash examples/training/run_training.sh \
  --config config.yaml \
  --skip-upload \
  --wait

Notes

The default dataset path is examples/training/data/flower_photos_sagemaker.
Uploaded data uses the s3.bucket and s3.data_prefix values from config.yaml.
Training artifacts are written under s3://<bucket>/<model_prefix>/.
The SageMaker model.tar.gz contains model.onnx, model.pt, class_to_idx.json, and metrics.json.
SageMaker packages examples/training/source, installs requirements.txt, and runs train.py.