Files

slalom 6c9f30d290 clean

2026-06-09 12:21:25 -04:00

8.5 KiB

Raw Blame History

YOLO26 Electric Meter Detection Example

This example trains a YOLO26 object detection model on the Roboflow Universe electric meter dataset using the existing qc-cli SageMaker training flow.

The workflow is intentionally command driven. Run each step yourself so you can inspect the dataset, update config.yaml, and decide when to submit the SageMaker job.

Dataset:

https://universe.roboflow.com/kemals-workspace-kbc8l/electric-meter-detection-o4tfi/dataset/1

Prerequisites

Install or sync the project dependencies: uv sync
The virtual environment is activated.
AWS credentials configured for the profile in config.yaml
Infrastructure already deployed with qc-cli infra setup

1. Download The Dataset

https://universe.roboflow.com/kemals-workspace-kbc8l/electric-meter-detection-o4tfi/dataset/1

Download the dataset in YOLOv26 format from the Roboflow UI, then extract the downloaded archive into:

examples/meter-detection/data/electric-meter-detection

The data.yaml file should be directly under that folder:

examples/meter-detection/data/electric-meter-detection/data.yaml

Do not move data.yaml into the train/ split folder.

After extracting, confirm the dataset has a YOLO data file and image splits:

find examples/meter-detection/data/electric-meter-detection -maxdepth 2 -type d | sort
find examples/meter-detection/data/electric-meter-detection -name data.yaml -print

Open examples/meter-detection/data/electric-meter-detection/data.yaml and make sure the split paths are relative to that folder:

path: .
train: train/images
val: valid/images
test: test/images

If your downloaded dataset does not include a test/ folder, remove the test: line.

The expected layout is similar to:

examples/meter-detection/data/electric-meter-detection/
  data.yaml
  train/
  valid/
  test/

2. Configure SageMaker Training

Update config.yaml so the training section points at this example's source directory:

sagemaker:
  training:
    image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.6-cpu-py312-ubuntu22.04-sagemaker-v1
    instance_type: ml.g4dn.xlarge
    instance_count: 1
    source_dir: examples/meter-detection/source
    entry_point: train.py
    hyperparameters:
      model: yolo26n.pt
      epochs: 25
      imgsz: 640
      batch: 16
      workers: 2

Use yolo26n.pt for a lightweight first YOLO26 run. If those weights are unavailable in the installed Ultralytics package, use yolo11n.pt as the established fallback:

      model: yolo11n.pt

The source/requirements.txt file is installed by the SageMaker PyTorch container before running train.py.

For a CPU smoke test, use a CPU instance and reduce the workload:

sagemaker:
  training:
    image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.6-cpu-py312-ubuntu22.04-sagemaker-v1
    instance_type: ml.m4.xlarge
    instance_count: 1
    source_dir: examples/meter-detection/source
    entry_point: train.py
    hyperparameters:
      model: yolo26n.pt
      epochs: 1
      imgsz: 320
      batch: 4
      workers: 2

3. Check Infrastructure

Confirm the CLI can see the configured SageMaker role and S3 bucket:

qc-cli infra status

4. Upload The Dataset

Upload the downloaded Roboflow dataset to the s3.data_prefix configured in config.yaml:

qc-cli upload examples/meter-detection/data/electric-meter-detection

Directory uploads preserve paths relative to the uploaded directory, so SageMaker receives the dataset root with data.yaml plus the split directories.

In SageMaker, this uploaded dataset root is mounted at /opt/ml/input/data/train. That train path is the SageMaker channel name, not the YOLO train/ split folder.

5. Start Training

Submit the SageMaker training job:

qc-cli train start

The command prints the submitted SageMaker job name. Check progress with:

qc-cli train status

Or pass the job name explicitly:

qc-cli train status qc-cli-YYYYMMDD-HHMMSS

SageMaker Outputs

When the job completes, SageMaker packages the files written under /opt/ml/model into model.tar.gz.

This example writes:

best.pt
model.onnx
metrics.json

The archive is stored under the configured s3.model_prefix.

6. Configure Qualcomm AI Hub

Authenticate with Qualcomm AI Hub:

qai-hub configure --api_token

Add AI Hub settings to config.yaml. The input name and image size must match the ONNX model exported by this example:

aihub:
  device:
    name: Dragonwing IQ-9075 EVK
  target_runtime: tflite
  input_specs:
    images: [[1, 3, 640, 640], float32]
  job_name: meter-detection
  model_name: meter-detection
  output_dir: build/qai-hub/meter-detection

Use the same image size configured in sagemaker.training.hyperparameters.imgsz. For example, a smoke-test model trained with imgsz: 320 requires images: [[1, 3, 320, 320], float32].

7. Prepare AI Hub Inputs

Generate calibration samples and a validation input from the downloaded dataset:

uv run python examples/meter-detection/prepare_aihub_inputs.py --image-size 640

This writes:

examples/meter-detection/data/aihub_calibration/*.npy
examples/meter-detection/data/inputs.npz

The script applies the preprocessing expected by the exported YOLO model: aspect-ratio-preserving letterboxing, RGB channel order, channel-first layout, and pixel values normalized to [0, 1].

Set --image-size to the training imgsz value when it is not 640.

8. Upload To Qualcomm AI Hub

Use the SageMaker job name printed by qc-cli train start:

qc-cli ai-hub upload \
  examples/meter-detection/data/aihub_calibration \
  examples/meter-detection/data/inputs.npz \
  --from-job qc-cli-YYYYMMDD-HHMMSS

The command downloads the job's model.tar.gz, finds model.onnx, uploads it to AI Hub, and runs quantization, compilation, validation, and profiling. The uploaded source model uses the configured aihub.model_name.

The training example sanitizes the Ultralytics ONNX export before saving model.onnx. This removes graph input or output names, such as output0, that are duplicated in the ONNX value_info metadata and rejected by AI Hub.

For a model already downloaded by a failed upload attempt, sanitize the extracted ONNX file and retry using the local model. Replace the job name in both paths:

uv run --with onnx python examples/meter-detection/source/sanitize_onnx.py \
  build/qai-hub/meter-detection/qc-cli-YYYYMMDD-HHMMSS/source/extracted/model.onnx \
  --output build/qai-hub/meter-detection/model.aihub.onnx

qc-cli ai-hub upload \
  examples/meter-detection/data/aihub_calibration \
  examples/meter-detection/data/inputs.npz \
  --onnx-path build/qai-hub/meter-detection/model.aihub.onnx

If the meter-detection job is still the last training job in .qc-cli.json, --from-job can be omitted. Keeping it explicit prevents accidentally uploading an artifact from a different training run.

To resume after a completed step, use one of:

qc-cli ai-hub upload \
  examples/meter-detection/data/aihub_calibration \
  examples/meter-detection/data/inputs.npz \
  --from-step compile

qc-cli ai-hub upload \
  examples/meter-detection/data/aihub_calibration \
  examples/meter-detection/data/inputs.npz \
  --from-step validate

Download the compiled artifact after the workflow completes:

qc-cli ai-hub download --output build/qai-hub/meter-detection/model.tflite

Training Hyperparameters

Values under sagemaker.training.hyperparameters are passed to source/train.py as command-line arguments.

Name	Type	Default	Description
`model`	string	`yolo26n.pt`	Ultralytics model weights or model YAML.
`epochs`	int	`25`	Number of training epochs.
`imgsz`	int	`640`	Square training image size.
`batch`	int	`16`	Images per training batch.
`workers`	int	`2`	DataLoader worker count.
`patience`	int	`20`	Early stopping patience.
`device`	string	auto	Optional Ultralytics device value such as `0` or `cpu`.
`data-yaml`	string	auto	Optional path to `data.yaml`; normally discovered from the uploaded dataset root.
`dataset-dir`	string	`SM_CHANNEL_TRAIN`	Uploaded dataset root mounted by SageMaker.

Do not set dataset-dir or model-dir in normal SageMaker runs. SageMaker sets those automatically through SM_CHANNEL_TRAIN and SM_MODEL_DIR.

8.5 KiB Raw Blame History