update ai-hub to first optimize model for Workbench

Remove old examples
2026-06-09 14:55:26 -04:00
parent 6c9f30d290
commit f26e8256f0
12 changed files with 260 additions and 700 deletions
--- a/README.md
+++ b/README.md
@@ -177,27 +177,35 @@ The expected output artifact is SageMaker’s `model.tar.gz`, normally containin
 ```
 qc-cli ai-hub upload <calibration.npz|calibration-dir> <inputs.npz|inputs.npy>
 qc-cli ai-hub upload <calibration> <inputs> --from-step validate
-qc-cli ai-hub quantize <calibration.npz|calibration-dir> [--onnx-path PATH] [--model-s3-uri URI] [--from-job NAME]
+qc-cli ai-hub optimize [--onnx-path PATH] [--model-s3-uri URI] [--from-job NAME]
 qc-cli ai-hub quantize <calibration.npz|calibration-dir> [--model-id ID] [--onnx-path PATH] [--model-s3-uri URI] [--from-job NAME]
 qc-cli ai-hub compile [--model-id ID] [--onnx-path PATH] [--model-s3-uri URI] [--from-job NAME]
 qc-cli ai-hub validate <inputs.npz|inputs.npy> [--model-id ID] [--input-name NAME]
 qc-cli ai-hub profile [--model-id ID]
 qc-cli ai-hub download [--model-id ID] [--output PATH]
 ```
-`ai-hub upload` runs the four Workbench upload steps in order: quantize, compile, validate, and profile. Use `--from-step compile`, `--from-step validate`, or `--from-step profile` to resume from saved local state after a completed earlier step.
+`ai-hub upload` optimizes to ONNX, quantizes, validates, and profiles. When `aihub.target_runtime` is not `onnx`, it
 also compiles the quantized model to that deployment runtime. The initial ONNX optimization gives external models
 Workbench provenance and applies compiler optimization passes before quantization.
 Resume behavior:
 ```text
--from-step quantize  Run quantize, compile, validate, and profile.
+--from-step optimize  Run optimize, quantize, optional final compile, validate, and profile.
--from-step compile   Skip quantize; compile the last quantized model unless an explicit source is passed.
+--from-step quantize  Quantize the last optimized ONNX, then optionally compile, validate, and profile.
--from-step validate  Skip quantize and compile; validate the last compiled model.
+--from-step compile   Skip optimize and quantize; finalize the last quantized model for the target runtime.
--from-step profile   Skip quantize, compile, and validate; profile the last compiled model.
+--from-step validate  Skip optimize, quantize, and compile; validate the last compiled model.
 --from-step profile   Skip optimize, quantize, compile, and validate; profile the last compiled model.
 ```
 When a step runs in the current command, `upload` passes its returned model ID directly to the next step. When a step is skipped, the next step resolves the needed model ID from `.qc-cli.json`. This avoids re-running earlier AI Hub jobs when you only need to continue from a later step.
-`ai-hub compile` resolves model sources in this order: `--model-id`, explicit source options (`--onnx-path`, `--model-s3-uri`, `--from-job`), last quantized model from state, then the last training job from local state. `ai-hub download` is separate because downloading the optimized artifact is outside the four-step Workbench upload loop.
+`ai-hub optimize` compiles an external model with `--target_runtime onnx`. `ai-hub quantize` uses an explicit
 `--model-id`, the last optimized ONNX model, or an explicit/local model source in that order. `ai-hub compile` resolves
 model sources in this order: `--model-id`, explicit source options, last quantized model, then the last training job.
 For `target_runtime: onnx`, upload treats the quantized ONNX as the final model and skips a redundant second compile.
 `ai-hub download` remains separate because downloading is outside the Workbench processing loop.
 AI Hub authentication currently uses the local `qai-hub` SDK configuration. A planned follow-up is to support AWS Systems Manager Parameter Store `SecureString` for team-managed tokens, where `config.yaml` stores only a parameter name such as `/qc-cli/aihub/token`, AWS KMS encrypts the token at rest, and the CLI retrieves it at runtime with `ssm:GetParameter` plus `kms:Decrypt` permissions.
--- a/examples/ai-hub/README.md
+++ b/examples/ai-hub/README.md
@@ -1,79 +0,0 @@
 # Qualcomm AI Hub Example
 This example takes the ONNX model produced by the SageMaker training example and runs the Qualcomm AI Hub upload workflow:
 1. Quantize
 2. Compile
 3. Validate
 4. Profile
 5. Download the compiled artifact
 ## Prerequisites
 Run the training example first and wait for it to complete:
 ```bash
 examples/training/run_training.sh --wait
 ```
 The `config.yaml` file must include AI Hub settings:
 ```yaml
 aihub:
  device:
    name: Samsung Galaxy S25 (Family)
  target_runtime: tflite
  input_specs:
    input: [[1, 3, 160, 160], float32]
  output_dir: build/qai-hub
 ```
 Finally, the user needs to authenticate with Qualcomm AI Hub using:
 ```bash
 qai-hub configure --api_token
 ```
 ## Prepare Inputs
 AI Hub does not consume the raw JPG training images directly. It needs NumPy tensors that match the ONNX model input shape and preprocessing.
 To generate calibration and validation inputs:
 ```bash
 python examples/ai-hub/prepare_inputs.py
 ```
 This writes:
 ```text
 examples/training/data/aihub_calibration/*.npy
 examples/training/data/inputs.npz
 ```
 The script applies the same image preprocessing used by the training example:
 - resize to `160x160`
 - convert to channel-first `1x3x160x160`
 - normalize with ImageNet mean and standard deviation
 ## Upload Model to Qualcomm Workbench
 The model can be uploaded to Qualcomm Workbench using:
 ```bash
 qc-cli ai-hub upload examples/training/data/aihub_calibration examples/training/data/inputs.npz
 ```
 The first argument is the calibration path for the model and the second argument is the input file, both of which were created by the `prepare_inputs.py` script. For more details, add `--help` after the `upload` command.
 The `upload` command runs the following commands in order:
 1. `qc-cli ai-hub quantize`
 2. `qc-cli ai-hub compile`
 3. `qc-cli ai-hub validate`
 4. `qc-cli ai-hub profile`
 Finally the user can download the model from AI Workbench using the command
 ```bash
 qc-cli ai-hub download
 ```
--- a/examples/ai-hub/prepare_inputs.py
+++ b/examples/ai-hub/prepare_inputs.py
@@ -1,74 +0,0 @@
 #!/usr/bin/env python3
 """Prepare Qualcomm AI Hub calibration and validation inputs for the training example."""
 from __future__ import annotations
 import argparse
 from pathlib import Path
 import numpy as np
 from PIL import Image
 IMAGE_EXTENSIONS = {".jpg", ".jpeg", ".png"}
 def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument(
        "--dataset-dir",
        type=Path,
        default=Path("examples/training/data/flower_photos_sagemaker"),
        help="ImageFolder-style dataset used for training.",
    )
    parser.add_argument(
        "--calibration-dir",
        type=Path,
        default=Path("examples/training/data/aihub_calibration"),
        help="Directory where .npy calibration samples will be written.",
    )
    parser.add_argument(
        "--input-file",
        type=Path,
        default=Path("examples/training/data/inputs.npz"),
        help="Validation .npz input file for qc-cli ai-hub validate.",
    )
    parser.add_argument("--input-name", default="input", help="ONNX input name.")
    parser.add_argument("--image-size", type=int, default=160, help="Square image size used by training.")
    parser.add_argument("--samples", type=int, default=16, help="Number of calibration samples to write.")
    return parser.parse_args()
 def preprocess_image(path: Path, image_size: int) -> np.ndarray:
    image = Image.open(path).convert("RGB").resize((image_size, image_size), Image.Resampling.BILINEAR)
    array = np.asarray(image, dtype=np.float32) / 255.0
    array = np.transpose(array, (2, 0, 1))
    mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)[:, None, None]
    std = np.array([0.229, 0.224, 0.225], dtype=np.float32)[:, None, None]
    return ((array - mean) / std)[None, ...].astype("float32")
 def main() -> None:
    args = parse_args()
    images = sorted(p for p in args.dataset_dir.rglob("*") if p.suffix.lower() in IMAGE_EXTENSIONS)
    if not images:
        raise SystemExit(f"No images found under {args.dataset_dir}")
    if args.samples < 1:
        raise SystemExit("--samples must be at least 1")
    args.calibration_dir.mkdir(parents=True, exist_ok=True)
    args.input_file.parent.mkdir(parents=True, exist_ok=True)
    sample_count = min(args.samples, len(images))
    prepared = []
    for index, image_path in enumerate(images[:sample_count]):
        sample = preprocess_image(image_path, args.image_size)
        np.save(args.calibration_dir / f"sample_{index:03d}.npy", sample)
        prepared.append(sample)
    np.savez(args.input_file, **{args.input_name: prepared[0]})
    print(f"Wrote {sample_count} calibration samples to {args.calibration_dir}")
    print(f"Wrote validation input to {args.input_file}")
 if __name__ == "__main__":
    main()
--- a/examples/meter-detection/README.md
+++ b/examples/meter-detection/README.md
@@ -181,7 +181,7 @@ Add AI Hub settings to `config.yaml`. The input name and image size must match t
 aihub:
  device:
    name: Dragonwing IQ-9075 EVK
-  target_runtime: tflite
+  target_runtime: onnx
  input_specs:
    images: [[1, 3, 640, 640], float32]
  job_name: meter-detection
@@ -189,7 +189,7 @@ aihub:
  output_dir: build/qai-hub/meter-detection
 ```
-Use the same image size configured in `sagemaker.training.hyperparameters.imgsz`. For example, a smoke-test model trained with `imgsz: 320` requires `images: [[1, 3, 320, 320], float32]`.
+The ONNX graph is the source of truth. The export normally uses the same value as `sagemaker.training.hyperparameters.imgsz`, but changing `config.yaml` after training does not resize an existing model. For example, a model exported with `imgsz: 320` requires `images: [[1, 3, 320, 320], float32]`.
 ## 7. Prepare AI Hub Inputs
@@ -208,8 +208,6 @@ examples/meter-detection/data/inputs.npz
 The script applies the preprocessing expected by the exported YOLO model: aspect-ratio-preserving letterboxing, RGB channel order, channel-first layout, and pixel values normalized to `[0, 1]`.
 Set `--image-size` to the training `imgsz` value when it is not `640`.
 ## 8. Upload To Qualcomm AI Hub
 Use the SageMaker job name printed by `qc-cli train start`:
@@ -221,7 +219,12 @@ qc-cli ai-hub upload \
  --from-job qc-cli-YYYYMMDD-HHMMSS
 ```
-The command downloads the job's `model.tar.gz`, finds `model.onnx`, uploads it to AI Hub, and runs quantization, compilation, validation, and profiling. The uploaded source model uses the configured `aihub.model_name`.
+The command downloads the job's `model.tar.gz`, finds `model.onnx`, and runs the following AI Hub workflow:
 1. Compile the external ONNX to a Workbench-optimized ONNX model.
 2. Quantize the optimized ONNX model.
 3. Compile the quantized model when the configured deployment runtime is not `onnx`.
 4. Validate and profile the final model.
 The training example sanitizes the Ultralytics ONNX export before saving `model.onnx`. This removes graph input or output names, such as `output0`, that are duplicated in the ONNX `value_info` metadata and rejected by AI Hub.
@@ -238,24 +241,6 @@ qc-cli ai-hub upload \
  --onnx-path build/qai-hub/meter-detection/model.aihub.onnx
 ```
 If the meter-detection job is still the last training job in `.qc-cli.json`, `--from-job` can be omitted. Keeping it explicit prevents accidentally uploading an artifact from a different training run.
 To resume after a completed step, use one of:
 ```bash
 qc-cli ai-hub upload \
  examples/meter-detection/data/aihub_calibration \
  examples/meter-detection/data/inputs.npz \
  --from-step compile
 ```
 ```bash
 qc-cli ai-hub upload \
  examples/meter-detection/data/aihub_calibration \
  examples/meter-detection/data/inputs.npz \
  --from-step validate
 ```
 Download the compiled artifact after the workflow completes:
 ```bash
--- a/examples/training/README.md
+++ b/examples/training/README.md
@@ -1,89 +0,0 @@
 # SageMaker Training Example
 This example downloads a small image-classification dataset, uploads it through `qc-cli`, and submits a live SageMaker training job.
 ## Prerequisites
 - AWS credentials configured for the profile in `config.yaml`
 - Infrastructure already deployed with `qc-cli infra setup`
 - `config.yaml` updated with:
 ```yaml
 s3:
  bucket: your-bucket-name
 sagemaker:
  training:
    image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.6-cpu-py312-ubuntu22.04-sagemaker-v1
    instance_type: ml.m4.xlarge
    instance_count: 1
    source_dir: examples/training/source
    entry_point: train.py
    hyperparameters:
      epochs: 1
      batch-size: 32
      learning-rate: 0.001
      image-size: 160
      validation-split: 0.2
 ```
 ## Training Hyperparameters
 Values under `sagemaker.training.hyperparameters` are passed to the training entry point as command-line arguments. For this example, they map to arguments defined in [source/train.py](source/train.py).
 Supported by this example:
 | Name | Type | Default | Description |
 |---|---:|---:|---|
 | `epochs` | int | `1` | Number of training epochs. |
 | `batch-size` | int | `32` | Images per training batch. |
 | `learning-rate` | float | `0.001` | Adam optimizer learning rate. |
 | `image-size` | int | `160` | Resize images to square `image-size x image-size`. |
 | `validation-split` | float | `0.2` | Fraction of data used for validation. |
 | `max-samples` | int | `0` | Optional cap for smoke tests; `0` means use all images. |
 | `seed` | int | `13` | Random seed for reproducible splitting. |
 | `num-workers` | int | `2` | DataLoader worker count. |
 Do not set `train-dir` or `model-dir` in normal SageMaker runs. SageMaker sets those automatically through `SM_CHANNEL_TRAIN` and `SM_MODEL_DIR`.
 ## 1. Download The Dataset
 ```bash
 bash examples/training/download_flower_photos.sh
 ```
 This creates:
 ```text
 examples/training/data/flower_photos_sagemaker/
  daisy/
  dandelion/
  roses/
  sunflowers/
  tulips/
 ```
 ## 2. Run Training
 Run the training script and wait until it finishes:
 ```bash
 bash examples/training/run_training.sh --config config.yaml --wait
 ```
 Use a dataset that is already uploaded to `s3.data_prefix`:
 ```bash
 bash examples/training/run_training.sh \
  --config config.yaml \
  --skip-upload \
  --wait
 ```
 ## Notes
 - The default dataset path is `examples/training/data/flower_photos_sagemaker`.
 - Uploaded data uses the `s3.bucket` and `s3.data_prefix` values from `config.yaml`.
 - Training artifacts are written under `s3://<bucket>/<model_prefix>/`.
 - The SageMaker `model.tar.gz` contains `model.onnx`, `model.pt`, `class_to_idx.json`, and `metrics.json`.
 - SageMaker packages `examples/training/source`, installs `requirements.txt`, and runs `train.py`.
--- a/examples/training/download_flower_photos.sh
+++ b/examples/training/download_flower_photos.sh
@@ -1,40 +0,0 @@
 #!/usr/bin/env bash
 set -euo pipefail
 DATASET_URL="https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
 DEST_DIR="${1:-examples/training/data}"
 ARCHIVE_PATH="${DEST_DIR}/flower_photos.tgz"
 RAW_DATASET_DIR="${DEST_DIR}/flower_photos"
 DATASET_DIR="${DEST_DIR}/flower_photos_sagemaker"
 CLASS_NAMES=("daisy" "dandelion" "roses" "sunflowers" "tulips")
 mkdir -p "${DEST_DIR}"
 if [[ -d "${DATASET_DIR}" ]]; then
  echo "Dataset already exists: ${DATASET_DIR}"
  echo "Use this path with run_training.py:"
  echo "  ${DATASET_DIR}"
  exit 0
 fi
 echo "Downloading TensorFlow flower_photos dataset..."
 if command -v curl >/dev/null 2>&1; then
  curl -L "${DATASET_URL}" -o "${ARCHIVE_PATH}"
 elif command -v wget >/dev/null 2>&1; then
  wget -O "${ARCHIVE_PATH}" "${DATASET_URL}"
 else
  echo "Either curl or wget is required." >&2
  exit 1
 fi
 echo "Extracting dataset..."
 tar -xzf "${ARCHIVE_PATH}" -C "${DEST_DIR}"
 echo "Preparing SageMaker directory layout..."
 mkdir -p "${DATASET_DIR}"
 for class_name in "${CLASS_NAMES[@]}"; do
  cp -R "${RAW_DATASET_DIR}/${class_name}" "${DATASET_DIR}/${class_name}"
 done
 echo "Dataset ready: ${DATASET_DIR}"
 find "${DATASET_DIR}" -mindepth 1 -maxdepth 1 -type d -print | sort
--- a/examples/training/run_training.sh
+++ b/examples/training/run_training.sh
@@ -1,112 +0,0 @@
 #!/usr/bin/env bash
 set -euo pipefail
 CONFIG_PATH="config.yaml"
 DATASET_DIR="examples/training/data/flower_photos_sagemaker"
 WAIT=false
 SKIP_UPLOAD=false
 POLL_SECONDS=60
 usage() {
  cat <<EOF
 Usage: $0 [options]
 Options:
  --config PATH          Path to qc-cli config file. Default: config.yaml
  --dataset-dir PATH     Dataset directory to upload. Default: ${DATASET_DIR}
  --skip-upload          Train against data already uploaded to s3.data_prefix.
  --wait                 Poll until training completes.
  -h, --help             Show this help.
 EOF
 }
 while [[ $# -gt 0 ]]; do
  case "$1" in
    --config)
      CONFIG_PATH="$2"
      shift 2
      ;;
    --dataset-dir)
      DATASET_DIR="$2"
      shift 2
      ;;
    --skip-upload)
      SKIP_UPLOAD=true
      shift
      ;;
    --wait)
      WAIT=true
      shift
      ;;
    -h|--help)
      usage
      exit 0
      ;;
    *)
      echo "Unknown option: $1" >&2
      usage >&2
      exit 1
      ;;
  esac
 done
 if [[ ! -f "${CONFIG_PATH}" ]]; then
  echo "Config not found: ${CONFIG_PATH}" >&2
  exit 1
 fi
 if [[ "${SKIP_UPLOAD}" == false && ! -d "${DATASET_DIR}" ]]; then
  echo "Dataset not found: ${DATASET_DIR}" >&2
  echo "Run: bash examples/training/download_flower_photos.sh" >&2
  exit 1
 fi
 run() {
  echo "+ $*"
  "$@"
 }
 run uv run qc-cli infra status --config "${CONFIG_PATH}"
 if [[ "${SKIP_UPLOAD}" == false ]]; then
  run uv run qc-cli upload "${DATASET_DIR}" --config "${CONFIG_PATH}"
 fi
 TRAIN_OUTPUT_FILE="$(mktemp)"
 trap 'rm -f "${TRAIN_OUTPUT_FILE}"' EXIT
 run uv run qc-cli train start --config "${CONFIG_PATH}" | tee "${TRAIN_OUTPUT_FILE}"
 JOB_NAME="$(grep -Eo 'qc-cli-[0-9]{8}-[0-9]{6}' "${TRAIN_OUTPUT_FILE}" | tail -n 1)"
 if [[ -z "${JOB_NAME}" ]]; then
  echo "Could not find training job name in qc-cli output." >&2
  exit 1
 fi
 echo "Submitted SageMaker training job: ${JOB_NAME}"
 if [[ "${WAIT}" == false ]]; then
  run uv run qc-cli train status "${JOB_NAME}" --config "${CONFIG_PATH}"
  exit 0
 fi
 while true; do
  STATUS_OUTPUT="$(uv run qc-cli train status "${JOB_NAME}" --config "${CONFIG_PATH}")"
  echo "${STATUS_OUTPUT}"
  if printf '%s\n' "${STATUS_OUTPUT}" | grep -q 'Status:.*Completed'; then
    echo "Training completed successfully."
    exit 0
  fi
  if printf '%s\n' "${STATUS_OUTPUT}" | grep -q 'Status:.*Failed'; then
    echo "Training failed." >&2
    exit 1
  fi
  if printf '%s\n' "${STATUS_OUTPUT}" | grep -q 'Status:.*Stopped'; then
    echo "Training stopped." >&2
    exit 1
  fi
  sleep "${POLL_SECONDS}"
 done
--- a/examples/training/source/requirements.txt
+++ b/examples/training/source/requirements.txt
@@ -1 +0,0 @@
 onnx==1.21.0
--- a/examples/training/source/train.py
+++ b/examples/training/source/train.py
@@ -1,188 +0,0 @@
 #!/usr/bin/env python3
 """SageMaker entry point for CPU image-classification training."""
 from __future__ import annotations
 import argparse
 import json
 import os
 import random
 from pathlib import Path
 import torch
 from torch import nn
 from torch.utils.data import DataLoader, Subset, random_split
 from torchvision import datasets, transforms
 class SmallImageClassifier(nn.Module):
    def __init__(self, class_count: int) -> None:
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Conv2d(16, 32, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.AdaptiveAvgPool2d((1, 1)),
        )
        self.classifier = nn.Linear(64, class_count)
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)
        x = torch.flatten(x, 1)
        return self.classifier(x)
 def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser()
    parser.add_argument("--epochs", type=int, default=1)
    parser.add_argument("--batch-size", type=int, default=32)
    parser.add_argument("--learning-rate", type=float, default=0.001)
    parser.add_argument("--image-size", type=int, default=160)
    parser.add_argument("--validation-split", type=float, default=0.2)
    parser.add_argument("--max-samples", type=int, default=0)
    parser.add_argument("--seed", type=int, default=13)
    parser.add_argument("--num-workers", type=int, default=2)
    parser.add_argument("--train-dir", default=os.environ.get("SM_CHANNEL_TRAIN", "/opt/ml/input/data/train"))
    parser.add_argument("--model-dir", default=os.environ.get("SM_MODEL_DIR", "/opt/ml/model"))
    return parser.parse_args()
 def build_datasets(args: argparse.Namespace) -> tuple[Subset, Subset, dict[str, int]]:
    transform = transforms.Compose(
        [
            transforms.Resize((args.image_size, args.image_size)),
            transforms.ToTensor(),
            transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
        ]
    )
    dataset = datasets.ImageFolder(args.train_dir, transform=transform)
    if len(dataset.classes) < 2:
        raise ValueError(f"Expected at least two classes in {args.train_dir}. Found: {dataset.classes}")
    if args.max_samples > 0 and args.max_samples < len(dataset):
        indices = list(range(len(dataset)))
        random.Random(args.seed).shuffle(indices)
        dataset = Subset(dataset, indices[: args.max_samples])
    validation_size = max(1, int(len(dataset) * args.validation_split))
    train_size = len(dataset) - validation_size
    if train_size < 1:
        raise ValueError("Not enough images to create a train/validation split.")
    generator = torch.Generator().manual_seed(args.seed)
    train_dataset, validation_dataset = random_split(dataset, [train_size, validation_size], generator=generator)
    return train_dataset, validation_dataset, getattr(dataset, "dataset", dataset).class_to_idx
 def run_epoch(
    model: nn.Module,
    data_loader: DataLoader,
    criterion: nn.Module,
    optimizer: torch.optim.Optimizer | None,
    device: torch.device,
 ) -> tuple[float, float]:
    training = optimizer is not None
    model.train(training)
    total_loss = 0.0
    total_correct = 0
    total_examples = 0
    for images, labels in data_loader:
        images = images.to(device)
        labels = labels.to(device)
        with torch.set_grad_enabled(training):
            logits = model(images)
            loss = criterion(logits, labels)
            if training:
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        total_loss += loss.item() * images.size(0)
        total_correct += (logits.argmax(dim=1) == labels).sum().item()
        total_examples += images.size(0)
    return total_loss / total_examples, total_correct / total_examples
 def export_onnx(model: nn.Module, model_dir: Path, image_size: int) -> None:
    model.eval()
    dummy_input = torch.randn(1, 3, image_size, image_size)
    torch.onnx.export(
        model,
        dummy_input,
        model_dir / "model.onnx",
        export_params=True,
        opset_version=17,
        do_constant_folding=True,
        input_names=["input"],
        output_names=["logits"],
    )
 def main() -> None:
    args = parse_args()
    random.seed(args.seed)
    torch.manual_seed(args.seed)
    train_dataset, validation_dataset, class_to_idx = build_datasets(args)
    train_loader = DataLoader(
        train_dataset,
        batch_size=args.batch_size,
        shuffle=True,
        num_workers=args.num_workers,
    )
    validation_loader = DataLoader(
        validation_dataset,
        batch_size=args.batch_size,
        shuffle=False,
        num_workers=args.num_workers,
    )
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = SmallImageClassifier(class_count=len(class_to_idx)).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)
    print(f"Training on {device}. Classes: {sorted(class_to_idx)}")
    metrics = []
    for epoch in range(1, args.epochs + 1):
        train_loss, train_accuracy = run_epoch(model, train_loader, criterion, optimizer, device)
        validation_loss, validation_accuracy = run_epoch(model, validation_loader, criterion, None, device)
        epoch_metrics = {
            "epoch": epoch,
            "train_loss": train_loss,
            "train_accuracy": train_accuracy,
            "validation_loss": validation_loss,
            "validation_accuracy": validation_accuracy,
        }
        metrics.append(epoch_metrics)
        print(json.dumps(epoch_metrics, sort_keys=True))
    model_dir = Path(args.model_dir)
    model_dir.mkdir(parents=True, exist_ok=True)
    torch.save(
        {
            "model_state_dict": model.cpu().state_dict(),
            "class_to_idx": class_to_idx,
            "image_size": args.image_size,
        },
        model_dir / "model.pt",
    )
    export_onnx(model, model_dir, args.image_size)
    (model_dir / "class_to_idx.json").write_text(json.dumps(class_to_idx, indent=2), encoding="utf-8")
    (model_dir / "metrics.json").write_text(json.dumps(metrics, indent=2), encoding="utf-8")
    print(f"Saved model artifacts to {model_dir}")
 if __name__ == "__main__":
    main()
--- a/src/commands/ai_hub.py
+++ b/src/commands/ai_hub.py
@@ -1,4 +1,5 @@
 from collections.abc import Mapping, Sequence
 from dataclasses import dataclass
 from datetime import datetime
 from enum import StrEnum
 from pathlib import Path
@@ -12,9 +13,9 @@ from src import state as state_ops
 from src.commands.utils import CONFIG_OPT, CONSOLE, load_cfg
 from src.config import Config
 from src.qualcomm import aihub_jobs
-from src.qualcomm.artifacts import resolve_onnx
+from src.qualcomm.artifacts import ResolvedOnnx, resolve_onnx
-app = typer.Typer(help="Quantize, compile, validate, profile, and download models with Qualcomm Workbench")
+app = typer.Typer(help="Optimize, quantize, compile, validate, profile, and download models with Qualcomm Workbench")
 _RUNTIME_EXTENSIONS = {
    "tflite": "tflite",
@@ -24,12 +25,19 @@ _RUNTIME_EXTENSIONS = {
 class UploadStep(StrEnum):
    optimize = "optimize"
    quantize = "quantize"
    compile = "compile"
    validate = "validate"
    profile = "profile"
@dataclass(frozen=True)
 class ResolvedModelSource:
    model: str | Path
    model_artifact: str | None = None
 def _input_specs(cfg: Config) -> dict[str, tuple[tuple[int, ...], str]]:
    specs = {name: (tuple(shape), dtype) for name, (shape, dtype) in cfg.aihub.input_specs.items()}
    if not specs:
@@ -101,6 +109,57 @@ def _model_id_or_state(config_path: str, model_id: str | None, *, quantized: boo
    return resolved
 def _resolve_model_source(
    cfg: Config,
    config_path: str,
    *,
    model_id: str | None = None,
    previous_model_id: str | None = None,
    from_job: str | None = None,
    model_s3_uri: str | None = None,
    onnx_path: str | None = None,
 ) -> ResolvedModelSource:
    if model_id:
        return ResolvedModelSource(model_id)
    has_explicit_source = bool(from_job or model_s3_uri or onnx_path)
    if previous_model_id and not has_explicit_source:
        return ResolvedModelSource(previous_model_id)
    resolved = _resolve_onnx_source(
        cfg,
        config_path,
        from_job=from_job,
        model_s3_uri=model_s3_uri,
        onnx_path=onnx_path,
    )
    return ResolvedModelSource(resolved.onnx_path, resolved.model_artifact)
 def _resolve_onnx_source(
    cfg: Config,
    config_path: str,
    *,
    from_job: str | None = None,
    model_s3_uri: str | None = None,
    onnx_path: str | None = None,
 ) -> ResolvedOnnx:
    st = state_ops.store(config_path)
    last_training_job = st.get_last_training_job()
    saved_model_artifact = None
    if not from_job and not model_s3_uri and not onnx_path and not last_training_job:
        saved_model_artifact = st.get_last_model_artifact()
    return resolve_onnx(
        cfg=cfg,
        output_dir=cfg.aihub.output_dir,
        from_job=from_job,
        model_s3_uri=model_s3_uri or saved_model_artifact,
        onnx_path=onnx_path,
        last_training_job=last_training_job,
    )
 def _device_selector(device: Device) -> str:
    parts: list[str] = []
    if device.name:
@@ -132,20 +191,23 @@ def _quantize_step(
    cfg: Config,
    config_path: str,
    calibration_path: Path,
-    from_job: str | None,
+    *,
-    model_s3_uri: str | None,
+    model_id: str | None = None,
-    onnx_path: str | None,
+    from_job: str | None = None,
    model_s3_uri: str | None = None,
    onnx_path: str | None = None,
 ) -> str:
    st = state_ops.store(config_path)
    specs = _input_specs(cfg)
    try:
-        resolved = resolve_onnx(
+        source = _resolve_model_source(
-            cfg=cfg,
+            cfg,
-            output_dir=cfg.aihub.output_dir,
+            config_path,
            model_id=model_id,
            previous_model_id=st.get_last_optimized_model_id(),
            from_job=from_job,
-            model_s3_uri=model_s3_uri or st.get_last_model_artifact(),
+            model_s3_uri=model_s3_uri,
            onnx_path=onnx_path,
            last_training_job=st.get_last_training_job(),
        )
        calibration_data = _load_calibration(calibration_path, specs)
    except (FileNotFoundError, ValueError) as e:
@@ -153,73 +215,117 @@ def _quantize_step(
        raise typer.Exit(1)
    try:
        hub_model = (
            hub.upload_model(str(source.model), name=cfg.aihub.model_name)
            if isinstance(source.model, Path)
            else hub.get_model(source.model)
        )
        result = aihub_jobs.submit_quantize_job(
-            resolved.onnx_path,
+            hub_model,
            calibration_data,
            cfg.aihub.quantize_options,
            job_name=_job_name(cfg, "quantize"),
            model_name=cfg.aihub.model_name,
        )
    except Exception as e:
        CONSOLE.print(f"[red]AI Hub quantize failed: {e}[/red]")
        raise typer.Exit(1)
-    st.update(
+    updates: dict[str, Any] = {
-        last_model_artifact=resolved.model_artifact,
+        "last_quantize_job_id": result["job_id"],
-        last_quantize_job_id=result["job_id"],
+        "last_quantized_model_id": result["model_id"],
-        last_quantized_model_id=result["model_id"],
+    }
-    )
+    if source.model_artifact:
        updates["last_model_artifact"] = source.model_artifact
    st.update(**updates)
    CONSOLE.print(f"[green]✓[/green] Quantize job: [bold]{result['job_id']}[/bold]")
    CONSOLE.print(f"[green]✓[/green] Quantized model: [bold]{result['model_id']}[/bold]")
    return str(result["model_id"])
 def _optimize_step(
    cfg: Config,
    config_path: str,
    from_job: str | None,
    model_s3_uri: str | None,
    onnx_path: str | None,
 ) -> str:
    st = state_ops.store(config_path)
    _validate_device(cfg)
    specs = _input_specs(cfg)
    try:
        source = _resolve_onnx_source(
            cfg,
            config_path,
            from_job=from_job,
            model_s3_uri=model_s3_uri,
            onnx_path=onnx_path,
        )
    except (FileNotFoundError, ValueError) as e:
        CONSOLE.print(f"[red]{e}[/red]")
        raise typer.Exit(1)
    try:
        hub_model = hub.upload_model(str(source.onnx_path), name=cfg.aihub.model_name)
        result = aihub_jobs.submit_compile_job(
            model=hub_model,
            device=cfg.aihub.device,
            input_specs=specs,
            target_runtime="onnx",
            job_name=_job_name(cfg, "optimize"),
        )
    except Exception as e:
        CONSOLE.print(f"[red]AI Hub ONNX optimization failed: {e}[/red]")
        raise typer.Exit(1)
    st.update(
        last_model_artifact=source.model_artifact,
        last_optimize_job_id=result["job_id"],
        last_optimized_model_id=result["model_id"],
    )
    CONSOLE.print(f"[green]✓[/green] ONNX optimization job: [bold]{result['job_id']}[/bold]")
    CONSOLE.print(f"[green]✓[/green] Optimized ONNX model: [bold]{result['model_id']}[/bold]")
    return str(result["model_id"])
 def _compile_step(
    cfg: Config,
    config_path: str,
    model_id: str | None,
    from_job: str | None,
    model_s3_uri: str | None,
    onnx_path: str | None,
    *,
-    prefer_quantized: bool,
+    model_id: str | None = None,
    from_job: str | None = None,
    model_s3_uri: str | None = None,
    onnx_path: str | None = None,
 ) -> str:
    st = state_ops.store(config_path)
    _validate_device(cfg)
    specs = _input_specs(cfg)
-
+    try:
-    model: Any
+        source = _resolve_model_source(
-    model_artifact: str | None = None
+            cfg,
-    has_explicit_source = bool(from_job or model_s3_uri or onnx_path)
+            config_path,
-    if model_id:
+            model_id=model_id,
-        model = model_id
+            previous_model_id=st.get_last_quantized_model_id(),
-    elif prefer_quantized and not has_explicit_source and st.get_last_quantized_model_id():
+            from_job=from_job,
-        model = st.get_last_quantized_model_id()
+            model_s3_uri=model_s3_uri,
-    else:
+            onnx_path=onnx_path,
-        try:
+        )
-            resolved = resolve_onnx(
+    except (FileNotFoundError, ValueError) as e:
-                cfg=cfg,
+        CONSOLE.print(f"[red]{e}[/red]")
-                output_dir=cfg.aihub.output_dir,
+        raise typer.Exit(1)
                from_job=from_job,
                model_s3_uri=model_s3_uri,
                onnx_path=onnx_path,
                last_training_job=st.get_last_training_job(),
            )
        except (FileNotFoundError, ValueError) as e:
            CONSOLE.print(f"[red]{e}[/red]")
            raise typer.Exit(1)
        model = resolved.onnx_path
        model_artifact = resolved.model_artifact
    try:
        hub_model = (
            hub.upload_model(str(source.model), name=cfg.aihub.model_name)
            if isinstance(source.model, Path)
            else hub.get_model(source.model)
        )
        result = aihub_jobs.submit_compile_job(
-            model=model,
+            model=hub_model,
            device=cfg.aihub.device,
            input_specs=specs,
            target_runtime=cfg.aihub.target_runtime,
            options=cfg.aihub.compile_options,
            job_name=_job_name(cfg, "compile"),
            model_name=cfg.aihub.model_name if isinstance(model, Path) else None,
        )
    except Exception as e:
        CONSOLE.print(f"[red]AI Hub compile failed: {e}[/red]")
@@ -229,8 +335,8 @@ def _compile_step(
        "last_compile_job_id": result["job_id"],
        "last_compiled_model_id": result["model_id"],
    }
-    if model_artifact:
+    if source.model_artifact:
-        updates["last_model_artifact"] = model_artifact
+        updates["last_model_artifact"] = source.model_artifact
    st.update(**updates)
    CONSOLE.print(f"[green]✓[/green] Compile job: [bold]{result['job_id']}[/bold]")
    CONSOLE.print(f"[green]✓[/green] Compiled model: [bold]{result['model_id']}[/bold]")
@@ -256,8 +362,9 @@ def _validate_step(
    run = datetime.now().strftime("%Y%m%d-%H%M%S")
    out_dir = Path(cfg.aihub.output_dir) / run / "validation"
    try:
        hub_model = hub.get_model(resolved_model_id)
        result = aihub_jobs.submit_inference_job(
-            resolved_model_id,
+            hub_model,
            cfg.aihub.device,
            inputs,
            out_dir,
@@ -281,8 +388,9 @@ def _profile_step(cfg: Config, config_path: str, model_id: str | None) -> str:
    _validate_device(cfg)
    resolved_model_id = _model_id_or_state(config_path, model_id)
    try:
        hub_model = hub.get_model(resolved_model_id)
        result = aihub_jobs.submit_profile_job(
-            resolved_model_id,
+            hub_model,
            cfg.aihub.device,
            cfg.aihub.profile_options,
            job_name=_job_name(cfg, "profile"),
@@ -295,9 +403,24 @@ def _profile_step(cfg: Config, config_path: str, model_id: str | None) -> str:
    return str(result["job_id"])
@app.command()
 def optimize(
    from_job: str | None = typer.Option(None, "--from-job", help="Training job name whose model artifact should optimize"),
    model_s3_uri: str | None = typer.Option(None, "--model-s3-uri", help="S3 URI of model.tar.gz to optimize"),
    onnx_path: str | None = typer.Option(
        None, "--onnx-path", help="Local ONNX path or ONNX path inside extracted artifact"
    ),
    config: str = CONFIG_OPT,
 ) -> None:
    """Optimize an external model into a Workbench-produced ONNX model."""
    cfg = load_cfg(config)
    _optimize_step(cfg, config, from_job, model_s3_uri, onnx_path)
@app.command()
 def quantize(
    calibration_path: Path = typer.Argument(..., help="Calibration .npz file or directory of .npy samples"),
    model_id: str | None = typer.Option(None, "--model-id", help="AI Hub optimized ONNX model ID"),
    from_job: str | None = typer.Option(None, "--from-job", help="Training job name whose model artifact should quantize"),
    model_s3_uri: str | None = typer.Option(None, "--model-s3-uri", help="S3 URI of model.tar.gz to quantize"),
    onnx_path: str | None = typer.Option(
@@ -307,7 +430,15 @@ def quantize(
 ) -> None:
    """Quantize an ONNX model to INT8."""
    cfg = load_cfg(config)
-    _quantize_step(cfg, config, calibration_path, from_job, model_s3_uri, onnx_path)
+    _quantize_step(
        cfg,
        config,
        calibration_path,
        model_id=model_id,
        from_job=from_job,
        model_s3_uri=model_s3_uri,
        onnx_path=onnx_path,
    )
@app.command()
@@ -322,7 +453,14 @@ def compile(
 ) -> None:
    """Compile a model for the configured Qualcomm AI Hub target."""
    cfg = load_cfg(config)
-    _compile_step(cfg, config, model_id, from_job, model_s3_uri, onnx_path, prefer_quantized=True)
+    _compile_step(
        cfg,
        config,
        model_id=model_id,
        from_job=from_job,
        model_s3_uri=model_s3_uri,
        onnx_path=onnx_path,
    )
@app.command()
@@ -351,7 +489,7 @@ def profile(
 def upload(
    calibration_path: Path = typer.Argument(..., help="Calibration .npz file or directory of .npy samples"),
    input_file: Path = typer.Argument(..., help="Validation .npz or .npy inputs to run on device"),
-    from_step: UploadStep = typer.Option(UploadStep.quantize, "--from-step", help="Resume from this Workbench step"),
+    from_step: UploadStep = typer.Option(UploadStep.optimize, "--from-step", help="Resume from this Workbench step"),
    from_job: str | None = typer.Option(None, "--from-job", help="Training job name whose model artifact should upload"),
    model_s3_uri: str | None = typer.Option(None, "--model-s3-uri", help="S3 URI of model.tar.gz to upload"),
    onnx_path: str | None = typer.Option(
@@ -360,25 +498,48 @@ def upload(
    input_name: str | None = typer.Option(None, "--input-name", help="Input name for .npy validation files"),
    config: str = CONFIG_OPT,
 ) -> None:
-    """Run the four Workbench upload steps: quantize, compile, validate, and profile."""
+    """Optimize, quantize, optionally compile, validate, and profile a model."""
    cfg = load_cfg(config)
-    steps = [UploadStep.quantize, UploadStep.compile, UploadStep.validate, UploadStep.profile]
+    steps = [UploadStep.optimize, UploadStep.quantize, UploadStep.compile, UploadStep.validate, UploadStep.profile]
    selected = steps[steps.index(from_step) :]
    optimized_model_id: str | None = None
    quantized_model_id: str | None = None
    compiled_model_id: str | None = None
    if UploadStep.optimize in selected:
        optimized_model_id = _optimize_step(cfg, config, from_job, model_s3_uri, onnx_path)
    if UploadStep.quantize in selected:
-        quantized_model_id = _quantize_step(cfg, config, calibration_path, from_job, model_s3_uri, onnx_path)
+        if UploadStep.optimize not in selected:
-    if UploadStep.compile in selected:
+            optimized_model_id = state_ops.store(config).get_last_optimized_model_id()
-        compiled_model_id = _compile_step(
+            if not optimized_model_id:
                CONSOLE.print(
                    "[red]No optimized ONNX model found. Resume from --from-step optimize or run "
                    "'qc-cli ai-hub optimize' first.[/red]"
                )
                raise typer.Exit(1)
        quantized_model_id = _quantize_step(
            cfg,
            config,
-            model_id=quantized_model_id,
+            calibration_path,
-            from_job=from_job,
+            model_id=optimized_model_id,
            model_s3_uri=model_s3_uri,
            onnx_path=onnx_path,
            prefer_quantized=True,
        )
    if UploadStep.compile in selected:
        if cfg.aihub.target_runtime == "onnx":
            compiled_model_id = quantized_model_id or state_ops.store(config).get_last_quantized_model_id()
            if not compiled_model_id:
                CONSOLE.print(
                    "[red]No quantized ONNX model found. Resume from --from-step quantize or run "
                    "'qc-cli ai-hub quantize' first.[/red]"
                )
                raise typer.Exit(1)
            state_ops.store(config).update(last_compiled_model_id=compiled_model_id)
            CONSOLE.print("[green]✓[/green] Target runtime is ONNX; skipping final compile.")
        else:
            compiled_model_id = _compile_step(
                cfg,
                config,
                model_id=quantized_model_id,
            )
    if UploadStep.validate in selected:
        _validate_step(cfg, config, input_file, compiled_model_id, input_name)
    if UploadStep.profile in selected:
--- a/src/qualcomm/aihub_jobs.py
+++ b/src/qualcomm/aihub_jobs.py
@@ -28,30 +28,19 @@ def _dataset_entries(inputs: dict[str, Any]) -> dict[str, list[Any]]:
 def submit_compile_job(
-    model: Any,
+    model: Model,
    device: Device,
    input_specs: dict[str, tuple[tuple[int, ...], str]],
    target_runtime: str,
    options: str | None = None,
    job_name: str | None = None,
    model_name: str | None = None,
 ) -> ModelJobResult:
    compile_options = f"--target_runtime {target_runtime}"
    if options:
        compile_options = f"{compile_options} {options}"
    model_arg = model
    if isinstance(model, Path):
        model_arg = str(model)
    elif isinstance(model, str):
        candidate = Path(model)
        model_arg = model if candidate.exists() or candidate.suffix else hub.get_model(model)
    if model_name and isinstance(model_arg, str) and Path(model_arg).exists():
        model_arg = hub.upload_model(model_arg, name=model_name)
    job = hub.submit_compile_job(
-        model=model_arg,
+        model=model,
        device=device,
        name=job_name,
        input_specs=input_specs,
@@ -64,14 +53,14 @@ def submit_compile_job(
 def submit_inference_job(
-    model_id: str,
+    model: Model,
    device: Device,
    inputs: dict[str, Any],
    output_dir: str | Path,
    job_name: str | None = None,
 ) -> InferenceJobResult:
    job = hub.submit_inference_job(
-        model=hub.get_model(model_id),
+        model=model,
        device=device,
        inputs=_dataset_entries(inputs),
        name=job_name,
@@ -83,13 +72,13 @@ def submit_inference_job(
 def submit_profile_job(
-    model_id: str,
+    model: Model,
    device: Device,
    options: str | None = None,
    job_name: str | None = None,
 ) -> ProfileJobResult:
    job = hub.submit_profile_job(
-        model=hub.get_model(model_id),
+        model=model,
        device=device,
        name=job_name,
        options=options or "",
@@ -98,17 +87,13 @@ def submit_profile_job(
 def submit_quantize_job(
-    model: str | Path,
+    model: Model,
    calibration_data: dict[str, Any],
    options: str | None = None,
    job_name: str | None = None,
    model_name: str | None = None,
 ) -> ModelJobResult:
    model_arg = str(model)
    if model_name and Path(model_arg).exists():
        model_arg = hub.upload_model(model_arg, name=model_name)
    job = hub.submit_quantize_job(
-        model=model_arg,
+        model=model,
        calibration_data=_dataset_entries(calibration_data),
        weights_dtype=QuantizeDtype.INT8,
        activations_dtype=QuantizeDtype.INT8,
--- a/src/state.py
+++ b/src/state.py
@@ -37,6 +37,10 @@ class CliStateStore:
        value = self.get("last_model_artifact")
        return str(value) if value else None
    def get_last_optimized_model_id(self) -> str | None:
        value = self.get("last_optimized_model_id")
        return str(value) if value else None
    def get_last_quantized_model_id(self) -> str | None:
        value = self.get("last_quantized_model_id")
        return str(value) if value else None