update ai-hub to first optimize model for Workbench

Remove old examples
2026-06-09 14:55:26 -04:00
parent 6c9f30d290
commit f26e8256f0
12 changed files with 260 additions and 700 deletions
--- a/README.md
+++ b/README.md
@@ -177,27 +177,35 @@ The expected output artifact is SageMaker’s `model.tar.gz`, normally containin
 ```
 qc-cli ai-hub upload <calibration.npz|calibration-dir> <inputs.npz|inputs.npy>
 qc-cli ai-hub upload <calibration> <inputs> --from-step validate
-qc-cli ai-hub quantize <calibration.npz|calibration-dir> [--onnx-path PATH] [--model-s3-uri URI] [--from-job NAME]
+qc-cli ai-hub optimize [--onnx-path PATH] [--model-s3-uri URI] [--from-job NAME]
+qc-cli ai-hub quantize <calibration.npz|calibration-dir> [--model-id ID] [--onnx-path PATH] [--model-s3-uri URI] [--from-job NAME]
 qc-cli ai-hub compile [--model-id ID] [--onnx-path PATH] [--model-s3-uri URI] [--from-job NAME]
 qc-cli ai-hub validate <inputs.npz|inputs.npy> [--model-id ID] [--input-name NAME]
 qc-cli ai-hub profile [--model-id ID]
 qc-cli ai-hub download [--model-id ID] [--output PATH]
 ```

-`ai-hub upload` runs the four Workbench upload steps in order: quantize, compile, validate, and profile. Use `--from-step compile`, `--from-step validate`, or `--from-step profile` to resume from saved local state after a completed earlier step.
+`ai-hub upload` optimizes to ONNX, quantizes, validates, and profiles. When `aihub.target_runtime` is not `onnx`, it
+also compiles the quantized model to that deployment runtime. The initial ONNX optimization gives external models
+Workbench provenance and applies compiler optimization passes before quantization.

 Resume behavior:

 ```text
--from-step quantize  Run quantize, compile, validate, and profile.
--from-step compile   Skip quantize; compile the last quantized model unless an explicit source is passed.
--from-step validate  Skip quantize and compile; validate the last compiled model.
--from-step profile   Skip quantize, compile, and validate; profile the last compiled model.
+--from-step optimize  Run optimize, quantize, optional final compile, validate, and profile.
+--from-step quantize  Quantize the last optimized ONNX, then optionally compile, validate, and profile.
+--from-step compile   Skip optimize and quantize; finalize the last quantized model for the target runtime.
+--from-step validate  Skip optimize, quantize, and compile; validate the last compiled model.
+--from-step profile   Skip optimize, quantize, compile, and validate; profile the last compiled model.
 ```

 When a step runs in the current command, `upload` passes its returned model ID directly to the next step. When a step is skipped, the next step resolves the needed model ID from `.qc-cli.json`. This avoids re-running earlier AI Hub jobs when you only need to continue from a later step.

-`ai-hub compile` resolves model sources in this order: `--model-id`, explicit source options (`--onnx-path`, `--model-s3-uri`, `--from-job`), last quantized model from state, then the last training job from local state. `ai-hub download` is separate because downloading the optimized artifact is outside the four-step Workbench upload loop.
+`ai-hub optimize` compiles an external model with `--target_runtime onnx`. `ai-hub quantize` uses an explicit
+`--model-id`, the last optimized ONNX model, or an explicit/local model source in that order. `ai-hub compile` resolves
+model sources in this order: `--model-id`, explicit source options, last quantized model, then the last training job.
+For `target_runtime: onnx`, upload treats the quantized ONNX as the final model and skips a redundant second compile.
+`ai-hub download` remains separate because downloading is outside the Workbench processing loop.

 AI Hub authentication currently uses the local `qai-hub` SDK configuration. A planned follow-up is to support AWS Systems Manager Parameter Store `SecureString` for team-managed tokens, where `config.yaml` stores only a parameter name such as `/qc-cli/aihub/token`, AWS KMS encrypts the token at rest, and the CLI retrieves it at runtime with `ssm:GetParameter` plus `kms:Decrypt` permissions.