4 Commits

Author SHA1 Message Date
58681cef82 command to create presigned URL for MLFlow 2026-05-27 10:52:08 -04:00
e1c8d6574f omit server name when created with config 2026-05-27 10:23:53 -04:00
35d25d8967 Merge branch 'main' into ml-flow 2026-05-27 08:58:46 -04:00
b907a74525 wip mlflow implementation 2026-05-26 15:03:53 -04:00
23 changed files with 262 additions and 1911 deletions

View File

@@ -65,18 +65,6 @@ sagemaker:
entry_point: null # Optional: script inside source_dir entry_point: null # Optional: script inside source_dir
source_dir: null # Optional: local dir packaged and uploaded automatically source_dir: null # Optional: local dir packaged and uploaded automatically
hyperparameters: {} hyperparameters: {}
aihub:
device:
name: Samsung Galaxy S25 (Family)
target_runtime: tflite
input_specs: {} # Required before running qc-cli ai-hub commands
job_name: null # Optional prefix for AI Hub Workbench jobs
model_name: null # Optional name for uploaded local ONNX models
compile_options: null
profile_options: null
quantize_options: null
output_dir: build/qai-hub
``` ```
`qc-cli init` generates the `infra.stack_name` and `s3.bucket` namespace once and writes it to `config.yaml`. Keep these values stable for a deployment; changing them points the CLI at different infrastructure. `qc-cli init` generates the `infra.stack_name` and `s3.bucket` namespace once and writes it to `config.yaml`. Keep these values stable for a deployment; changing them points the CLI at different infrastructure.
@@ -105,15 +93,21 @@ mlflow:
tracking_server_name: your-tracking-server-name tracking_server_name: your-tracking-server-name
``` ```
When MLflow is enabled, `train start` creates an MLflow run for the SageMaker job. `train status` finalizes that run once the job reaches a terminal state and registers completed model artifacts as experiment model versions using the `experiment-latest` MLflow alias. An experiment version is an immutable trained-source artifact; it records that training produced a model, not that the model is better than earlier versions or ready for release. Install the optional MLflow dependencies before enabling MLflow:
```bash
uv sync --extra mlflow
```
When MLflow is enabled, `train start` creates an MLflow run for the SageMaker job. `train status` finalizes that run once the job reaches a terminal state and registers completed model artifacts as pre-release model versions using the `prerelease-latest` MLflow alias.
To open the managed SageMaker MLflow UI, request a fresh presigned URL: To open the managed SageMaker MLflow UI, request a fresh presigned URL:
```bash ```bash
qc-cli mlflow open --config config.yaml qc-cli infra mlflow-url --config config.yaml
``` ```
This opens a browser to a fresh presigned URL. It works for `mode: create` and for `mode: existing` when the existing server is managed by Amazon SageMaker. In `create` mode, the command uses the CLI-managed tracking server name. In `existing` mode, it uses `mlflow.tracking_server_name`. If the existing MLflow server is external to SageMaker, open it with that server's own URL instead. This works for `mode: create` and for `mode: existing` when the existing server is managed by Amazon SageMaker. In `create` mode, the command uses the CLI-managed tracking server name. In `existing` mode, it uses `mlflow.tracking_server_name`. If the existing MLflow server is external to SageMaker, open it with that server's own URL instead.
## Commands ## Commands
@@ -125,12 +119,6 @@ qc-cli init --output <path> Write config to a custom path
qc-cli init --force Overwrite an existing config file qc-cli init --force Overwrite an existing config file
``` ```
### `mlflow`
```
qc-cli mlflow open Open a presigned MLflow UI URL in a browser
```
### `infra` ### `infra`
``` ```
@@ -138,6 +126,7 @@ qc-cli infra setup Deploy the CDK stack
qc-cli infra setup --no-bootstrap Deploy without running CDK bootstrap qc-cli infra setup --no-bootstrap Deploy without running CDK bootstrap
qc-cli infra setup --cloudformation-execution-policy <arn> Set CDK bootstrap execution policy ARN qc-cli infra setup --cloudformation-execution-policy <arn> Set CDK bootstrap execution policy ARN
qc-cli infra status Show CDK stack/resource status qc-cli infra status Show CDK stack/resource status
qc-cli infra mlflow-url Print a presigned MLflow UI URL
qc-cli infra destroy Destroy stack, retaining S3 data qc-cli infra destroy Destroy stack, retaining S3 data
qc-cli infra destroy --yes Destroy stack without confirmation qc-cli infra destroy --yes Destroy stack without confirmation
qc-cli infra destroy --delete-bucket-data Destroy stack and delete S3 data qc-cli infra destroy --delete-bucket-data Destroy stack and delete S3 data
@@ -172,74 +161,6 @@ qc-cli train list --limit 3 Show a custom number of recent jobs
The expected output artifact is SageMakers `model.tar.gz`, normally containing the trained model file your container writes to `/opt/ml/model`. The expected output artifact is SageMakers `model.tar.gz`, normally containing the trained model file your container writes to `/opt/ml/model`.
### `ai-hub`
```
qc-cli ai-hub upload <calibration.npz|calibration-dir> <inputs.npz|inputs.npy>
qc-cli ai-hub upload <calibration> <inputs> --from-step validate
qc-cli ai-hub quantize <calibration.npz|calibration-dir> [--onnx-path PATH] [--model-s3-uri URI] [--from-job NAME]
qc-cli ai-hub compile [--model-id ID] [--onnx-path PATH] [--model-s3-uri URI] [--from-job NAME]
qc-cli ai-hub validate <inputs.npz|inputs.npy> [--model-id ID] [--input-name NAME]
qc-cli ai-hub profile [--model-id ID]
qc-cli ai-hub download [--model-id ID] [--output PATH]
```
`ai-hub upload` runs the four Workbench upload steps in order: quantize, compile, validate, and profile. Use `--from-step compile`, `--from-step validate`, or `--from-step profile` to resume from saved local state after a completed earlier step.
Resume behavior:
```text
--from-step quantize Run quantize, compile, validate, and profile.
--from-step compile Skip quantize; compile the last quantized model unless an explicit source is passed.
--from-step validate Skip quantize and compile; validate the last compiled model.
--from-step profile Skip quantize, compile, and validate; profile the last compiled model.
```
When a step runs in the current command, `upload` passes its returned model ID directly to the next step. When a step is skipped, the next step resolves the needed model ID from `.qc-cli.json`. This avoids re-running earlier AI Hub jobs when you only need to continue from a later step.
`ai-hub compile` resolves model sources in this order: `--model-id`, explicit source options (`--onnx-path`, `--model-s3-uri`, `--from-job`), last quantized model from state, then the last training job from local state. `ai-hub download` is separate because downloading the optimized artifact is outside the four-step Workbench upload loop.
When MLflow is enabled, AI Hub job-producing commands (`quantize`, `compile`, `validate`, `profile`, and `upload`) log AI Hub metadata to MLflow. Each command execution receives a `qc_cli.aihub_submission_id`; all steps inside one `ai-hub upload` share that submission ID. Runs are nested under the MLflow run for the resolved source model when the CLI can prove that source from local state, such as `--from-job` or a model produced by a prior tracked AI Hub step. Otherwise, AI Hub runs are standalone. `validate` also logs output summaries, and `profile` logs profile metrics plus the raw profile JSON. `ai-hub download` does not create an MLflow run because it does not submit or measure an AI Hub job.
AI Hub authentication currently uses the local `qai-hub` SDK configuration. A planned follow-up is to support AWS Systems Manager Parameter Store `SecureString` for team-managed tokens, where `config.yaml` stores only a parameter name such as `/qc-cli/aihub/token`, AWS KMS encrypts the token at rest, and the CLI retrieves it at runtime with `ssm:GetParameter` plus `kms:Decrypt` permissions.
## Model lifecycle
The CLI uses neutral experiment naming for trained artifacts and reserves release terminology for an explicit promotion step.
Current behavior:
1. `qc-cli train start` submits a SageMaker training job.
2. `qc-cli train status` finalizes the MLflow run after the job reaches a terminal state.
3. If the job completed and `mlflow.register_trained_models` is enabled, the SageMaker `model.tar.gz` is registered as a new MLflow model version with:
- `qc_cli.stage=experiment`
- `qc_cli.artifact_kind=trained_source`
- `qc_cli.source=sagemaker`
4. The MLflow alias `experiment-latest` points at the most recently registered experiment version.
5. AI Hub upload commands create deployable derived artifacts from a trained-source experiment or local ONNX model.
Future release aliases such as `v1` or `production` can point at a selected deployable artifact.
Example future metadata:
```text
qc-cli-model version 12
qc_cli.stage=experiment
qc_cli.artifact_kind=trained_source
qc_cli.source=sagemaker
qc-cli-model-aihub version 3
qc_cli.stage=ai_hub_compiled
qc_cli.artifact_kind=deployable
qc_cli.parent_registered_model_name=qc-cli-model
qc_cli.parent_model_version=12
qc_cli.runtime=tflite
qc_cli.quantization=int8
qc_cli.target_device=Samsung Galaxy S25
```
In that flow, `experiment-latest` remains a training convenience alias. Release selection is a separate promotion decision based on the derived artifact, not on the experiment name.
## AWS permissions required ## AWS permissions required
The IAM user or role running the CLI needs: The IAM user or role running the CLI needs:

View File

@@ -1,118 +0,0 @@
# Qualcomm AI Hub Example
This example takes the ONNX model produced by the SageMaker training example and runs the Qualcomm AI Hub upload workflow:
1. Quantize
2. Compile
3. Validate
4. Profile
5. Download the compiled artifact
## Prerequisites
Run the training example first and wait for it to complete:
```bash
bash examples/training/run_training.sh --config config.yaml --wait
```
If the dataset is already uploaded to S3, use:
```bash
bash examples/training/run_training.sh --config config.yaml --skip-upload --wait
```
The training artifact must contain a static-shape `model.onnx`. The training example exports an input named `input` with shape `1x3x160x160`.
Your `config.yaml` must include AI Hub settings:
```yaml
aihub:
device:
name: Samsung Galaxy S25 (Family)
target_runtime: tflite
input_specs:
input: [[1, 3, 160, 160], float32]
output_dir: build/qai-hub
```
You also need local Qualcomm AI Hub SDK authentication configured.
## Prepare Inputs
AI Hub does not consume the raw JPG training images directly. It needs NumPy tensors that match the ONNX model input shape and preprocessing.
Generate calibration and validation inputs:
```bash
uv run python examples/ai-hub/prepare_inputs.py
```
This writes:
```text
examples/training/data/aihub_calibration/*.npy
examples/training/data/inputs.npz
```
The script applies the same image preprocessing used by the training example:
- resize to `160x160`
- convert to channel-first `1x3x160x160`
- normalize with ImageNet mean and standard deviation
Useful options:
```bash
uv run python examples/ai-hub/prepare_inputs.py \
--dataset-dir examples/training/data/flower_photos_sagemaker \
--calibration-dir examples/training/data/aihub_calibration \
--input-file examples/training/data/inputs.npz \
--samples 16
```
## Run AI Hub
After training completes and inputs are prepared:
```bash
bash examples/ai-hub/run_ai_hub.sh --config config.yaml
```
By default, the script uses the last SageMaker training job recorded in `.qc-cli.json`. It downloads that job's `model.tar.gz`, extracts `model.onnx`, runs the AI Hub workflow, and downloads the compiled artifact.
To use a specific training job:
```bash
bash examples/ai-hub/run_ai_hub.sh \
--config config.yaml \
--from-job qc-cli-YYYYMMDD-HHMMSS
```
To resume from a later Workbench step:
```bash
bash examples/ai-hub/run_ai_hub.sh \
--config config.yaml \
--from-step validate
```
To skip downloading the compiled artifact:
```bash
bash examples/ai-hub/run_ai_hub.sh \
--config config.yaml \
--skip-download
```
## Troubleshooting
If AI Hub reports dynamic input shapes, rerun training with the current training source. AI Hub quantization requires the exported ONNX model to use static input shapes.
If `run_ai_hub.sh` reports missing calibration or input files, run:
```bash
uv run python examples/ai-hub/prepare_inputs.py
```
If validation fails with a missing input name, make sure `config.yaml` and the generated `.npz` both use `input` as the input name.

View File

@@ -1,74 +0,0 @@
#!/usr/bin/env python3
"""Prepare Qualcomm AI Hub calibration and validation inputs for the training example."""
from __future__ import annotations
import argparse
from pathlib import Path
import numpy as np
from PIL import Image
IMAGE_EXTENSIONS = {".jpg", ".jpeg", ".png"}
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--dataset-dir",
type=Path,
default=Path("examples/training/data/flower_photos_sagemaker"),
help="ImageFolder-style dataset used for training.",
)
parser.add_argument(
"--calibration-dir",
type=Path,
default=Path("examples/training/data/aihub_calibration"),
help="Directory where .npy calibration samples will be written.",
)
parser.add_argument(
"--input-file",
type=Path,
default=Path("examples/training/data/inputs.npz"),
help="Validation .npz input file for qc-cli ai-hub validate.",
)
parser.add_argument("--input-name", default="input", help="ONNX input name.")
parser.add_argument("--image-size", type=int, default=160, help="Square image size used by training.")
parser.add_argument("--samples", type=int, default=16, help="Number of calibration samples to write.")
return parser.parse_args()
def preprocess_image(path: Path, image_size: int) -> np.ndarray:
image = Image.open(path).convert("RGB").resize((image_size, image_size), Image.Resampling.BILINEAR)
array = np.asarray(image, dtype=np.float32) / 255.0
array = np.transpose(array, (2, 0, 1))
mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)[:, None, None]
std = np.array([0.229, 0.224, 0.225], dtype=np.float32)[:, None, None]
return ((array - mean) / std)[None, ...].astype("float32")
def main() -> None:
args = parse_args()
images = sorted(p for p in args.dataset_dir.rglob("*") if p.suffix.lower() in IMAGE_EXTENSIONS)
if not images:
raise SystemExit(f"No images found under {args.dataset_dir}")
if args.samples < 1:
raise SystemExit("--samples must be at least 1")
args.calibration_dir.mkdir(parents=True, exist_ok=True)
args.input_file.parent.mkdir(parents=True, exist_ok=True)
sample_count = min(args.samples, len(images))
prepared = []
for index, image_path in enumerate(images[:sample_count]):
sample = preprocess_image(image_path, args.image_size)
np.save(args.calibration_dir / f"sample_{index:03d}.npy", sample)
prepared.append(sample)
np.savez(args.input_file, **{args.input_name: prepared[0]})
print(f"Wrote {sample_count} calibration samples to {args.calibration_dir}")
print(f"Wrote validation input to {args.input_file}")
if __name__ == "__main__":
main()

View File

@@ -1,156 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
CONFIG_PATH="config.yaml"
CALIBRATION_PATH="examples/training/data/aihub_calibration"
INPUT_FILE="examples/training/data/inputs.npz"
FROM_STEP="quantize"
FROM_JOB=""
MODEL_S3_URI=""
ONNX_PATH=""
INPUT_NAME=""
DOWNLOAD=true
OUTPUT_PATH=""
usage() {
cat <<EOF
Usage: $0 [options]
Options:
--config PATH Path to qc-cli config file. Default: config.yaml
--calibration PATH Calibration .npz file or directory of .npy samples.
Default: ${CALIBRATION_PATH}
--input-file PATH Validation .npz or .npy inputs. Default: ${INPUT_FILE}
--from-step STEP Resume upload from: quantize, compile, validate, profile.
Default: ${FROM_STEP}
--from-job NAME SageMaker training job whose model artifact should upload.
Defaults to the last training job in local qc-cli state.
--model-s3-uri URI S3 URI of model.tar.gz to upload.
--onnx-path PATH Local ONNX path or ONNX path inside extracted artifact.
--input-name NAME Input name for .npy validation files.
--skip-download Do not download the compiled AI Hub artifact after upload.
--output PATH Destination file for ai-hub download.
-h, --help Show this help.
EOF
}
while [[ $# -gt 0 ]]; do
case "$1" in
--config)
CONFIG_PATH="$2"
shift 2
;;
--calibration)
CALIBRATION_PATH="$2"
shift 2
;;
--input-file)
INPUT_FILE="$2"
shift 2
;;
--from-step)
FROM_STEP="$2"
shift 2
;;
--from-job)
FROM_JOB="$2"
shift 2
;;
--model-s3-uri)
MODEL_S3_URI="$2"
shift 2
;;
--onnx-path)
ONNX_PATH="$2"
shift 2
;;
--input-name)
INPUT_NAME="$2"
shift 2
;;
--skip-download)
DOWNLOAD=false
shift
;;
--output)
OUTPUT_PATH="$2"
shift 2
;;
-h|--help)
usage
exit 0
;;
*)
echo "Unknown option: $1" >&2
usage >&2
exit 1
;;
esac
done
if [[ ! -f "${CONFIG_PATH}" ]]; then
echo "Config not found: ${CONFIG_PATH}" >&2
exit 1
fi
case "${FROM_STEP}" in
quantize|compile|validate|profile)
;;
*)
echo "--from-step must be one of: quantize, compile, validate, profile" >&2
exit 1
;;
esac
if [[ ! -e "${CALIBRATION_PATH}" ]]; then
echo "Calibration path not found: ${CALIBRATION_PATH}" >&2
echo "Pass --calibration with a .npz file or directory of .npy samples." >&2
exit 1
fi
if [[ ! -f "${INPUT_FILE}" ]]; then
echo "Input file not found: ${INPUT_FILE}" >&2
echo "Pass --input-file with a validation .npz or .npy file." >&2
exit 1
fi
run() {
echo "+ $*"
"$@"
}
UPLOAD_ARGS=(
"${CALIBRATION_PATH}"
"${INPUT_FILE}"
--from-step "${FROM_STEP}"
--config "${CONFIG_PATH}"
)
if [[ -n "${FROM_JOB}" ]]; then
UPLOAD_ARGS+=(--from-job "${FROM_JOB}")
fi
if [[ -n "${MODEL_S3_URI}" ]]; then
UPLOAD_ARGS+=(--model-s3-uri "${MODEL_S3_URI}")
fi
if [[ -n "${ONNX_PATH}" ]]; then
UPLOAD_ARGS+=(--onnx-path "${ONNX_PATH}")
fi
if [[ -n "${INPUT_NAME}" ]]; then
UPLOAD_ARGS+=(--input-name "${INPUT_NAME}")
fi
run uv run qc-cli ai-hub upload "${UPLOAD_ARGS[@]}"
if [[ "${DOWNLOAD}" == false ]]; then
exit 0
fi
DOWNLOAD_ARGS=(--config "${CONFIG_PATH}")
if [[ -n "${OUTPUT_PATH}" ]]; then
DOWNLOAD_ARGS+=(--output "${OUTPUT_PATH}")
fi
run uv run qc-cli ai-hub download "${DOWNLOAD_ARGS[@]}"

View File

@@ -126,6 +126,10 @@ def export_onnx(model: nn.Module, model_dir: Path, image_size: int) -> None:
do_constant_folding=True, do_constant_folding=True,
input_names=["input"], input_names=["input"],
output_names=["logits"], output_names=["logits"],
dynamic_axes={
"input": {0: "batch_size"},
"logits": {0: "batch_size"},
},
) )

View File

@@ -5,18 +5,20 @@ build-backend = "hatchling.build"
[project] [project]
name = "qc-cli" name = "qc-cli"
version = "0.1.0" version = "0.1.0"
description = "CLI for training and deploying models for Qualcomm AI Hub" description = "CLI for SageMaker ONNX training and Qualcomm AI Hub optimization"
requires-python = ">=3.13" requires-python = ">=3.13"
dependencies = [ dependencies = [
"aws-cdk-lib>=2.180.0", "aws-cdk-lib>=2.180.0",
"typer==0.25.0", "typer==0.25.0",
"boto3>=1.34,<1.42", "boto3>=1.34,<1.42",
"constructs>=10.0.0", "constructs>=10.0.0",
"mlflow>=3.0",
"numpy>=1.26",
"pydantic>=2.13.3", "pydantic>=2.13.3",
"pyyaml>=6.0.3", "pyyaml>=6.0.3",
"qai-hub>=0.49.0", ]
[project.optional-dependencies]
mlflow = [
"mlflow>=3.0",
"sagemaker-mlflow>=0.4.0", "sagemaker-mlflow>=0.4.0",
] ]
@@ -29,6 +31,7 @@ packages = ["src"]
[dependency-groups] [dependency-groups]
dev = [ dev = [
"boto3-stubs[iam,s3,sagemaker]", "boto3-stubs[iam,s3,sagemaker]",
"pytest>=8.0",
"pyright>=1.1.409", "pyright>=1.1.409",
"types-PyYAML", "types-PyYAML",
"ruff>=0.4", "ruff>=0.4",

View File

@@ -21,24 +21,6 @@ def upload_file(
return f"s3://{bucket}/{s3_key}" return f"s3://{bucket}/{s3_key}"
def download_file(
region: str,
profile: str,
s3_uri: str,
local_path: str,
) -> str:
if not s3_uri.startswith("s3://"):
raise ValueError(f"Expected S3 URI, got: {s3_uri}")
bucket_key = s3_uri.removeprefix("s3://")
bucket, _, key = bucket_key.partition("/")
if not bucket or not key:
raise ValueError(f"Expected S3 URI with bucket and key, got: {s3_uri}")
dest = Path(local_path)
dest.parent.mkdir(parents=True, exist_ok=True)
_client(region, profile).download_file(bucket, key, str(dest))
return str(dest)
def upload_dir( def upload_dir(
region: str, region: str,
profile: str, profile: str,

View File

@@ -121,16 +121,6 @@ def get_training_job_status(session: Boto3SessionKwargs, job_name: str) -> Train
) )
def get_model_artifacts(region: str, profile: str, job_name: str) -> str:
resp = boto3.Session(profile_name=profile, region_name=region).client("sagemaker").describe_training_job(
TrainingJobName=job_name
)
artifact = resp.get("ModelArtifacts", {}).get("S3ModelArtifacts")
if not artifact:
raise RuntimeError(f"Training job '{job_name}' does not have model artifacts yet.")
return str(artifact)
def list_training_jobs( def list_training_jobs(
session: Boto3SessionKwargs, session: Boto3SessionKwargs,
max_results: int = 10, max_results: int = 10,

View File

@@ -1,683 +0,0 @@
import json
from collections.abc import Mapping, Sequence
from dataclasses import asdict, dataclass
from datetime import datetime
from enum import StrEnum
from pathlib import Path
from typing import Any
from uuid import uuid4
import qai_hub.hub as hub
import typer
from qai_hub.client import Device
from src import state as state_ops
from src.commands.utils import CONFIG_OPT, CONSOLE, load_cfg
from src.config import Config
from src.qualcomm import aihub_jobs
from src.qualcomm.artifacts import resolve_onnx
from src.tracking.mlflow import AIHubSourceProvenance, AIHubStepRecord, MlflowTracker, Tracker
app = typer.Typer(help="Quantize, compile, validate, profile, and download models with Qualcomm AI Hub")
_RUNTIME_EXTENSIONS = {
"tflite": "tflite",
"qnn_context_binary": "bin",
"onnx": "onnx",
}
class UploadStep(StrEnum):
quantize = "quantize"
compile = "compile"
validate = "validate"
profile = "profile"
@dataclass(frozen=True)
class AIHubStepResult:
job: Any
job_id: str
model_id: str | None = None
output_dir: Path | None = None
outputs: Mapping[str, Any] | None = None
profile: Mapping[str, Any] | None = None
def _input_specs(cfg: Config) -> dict[str, tuple[tuple[int, ...], str]]:
specs = {name: (tuple(shape), dtype) for name, (shape, dtype) in cfg.aihub.input_specs.items()}
if not specs:
CONSOLE.print("[red]aihub.input_specs must define at least one input.[/red]")
raise typer.Exit(1)
return specs
def _load_inputs(
input_file: Path,
specs: Mapping[str, tuple[Sequence[int], str]],
input_name: str | None = None,
) -> dict[str, Any]:
import numpy as np
if not input_file.exists():
raise FileNotFoundError(f"File not found: {input_file}")
if input_file.suffix == ".npz":
loaded = np.load(input_file)
missing = set(specs) - set(loaded.files)
if missing:
raise ValueError(f"Missing input(s) in NPZ: {', '.join(sorted(missing))}")
return {name: loaded[name] for name in specs}
if input_file.suffix == ".npy":
if input_name is None:
if len(specs) != 1:
raise ValueError("--input-name is required when config has multiple inputs")
input_name = next(iter(specs))
if input_name not in specs:
raise ValueError(f"Input name '{input_name}' is not defined in aihub.input_specs")
return {input_name: np.load(input_file)}
raise ValueError("Input file must be .npz or .npy")
def _load_calibration(path: Path, specs: Mapping[str, tuple[Sequence[int], str]]) -> dict[str, Any]:
import numpy as np
if path.is_file():
return _load_inputs(path, specs)
if not path.is_dir():
raise FileNotFoundError(f"Calibration path not found: {path}")
if len(specs) != 1:
raise ValueError("Directory calibration data is supported only for single-input models.")
input_name = next(iter(specs))
samples = [np.load(p) for p in sorted(path.glob("*.npy"))]
if not samples:
raise ValueError(f"No .npy calibration samples found in {path}")
return {input_name: samples}
def _job_name(cfg: Config, operation: str) -> str | None:
if not cfg.aihub.job_name:
return None
return f"{cfg.aihub.job_name}-{operation}"
def _model_id_or_state(config_path: str, model_id: str | None, *, quantized: bool = False) -> str:
st = state_ops.store(config_path)
resolved = model_id or (st.get_last_quantized_model_id() if quantized else st.get_last_compiled_model_id())
if not resolved:
source = "quantized" if quantized else "compiled"
CONSOLE.print(f"[red]No {source} model found. Pass --model-id or run the previous AI Hub step first.[/red]")
raise typer.Exit(1)
return resolved
def _device_selector(device: Device) -> str:
parts: list[str] = []
if device.name:
parts.append(f"name={device.name!r}")
if device.os:
parts.append(f"os={device.os!r}")
if device.attributes:
parts.append(f"attributes={device.attributes!r}")
return ", ".join(parts) if parts else "empty selector"
def _submission_id() -> str:
return f"{datetime.now().strftime('%Y%m%d-%H%M%S')}-{uuid4().hex[:8]}"
def _tracker(cfg: Config) -> Tracker:
try:
return MlflowTracker.from_config(cfg)
except Exception as e:
CONSOLE.print(f"[red]MLflow setup failed: {e}[/red]")
raise typer.Exit(1)
def _training_parent_run_id(config_path: str, training_job: str | None) -> str | None:
if not training_job:
return None
run_id = state_ops.store(config_path).get_training_job(training_job).get("mlflow_run_id")
return str(run_id) if run_id else None
def _source_to_state(source: AIHubSourceProvenance) -> dict[str, Any]:
return {key: value for key, value in asdict(source).items() if value is not None}
def _source_from_state(value: Mapping[str, Any]) -> AIHubSourceProvenance:
return AIHubSourceProvenance(
kind=str(value.get("kind", "aihub_model")),
parent_run_id=str(value["parent_run_id"]) if value.get("parent_run_id") else None,
uri=str(value["uri"]) if value.get("uri") else None,
path=str(value["path"]) if value.get("path") else None,
aihub_model_id=str(value["aihub_model_id"]) if value.get("aihub_model_id") else None,
training_job=str(value["training_job"]) if value.get("training_job") else None,
)
def _source_for_aihub_model(config_path: str, model_id: str) -> AIHubSourceProvenance:
stored = state_ops.store(config_path).get_aihub_model_provenance(model_id)
if stored:
return _source_from_state(stored)
return AIHubSourceProvenance(kind="aihub_model", aihub_model_id=model_id)
def _source_for_resolved_onnx(
config_path: str,
*,
resolved_path: Path,
model_artifact: str | None,
from_job: str | None,
model_s3_uri: str | None,
onnx_path: str | None,
implicit_training_job: str | None,
implicit_model_artifact: str | None,
) -> AIHubSourceProvenance:
if onnx_path and Path(onnx_path).exists() and not from_job and not model_s3_uri:
return AIHubSourceProvenance(kind="local_onnx", path=str(resolved_path))
training_job = from_job
if not training_job and model_artifact and implicit_model_artifact and model_artifact == implicit_model_artifact:
training_job = implicit_training_job
if not training_job and not model_s3_uri and not onnx_path:
training_job = implicit_training_job
return AIHubSourceProvenance(
kind="sagemaker_model_artifact" if model_artifact else "local_onnx",
parent_run_id=_training_parent_run_id(config_path, training_job),
uri=model_artifact,
path=str(resolved_path) if not model_artifact else None,
training_job=training_job,
)
def _model_id_or_state_with_source(
config_path: str,
model_id: str | None,
*,
quantized: bool = False,
) -> tuple[str, AIHubSourceProvenance]:
resolved_model_id = _model_id_or_state(config_path, model_id, quantized=quantized)
return resolved_model_id, _source_for_aihub_model(config_path, resolved_model_id)
def _record_step(
cfg: Config,
tracker: Tracker,
*,
result: AIHubStepResult,
source: AIHubSourceProvenance,
step: str,
submission_id: str,
command: str,
options: str | None = None,
) -> None:
tracker.record_aihub_step(
AIHubStepRecord(
step=step,
submission_id=submission_id,
command=command,
source=source,
job=result.job,
job_id=result.job_id,
model_id=result.model_id,
target_runtime=cfg.aihub.target_runtime,
device=_device_selector(cfg.aihub.device),
options=options,
output_dir=result.output_dir,
outputs=result.outputs,
profile=result.profile,
)
)
def _validate_device(cfg: Config) -> None:
device = cfg.aihub.device
try:
matches = hub.get_devices(name=device.name, os=device.os, attributes=device.attributes)
except Exception as e:
CONSOLE.print(f"[red]Unable to validate AI Hub device {_device_selector(device)}: {e}[/red]")
raise typer.Exit(1)
if matches:
return
CONSOLE.print(f"[red]AI Hub device not found: {_device_selector(device)}[/red]")
CONSOLE.print("Run [bold]qai-hub list-devices[/bold] to see valid device names.")
raise typer.Exit(1)
def _quantize_step(
cfg: Config,
config_path: str,
calibration_path: Path,
from_job: str | None,
model_s3_uri: str | None,
onnx_path: str | None,
tracker: Tracker,
submission_id: str,
) -> AIHubStepResult:
st = state_ops.store(config_path)
specs = _input_specs(cfg)
implicit_training_job = st.get_last_training_job()
implicit_model_artifact = st.get_last_model_artifact()
try:
resolved = resolve_onnx(
cfg=cfg,
output_dir=cfg.aihub.output_dir,
from_job=from_job,
model_s3_uri=model_s3_uri or implicit_model_artifact,
onnx_path=onnx_path,
last_training_job=implicit_training_job,
)
calibration_data = _load_calibration(calibration_path, specs)
except (FileNotFoundError, ValueError) as e:
CONSOLE.print(f"[red]{e}[/red]")
raise typer.Exit(1)
source = _source_for_resolved_onnx(
config_path,
resolved_path=resolved.onnx_path,
model_artifact=resolved.model_artifact,
from_job=from_job,
model_s3_uri=model_s3_uri,
onnx_path=onnx_path,
implicit_training_job=implicit_training_job,
implicit_model_artifact=implicit_model_artifact,
)
try:
result = aihub_jobs.submit_quantize_job(
resolved.onnx_path,
calibration_data,
cfg.aihub.quantize_options,
job_name=_job_name(cfg, "quantize"),
model_name=cfg.aihub.model_name,
)
except Exception as e:
CONSOLE.print(f"[red]AI Hub quantize failed: {e}[/red]")
raise typer.Exit(1)
st.update(
last_model_artifact=resolved.model_artifact,
last_quantize_job_id=result["job_id"],
last_quantized_model_id=result["model_id"],
)
st.update_aihub_model_provenance(str(result["model_id"]), _source_to_state(source))
step_result = AIHubStepResult(
job=result["job"],
job_id=str(result["job_id"]),
model_id=str(result["model_id"]),
)
_record_step(
cfg,
tracker,
result=step_result,
source=source,
step="quantize",
submission_id=submission_id,
command="ai-hub quantize",
options=cfg.aihub.quantize_options,
)
CONSOLE.print(f"[green]✓[/green] Quantize job: [bold]{result['job_id']}[/bold]")
CONSOLE.print(f"[green]✓[/green] Quantized model: [bold]{result['model_id']}[/bold]")
return step_result
def _compile_step(
cfg: Config,
config_path: str,
model_id: str | None,
from_job: str | None,
model_s3_uri: str | None,
onnx_path: str | None,
*,
prefer_quantized: bool,
tracker: Tracker,
submission_id: str,
) -> AIHubStepResult:
st = state_ops.store(config_path)
_validate_device(cfg)
specs = _input_specs(cfg)
model: Any
model_artifact: str | None = None
source: AIHubSourceProvenance
has_explicit_source = bool(from_job or model_s3_uri or onnx_path)
if model_id:
model = model_id
source = _source_for_aihub_model(config_path, model_id)
elif prefer_quantized and not has_explicit_source and st.get_last_quantized_model_id():
model = st.get_last_quantized_model_id()
source = _source_for_aihub_model(config_path, str(model))
else:
implicit_training_job = st.get_last_training_job()
try:
resolved = resolve_onnx(
cfg=cfg,
output_dir=cfg.aihub.output_dir,
from_job=from_job,
model_s3_uri=model_s3_uri,
onnx_path=onnx_path,
last_training_job=implicit_training_job,
)
except (FileNotFoundError, ValueError) as e:
CONSOLE.print(f"[red]{e}[/red]")
raise typer.Exit(1)
model = resolved.onnx_path
model_artifact = resolved.model_artifact
source = _source_for_resolved_onnx(
config_path,
resolved_path=resolved.onnx_path,
model_artifact=resolved.model_artifact,
from_job=from_job,
model_s3_uri=model_s3_uri,
onnx_path=onnx_path,
implicit_training_job=implicit_training_job,
implicit_model_artifact=st.get_last_model_artifact(),
)
try:
result = aihub_jobs.submit_compile_job(
model=model,
device=cfg.aihub.device,
input_specs=specs,
target_runtime=cfg.aihub.target_runtime,
options=cfg.aihub.compile_options,
job_name=_job_name(cfg, "compile"),
model_name=cfg.aihub.model_name if isinstance(model, Path) else None,
)
except Exception as e:
CONSOLE.print(f"[red]AI Hub compile failed: {e}[/red]")
raise typer.Exit(1)
updates: dict[str, Any] = {
"last_compile_job_id": result["job_id"],
"last_compiled_model_id": result["model_id"],
}
if model_artifact:
updates["last_model_artifact"] = model_artifact
st.update(**updates)
st.update_aihub_model_provenance(str(result["model_id"]), _source_to_state(source))
step_result = AIHubStepResult(
job=result["job"],
job_id=str(result["job_id"]),
model_id=str(result["model_id"]),
)
_record_step(
cfg,
tracker,
result=step_result,
source=source,
step="compile",
submission_id=submission_id,
command="ai-hub compile",
options=cfg.aihub.compile_options,
)
CONSOLE.print(f"[green]✓[/green] Compile job: [bold]{result['job_id']}[/bold]")
CONSOLE.print(f"[green]✓[/green] Compiled model: [bold]{result['model_id']}[/bold]")
return step_result
def _validate_step(
cfg: Config,
config_path: str,
input_file: Path,
model_id: str | None,
input_name: str | None,
tracker: Tracker,
submission_id: str,
) -> AIHubStepResult:
_validate_device(cfg)
specs = _input_specs(cfg)
resolved_model_id, source = _model_id_or_state_with_source(config_path, model_id)
try:
inputs = _load_inputs(input_file, specs, input_name)
except (FileNotFoundError, ValueError) as e:
CONSOLE.print(f"[red]{e}[/red]")
raise typer.Exit(1)
run = datetime.now().strftime("%Y%m%d-%H%M%S")
out_dir = Path(cfg.aihub.output_dir) / run / "validation"
try:
result = aihub_jobs.submit_inference_job(
resolved_model_id,
cfg.aihub.device,
inputs,
out_dir,
job_name=_job_name(cfg, "validate"),
)
except Exception as e:
CONSOLE.print(f"[red]AI Hub inference failed: {e}[/red]")
raise typer.Exit(1)
state_ops.store(config_path).update(last_inference_job_id=result["job_id"])
outputs = result.get("outputs")
step_result = AIHubStepResult(
job=result["job"],
job_id=str(result["job_id"]),
model_id=resolved_model_id,
output_dir=out_dir,
outputs=outputs if isinstance(outputs, Mapping) else None,
)
_record_step(
cfg,
tracker,
result=step_result,
source=source,
step="validate",
submission_id=submission_id,
command="ai-hub validate",
)
CONSOLE.print(f"[green]✓[/green] Inference job: [bold]{result['job_id']}[/bold]")
if isinstance(outputs, dict):
for name, value in outputs.items():
CONSOLE.print(f" {name}: shape={getattr(value, 'shape', '?')}")
CONSOLE.print(f"Outputs: [cyan]{out_dir}[/cyan]")
return step_result
def _profile_step(
cfg: Config,
config_path: str,
model_id: str | None,
tracker: Tracker,
submission_id: str,
) -> AIHubStepResult:
_validate_device(cfg)
resolved_model_id, source = _model_id_or_state_with_source(config_path, model_id)
try:
result = aihub_jobs.submit_profile_job(
resolved_model_id,
cfg.aihub.device,
cfg.aihub.profile_options,
job_name=_job_name(cfg, "profile"),
)
except Exception as e:
CONSOLE.print(f"[red]AI Hub profile failed: {e}[/red]")
raise typer.Exit(1)
run = datetime.now().strftime("%Y%m%d-%H%M%S")
out_dir = Path(cfg.aihub.output_dir) / run / "profile"
try:
out_dir.mkdir(parents=True, exist_ok=True)
profile_data = result["job"].download_profile()
if isinstance(profile_data, Mapping):
(out_dir / "profile.json").write_text(json.dumps(profile_data, indent=2), encoding="utf-8")
else:
profile_data = {}
except Exception as e:
CONSOLE.print(f"[red]AI Hub profile download failed: {e}[/red]")
raise typer.Exit(1)
state_ops.store(config_path).update(last_profile_job_id=result["job_id"])
step_result = AIHubStepResult(
job=result["job"],
job_id=str(result["job_id"]),
model_id=resolved_model_id,
output_dir=out_dir,
profile=profile_data,
)
_record_step(
cfg,
tracker,
result=step_result,
source=source,
step="profile",
submission_id=submission_id,
command="ai-hub profile",
options=cfg.aihub.profile_options,
)
CONSOLE.print(f"[green]✓[/green] Profile job: [bold]{result['job_id']}[/bold]")
CONSOLE.print(f"Profile: [cyan]{out_dir}[/cyan]")
return step_result
@app.command()
def quantize(
calibration_path: Path = typer.Argument(..., help="Calibration .npz file or directory of .npy samples"),
from_job: str | None = typer.Option(None, "--from-job", help="Training job name whose model artifact should quantize"),
model_s3_uri: str | None = typer.Option(None, "--model-s3-uri", help="S3 URI of model.tar.gz to quantize"),
onnx_path: str | None = typer.Option(
None, "--onnx-path", help="Local ONNX path or ONNX path inside extracted artifact"
),
config: str = CONFIG_OPT,
) -> None:
"""Quantize an ONNX model to INT8."""
cfg = load_cfg(config)
_quantize_step(
cfg,
config,
calibration_path,
from_job,
model_s3_uri,
onnx_path,
_tracker(cfg),
_submission_id(),
)
@app.command()
def compile(
model_id: str | None = typer.Option(None, "--model-id", help="AI Hub model ID to compile"),
from_job: str | None = typer.Option(None, "--from-job", help="Training job name whose model artifact should compile"),
model_s3_uri: str | None = typer.Option(None, "--model-s3-uri", help="S3 URI of model.tar.gz to compile"),
onnx_path: str | None = typer.Option(
None, "--onnx-path", help="Local ONNX path or ONNX path inside extracted artifact"
),
config: str = CONFIG_OPT,
) -> None:
"""Compile a model for the configured Qualcomm AI Hub target."""
cfg = load_cfg(config)
_compile_step(
cfg,
config,
model_id,
from_job,
model_s3_uri,
onnx_path,
prefer_quantized=True,
tracker=_tracker(cfg),
submission_id=_submission_id(),
)
@app.command()
def validate(
input_file: Path = typer.Argument(..., help="NumPy .npz or .npy inputs to run on device"),
model_id: str | None = typer.Option(None, "--model-id", help="AI Hub compiled model ID"),
input_name: str | None = typer.Option(None, "--input-name", help="Input name for .npy files"),
config: str = CONFIG_OPT,
) -> None:
"""Run an AI Hub inference job using sample inputs."""
cfg = load_cfg(config)
_validate_step(cfg, config, input_file, model_id, input_name, _tracker(cfg), _submission_id())
@app.command()
def profile(
model_id: str | None = typer.Option(None, "--model-id", help="AI Hub compiled model ID"),
config: str = CONFIG_OPT,
) -> None:
"""Profile a compiled model on the configured AI Hub device."""
cfg = load_cfg(config)
_profile_step(cfg, config, model_id, _tracker(cfg), _submission_id())
@app.command()
def upload(
calibration_path: Path = typer.Argument(..., help="Calibration .npz file or directory of .npy samples"),
input_file: Path = typer.Argument(..., help="Validation .npz or .npy inputs to run on device"),
from_step: UploadStep = typer.Option(UploadStep.quantize, "--from-step", help="Resume from this Workbench step"),
from_job: str | None = typer.Option(None, "--from-job", help="Training job name whose model artifact should upload"),
model_s3_uri: str | None = typer.Option(None, "--model-s3-uri", help="S3 URI of model.tar.gz to upload"),
onnx_path: str | None = typer.Option(
None, "--onnx-path", help="Local ONNX path or ONNX path inside extracted artifact"
),
input_name: str | None = typer.Option(None, "--input-name", help="Input name for .npy validation files"),
config: str = CONFIG_OPT,
) -> None:
"""Run the four Workbench upload steps: quantize, compile, validate, and profile."""
cfg = load_cfg(config)
steps = [UploadStep.quantize, UploadStep.compile, UploadStep.validate, UploadStep.profile]
selected = steps[steps.index(from_step) :]
tracker = _tracker(cfg)
submission_id = _submission_id()
quantized_model_id: str | None = None
compiled_model_id: str | None = None
if UploadStep.quantize in selected:
quantized = _quantize_step(
cfg,
config,
calibration_path,
from_job,
model_s3_uri,
onnx_path,
tracker,
submission_id,
)
quantized_model_id = quantized.model_id
if UploadStep.compile in selected:
compiled = _compile_step(
cfg,
config,
model_id=quantized_model_id,
from_job=from_job,
model_s3_uri=model_s3_uri,
onnx_path=onnx_path,
prefer_quantized=True,
tracker=tracker,
submission_id=submission_id,
)
compiled_model_id = compiled.model_id
if UploadStep.validate in selected:
_validate_step(cfg, config, input_file, compiled_model_id, input_name, tracker, submission_id)
if UploadStep.profile in selected:
_profile_step(cfg, config, compiled_model_id, tracker, submission_id)
@app.command()
def download(
model_id: str | None = typer.Option(None, "--model-id", help="AI Hub compiled model ID"),
output: Path | None = typer.Option(None, "--output", "-o", help="Destination file path"),
config: str = CONFIG_OPT,
) -> None:
"""Download the last compiled deployable artifact from AI Hub."""
cfg = load_cfg(config)
resolved_model_id = _model_id_or_state(config, model_id)
ext = _RUNTIME_EXTENSIONS.get(cfg.aihub.target_runtime, cfg.aihub.target_runtime)
dest = output or (Path(cfg.aihub.output_dir) / f"model.{ext}")
try:
written = aihub_jobs.download_model(resolved_model_id, dest)
except Exception as e:
CONSOLE.print(f"[red]AI Hub download failed: {e}[/red]")
raise typer.Exit(1)
state_ops.store(config).update(last_downloaded_model=written)
CONSOLE.print(f"[green]✓[/green] Downloaded model: [cyan]{written}[/cyan]")

View File

@@ -150,6 +150,32 @@ def status(config: str = CONFIG_OPT) -> None:
CONSOLE.print(table) CONSOLE.print(table)
@app.command(name="mlflow-url")
def mlflow_url(config: str = CONFIG_OPT) -> None:
"""Print a presigned URL for the configured MLflow tracking server."""
cfg = load_cfg(config)
tracking_server_name = _mlflow_tracking_server_name(cfg)
try:
url = mlflow.create_presigned_tracking_server_url(
cfg.aws.region,
cfg.aws.profile,
tracking_server_name,
)
except Exception as e:
CONSOLE.print("[yellow]Could not create a SageMaker MLflow UI URL.[/yellow]")
CONSOLE.print(f"Tracking server: [cyan]{tracking_server_name}[/cyan]")
CONSOLE.print(f"Reason: {e}")
CONSOLE.print(
"This command can create presigned URLs only for MLflow tracking servers managed by "
"Amazon SageMaker. If this is an external MLflow server, open it with that server's own URL."
)
raise typer.Exit(1)
CONSOLE.print(f"MLflow tracking server: [cyan]{tracking_server_name}[/cyan]")
CONSOLE.print(f"MLflow UI: {url}")
@app.command() @app.command()
def destroy( def destroy(
config: str = CONFIG_OPT, config: str = CONFIG_OPT,
@@ -211,6 +237,14 @@ def _role_name(configured_name: str, role_arn: str) -> str:
return "-" return "-"
def _mlflow_tracking_server_name(cfg: Config) -> str:
name = cfg.effective_mlflow_tracking_server_name
if not name:
CONSOLE.print("[red]MLflow is disabled in config.yaml.[/red]")
raise typer.Exit(1)
return name
def _destroy_account_id(config_path: str, cfg: Config) -> str: def _destroy_account_id(config_path: str, cfg: Config) -> str:
config_dir = str(Path(config_path).parent) config_dir = str(Path(config_path).parent)
state = read_infra_state(config_dir) state = read_infra_state(config_dir)

View File

@@ -1,40 +0,0 @@
import secrets
from pathlib import Path
import typer
import yaml
from src.commands.utils import CONSOLE
from src.config import GENERATED_STACK_PREFIX, Config, InfraConfig, S3Config
app = typer.Typer()
@app.command()
def init(
output: str = typer.Option("config.yaml", help="Destination path for the config file"),
force: bool = typer.Option(False, "--force", "-f", help="Overwrite an existing config file"),
) -> None:
"""Write a starter config.yaml to the current directory."""
dest = Path(output)
if dest.exists() and not force:
CONSOLE.print(f"[yellow]{dest} already exists.[/yellow] Use --force to overwrite.")
raise typer.Exit(1)
config = _new_isolated_config()
dest.parent.mkdir(parents=True, exist_ok=True)
config_data = config.model_dump(mode="json")
config_data["sagemaker"].pop("role_name", None)
with open(dest, "w") as f:
yaml.safe_dump(config_data, f, sort_keys=False)
CONSOLE.print(f"[green]✓[/green] Config written to [bold]{dest}[/bold]")
CONSOLE.print("Edit [cyan]sagemaker.training.image_uri[/cyan] before running training commands.")
def _new_isolated_config() -> Config:
suffix = secrets.token_hex(6)
namespace = f"{GENERATED_STACK_PREFIX}{suffix}"
config = Config(infra=InfraConfig(stack_name=namespace))
config.s3 = S3Config(bucket=f"{namespace}-data")
return config

View File

@@ -1,41 +0,0 @@
import webbrowser
import typer
from src.aws import mlflow as aws_mlflow
from src.commands.utils import CONFIG_OPT, CONSOLE, load_cfg
app = typer.Typer(help="Manage MLflow tracking server access")
@app.command(name="open")
def open_mlflow(config: str = CONFIG_OPT) -> None:
"""Open a presigned URL for the configured MLflow tracking server."""
cfg = load_cfg(config)
tracking_server_name = cfg.effective_mlflow_tracking_server_name
if not tracking_server_name:
CONSOLE.print("[red]MLflow is disabled in config.yaml.[/red]")
raise typer.Exit(1)
try:
url = aws_mlflow.create_presigned_tracking_server_url(
cfg.aws.region,
cfg.aws.profile,
tracking_server_name,
)
except Exception as e:
CONSOLE.print("[yellow]Could not create a SageMaker MLflow UI URL.[/yellow]")
CONSOLE.print(f"Tracking server: [cyan]{tracking_server_name}[/cyan]")
CONSOLE.print(f"Reason: {e}")
CONSOLE.print(
"This command can create presigned URLs only for MLflow tracking servers managed by "
"Amazon SageMaker. If this is an external MLflow server, open it with that server's own URL."
)
raise typer.Exit(1)
CONSOLE.print(f"MLflow tracking server: [cyan]{tracking_server_name}[/cyan]")
CONSOLE.print(f"MLflow UI: {url}")
if webbrowser.open(url):
CONSOLE.print("[green]✓[/green] Opened MLflow UI in your browser.")
else:
CONSOLE.print("[yellow]Could not open a browser automatically. Open the URL above manually.[/yellow]")

View File

@@ -101,7 +101,7 @@ def start(config: str = CONFIG_OPT) -> None:
CONSOLE.print(f"[green]✓[/green] Job submitted: [bold]{job_name}[/bold]") CONSOLE.print(f"[green]✓[/green] Job submitted: [bold]{job_name}[/bold]")
if run_id: if run_id:
CONSOLE.print(f"MLflow run: [cyan]{run_id}[/cyan]") CONSOLE.print(f"MLflow run: [cyan]{run_id}[/cyan]")
CONSOLE.print("Open MLflow: [cyan]qc-cli mlflow open[/cyan]") CONSOLE.print("Open MLflow: [cyan]qc-cli infra mlflow-url[/cyan]")
CONSOLE.print("Track progress: [cyan]qc-cli train status[/cyan]") CONSOLE.print("Track progress: [cyan]qc-cli train status[/cyan]")
@@ -148,10 +148,10 @@ def status(
updates["registered_model_version"] = version updates["registered_model_version"] = version
st.update_training_job(job_name, **updates) st.update_training_job(job_name, **updates)
if version: if version:
st.set_latest_experiment_model_version(version) st.set_latest_prerelease_model_version(version)
CONSOLE.print(f"MLflow model version: [cyan]{version}[/cyan] ([cyan]experiment-latest[/cyan])") CONSOLE.print(f"MLflow model version: [cyan]{version}[/cyan] ([cyan]prerelease-latest[/cyan])")
if run_id and cfg.mlflow.mode is not MlflowMode.disabled: if run_id and cfg.mlflow.mode is not MlflowMode.disabled:
CONSOLE.print("Open MLflow: [cyan]qc-cli mlflow open[/cyan]") CONSOLE.print("Open MLflow: [cyan]qc-cli infra mlflow-url[/cyan]")
@app.command(name="list") @app.command(name="list")

View File

@@ -1,70 +0,0 @@
from pathlib import Path
import typer
from rich.progress import BarColumn, Progress, SpinnerColumn, TaskProgressColumn, TextColumn
from src.aws import s3 as s3_ops
from src.commands.utils import CONFIG_OPT, CONSOLE, load_cfg
app = typer.Typer()
@app.command()
def upload(
path: Path = typer.Argument(..., help="Local file or directory to upload"),
s3_key: str | None = typer.Option(None, "--s3-key", help="S3 key for file uploads"),
config: str = CONFIG_OPT,
) -> None:
"""Upload a local file or directory to S3."""
cfg = load_cfg(config)
if path.is_file():
key = s3_key or f"{cfg.s3.data_prefix.rstrip('/')}/{path.name}"
try:
with CONSOLE.status(f"Uploading {path.name}..."):
uri = s3_ops.upload_file(cfg.aws.region, cfg.aws.profile, cfg.s3.bucket, str(path), key)
except Exception as e:
CONSOLE.print(f"[red]Upload failed: {e}[/red]")
raise typer.Exit(1)
CONSOLE.print(f"[green]✓[/green] {path.name} -> {uri}")
return
if path.is_dir():
if s3_key is not None:
CONSOLE.print("[red]--s3-key can only be used when uploading a single file.[/red]")
raise typer.Exit(1)
files = [file for file in path.rglob("*") if file.is_file()]
if not files:
CONSOLE.print("[yellow]No files found in directory.[/yellow]")
raise typer.Exit(0)
prefix = cfg.s3.data_prefix
CONSOLE.print(f"Uploading {len(files)} files to s3://{cfg.s3.bucket}/{prefix.rstrip('/')}/")
try:
with Progress(
SpinnerColumn(),
TextColumn("[progress.description]{task.description}"),
BarColumn(),
TaskProgressColumn(),
console=CONSOLE,
) as progress:
task = progress.add_task("Uploading...", total=len(files))
count = s3_ops.upload_dir(
cfg.aws.region,
cfg.aws.profile,
cfg.s3.bucket,
str(path),
prefix,
on_progress=lambda: progress.advance(task),
)
except Exception as e:
CONSOLE.print(f"[red]Upload failed: {e}[/red]")
raise typer.Exit(1)
CONSOLE.print(f"[green]✓[/green] Uploaded {count} files to s3://{cfg.s3.bucket}/{prefix.rstrip('/')}/")
return
CONSOLE.print(f"[red]Path not found: {path}[/red]")
raise typer.Exit(1)

View File

@@ -4,8 +4,7 @@ from typing import Any, Literal, TypedDict
from mypy_boto3_s3.literals import BucketLocationConstraintType from mypy_boto3_s3.literals import BucketLocationConstraintType
from mypy_boto3_sagemaker.literals import TrainingInstanceTypeType from mypy_boto3_sagemaker.literals import TrainingInstanceTypeType
from pydantic import BaseModel, Field, field_validator, model_validator from pydantic import BaseModel, Field, model_validator
from qai_hub.client import Device
class MlflowMode(StrEnum): class MlflowMode(StrEnum):
@@ -81,25 +80,6 @@ class SageMakerConfig(BaseModel):
training: TrainingConfig = Field(default_factory=TrainingConfig) training: TrainingConfig = Field(default_factory=TrainingConfig)
class AIHubConfig(BaseModel):
device: Device = Field(default_factory=lambda: Device("Samsung Galaxy S25 (Family)"))
target_runtime: str = "tflite"
input_specs: dict[str, tuple[list[int], str]] = Field(default_factory=dict)
job_name: str | None = None
model_name: str | None = None
compile_options: str | None = None
profile_options: str | None = None
quantize_options: str | None = None
output_dir: str = "build/qai-hub"
@field_validator("device", mode="before")
@classmethod
def parse_device(cls, value: Any) -> Any:
if isinstance(value, str):
return Device(value)
return value
class MlflowConfig(BaseModel): class MlflowConfig(BaseModel):
mode: MlflowMode = MlflowMode.disabled mode: MlflowMode = MlflowMode.disabled
tracking_server_name: str | None = None tracking_server_name: str | None = None
@@ -124,7 +104,6 @@ class Config(BaseModel):
aws: AwsConfig = Field(default_factory=AwsConfig) aws: AwsConfig = Field(default_factory=AwsConfig)
s3: S3Config = Field(default_factory=S3Config) s3: S3Config = Field(default_factory=S3Config)
sagemaker: SageMakerConfig = Field(default_factory=SageMakerConfig) sagemaker: SageMakerConfig = Field(default_factory=SageMakerConfig)
aihub: AIHubConfig = Field(default_factory=AIHubConfig)
mlflow: MlflowConfig = Field(default_factory=MlflowConfig) mlflow: MlflowConfig = Field(default_factory=MlflowConfig)
@property @property

View File

@@ -1,14 +1,114 @@
import typer import secrets
from pathlib import Path
from src.commands import ai_hub, infra, init, mlflow, train, upload import typer
import yaml
from rich.console import Console
from rich.progress import BarColumn, Progress, SpinnerColumn, TaskProgressColumn, TextColumn
from src.aws import s3 as s3_ops
from src.commands import infra, train
from src.commands.utils import CONFIG_OPT, load_cfg
from src.config import GENERATED_STACK_PREFIX, Config, InfraConfig, S3Config
app = typer.Typer( app = typer.Typer(
help="qc-cli: End-to-end model managment for Qualcomm AI Hub.", help="qc-cli: End-to-end model managment for Qualcomm AI Hub.",
no_args_is_help=True, no_args_is_help=True,
) )
app.add_typer(init.app)
app.add_typer(upload.app)
app.add_typer(mlflow.app, name="mlflow")
app.add_typer(infra.app, name="infra") app.add_typer(infra.app, name="infra")
app.add_typer(train.app, name="train") app.add_typer(train.app, name="train")
app.add_typer(ai_hub.app, name="ai-hub")
console = Console()
@app.command()
def init(
output: str = typer.Option("config.yaml", help="Destination path for the config file"),
force: bool = typer.Option(False, "--force", "-f", help="Overwrite an existing config file"),
) -> None:
"""Write a starter config.yaml to the current directory."""
dest = Path(output)
if dest.exists() and not force:
console.print(f"[yellow]{dest} already exists.[/yellow] Use --force to overwrite.")
raise typer.Exit(1)
config = _new_isolated_config()
dest.parent.mkdir(parents=True, exist_ok=True)
config_data = config.model_dump(mode="json")
config_data["sagemaker"].pop("role_name", None)
with open(dest, "w") as f:
yaml.safe_dump(config_data, f, sort_keys=False)
console.print(f"[green]✓[/green] Config written to [bold]{dest}[/bold]")
console.print(
"Edit [cyan]sagemaker.training.image_uri[/cyan] before running training commands."
)
def _new_isolated_config() -> Config:
suffix = secrets.token_hex(6)
namespace = f"{GENERATED_STACK_PREFIX}{suffix}"
config = Config(infra=InfraConfig(stack_name=namespace))
config.s3 = S3Config(bucket=f"{namespace}-data")
return config
@app.command()
def upload(
path: Path = typer.Argument(..., help="Local file or directory to upload"),
s3_key: str | None = typer.Option(None, "--s3-key", help="S3 key for file uploads"),
config: str = CONFIG_OPT,
) -> None:
"""Upload a local file or directory to S3."""
cfg = load_cfg(config)
if path.is_file():
key = s3_key or f"{cfg.s3.data_prefix.rstrip('/')}/{path.name}"
try:
with console.status(f"Uploading {path.name}..."):
uri = s3_ops.upload_file(cfg.aws.region, cfg.aws.profile, cfg.s3.bucket, str(path), key)
except Exception as e:
console.print(f"[red]Upload failed: {e}[/red]")
raise typer.Exit(1)
console.print(f"[green]✓[/green] {path.name} -> {uri}")
return
if path.is_dir():
if s3_key is not None:
console.print("[red]--s3-key can only be used when uploading a single file.[/red]")
raise typer.Exit(1)
files = [file for file in path.rglob("*") if file.is_file()]
if not files:
console.print("[yellow]No files found in directory.[/yellow]")
raise typer.Exit(0)
prefix = cfg.s3.data_prefix
console.print(f"Uploading {len(files)} files to s3://{cfg.s3.bucket}/{prefix.rstrip('/')}/")
try:
with Progress(
SpinnerColumn(),
TextColumn("[progress.description]{task.description}"),
BarColumn(),
TaskProgressColumn(),
console=console,
) as progress:
task = progress.add_task("Uploading...", total=len(files))
count = s3_ops.upload_dir(
cfg.aws.region,
cfg.aws.profile,
cfg.s3.bucket,
str(path),
prefix,
on_progress=lambda: progress.advance(task),
)
except Exception as e:
console.print(f"[red]Upload failed: {e}[/red]")
raise typer.Exit(1)
console.print(f"[green]✓[/green] Uploaded {count} files to s3://{cfg.s3.bucket}/{prefix.rstrip('/')}/")
return
console.print(f"[red]Path not found: {path}[/red]")
raise typer.Exit(1)

View File

@@ -1 +0,0 @@

View File

@@ -1,129 +0,0 @@
from pathlib import Path
from typing import Any, TypedDict
import qai_hub.hub as hub
from qai_hub.client import CompileJob, Device, InferenceJob, Model, ProfileJob, QuantizeDtype, QuantizeJob
class ModelJobResult(TypedDict):
job: CompileJob | QuantizeJob
job_id: str
model: Model
model_id: str
class InferenceJobResult(TypedDict):
job: InferenceJob
job_id: str
outputs: Any
class ProfileJobResult(TypedDict):
job: ProfileJob
job_id: str
def _dataset_entries(inputs: dict[str, Any]) -> dict[str, list[Any]]:
return {name: value if isinstance(value, list) else [value] for name, value in inputs.items()}
def submit_compile_job(
model: Any,
device: Device,
input_specs: dict[str, tuple[tuple[int, ...], str]],
target_runtime: str,
options: str | None = None,
job_name: str | None = None,
model_name: str | None = None,
) -> ModelJobResult:
compile_options = f"--target_runtime {target_runtime}"
if options:
compile_options = f"{compile_options} {options}"
model_arg = model
if isinstance(model, Path):
model_arg = str(model)
elif isinstance(model, str):
candidate = Path(model)
model_arg = model if candidate.exists() or candidate.suffix else hub.get_model(model)
if model_name and isinstance(model_arg, str) and Path(model_arg).exists():
model_arg = hub.upload_model(model_arg, name=model_name)
job = hub.submit_compile_job(
model=model_arg,
device=device,
name=job_name,
input_specs=input_specs,
options=compile_options,
)
target_model = job.get_target_model()
if target_model is None:
raise RuntimeError(f"Compile job {job.job_id} did not produce a target model.")
return {"job": job, "job_id": str(job.job_id), "model": target_model, "model_id": str(target_model.model_id)}
def submit_inference_job(
model_id: str,
device: Device,
inputs: dict[str, Any],
output_dir: str | Path,
job_name: str | None = None,
) -> InferenceJobResult:
job = hub.submit_inference_job(
model=hub.get_model(model_id),
device=device,
inputs=_dataset_entries(inputs),
name=job_name,
)
out = Path(output_dir)
out.mkdir(parents=True, exist_ok=True)
data = job.download_output_data(str(out))
return {"job": job, "job_id": str(job.job_id), "outputs": data}
def submit_profile_job(
model_id: str,
device: Device,
options: str | None = None,
job_name: str | None = None,
) -> ProfileJobResult:
job = hub.submit_profile_job(
model=hub.get_model(model_id),
device=device,
name=job_name,
options=options or "",
)
return {"job": job, "job_id": str(job.job_id)}
def submit_quantize_job(
model: str | Path,
calibration_data: dict[str, Any],
options: str | None = None,
job_name: str | None = None,
model_name: str | None = None,
) -> ModelJobResult:
model_arg = str(model)
if model_name and Path(model_arg).exists():
model_arg = hub.upload_model(model_arg, name=model_name)
job = hub.submit_quantize_job(
model=model_arg,
calibration_data=_dataset_entries(calibration_data),
weights_dtype=QuantizeDtype.INT8,
activations_dtype=QuantizeDtype.INT8,
name=job_name,
options=options or "",
)
target_model = job.get_target_model()
if target_model is None:
raise RuntimeError(f"Quantize job {job.job_id} did not produce a target model.")
return {"job": job, "job_id": str(job.job_id), "model": target_model, "model_id": str(target_model.model_id)}
def download_model(model_id: str, output_path: str | Path) -> str:
dest = Path(output_path)
dest.parent.mkdir(parents=True, exist_ok=True)
model = hub.get_model(model_id)
result = model.download(str(dest))
return str(result or dest)

View File

@@ -1,83 +0,0 @@
import tarfile
from dataclasses import dataclass
from pathlib import Path
from src.aws import s3 as s3_ops
from src.aws import sagemaker as sm_ops
from src.config import Config
@dataclass(frozen=True)
class ResolvedOnnx:
onnx_path: Path
model_artifact: str | None
run_name: str
def _safe_extract(tar: tarfile.TarFile, dest: Path) -> None:
dest_root = dest.resolve()
for member in tar.getmembers():
target = (dest / member.name).resolve()
if dest_root != target and dest_root not in target.parents:
raise ValueError(f"Unsafe tar member path: {member.name}")
tar.extractall(dest, filter="data")
def _find_onnx(root: Path, explicit: str | None = None) -> Path:
if explicit:
p = Path(explicit)
if not p.is_absolute():
p = root / p
if not p.exists():
raise FileNotFoundError(f"ONNX file not found: {p}")
return p
matches = sorted(root.rglob("model.onnx"))
if not matches:
matches = sorted(root.rglob("*.onnx"))
if not matches:
raise FileNotFoundError(f"No ONNX file found under {root}")
if len(matches) > 1:
joined = ", ".join(str(p.relative_to(root)) for p in matches)
raise ValueError(f"Multiple ONNX files found ({joined}). Pass --onnx-path.")
return matches[0]
def resolve_onnx(
cfg: Config,
output_dir: str,
from_job: str | None = None,
model_s3_uri: str | None = None,
onnx_path: str | None = None,
last_training_job: str | None = None,
) -> ResolvedOnnx:
if onnx_path:
path = Path(onnx_path)
if path.exists():
return ResolvedOnnx(onnx_path=path, model_artifact=None, run_name=path.stem)
job = from_job or last_training_job
artifact = model_s3_uri
if not artifact:
if not job:
raise ValueError("No model source found. Pass --onnx-path, --model-s3-uri, --from-job, or run training first.")
artifact = sm_ops.get_model_artifacts(cfg.aws.region, cfg.aws.profile, job)
run_name = job or Path(artifact).name.removesuffix(".tar.gz").replace("/", "-")
root = Path(output_dir) / run_name / "source"
tar_path = root / "model.tar.gz"
s3_ops.download_file(cfg.aws.region, cfg.aws.profile, artifact, str(tar_path))
extract_dir = root / "extracted"
extract_dir.mkdir(parents=True, exist_ok=True)
try:
with tarfile.open(tar_path, "r:gz") as tar:
_safe_extract(tar, extract_dir)
except tarfile.TarError as e:
raise ValueError(f"Invalid model tarball: {tar_path}") from e
return ResolvedOnnx(
onnx_path=_find_onnx(extract_dir, onnx_path),
model_artifact=artifact,
run_name=run_name,
)

View File

@@ -33,22 +33,6 @@ class CliStateStore:
value = self.get("last_training_job") value = self.get("last_training_job")
return str(value) if value else None return str(value) if value else None
def get_last_model_artifact(self) -> str | None:
value = self.get("last_model_artifact")
return str(value) if value else None
def get_last_quantized_model_id(self) -> str | None:
value = self.get("last_quantized_model_id")
return str(value) if value else None
def get_last_compiled_model_id(self) -> str | None:
value = self.get("last_compiled_model_id")
return str(value) if value else None
def get_last_downloaded_model(self) -> str | None:
value = self.get("last_downloaded_model")
return str(value) if value else None
def set_last_training_job(self, job_name: str) -> None: def set_last_training_job(self, job_name: str) -> None:
self.update(last_training_job=job_name) self.update(last_training_job=job_name)
@@ -64,20 +48,8 @@ class CliStateStore:
state["training_jobs"] = jobs state["training_jobs"] = jobs
self._write(state) self._write(state)
def set_latest_experiment_model_version(self, version: str) -> None: def set_latest_prerelease_model_version(self, version: str) -> None:
self.update(latest_experiment_model_version=version) self.update(latest_prerelease_model_version=version)
def get_aihub_model_provenance(self, model_id: str) -> dict[str, Any]:
provenance = self._aihub_model_provenance(self.read())
value = provenance.get(model_id, {})
return dict(value) if isinstance(value, dict) else {}
def update_aihub_model_provenance(self, model_id: str, provenance: dict[str, Any]) -> None:
state = self.read()
model_provenance = self._aihub_model_provenance(state)
model_provenance[model_id] = provenance
state["aihub_model_provenance"] = model_provenance
self._write(state)
def _write(self, state: dict[str, Any]) -> None: def _write(self, state: dict[str, Any]) -> None:
with open(self.path, "w") as f: with open(self.path, "w") as f:
@@ -87,10 +59,6 @@ class CliStateStore:
value = state.get("training_jobs", {}) value = state.get("training_jobs", {})
return dict(value) if isinstance(value, dict) else {} return dict(value) if isinstance(value, dict) else {}
def _aihub_model_provenance(self, state: dict[str, Any]) -> dict[str, Any]:
value = state.get("aihub_model_provenance", {})
return dict(value) if isinstance(value, dict) else {}
def store(config_path: str) -> CliStateStore: def store(config_path: str) -> CliStateStore:
config_dir = str(Path(config_path).parent) config_dir = str(Path(config_path).parent)

View File

@@ -1,3 +1,3 @@
from src.tracking.mlflow import AIHubSourceProvenance, AIHubStepRecord, MlflowTracker, NoopTracker, Tracker from src.tracking.mlflow import MlflowTracker, NoopTracker, Tracker
__all__ = ["AIHubSourceProvenance", "AIHubStepRecord", "MlflowTracker", "NoopTracker", "Tracker"] __all__ = ["MlflowTracker", "NoopTracker", "Tracker"]

View File

@@ -1,12 +1,8 @@
import os from __future__ import annotations
import re
from collections.abc import Mapping
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Protocol
import mlflow import os
from mlflow.tracking import MlflowClient from dataclasses import dataclass
from typing import Any, Protocol
from src.aws import mlflow as aws_mlflow from src.aws import mlflow as aws_mlflow
from src.config import Config, MlflowMode from src.config import Config, MlflowMode
@@ -15,36 +11,12 @@ from src.config import Config, MlflowMode
class Tracker(Protocol): class Tracker(Protocol):
def start_training_run(self, training_job: Any, *, region: str, profile: str, role_arn: str) -> str | None: ... def start_training_run(self, training_job: Any, *, region: str, profile: str, role_arn: str) -> str | None: ...
def finalize_training_run(self, *, run_id: str | None, training_job_status: Any) -> str | None: ... def finalize_training_run(
self,
def record_aihub_step(self, record: "AIHubStepRecord") -> str | None: ... *,
run_id: str | None,
training_job_status: Any,
@dataclass(frozen=True) ) -> str | None: ...
class AIHubSourceProvenance:
kind: str
parent_run_id: str | None = None
uri: str | None = None
path: str | None = None
aihub_model_id: str | None = None
training_job: str | None = None
@dataclass(frozen=True)
class AIHubStepRecord:
step: str
submission_id: str
command: str
source: AIHubSourceProvenance
job: Any | None = None
job_id: str | None = None
model_id: str | None = None
target_runtime: str | None = None
device: str | None = None
options: str | None = None
output_dir: str | Path | None = None
outputs: Mapping[str, Any] | None = None
profile: Mapping[str, Any] | None = None
@dataclass(frozen=True) @dataclass(frozen=True)
@@ -55,12 +27,10 @@ class NoopTracker:
def finalize_training_run(self, *, run_id: str | None, training_job_status: Any) -> str | None: def finalize_training_run(self, *, run_id: str | None, training_job_status: Any) -> str | None:
return None return None
def record_aihub_step(self, record: AIHubStepRecord) -> str | None:
return None
@dataclass(frozen=True) @dataclass(frozen=True)
class MlflowTracker: class MlflowTracker:
mlflow: Any
tracking_uri: str tracking_uri: str
experiment_name: str experiment_name: str
registered_model_name: str registered_model_name: str
@@ -73,6 +43,14 @@ class MlflowTracker:
os.environ.setdefault("MLFLOW_SUPPRESS_PRINTING_URL_TO_STDOUT", "true") os.environ.setdefault("MLFLOW_SUPPRESS_PRINTING_URL_TO_STDOUT", "true")
try:
import mlflow
except ImportError as e:
raise RuntimeError(
"MLflow is enabled in config but optional dependencies are not installed. "
"Install with: qc-cli[mlflow]"
) from e
tracking_server_name = cfg.effective_mlflow_tracking_server_name tracking_server_name = cfg.effective_mlflow_tracking_server_name
if not tracking_server_name: if not tracking_server_name:
raise RuntimeError("MLflow tracking server name could not be resolved.") raise RuntimeError("MLflow tracking server name could not be resolved.")
@@ -86,6 +64,7 @@ class MlflowTracker:
mlflow.set_experiment(cfg.mlflow.experiment_name) mlflow.set_experiment(cfg.mlflow.experiment_name)
return cls( return cls(
mlflow=mlflow,
tracking_uri=tracking_uri, tracking_uri=tracking_uri,
experiment_name=cfg.mlflow.experiment_name, experiment_name=cfg.mlflow.experiment_name,
registered_model_name=cfg.mlflow.registered_model_name, registered_model_name=cfg.mlflow.registered_model_name,
@@ -93,7 +72,7 @@ class MlflowTracker:
) )
def start_training_run(self, training_job: Any, *, region: str, profile: str, role_arn: str) -> str | None: def start_training_run(self, training_job: Any, *, region: str, profile: str, role_arn: str) -> str | None:
run = mlflow.start_run(run_name=training_job.job_name) run = self.mlflow.start_run(run_name=training_job.job_name)
run_id = str(run.info.run_id) run_id = str(run.info.run_id)
params = { params = {
@@ -111,23 +90,21 @@ class MlflowTracker:
} }
self._log_params(params) self._log_params(params)
self._log_params({f"hyperparameters.{key}": value for key, value in training_job.hyperparameters.items()}) self._log_params({f"hyperparameters.{key}": value for key, value in training_job.hyperparameters.items()})
mlflow.set_tags( self.mlflow.set_tags(
{ {
"qc_cli.stage": "experiment", "qc_cli.stage": "prerelease",
"qc_cli.artifact_kind": "trained_source",
"qc_cli.source": "sagemaker",
"qc_cli.command": "train start", "qc_cli.command": "train start",
"sagemaker.job_name": training_job.job_name, "sagemaker.job_name": training_job.job_name,
} }
) )
mlflow.end_run() self.mlflow.end_run()
return run_id return run_id
def finalize_training_run(self, *, run_id: str | None, training_job_status: Any) -> str | None: def finalize_training_run(self, *, run_id: str | None, training_job_status: Any) -> str | None:
if not run_id: if not run_id:
return None return None
with mlflow.start_run(run_id=run_id): with self.mlflow.start_run(run_id=run_id):
self._log_params( self._log_params(
{ {
"sagemaker.training_status": training_job_status.status, "sagemaker.training_status": training_job_status.status,
@@ -138,53 +115,36 @@ class MlflowTracker:
} }
) )
self._log_final_metrics(training_job_status.raw) self._log_final_metrics(training_job_status.raw)
mlflow.set_tag("qc_cli.command", "train status") self.mlflow.set_tag("qc_cli.command", "train status")
if training_job_status.status != "Completed" or not training_job_status.model_artifacts: if training_job_status.status != "Completed" or not training_job_status.model_artifacts:
mlflow.set_tag("qc_cli.training_terminal_status", training_job_status.status) self.mlflow.set_tag("qc_cli.training_terminal_status", training_job_status.status)
return None return None
if not self.register_trained_models: if not self.register_trained_models:
return None return None
client = MlflowClient() client = self.mlflow.tracking.MlflowClient()
self._ensure_registered_model(client, self.registered_model_name) self._ensure_registered_model(client, self.registered_model_name)
version = client.create_model_version( version = client.create_model_version(
name=self.registered_model_name, name=self.registered_model_name,
source=training_job_status.model_artifacts, source=training_job_status.model_artifacts,
run_id=run_id, run_id=run_id,
tags={ tags={
"qc_cli.stage": "experiment", "qc_cli.stage": "prerelease",
"qc_cli.artifact_kind": "trained_source",
"qc_cli.source": "sagemaker",
"sagemaker.job_name": training_job_status.name, "sagemaker.job_name": training_job_status.name,
}, },
) )
version_number = str(version.version) version_number = str(version.version)
client.set_registered_model_alias(self.registered_model_name, "experiment-latest", version_number) self._set_alias(client, self.registered_model_name, "prerelease-latest", version_number)
mlflow.set_tag("qc_cli.registered_model_name", self.registered_model_name) self.mlflow.set_tag("qc_cli.registered_model_name", self.registered_model_name)
mlflow.set_tag("qc_cli.registered_model_version", version_number) self.mlflow.set_tag("qc_cli.registered_model_version", version_number)
return version_number return version_number
def record_aihub_step(self, record: AIHubStepRecord) -> str | None:
run_name = f"ai-hub {record.step}"
if record.source.parent_run_id:
with mlflow.start_run(run_id=record.source.parent_run_id):
child = mlflow.start_run(run_name=run_name, nested=True)
try:
self._log_aihub_record(record)
return str(child.info.run_id)
finally:
mlflow.end_run()
with mlflow.start_run(run_name=run_name) as run:
self._log_aihub_record(record)
return str(run.info.run_id)
def _log_params(self, params: dict[str, Any]) -> None: def _log_params(self, params: dict[str, Any]) -> None:
cleaned = {key: str(value) for key, value in params.items() if value is not None} cleaned = {key: str(value) for key, value in params.items() if value is not None}
if cleaned: if cleaned:
mlflow.log_params(cleaned) self.mlflow.log_params(cleaned)
def _log_final_metrics(self, training_job: dict[str, Any]) -> None: def _log_final_metrics(self, training_job: dict[str, Any]) -> None:
metrics = {} metrics = {}
@@ -194,135 +154,14 @@ class MlflowTracker:
if name and value is not None: if name and value is not None:
metrics[str(name)] = float(value) metrics[str(name)] = float(value)
if metrics: if metrics:
mlflow.log_metrics(metrics) self.mlflow.log_metrics(metrics)
def _ensure_registered_model(self, client: MlflowClient, name: str) -> None: def _ensure_registered_model(self, client: Any, name: str) -> None:
try: try:
client.get_registered_model(name) client.get_registered_model(name)
except Exception: except Exception:
client.create_registered_model(name) client.create_registered_model(name)
def _log_aihub_record(self, record: AIHubStepRecord) -> None: def _set_alias(self, client: Any, name: str, alias: str, version: str) -> None:
status = self._job_status(record.job) if hasattr(client, "set_registered_model_alias"):
job_id = record.job_id or self._job_attr(record.job, "job_id") client.set_registered_model_alias(name, alias, version)
self._log_params(
{
"aihub.step": record.step,
"aihub.submission_id": record.submission_id,
"aihub.job_id": job_id,
"aihub.job_name": self._job_attr(record.job, "name"),
"aihub.job_type": self._job_attr(record.job, "job_type"),
"aihub.job_url": self._job_attr(record.job, "url"),
"aihub.model_id": record.model_id,
"aihub.target_runtime": record.target_runtime,
"aihub.device": record.device,
"aihub.options": record.options or self._job_attr(record.job, "options"),
"aihub.status": status.get("code"),
"aihub.failure_reason": status.get("message"),
"aihub.output_dir": record.output_dir,
"qc_cli.source_model.kind": record.source.kind,
"qc_cli.source_model.uri": record.source.uri,
"qc_cli.source_model.path": record.source.path,
"qc_cli.source_model.aihub_model_id": record.source.aihub_model_id,
"qc_cli.source_training_job": record.source.training_job,
"qc_cli.parent_mlflow_run_id": record.source.parent_run_id,
}
)
mlflow.set_tags(
{
"qc_cli.source": "ai_hub",
"qc_cli.stage": record.step,
"qc_cli.command": record.command,
"qc_cli.aihub_submission_id": record.submission_id,
}
)
self._log_output_stats(record.outputs)
self._log_profile(record.profile)
if record.output_dir:
output_dir = Path(record.output_dir)
if output_dir.exists() and output_dir.is_dir():
mlflow.log_artifacts(str(output_dir), artifact_path=f"aihub/{record.step}")
def _log_output_stats(self, outputs: Mapping[str, Any] | None) -> None:
if not outputs:
return
import numpy as np
params: dict[str, Any] = {}
metrics: dict[str, float] = {}
for name, value in outputs.items():
safe_name = self._metric_name(name)
arr = np.asarray(value)
params[f"aihub.inference.output.{safe_name}.shape"] = list(arr.shape)
params[f"aihub.inference.output.{safe_name}.dtype"] = str(arr.dtype)
metrics[f"aihub.inference.output.{safe_name}.count"] = float(arr.size)
if arr.size == 0 or not np.issubdtype(arr.dtype, np.number):
continue
numeric = arr.astype(float, copy=False)
finite = numeric[np.isfinite(numeric)]
metrics[f"aihub.inference.output.{safe_name}.nan_count"] = float(np.isnan(numeric).sum())
metrics[f"aihub.inference.output.{safe_name}.inf_count"] = float(np.isinf(numeric).sum())
if finite.size == 0:
continue
metrics[f"aihub.inference.output.{safe_name}.min"] = float(finite.min())
metrics[f"aihub.inference.output.{safe_name}.max"] = float(finite.max())
metrics[f"aihub.inference.output.{safe_name}.mean"] = float(finite.mean())
metrics[f"aihub.inference.output.{safe_name}.std"] = float(finite.std())
metrics[f"aihub.inference.output.{safe_name}.l1_norm"] = float(np.linalg.norm(finite, ord=1))
metrics[f"aihub.inference.output.{safe_name}.l2_norm"] = float(np.linalg.norm(finite, ord=2))
self._log_params(params)
if metrics:
mlflow.log_metrics(metrics)
def _log_profile(self, profile: Mapping[str, Any] | None) -> None:
if not profile:
return
mlflow.log_dict(dict(profile), "aihub/profile.json")
metrics = {
f"aihub.profile.{self._metric_name(path)}": float(value)
for path, value in self._flatten_numeric(profile).items()
}
if metrics:
mlflow.log_metrics(metrics)
def _flatten_numeric(self, value: Any, prefix: str = "") -> dict[str, float]:
if isinstance(value, Mapping):
flattened: dict[str, float] = {}
for key, item in value.items():
child_prefix = f"{prefix}.{key}" if prefix else str(key)
flattened.update(self._flatten_numeric(item, child_prefix))
return flattened
if isinstance(value, list | tuple):
flattened = {}
for index, item in enumerate(value):
child_prefix = f"{prefix}.{index}" if prefix else str(index)
flattened.update(self._flatten_numeric(item, child_prefix))
return flattened
if isinstance(value, bool):
return {}
if isinstance(value, int | float):
return {prefix: float(value)}
return {}
def _job_status(self, job: Any | None) -> dict[str, Any]:
if job is None or not hasattr(job, "get_status"):
return {}
status = job.get_status()
return {
"code": getattr(status, "code", None),
"message": getattr(status, "message", None),
}
def _job_attr(self, job: Any | None, name: str) -> Any:
if job is None:
return None
try:
return getattr(job, name)
except Exception:
return None
def _metric_name(self, value: str) -> str:
return re.sub(r"[^A-Za-z0-9_.-]+", "_", str(value)).strip("._") or "value"

182
uv.lock generated
View File

@@ -210,15 +210,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/06/7c/1e7964f0f267301bb5026fed45369961f7311073412bcd36e09fbe4df0de/aws_cdk_lib-2.253.1-py3-none-any.whl", hash = "sha256:03a6f5080978f9e3576f490d06fbd1f41f159280d34dbca50721de4a19694136", size = 50271288, upload-time = "2026-05-08T16:04:41.956Z" }, { url = "https://files.pythonhosted.org/packages/06/7c/1e7964f0f267301bb5026fed45369961f7311073412bcd36e09fbe4df0de/aws_cdk_lib-2.253.1-py3-none-any.whl", hash = "sha256:03a6f5080978f9e3576f490d06fbd1f41f159280d34dbca50721de4a19694136", size = 50271288, upload-time = "2026-05-08T16:04:41.956Z" },
] ]
[[package]]
name = "backoff"
version = "2.2.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/47/d7/5bbeb12c44d7c4f2fb5b56abce497eb5ed9f34d85701de869acedd602619/backoff-2.2.1.tar.gz", hash = "sha256:03f829f5bb1923180821643f8753b0502c3b682293992485b0eef2807afa5cba", size = 17001, upload-time = "2022-10-05T19:19:32.061Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/df/73/b6e24bd22e6720ca8ee9a85a0c4a2971af8497d8f3193fa05390cbd46e09/backoff-2.2.1-py3-none-any.whl", hash = "sha256:63579f9a0628e06278f7e47b7d7d5b6ce20dc65c5e96a6f3ca99a6adca0396e8", size = 15148, upload-time = "2022-10-05T19:19:30.546Z" },
]
[[package]] [[package]]
name = "blinker" name = "blinker"
version = "1.9.0" version = "1.9.0"
@@ -600,18 +591,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/e3/43/33806117fc8e0992aae890be73990b31d802b66e8a423bf87b80990fce66/databricks_sdk-0.111.0-py3-none-any.whl", hash = "sha256:d14ba186afd2bea03c7157d2f03e0f861a0b8eff528cfdba926d07b9e20384b8", size = 901536, upload-time = "2026-05-25T09:29:58.057Z" }, { url = "https://files.pythonhosted.org/packages/e3/43/33806117fc8e0992aae890be73990b31d802b66e8a423bf87b80990fce66/databricks_sdk-0.111.0-py3-none-any.whl", hash = "sha256:d14ba186afd2bea03c7157d2f03e0f861a0b8eff528cfdba926d07b9e20384b8", size = 901536, upload-time = "2026-05-25T09:29:58.057Z" },
] ]
[[package]]
name = "deprecation"
version = "2.1.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "packaging" },
]
sdist = { url = "https://files.pythonhosted.org/packages/5a/d3/8ae2869247df154b64c1884d7346d412fed0c49df84db635aab2d1c40e62/deprecation-2.1.0.tar.gz", hash = "sha256:72b3bde64e5d778694b0cf68178aed03d15e15477116add3fb773e581f9518ff", size = 173788, upload-time = "2020-04-20T14:23:38.738Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/02/c3/253a89ee03fc9b9682f1541728eb66db7db22148cd94f89ab22528cd1e1b/deprecation-2.1.0-py2.py3-none-any.whl", hash = "sha256:a10811591210e1fb0e768a8c25517cabeabcba6f0bf96564f8ff45189f90b14a", size = 11178, upload-time = "2020-04-20T14:23:36.581Z" },
]
[[package]] [[package]]
name = "docker" name = "docker"
version = "7.1.0" version = "7.1.0"
@@ -929,41 +908,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl", hash = "sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86", size = 37515, upload-time = "2025-04-24T03:35:24.344Z" }, { url = "https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl", hash = "sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86", size = 37515, upload-time = "2025-04-24T03:35:24.344Z" },
] ]
[[package]]
name = "h5py"
version = "3.16.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "numpy" },
]
sdist = { url = "https://files.pythonhosted.org/packages/db/33/acd0ce6863b6c0d7735007df01815403f5589a21ff8c2e1ee2587a38f548/h5py-3.16.0.tar.gz", hash = "sha256:a0dbaad796840ccaa67a4c144a0d0c8080073c34c76d5a6941d6818678ef2738", size = 446526, upload-time = "2026-03-06T13:49:08.07Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/0f/9e/6142ebfda0cb6e9349c091eae73c2e01a770b7659255248d637bec54a88b/h5py-3.16.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:370a845f432c2c9619db8eed334d1e610c6015796122b0e57aa46312c22617d9", size = 3671808, upload-time = "2026-03-06T13:48:19.737Z" },
{ url = "https://files.pythonhosted.org/packages/b0/65/5e088a45d0f43cd814bc5bec521c051d42005a472e804b1a36c48dada09b/h5py-3.16.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:42108e93326c50c2810025aade9eac9d6827524cdccc7d4b75a546e5ab308edb", size = 3045837, upload-time = "2026-03-06T13:48:21.854Z" },
{ url = "https://files.pythonhosted.org/packages/da/1e/6172269e18cc5a484e2913ced33339aad588e02ba407fafd00d369e22ef3/h5py-3.16.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:099f2525c9dcf28de366970a5fb34879aab20491589fa89ce2863a84218bb524", size = 5193860, upload-time = "2026-03-06T13:48:24.071Z" },
{ url = "https://files.pythonhosted.org/packages/bd/98/ef2b6fe2903e377cbe870c3b2800d62552f1e3dbe81ce49e1923c53d1c5c/h5py-3.16.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:9300ad32dea9dfc5171f94d5f6948e159ed93e4701280b0f508773b3f582f402", size = 5400417, upload-time = "2026-03-06T13:48:25.728Z" },
{ url = "https://files.pythonhosted.org/packages/bc/81/5b62d760039eed64348c98129d17061fdfc7839fc9c04eaaad6dee1004e4/h5py-3.16.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:171038f23bccddfc23f344cadabdfc9917ff554db6a0d417180d2747fe4c75a7", size = 5185214, upload-time = "2026-03-06T13:48:27.436Z" },
{ url = "https://files.pythonhosted.org/packages/28/c4/532123bcd9080e250696779c927f2cb906c8bf3447df98f5ceb8dcded539/h5py-3.16.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:7e420b539fb6023a259a1b14d4c9f6df8cf50d7268f48e161169987a57b737ff", size = 5414598, upload-time = "2026-03-06T13:48:29.49Z" },
{ url = "https://files.pythonhosted.org/packages/c3/d9/a27997f84341fc0dfcdd1fe4179b6ba6c32a7aa880fdb8c514d4dad6fba3/h5py-3.16.0-cp313-cp313-win_amd64.whl", hash = "sha256:18f2bbcd545e6991412253b98727374c356d67caa920e68dc79eab36bf5fedad", size = 3175509, upload-time = "2026-03-06T13:48:31.131Z" },
{ url = "https://files.pythonhosted.org/packages/a5/23/bb8647521d4fd770c30a76cfc6cb6a2f5495868904054e92f2394c5a78ff/h5py-3.16.0-cp313-cp313-win_arm64.whl", hash = "sha256:656f00e4d903199a1d58df06b711cf3ca632b874b4207b7dbec86185b5c8c7d4", size = 2647362, upload-time = "2026-03-06T13:48:33.411Z" },
{ url = "https://files.pythonhosted.org/packages/48/3c/7fcd9b4c9eed82e91fb15568992561019ae7a829d1f696b2c844355d95dd/h5py-3.16.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:9c9d307c0ef862d1cd5714f72ecfafe0a5d7529c44845afa8de9f46e5ba8bd65", size = 3678608, upload-time = "2026-03-06T13:48:35.183Z" },
{ url = "https://files.pythonhosted.org/packages/6a/b7/9366ed44ced9b7ef357ab48c94205280276db9d7f064aa3012a97227e966/h5py-3.16.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:8c1eff849cdd53cbc73c214c30ebdb6f1bb8b64790b4b4fc36acdb5e43570210", size = 3054773, upload-time = "2026-03-06T13:48:37.139Z" },
{ url = "https://files.pythonhosted.org/packages/58/a5/4964bc0e91e86340c2bbda83420225b2f770dcf1eb8a39464871ad769436/h5py-3.16.0-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:e2c04d129f180019e216ee5f9c40b78a418634091c8782e1f723a6ca3658b965", size = 5198886, upload-time = "2026-03-06T13:48:38.879Z" },
{ url = "https://files.pythonhosted.org/packages/f1/16/d905e7f53e661ce2c24686c38048d8e2b750ffc4350009d41c4e6c6c9826/h5py-3.16.0-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:e4360f15875a532bc7b98196c7592ed4fc92672a57c0a621355961cafb17a6dd", size = 5404883, upload-time = "2026-03-06T13:48:41.324Z" },
{ url = "https://files.pythonhosted.org/packages/4b/f2/58f34cb74af46d39f4cd18ea20909a8514960c5a3e5b92fd06a28161e0a8/h5py-3.16.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:3fae9197390c325e62e0a1aa977f2f62d994aa87aab182abbea85479b791197c", size = 5192039, upload-time = "2026-03-06T13:48:43.117Z" },
{ url = "https://files.pythonhosted.org/packages/ce/ca/934a39c24ce2e2db017268c08da0537c20fa0be7e1549be3e977313fc8f5/h5py-3.16.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:43259303989ac8adacc9986695b31e35dba6fd1e297ff9c6a04b7da5542139cc", size = 5421526, upload-time = "2026-03-06T13:48:44.838Z" },
{ url = "https://files.pythonhosted.org/packages/3e/14/615a450205e1b56d16c6783f5ccd116cde05550faad70ae077c955654a75/h5py-3.16.0-cp314-cp314-win_amd64.whl", hash = "sha256:fa48993a0b799737ba7fd21e2350fa0a60701e58180fae9f2de834bc39a147ab", size = 3183263, upload-time = "2026-03-06T13:48:47.117Z" },
{ url = "https://files.pythonhosted.org/packages/7b/48/a6faef5ed632cae0c65ac6b214a6614a0b510c3183532c521bdb0055e117/h5py-3.16.0-cp314-cp314-win_arm64.whl", hash = "sha256:1897a771a7f40d05c262fc8f37376ec37873218544b70216872876c627640f63", size = 2663450, upload-time = "2026-03-06T13:48:48.707Z" },
{ url = "https://files.pythonhosted.org/packages/5d/32/0c8bb8aedb62c772cf7c1d427c7d1951477e8c2835f872bc0a13d1f85f86/h5py-3.16.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:15922e485844f77c0b9d275396d435db3baa58292a9c2176a386e072e0cf2491", size = 3760693, upload-time = "2026-03-06T13:48:50.453Z" },
{ url = "https://files.pythonhosted.org/packages/1d/1f/fcc5977d32d6387c5c9a694afee716a5e20658ac08b3ff24fdec79fb05f2/h5py-3.16.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:df02dd29bd247f98674634dfe41f89fd7c16ba3d7de8695ec958f58404a4e618", size = 3181305, upload-time = "2026-03-06T13:48:52.221Z" },
{ url = "https://files.pythonhosted.org/packages/f5/a1/af87f64b9f986889884243643621ebbd4ac72472ba8ec8cec891ac8e2ca1/h5py-3.16.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:0f456f556e4e2cebeebd9d66adf8dc321770a42593494a0b6f0af54a7567b242", size = 5074061, upload-time = "2026-03-06T13:48:54.089Z" },
{ url = "https://files.pythonhosted.org/packages/cc/d0/146f5eaff3dc246a9c7f6e5e4f42bd45cc613bce16693bcd4d1f7c958bf5/h5py-3.16.0-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:3e6cb3387c756de6a9492d601553dffea3fe11b5f22b443aac708c69f3f55e16", size = 5279216, upload-time = "2026-03-06T13:48:56.75Z" },
{ url = "https://files.pythonhosted.org/packages/a1/9d/12a13424f1e604fc7df9497b73c0356fb78c2fb206abd7465ce47226e8fd/h5py-3.16.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:8389e13a1fd745ad2856873e8187fd10268b2d9677877bb667b41aebd771d8b7", size = 5070068, upload-time = "2026-03-06T13:48:59.169Z" },
{ url = "https://files.pythonhosted.org/packages/41/8c/bbe98f813722b4873818a8db3e15aa3e625b59278566905ac439725e8070/h5py-3.16.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:346df559a0f7dcb31cf8e44805319e2ab24b8957c45e7708ce503b2ec79ba725", size = 5300253, upload-time = "2026-03-06T13:49:02.033Z" },
{ url = "https://files.pythonhosted.org/packages/32/9e/87e6705b4d6890e7cecdf876e2a7d3e40654a2ae37482d79a6f1b87f7b92/h5py-3.16.0-cp314-cp314t-win_amd64.whl", hash = "sha256:4c6ab014ab704b4feaa719ae783b86522ed0bf1f82184704ed3c9e4e3228796e", size = 3381671, upload-time = "2026-03-06T13:49:04.351Z" },
{ url = "https://files.pythonhosted.org/packages/96/91/9fad90cfc5f9b2489c7c26ad897157bce82f0e9534a986a221b99760b23b/h5py-3.16.0-cp314-cp314t-win_arm64.whl", hash = "sha256:faca8fb4e4319c09d83337adc80b2ca7d5c5a343c2d6f1b6388f32cfecca13c1", size = 2740706, upload-time = "2026-03-06T13:49:06.347Z" },
]
[[package]] [[package]]
name = "huey" name = "huey"
version = "2.6.0" version = "2.6.0"
@@ -1003,6 +947,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/8a/db/55a262f3606bebcae07cc14095338471ad7c0bbcaa37707e6f0ee49725b7/importlib_resources-7.1.0-py3-none-any.whl", hash = "sha256:1bd7b48b4088eddb2cd16382150bb515af0bd2c70128194392725f82ad2c96a1", size = 37232, upload-time = "2026-04-12T16:36:08.219Z" }, { url = "https://files.pythonhosted.org/packages/8a/db/55a262f3606bebcae07cc14095338471ad7c0bbcaa37707e6f0ee49725b7/importlib_resources-7.1.0-py3-none-any.whl", hash = "sha256:1bd7b48b4088eddb2cd16382150bb515af0bd2c70128194392725f82ad2c96a1", size = 37232, upload-time = "2026-04-12T16:36:08.219Z" },
] ]
[[package]]
name = "iniconfig"
version = "2.3.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/72/34/14ca021ce8e5dfedc35312d08ba8bf51fdd999c576889fc2c24cb97f4f10/iniconfig-2.3.0.tar.gz", hash = "sha256:c76315c77db068650d49c5b56314774a7804df16fee4402c1f19d6d15d8c4730", size = 20503, upload-time = "2025-10-18T21:55:43.219Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12", size = 7484, upload-time = "2025-10-18T21:55:41.639Z" },
]
[[package]] [[package]]
name = "itsdangerous" name = "itsdangerous"
version = "2.2.0" version = "2.2.0"
@@ -1665,6 +1618,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/ff/6e/cf826fae916b8658848d7b9f38d88da6396895c676e8086fc0988073aaf8/pillow-12.2.0-cp314-cp314t-win_arm64.whl", hash = "sha256:aa88ccfe4e32d362816319ed727a004423aab09c5cea43c01a4b435643fa34eb", size = 2556579, upload-time = "2026-04-01T14:45:52.529Z" }, { url = "https://files.pythonhosted.org/packages/ff/6e/cf826fae916b8658848d7b9f38d88da6396895c676e8086fc0988073aaf8/pillow-12.2.0-cp314-cp314t-win_arm64.whl", hash = "sha256:aa88ccfe4e32d362816319ed727a004423aab09c5cea43c01a4b435643fa34eb", size = 2556579, upload-time = "2026-04-01T14:45:52.529Z" },
] ]
[[package]]
name = "pluggy"
version = "1.6.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/f9/e2/3e91f31a7d2b083fe6ef3fa267035b518369d9511ffab804f839851d2779/pluggy-1.6.0.tar.gz", hash = "sha256:7dcc130b76258d33b90f61b658791dede3486c3e6bfb003ee5c9bfb396dd22f3", size = 69412, upload-time = "2025-05-15T12:30:07.975Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
]
[[package]] [[package]]
name = "prettytable" name = "prettytable"
version = "3.17.0" version = "3.17.0"
@@ -1756,16 +1718,17 @@ wheels = [
[[package]] [[package]]
name = "protobuf" name = "protobuf"
version = "6.31.1" version = "6.33.6"
source = { registry = "https://pypi.org/simple" } source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/52/f3/b9655a711b32c19720253f6f06326faf90580834e2e83f840472d752bc8b/protobuf-6.31.1.tar.gz", hash = "sha256:d8cac4c982f0b957a4dc73a80e2ea24fab08e679c0de9deb835f4a12d69aca9a", size = 441797, upload-time = "2025-05-28T19:25:54.947Z" } sdist = { url = "https://files.pythonhosted.org/packages/66/70/e908e9c5e52ef7c3a6c7902c9dfbb34c7e29c25d2f81ade3856445fd5c94/protobuf-6.33.6.tar.gz", hash = "sha256:a6768d25248312c297558af96a9f9c929e8c4cee0659cb07e780731095f38135", size = 444531, upload-time = "2026-03-18T19:05:00.988Z" }
wheels = [ wheels = [
{ url = "https://files.pythonhosted.org/packages/f3/6f/6ab8e4bf962fd5570d3deaa2d5c38f0a363f57b4501047b5ebeb83ab1125/protobuf-6.31.1-cp310-abi3-win32.whl", hash = "sha256:7fa17d5a29c2e04b7d90e5e32388b8bfd0e7107cd8e616feef7ed3fa6bdab5c9", size = 423603, upload-time = "2025-05-28T19:25:41.198Z" }, { url = "https://files.pythonhosted.org/packages/fc/9f/2f509339e89cfa6f6a4c4ff50438db9ca488dec341f7e454adad60150b00/protobuf-6.33.6-cp310-abi3-win32.whl", hash = "sha256:7d29d9b65f8afef196f8334e80d6bc1d5d4adedb449971fefd3723824e6e77d3", size = 425739, upload-time = "2026-03-18T19:04:48.373Z" },
{ url = "https://files.pythonhosted.org/packages/44/3a/b15c4347dd4bf3a1b0ee882f384623e2063bb5cf9fa9d57990a4f7df2fb6/protobuf-6.31.1-cp310-abi3-win_amd64.whl", hash = "sha256:426f59d2964864a1a366254fa703b8632dcec0790d8862d30034d8245e1cd447", size = 435283, upload-time = "2025-05-28T19:25:44.275Z" }, { url = "https://files.pythonhosted.org/packages/76/5d/683efcd4798e0030c1bab27374fd13a89f7c2515fb1f3123efdfaa5eab57/protobuf-6.33.6-cp310-abi3-win_amd64.whl", hash = "sha256:0cd27b587afca21b7cfa59a74dcbd48a50f0a6400cfb59391340ad729d91d326", size = 437089, upload-time = "2026-03-18T19:04:50.381Z" },
{ url = "https://files.pythonhosted.org/packages/6a/c9/b9689a2a250264a84e66c46d8862ba788ee7a641cdca39bccf64f59284b7/protobuf-6.31.1-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:6f1227473dc43d44ed644425268eb7c2e488ae245d51c6866d19fe158e207402", size = 425604, upload-time = "2025-05-28T19:25:45.702Z" }, { url = "https://files.pythonhosted.org/packages/5c/01/a3c3ed5cd186f39e7880f8303cc51385a198a81469d53d0fdecf1f64d929/protobuf-6.33.6-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:9720e6961b251bde64edfdab7d500725a2af5280f3f4c87e57c0208376aa8c3a", size = 427737, upload-time = "2026-03-18T19:04:51.866Z" },
{ url = "https://files.pythonhosted.org/packages/76/a1/7a5a94032c83375e4fe7e7f56e3976ea6ac90c5e85fac8576409e25c39c3/protobuf-6.31.1-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:a40fc12b84c154884d7d4c4ebd675d5b3b5283e155f324049ae396b95ddebc39", size = 322115, upload-time = "2025-05-28T19:25:47.128Z" }, { url = "https://files.pythonhosted.org/packages/ee/90/b3c01fdec7d2f627b3a6884243ba328c1217ed2d978def5c12dc50d328a3/protobuf-6.33.6-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:e2afbae9b8e1825e3529f88d514754e094278bb95eadc0e199751cdd9a2e82a2", size = 324610, upload-time = "2026-03-18T19:04:53.096Z" },
{ url = "https://files.pythonhosted.org/packages/fa/b1/b59d405d64d31999244643d88c45c8241c58f17cc887e73bcb90602327f8/protobuf-6.31.1-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:4ee898bf66f7a8b0bd21bce523814e6fbd8c6add948045ce958b73af7e8878c6", size = 321070, upload-time = "2025-05-28T19:25:50.036Z" }, { url = "https://files.pythonhosted.org/packages/9b/ca/25afc144934014700c52e05103c2421997482d561f3101ff352e1292fb81/protobuf-6.33.6-cp39-abi3-manylinux2014_s390x.whl", hash = "sha256:c96c37eec15086b79762ed265d59ab204dabc53056e3443e702d2681f4b39ce3", size = 339381, upload-time = "2026-03-18T19:04:54.616Z" },
{ url = "https://files.pythonhosted.org/packages/f7/af/ab3c51ab7507a7325e98ffe691d9495ee3d3aa5f589afad65ec920d39821/protobuf-6.31.1-py3-none-any.whl", hash = "sha256:720a6c7e6b77288b85063569baae8536671b39f15cc22037ec7045658d80489e", size = 168724, upload-time = "2025-05-28T19:25:53.926Z" }, { url = "https://files.pythonhosted.org/packages/16/92/d1e32e3e0d894fe00b15ce28ad4944ab692713f2e7f0a99787405e43533a/protobuf-6.33.6-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:e9db7e292e0ab79dd108d7f1a94fe31601ce1ee3f7b79e0692043423020b0593", size = 323436, upload-time = "2026-03-18T19:04:55.768Z" },
{ url = "https://files.pythonhosted.org/packages/c4/72/02445137af02769918a93807b2b7890047c32bfb9f90371cbc12688819eb/protobuf-6.33.6-py3-none-any.whl", hash = "sha256:77179e006c476e69bf8e8ce866640091ec42e1beb80b213c3900006ecfba6901", size = 170656, upload-time = "2026-03-18T19:04:59.826Z" },
] ]
[[package]] [[package]]
@@ -1945,6 +1908,22 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/16/6b/330d8ebae582b30c2959a1ef4c3bc344ebde48c2ff0c3f113c4710735e11/pyright-1.1.409-py3-none-any.whl", hash = "sha256:aa3ea228cab90c845c7a60d28db7a844c04315356392aa09fafcee98c8c22fb3", size = 6438161, upload-time = "2026-04-23T11:02:01.309Z" }, { url = "https://files.pythonhosted.org/packages/16/6b/330d8ebae582b30c2959a1ef4c3bc344ebde48c2ff0c3f113c4710735e11/pyright-1.1.409-py3-none-any.whl", hash = "sha256:aa3ea228cab90c845c7a60d28db7a844c04315356392aa09fafcee98c8c22fb3", size = 6438161, upload-time = "2026-04-23T11:02:01.309Z" },
] ]
[[package]]
name = "pytest"
version = "9.0.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "colorama", marker = "sys_platform == 'win32'" },
{ name = "iniconfig" },
{ name = "packaging" },
{ name = "pluggy" },
{ name = "pygments" },
]
sdist = { url = "https://files.pythonhosted.org/packages/7d/0d/549bd94f1a0a402dc8cf64563a117c0f3765662e2e668477624baeec44d5/pytest-9.0.3.tar.gz", hash = "sha256:b86ada508af81d19edeb213c681b1d48246c1a91d304c6c81a427674c17eb91c", size = 1572165, upload-time = "2026-04-07T17:16:18.027Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/d4/24/a372aaf5c9b7208e7112038812994107bc65a84cd00e0354a88c2c77a617/pytest-9.0.3-py3-none-any.whl", hash = "sha256:2c5efc453d45394fdd706ade797c0a81091eccd1d6e4bccfcd476e2b8e0ab5d9", size = 375249, upload-time = "2026-04-07T17:16:16.13Z" },
]
[[package]] [[package]]
name = "python-dateutil" name = "python-dateutil"
version = "2.9.0.post0" version = "2.9.0.post0"
@@ -2024,29 +2003,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/f1/12/de94a39c2ef588c7e6455cfbe7343d3b2dc9d6b6b2f40c4c6565744c873d/pyyaml-6.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:ebc55a14a21cb14062aa4162f906cd962b28e2e9ea38f9b4391244cd8de4ae0b", size = 149341, upload-time = "2025-09-25T21:32:56.828Z" }, { url = "https://files.pythonhosted.org/packages/f1/12/de94a39c2ef588c7e6455cfbe7343d3b2dc9d6b6b2f40c4c6565744c873d/pyyaml-6.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:ebc55a14a21cb14062aa4162f906cd962b28e2e9ea38f9b4391244cd8de4ae0b", size = 149341, upload-time = "2025-09-25T21:32:56.828Z" },
] ]
[[package]]
name = "qai-hub"
version = "0.50.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "backoff" },
{ name = "deprecation" },
{ name = "h5py" },
{ name = "numpy" },
{ name = "packaging" },
{ name = "prettytable" },
{ name = "protobuf" },
{ name = "requests" },
{ name = "requests-toolbelt" },
{ name = "s3transfer" },
{ name = "semver" },
{ name = "tqdm" },
{ name = "typing-extensions" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/f1/d8/d25fea29362a762b0d739ca8bfcfbda8b7af7f028813fa4c76a91edabfb1/qai_hub-0.50.0-py3-none-any.whl", hash = "sha256:a0b1e93fc3e358c02151042676779a793fea028d78b09854a3b4c6e0719bc0ce", size = 123503, upload-time = "2026-05-28T23:08:06.19Z" },
]
[[package]] [[package]]
name = "qc-cli" name = "qc-cli"
version = "0.1.0" version = "0.1.0"
@@ -2055,19 +2011,22 @@ dependencies = [
{ name = "aws-cdk-lib" }, { name = "aws-cdk-lib" },
{ name = "boto3" }, { name = "boto3" },
{ name = "constructs" }, { name = "constructs" },
{ name = "mlflow" },
{ name = "numpy" },
{ name = "pydantic" }, { name = "pydantic" },
{ name = "pyyaml" }, { name = "pyyaml" },
{ name = "qai-hub" },
{ name = "sagemaker-mlflow" },
{ name = "typer" }, { name = "typer" },
] ]
[package.optional-dependencies]
mlflow = [
{ name = "mlflow" },
{ name = "sagemaker-mlflow" },
]
[package.dev-dependencies] [package.dev-dependencies]
dev = [ dev = [
{ name = "boto3-stubs", extra = ["iam", "s3", "sagemaker"] }, { name = "boto3-stubs", extra = ["iam", "s3", "sagemaker"] },
{ name = "pyright" }, { name = "pyright" },
{ name = "pytest" },
{ name = "ruff" }, { name = "ruff" },
{ name = "types-pyyaml" }, { name = "types-pyyaml" },
] ]
@@ -2077,19 +2036,19 @@ requires-dist = [
{ name = "aws-cdk-lib", specifier = ">=2.180.0" }, { name = "aws-cdk-lib", specifier = ">=2.180.0" },
{ name = "boto3", specifier = ">=1.34,<1.42" }, { name = "boto3", specifier = ">=1.34,<1.42" },
{ name = "constructs", specifier = ">=10.0.0" }, { name = "constructs", specifier = ">=10.0.0" },
{ name = "mlflow", specifier = ">=3.0" }, { name = "mlflow", marker = "extra == 'mlflow'", specifier = ">=3.0" },
{ name = "numpy", specifier = ">=1.26" },
{ name = "pydantic", specifier = ">=2.13.3" }, { name = "pydantic", specifier = ">=2.13.3" },
{ name = "pyyaml", specifier = ">=6.0.3" }, { name = "pyyaml", specifier = ">=6.0.3" },
{ name = "qai-hub", specifier = ">=0.49.0" }, { name = "sagemaker-mlflow", marker = "extra == 'mlflow'", specifier = ">=0.4.0" },
{ name = "sagemaker-mlflow", specifier = ">=0.4.0" },
{ name = "typer", specifier = "==0.25.0" }, { name = "typer", specifier = "==0.25.0" },
] ]
provides-extras = ["mlflow"]
[package.metadata.requires-dev] [package.metadata.requires-dev]
dev = [ dev = [
{ name = "boto3-stubs", extras = ["iam", "s3", "sagemaker"] }, { name = "boto3-stubs", extras = ["iam", "s3", "sagemaker"] },
{ name = "pyright", specifier = ">=1.1.409" }, { name = "pyright", specifier = ">=1.1.409" },
{ name = "pytest", specifier = ">=8.0" },
{ name = "ruff", specifier = ">=0.4" }, { name = "ruff", specifier = ">=0.4" },
{ name = "types-pyyaml" }, { name = "types-pyyaml" },
] ]
@@ -2109,18 +2068,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/a0/f4/c67b0b3f1b9245e8d266f0f112c500d50e5b4e83cb6f3b71b6528104182a/requests-2.34.2-py3-none-any.whl", hash = "sha256:2a0d60c172f83ac6ab31e4554906c0f3b3588d37b5cb939b1c061f4907e278e0", size = 73075, upload-time = "2026-05-14T19:25:26.443Z" }, { url = "https://files.pythonhosted.org/packages/a0/f4/c67b0b3f1b9245e8d266f0f112c500d50e5b4e83cb6f3b71b6528104182a/requests-2.34.2-py3-none-any.whl", hash = "sha256:2a0d60c172f83ac6ab31e4554906c0f3b3588d37b5cb939b1c061f4907e278e0", size = 73075, upload-time = "2026-05-14T19:25:26.443Z" },
] ]
[[package]]
name = "requests-toolbelt"
version = "1.0.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "requests" },
]
sdist = { url = "https://files.pythonhosted.org/packages/f3/61/d7545dafb7ac2230c70d38d31cbfe4cc64f7144dc41f6e4e4b78ecd9f5bb/requests-toolbelt-1.0.0.tar.gz", hash = "sha256:7681a0a3d047012b5bdc0ee37d7f8f07ebe76ab08caeccfc3921ce23c88d5bc6", size = 206888, upload-time = "2023-05-01T04:11:33.229Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/3f/51/d4db610ef29373b879047326cbf6fa98b6c1969d6f6dc423279de2b1be2c/requests_toolbelt-1.0.0-py2.py3-none-any.whl", hash = "sha256:cccfdd665f0a24fcf4726e690f65639d272bb0637b9b92dfd91a5568ccf6bd06", size = 54481, upload-time = "2023-05-01T04:11:28.427Z" },
]
[[package]] [[package]]
name = "rich" name = "rich"
version = "15.0.0" version = "15.0.0"
@@ -2273,15 +2220,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/07/39/338d9219c4e87f3e708f18857ecd24d22a0c3094752393319553096b98af/scipy-1.17.1-cp314-cp314t-win_arm64.whl", hash = "sha256:200e1050faffacc162be6a486a984a0497866ec54149a01270adc8a59b7c7d21", size = 25489165, upload-time = "2026-02-23T00:22:29.563Z" }, { url = "https://files.pythonhosted.org/packages/07/39/338d9219c4e87f3e708f18857ecd24d22a0c3094752393319553096b98af/scipy-1.17.1-cp314-cp314t-win_arm64.whl", hash = "sha256:200e1050faffacc162be6a486a984a0497866ec54149a01270adc8a59b7c7d21", size = 25489165, upload-time = "2026-02-23T00:22:29.563Z" },
] ]
[[package]]
name = "semver"
version = "3.0.4"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/72/d1/d3159231aec234a59dd7d601e9dd9fe96f3afff15efd33c1070019b26132/semver-3.0.4.tar.gz", hash = "sha256:afc7d8c584a5ed0a11033af086e8af226a9c0b206f313e0301f8dd7b6b589602", size = 269730, upload-time = "2025-01-24T13:19:27.617Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/a6/24/4d91e05817e92e3a61c8a21e08fd0f390f5301f1c448b137c57c4bc6e543/semver-3.0.4-py3-none-any.whl", hash = "sha256:9c824d87ba7f7ab4a1890799cec8596f15c1241cb473404ea1cb0c55e4b04746", size = 17912, upload-time = "2025-01-24T13:19:24.949Z" },
]
[[package]] [[package]]
name = "shellingham" name = "shellingham"
version = "1.5.4" version = "1.5.4"
@@ -2389,18 +2327,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/32/d5/f9a850d79b0851d1d4ef6456097579a9005b31fea68726a4ae5f2d82ddd9/threadpoolctl-3.6.0-py3-none-any.whl", hash = "sha256:43a0b8fd5a2928500110039e43a5eed8480b918967083ea48dc3ab9f13c4a7fb", size = 18638, upload-time = "2025-03-13T13:49:21.846Z" }, { url = "https://files.pythonhosted.org/packages/32/d5/f9a850d79b0851d1d4ef6456097579a9005b31fea68726a4ae5f2d82ddd9/threadpoolctl-3.6.0-py3-none-any.whl", hash = "sha256:43a0b8fd5a2928500110039e43a5eed8480b918967083ea48dc3ab9f13c4a7fb", size = 18638, upload-time = "2025-03-13T13:49:21.846Z" },
] ]
[[package]]
name = "tqdm"
version = "4.67.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "colorama", marker = "sys_platform == 'win32'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/09/a9/6ba95a270c6f1fbcd8dac228323f2777d886cb206987444e4bce66338dd4/tqdm-4.67.3.tar.gz", hash = "sha256:7d825f03f89244ef73f1d4ce193cb1774a8179fd96f31d7e1dcde62092b960bb", size = 169598, upload-time = "2026-02-03T17:35:53.048Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
]
[[package]] [[package]]
name = "typeguard" name = "typeguard"
version = "2.13.3" version = "2.13.3"