Files
qai-cli/README.md

7.1 KiB
Raw Blame History

qc-cli

A CLI for Qualcomm's MLOps pipeline — browse and download models from Qualcomm AI Hub, fine-tune them on custom datasets using SageMaker, validate inference, and prepare artifacts for Qualcomm hardware deployment.

Requirements

  • Python 3.13+
  • uv
  • AWS account with credentials configured (aws configure) when using qc-cli infra
  • AWS CDK CLI (npm install -g aws-cdk) when using qc-cli infra setup or qc-cli infra destroy

Installation

git clone <repo>
cd qc-cli
uv sync

Run commands with uv run qc-cli <command> or activate the venv first:

source .venv/bin/activate
qc-cli --help

Quick start

# 1. Create config.yaml in the current directory
qc-cli init

# 2. Edit config.yaml — at minimum set sagemaker.training.image_uri

# 3. Provision AWS infrastructure (S3 bucket + SageMaker IAM role).
#    This is the step that requires the AWS CDK CLI.
qc-cli infra setup

# 4. Upload training data, then submit a SageMaker training job.
qc-cli upload ./my-dataset
qc-cli train start
qc-cli train status

Configuration

qc-cli init writes a config.yaml in the current directory. The fields you must fill in before using the tool:

infra:
  stack_name: qc-cli-mlops-1a2b3c4d5e6f

aws:
  region: us-east-1
  profile: default          # AWS CLI profile name

s3:
  bucket: qc-cli-mlops-1a2b3c4d5e6f-data

sagemaker:
  training:
    image_uri: ""           # ECR URI for your training container
    instance_type: ml.m5.xlarge
    instance_count: 1
    entry_point: null       # Optional: script inside source_dir
    source_dir: null        # Optional: local dir packaged and uploaded automatically
    hyperparameters: {}

qc-cli init generates the infra.stack_name and s3.bucket namespace once and writes it to config.yaml. Keep these values stable for a deployment; changing them points the CLI at different infrastructure.

The CLI isolates both application resources and CDK bootstrap resources. The application CloudFormation stack uses infra.stack_name, the S3 bucket uses the same generated namespace because bucket names are globally unique, and the SageMaker IAM role uses a CloudFormation-generated physical name. CDK bootstrap resources are derived internally from infra.stack_name, including a bootstrap stack named <stack_name>-bootstrap and a matching non-default CDK asset bucket qualifier. qc-cli infra destroy removes the application stack but leaves the CDK bootstrap stack in place; the command prints the retained bootstrap stack name.

hyperparameters is a flat map of values passed to the training container. Valid keys depend on the selected training image and entry point.

To provision an MLflow tracking server, set:

mlflow:
  mode: create
  experiment_name: qc-cli-training
  registered_model_name: qc-cli-model
  register_trained_models: true

In create mode, the CLI manages the tracking server name from infra.stack_name; you do not need to set tracking_server_name.

To use an existing MLflow tracking server, set:

mlflow:
  mode: existing
  tracking_server_name: your-tracking-server-name

Install the optional MLflow dependencies before enabling MLflow:

uv sync --extra mlflow

When MLflow is enabled, train start creates an MLflow run for the SageMaker job. train status finalizes that run once the job reaches a terminal state and registers completed model artifacts as pre-release model versions using the prerelease-latest MLflow alias.

To open the managed SageMaker MLflow UI, request a fresh presigned URL:

qc-cli infra mlflow-url --config config.yaml

This works for mode: create and for mode: existing when the existing server is managed by Amazon SageMaker. In create mode, the command uses the CLI-managed tracking server name. In existing mode, it uses mlflow.tracking_server_name. If the existing MLflow server is external to SageMaker, open it with that server's own URL instead.

Commands

init

qc-cli init                  Write config.yaml
qc-cli init --output <path>  Write config to a custom path
qc-cli init --force          Overwrite an existing config file

infra

qc-cli infra setup                         Deploy the CDK stack
qc-cli infra setup --no-bootstrap          Deploy without running CDK bootstrap
qc-cli infra setup --cloudformation-execution-policy <arn> Set CDK bootstrap execution policy ARN
qc-cli infra status                        Show CDK stack/resource status
qc-cli infra mlflow-url                    Print a presigned MLflow UI URL
qc-cli infra destroy                       Destroy stack, retaining S3 data
qc-cli infra destroy --yes                 Destroy stack without confirmation
qc-cli infra destroy --delete-bucket-data  Destroy stack and delete S3 data

--cloudformation-execution-policy is a one-time CDK bootstrap option, not a config.yaml setting. Pass it on infra setup when you need the CDK bootstrap CloudFormation execution role to use a policy other than the default AdministratorAccess:

qc-cli infra setup --cloudformation-execution-policy arn:aws:iam::aws:policy/PowerUserAccess

upload

qc-cli upload <file>                 Upload a single file to S3
qc-cli upload <dir>                  Upload all files in a directory tree to S3
qc-cli upload <file> --s3-key <key>  Upload a file to a custom S3 key

Uploads use s3.bucket and s3.data_prefix from config.yaml. File uploads default to s3://<bucket>/<data_prefix>/<filename>. Directory uploads are recursive, preserve paths relative to the uploaded directory, and place files under s3://<bucket>/<data_prefix>/.

train

qc-cli train start              Submit a SageMaker training job
qc-cli train status [job-name]  Show job status; defaults to the last submitted job
qc-cli train list               List recent training jobs
qc-cli train list --limit 3     Show a custom number of recent jobs

train start uses s3://<bucket>/<data_prefix>/ as the training channel and writes outputs under s3://<bucket>/<model_prefix>/. If sagemaker.training.source_dir is set, the CLI packages that directory, uploads it beside the job output prefix, and passes sagemaker_program/sagemaker_submit_directory to the SageMaker container.

The expected output artifact is SageMakers model.tar.gz, normally containing the trained model file your container writes to /opt/ml/model.

AWS permissions required

The IAM user or role running the CLI needs:

Action Service
CreateBucket, DeleteBucket, PutObject, GetObject, ListBucket, DeleteObject S3
CreateRole, GetRole, DeleteRole, AttachRolePolicy, DetachRolePolicy IAM
CreateStack, UpdateStack, DeleteStack, DescribeStacks, DescribeStackEvents CloudFormation
GetCallerIdentity STS
CreateTrainingJob, DescribeTrainingJob, ListTrainingJobs SageMaker AI
CreateMlflowTrackingServer, DescribeMlflowTrackingServer, DeleteMlflowTrackingServer SageMaker AI, when mlflow.mode is create or existing

AdministratorAccess covers all of the above.