# qc-cli A CLI for Qualcomm's MLOps pipeline — browse and download models from Qualcomm AI Hub, fine-tune them on custom datasets using SageMaker, validate inference, and prepare artifacts for Qualcomm hardware deployment. ## Requirements - Python 3.13+ - [uv](https://docs.astral.sh/uv/getting-started/installation/) - AWS account with credentials configured (`aws configure`) when using `qc-cli infra` - AWS CDK CLI (`npm install -g aws-cdk`) when using `qc-cli infra setup` or `qc-cli infra destroy` ## Installation ```bash git clone cd qc-cli uv sync ``` Run commands with `uv run qc-cli ` or activate the venv first: ```bash source .venv/bin/activate qc-cli --help ``` ## Quick start ```bash # 1. Create config.yaml in the current directory qc-cli init # 2. Edit config.yaml — at minimum set sagemaker.training.image_uri # 3. Provision AWS infrastructure (S3 bucket + SageMaker IAM role). # This is the step that requires the AWS CDK CLI. qc-cli infra setup # 4. Upload training data, then submit a SageMaker training job. qc-cli upload ./my-dataset qc-cli train start qc-cli train status ``` ## Configuration `qc-cli init` writes a `config.yaml` in the current directory. The fields you must fill in before using the tool: ```yaml infra: stack_name: qc-cli-mlops-1a2b3c4d5e6f aws: region: us-east-1 profile: default # AWS CLI profile name s3: bucket: qc-cli-mlops-1a2b3c4d5e6f-data sagemaker: training: image_uri: "" # ECR URI for your training container instance_type: ml.m5.xlarge instance_count: 1 entry_point: null # Optional: script inside source_dir source_dir: null # Optional: local dir packaged and uploaded automatically hyperparameters: {} ``` `qc-cli init` generates the `infra.stack_name` and `s3.bucket` namespace once and writes it to `config.yaml`. Keep these values stable for a deployment; changing them points the CLI at different infrastructure. The CLI isolates both application resources and CDK bootstrap resources. The application CloudFormation stack uses `infra.stack_name`, the S3 bucket uses the same generated namespace because bucket names are globally unique, and the SageMaker IAM role uses a CloudFormation-generated physical name. CDK bootstrap resources are derived internally from `infra.stack_name`, including a bootstrap stack named `-bootstrap` and a matching non-default CDK asset bucket qualifier. `qc-cli infra destroy` removes the application stack but leaves the CDK bootstrap stack in place; the command prints the retained bootstrap stack name. `hyperparameters` is a flat map of values passed to the training container. Valid keys depend on the selected training image and entry point. To provision an MLflow tracking server, set: ```yaml mlflow: mode: create experiment_name: qc-cli-training registered_model_name: qc-cli-model register_trained_models: true ``` In `create` mode, the CLI manages the tracking server name from `infra.stack_name`; you do not need to set `tracking_server_name`. To use an existing MLflow tracking server, set: ```yaml mlflow: mode: existing tracking_server_name: your-tracking-server-name ``` When MLflow is enabled, `train start` creates an MLflow run for the SageMaker job. `train status` finalizes that run once the job reaches a terminal state and registers completed model artifacts as experiment model versions using the `experiment-latest` MLflow alias. An experiment version is an immutable trained-source artifact; it records that training produced a model, not that the model is better than earlier versions or ready for release. To open the managed SageMaker MLflow UI, request a fresh presigned URL: ```bash qc-cli infra mlflow-url --config config.yaml ``` This works for `mode: create` and for `mode: existing` when the existing server is managed by Amazon SageMaker. In `create` mode, the command uses the CLI-managed tracking server name. In `existing` mode, it uses `mlflow.tracking_server_name`. If the existing MLflow server is external to SageMaker, open it with that server's own URL instead. ## Commands ### `init` ``` qc-cli init Write config.yaml qc-cli init --output Write config to a custom path qc-cli init --force Overwrite an existing config file ``` ### `infra` ``` qc-cli infra setup Deploy the CDK stack qc-cli infra setup --no-bootstrap Deploy without running CDK bootstrap qc-cli infra setup --cloudformation-execution-policy Set CDK bootstrap execution policy ARN qc-cli infra status Show CDK stack/resource status qc-cli infra mlflow-url Print a presigned MLflow UI URL qc-cli infra destroy Destroy stack, retaining S3 data qc-cli infra destroy --yes Destroy stack without confirmation qc-cli infra destroy --delete-bucket-data Destroy stack and delete S3 data ``` `--cloudformation-execution-policy` is a one-time CDK bootstrap option, not a `config.yaml` setting. Pass it on `infra setup` when you need the CDK bootstrap CloudFormation execution role to use a policy other than the default `AdministratorAccess`: ```bash qc-cli infra setup --cloudformation-execution-policy arn:aws:iam::aws:policy/PowerUserAccess ``` ### `upload` ``` qc-cli upload Upload a single file to S3 qc-cli upload Upload all files in a directory tree to S3 qc-cli upload --s3-key Upload a file to a custom S3 key ``` Uploads use `s3.bucket` and `s3.data_prefix` from `config.yaml`. File uploads default to `s3:////`. Directory uploads are recursive, preserve paths relative to the uploaded directory, and place files under `s3:////`. ### `train` ``` qc-cli train start Submit a SageMaker training job qc-cli train status [job-name] Show job status; defaults to the last submitted job qc-cli train list List recent training jobs qc-cli train list --limit 3 Show a custom number of recent jobs ``` `train start` uses `s3:////` as the training channel and writes outputs under `s3:////`. If `sagemaker.training.source_dir` is set, the CLI packages that directory, uploads it beside the job output prefix, and passes `sagemaker_program`/`sagemaker_submit_directory` to the SageMaker container. The expected output artifact is SageMaker’s `model.tar.gz`, normally containing the trained model file your container writes to `/opt/ml/model`. ## Model lifecycle The CLI uses neutral experiment naming for trained artifacts and reserves release terminology for an explicit promotion step. Current behavior: 1. `qc-cli train start` submits a SageMaker training job. 2. `qc-cli train status` finalizes the MLflow run after the job reaches a terminal state. 3. If the job completed and `mlflow.register_trained_models` is enabled, the SageMaker `model.tar.gz` is registered as a new MLflow model version with: - `qc_cli.stage=experiment` - `qc_cli.artifact_kind=trained_source` - `qc_cli.source=sagemaker` 4. The MLflow alias `experiment-latest` points at the most recently registered experiment version. Planned AI Hub extension: 1. AI Hub compile or quantize will create deployable derived artifacts from a trained-source experiment. 2. Derived artifacts will keep lineage back to the source experiment version instead of replacing it. 3. Release aliases such as `v1` or `production` will point at the selected deployable artifact. Example future metadata: ```text qc-cli-model version 12 qc_cli.stage=experiment qc_cli.artifact_kind=trained_source qc_cli.source=sagemaker qc-cli-model-aihub version 3 qc_cli.stage=ai_hub_compiled qc_cli.artifact_kind=deployable qc_cli.parent_registered_model_name=qc-cli-model qc_cli.parent_model_version=12 qc_cli.runtime=tflite qc_cli.quantization=int8 qc_cli.target_device=Samsung Galaxy S25 ``` In that flow, `experiment-latest` remains a training convenience alias. Release selection is a separate promotion decision based on the derived artifact, not on the experiment name. ## AWS permissions required The IAM user or role running the CLI needs: | Action | Service | |---|---| | CreateBucket, DeleteBucket, PutObject, GetObject, ListBucket, DeleteObject | S3 | | CreateRole, GetRole, DeleteRole, AttachRolePolicy, DetachRolePolicy | IAM | | CreateStack, UpdateStack, DeleteStack, DescribeStacks, DescribeStackEvents | CloudFormation | | GetCallerIdentity | STS | | CreateTrainingJob, DescribeTrainingJob, ListTrainingJobs | SageMaker AI | | CreateMlflowTrackingServer, DescribeMlflowTrackingServer, DeleteMlflowTrackingServer | SageMaker AI, when `mlflow.mode` is `create` or `existing` | `AdministratorAccess` covers all of the above.