Files
qai-cli/README.md
2026-05-25 16:48:31 -04:00

145 lines
4.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# qc-cli
A CLI for Qualcomm's MLOps pipeline — browse and download models from Qualcomm AI Hub, fine-tune them on custom datasets using SageMaker, validate inference, and prepare artifacts for Qualcomm hardware deployment.
## Requirements
- Python 3.13+
- [uv](https://docs.astral.sh/uv/getting-started/installation/)
- AWS account with credentials configured (`aws configure`) when using `qc-cli infra`
- AWS CDK CLI (`npm install -g aws-cdk`) when using `qc-cli infra setup` or `qc-cli infra destroy`
## Installation
```bash
git clone <repo>
cd qc-cli
uv sync
```
Run commands with `uv run qc-cli <command>` or activate the venv first:
```bash
source .venv/bin/activate
qc-cli --help
```
## Quick start
```bash
# 1. Create config.yaml in the current directory
qc-cli init
# 2. Edit config.yaml — at minimum set s3.bucket and sagemaker.training.image_uri
# 3. Provision AWS infrastructure (S3 bucket + SageMaker IAM role).
# This is the step that requires the AWS CDK CLI.
qc-cli infra setup
# 4. Upload training data, then submit a SageMaker training job.
qc-cli upload ./my-dataset
qc-cli train start
qc-cli train status
```
## Configuration
`qc-cli init` writes a `config.yaml` in the current directory. The fields you must fill in before using the tool:
```yaml
aws:
region: us-east-1
profile: default # AWS CLI profile name
s3:
bucket: your-unique-bucket-name
sagemaker:
role_name: qc-cli-sagemaker-role
training:
image_uri: "" # ECR URI for your training container
instance_type: ml.m5.xlarge
instance_count: 1
entry_point: null # Optional: script inside source_dir
source_dir: null # Optional: local dir packaged and uploaded automatically
hyperparameters: {}
```
`hyperparameters` is a flat map of values passed to the training container. Valid keys depend on the selected training image and entry point.
To provision an MLflow tracking server, set:
```yaml
mlflow:
mode: create
tracking_server_name: your-tracking-server-name
```
To use an existing MLflow tracking server, set:
```yaml
mlflow:
mode: existing
tracking_server_name: your-tracking-server-name
```
## Commands
### `init`
```
qc-cli init Write config.yaml
qc-cli init --output <path> Write config to a custom path
qc-cli init --force Overwrite an existing config file
```
### `infra`
```
qc-cli infra setup Deploy the CDK stack
qc-cli infra setup --no-bootstrap Deploy without running CDK bootstrap
qc-cli infra setup --cloudformation-execution-policy <arn> Set CDK bootstrap execution policy ARN
qc-cli infra status Show CDK stack/resource status
qc-cli infra destroy Destroy stack, retaining S3 data
qc-cli infra destroy --yes Destroy stack without confirmation
qc-cli infra destroy --delete-bucket-data Destroy stack and delete S3 data
```
### `upload`
```
qc-cli upload <file> Upload a single file to S3
qc-cli upload <dir> Upload all files in a directory tree to S3
qc-cli upload <file> --s3-key <key> Upload a file to a custom S3 key
```
Uploads use `s3.bucket` and `s3.data_prefix` from `config.yaml`. File uploads default to `s3://<bucket>/<data_prefix>/<filename>`. Directory uploads are recursive, preserve paths relative to the uploaded directory, and place files under `s3://<bucket>/<data_prefix>/`.
### `train`
```
qc-cli train start Submit a SageMaker training job
qc-cli train status [job-name] Show job status; defaults to the last submitted job
qc-cli train list List recent training jobs
qc-cli train list --limit 3 Show a custom number of recent jobs
```
`train start` uses `s3://<bucket>/<data_prefix>/` as the training channel and writes outputs under `s3://<bucket>/<model_prefix>/`. If `sagemaker.training.source_dir` is set, the CLI packages that directory, uploads it beside the job output prefix, and passes `sagemaker_program`/`sagemaker_submit_directory` to the SageMaker container.
The expected output artifact is SageMakers `model.tar.gz`, normally containing the trained model file your container writes to `/opt/ml/model`.
## AWS permissions required
The IAM user or role running the CLI needs:
| Action | Service |
|---|---|
| CreateBucket, DeleteBucket, PutObject, GetObject, ListBucket, DeleteObject | S3 |
| CreateRole, GetRole, DeleteRole, AttachRolePolicy, DetachRolePolicy | IAM |
| CreateStack, UpdateStack, DeleteStack, DescribeStacks, DescribeStackEvents | CloudFormation |
| GetCallerIdentity | STS |
| CreateTrainingJob, DescribeTrainingJob, ListTrainingJobs | SageMaker AI |
| CreateMlflowTrackingServer, DescribeMlflowTrackingServer, DeleteMlflowTrackingServer | SageMaker AI, when `mlflow.mode` is `create` or `existing` |
`AdministratorAccess` covers all of the above.