This commit is contained in:
2026-06-12 12:21:41 -04:00
parent 5211d0af14
commit 4c33a016f0
4 changed files with 61 additions and 55 deletions

View File

@@ -105,7 +105,11 @@ mlflow:
tracking_server_name: your-tracking-server-name
```
When MLflow is enabled, `train start` creates an MLflow run for the SageMaker job. `train status` finalizes that run once the job reaches a terminal state and registers completed model artifacts as experiment model versions using the `experiment-latest` MLflow alias. An experiment version is an immutable trained-source artifact; it records that training produced a model, not that the model is better than earlier versions or ready for release.
When MLflow is enabled, `train start` creates an MLflow run for the SageMaker job. Metric upload through
`train start --upload-metrics` or `mlflow upload-metrics` finalizes that run and registers completed model artifacts
as experiment model versions using the `experiment-latest` MLflow alias. `train status` reads SageMaker status only.
An experiment version is an immutable trained-source artifact; it records that training produced a model, not that
the model is better than earlier versions or ready for release.
To open the managed SageMaker MLflow UI, request a fresh presigned URL:
@@ -224,10 +228,10 @@ The CLI uses neutral experiment naming for trained artifacts and reserves releas
Current behavior:
1. `qc-cli train start` submits a SageMaker training job.
2. `qc-cli train status` finalizes the MLflow run and registers completed model artifacts.
2. `qc-cli train status` reads and displays SageMaker status only; it does not contact MLflow.
3. `qc-cli train start --upload-metrics` polls every 30 seconds by default, then uploads per-epoch metrics after completion.
4. `qc-cli mlflow upload-metrics [job-name]` uploads or retries metrics for an existing completed job.
5. If the job completed and `mlflow.register_trained_models` is enabled, the SageMaker `model.tar.gz` is registered as a new MLflow model version with:
5. The metrics upload workflow finalizes the MLflow run and, when `mlflow.register_trained_models` is enabled, registers the SageMaker `model.tar.gz` as a new MLflow model version with:
- `qc_cli.stage=experiment`
- `qc_cli.artifact_kind=trained_source`
- `qc_cli.source=sagemaker`