run_ml_job - QuantSpace

Overview

run_ml_job is Stage 3a of the pipeline (alternative to run_dl_job). It triggers the ml-worker-job pipeline job, which trains a classical machine learning model using scikit-learn on the feature-engineered dataset and generates per-ticker price predictions. The tool blocks until the job completes and returns the output blob URL.

Parameters

feature_url

string

required

Blob storage URL pointing to a feature_engine_*.json file.This is the output_url returned by run_feature_worker.

data_extractor_url

string

required

Blob storage URL pointing to a data_extractor_*.json file.This is the output_url returned by run_data_extraction. Required alongside feature_url for proper train/test date alignment.

config

object

required

ML model configuration.

Show config fields

ML params

object

required

Model selection and cross-validation settings.

Show ML params fields

model

string

required

Model type. Accepted values (case-insensitive, spaces/dashes normalized):

Value	Algorithm
`"random_forest"` / `"randomforest"` / `"rf"`	Random Forest
`"xgboost"` / `"xgb"`	XGBoost
`"lightgbm"` / `"lgbm"`	LightGBM
`"catboost"`	CatBoost
`"linear"` / `"linear_regression"`	Linear Regression
`"ridge"`	Ridge Regression
`"lasso"`	Lasso Regression
`"elastic_net"` / `"elasticnet"`	Elastic Net
`"quantile"`	Quantile Regression

Unknown values fall back to "random_forest".

n_estimators

integer

Number of estimators for tree-based models (Random Forest, XGBoost, LightGBM, CatBoost). Default: 100.

test_size

integer

Number of samples in the test fold for cross-validation. Pass an integer (e.g. 15 = 15 days). Default: 15.

If you pass a float less than 1 (e.g. 0.2), it is automatically converted to the default 15. Always use an integer.

n_splits

integer

Number of cross-validation splits (time-series walk-forward). Default: 15.

Returns

{
  "status": "Succeeded",
  "output_url": "https://stmcpfabricdev.blob.core.windows.net/data/ml_engine_20260307_125841.json",
  "output_name": "ml_engine_20260307_125841.json",
  "execution_name": "ml-worker-job-abc123xyz"
}

Field	Description
`status`	Job terminal status (`Succeeded`)
`output_url`	Full HTTPS URL to the output blob — pass to `run_po_job`
`output_name`	Blob filename
`execution_name`	Job execution ID for audit/debugging

Example

{
  "feature_url": "https://stmcpfabricdev.blob.core.windows.net/data/feature_engine_20260307_125103.json",
  "data_extractor_url": "https://stmcpfabricdev.blob.core.windows.net/data/data_extractor_20260307_124937.json",
  "config": {
    "ML params": {
      "model": "random_forest",
      "n_estimators": 100,
      "test_size": 15,
      "n_splits": 15
    }
  }
}

Resources

Resource	Value
Container Apps Job	`ml-worker-job`
Container name	`ml-worker-job`
Env vars injected	`FEATURE_URL`, `DATA_EXTRACTOR_URL`, `CONFIG`
Output blob prefix	`ml_engine_`
Timeout	600 seconds

Next Step

Pass output_url to run_po_job as input_url.

If you prefer a neural network approach, use run_dl_job instead — it produces an nn_engine_*.json blob that is equally compatible with run_po_job.

​Overview

​Parameters

​Returns

​Example

​Resources

​Next Step

Overview

Parameters

Returns

Example

Resources

Next Step