Pipeline - QuantSpace

Pipeline Overview

QuantSpace orchestrates a 5-stage core ML trading pipeline plus 3 reporting/analytics tools. Each tool runs as a container job, writes output to blob storage, and returns an output_url.

Stages 3a and 3b are interchangeable — both run_ml_job and run_dl_job produce a URL compatible with run_po_job. run_plot_job can visualize outputs from multiple stages. run_st_job and run_risk_job consume portfolio optimization output.

Stage Details

#	Tool	Job	Output prefix	What it does
1	`run_data_extraction`	`job-data-extraction-worker`	`data_extractor_`	Downloads OHLCV data (Yahoo Finance or Polygon/Limex)
2	`run_feature_worker`	`job-feature-worker`	`feature_engine_`	Computes technical indicators — RSI, MACD, Bollinger Bands via TA-Lib
3a	`run_ml_job`	`ml-worker-job`	`ml_engine_`	Trains a scikit-learn model (e.g. RandomForest) and generates predictions. Requires `feature_url` + `data_extractor_url`
3b	`run_dl_job`	`dl-job`	`nn_engine_`	Trains a PyTorch neural network and generates predictions. Requires `feature_url` + `data_extractor_url`
4	`run_po_job`	`po-job`	`portfolio_optimization_`	Optimizes portfolio weights using ML/DL predictions (e.g. Mean-Variance, HRP)
5	`run_trading_job`	`trading-worker`	`trading_report_`	Runs a VectorBT backtest from portfolio weights, returns performance metrics
6	`run_plot_job`	`job-plot-worker`	`plot_`	Renders chart HTML from JSON output of previous stages (prices, features, ML, or portfolio data)
7	`run_st_job`	`job-st-worker`	`stress_test_`	Generates stress-test HTML report from portfolio optimization output
8	`run_risk_job`	`job-risk-worker`	`risk_`	Generates risk analytics artifacts (HTML/PNG/CSV) from portfolio optimization output

Job Execution Model

Every tool follows the same execution model:

Fetch job config

The server reads the job definition to get the current image and existing environment variables.

Merge env overrides

Tool parameters are serialized to JSON and merged into the job’s env vars.

Start the job

The job is triggered and the server blocks until the execution name is returned.

Poll for completion

The server polls every 15 seconds until the execution reaches a terminal state (Succeeded, Failed, Stopped, Degraded).

Find output blob

After success, the freshly written output file is located in blob storage.

Return result

Returns { status, output_url, output_name, execution_name }.

Timeout: 600 seconds (10 minutes) per job.

Blob Storage

All blobs are stored in the data container. Naming pattern: <prefix><YYYYMMDD_HHMMSS>.<ext>.

Stage	Prefix	Example
Data Extraction	`data_extractor_`	`data_extractor_20260307_124937.json`
Feature Engineering	`feature_engine_`	`feature_engine_20260307_125103.json`
ML	`ml_engine_`	`ml_engine_20260307_125841.json`
Deep Learning	`nn_engine_`	`nn_engine_20260307_125841.json`
Portfolio Optimization	`portfolio_optimization_`	`portfolio_optimization_20260307_130215.json`
Trading	`trading_report_`	`trading_report_20260307_130504.json`
Plot	`plot_`	`plot_20260322_173012.html`
Stress Test	`stress_test_`	`stress_test_gaussian_20260322_184723.html`
Risk	`risk_`	`risk_dynamic_weights_20260323_093145.html`

To skip a completed stage, pass its existing blob URL directly as input_url to the next tool. Browse the data container in Storage Explorer to find intermediate results.

Tracing

Every tool call emits a custom telemetry event:

Property	Value
`tool_name`	e.g. `run_ml_job`
`user_oid`	User object ID from JWT
`user_name`	Display name or `preferred_username`
`success`	`"True"` or `"False"`
`duration_ms`	Wall-clock time of the job execution

Configure with APPINSIGHTS_CONNECTION_STRING in your environment.

​Pipeline Overview

​Stage Details

​Job Execution Model

​Blob Storage

​Tracing

Pipeline Overview

Stage Details

Job Execution Model

Blob Storage

Tracing