Skip to main content

Pipeline Overview

QuantSpace orchestrates a 5-stage core ML trading pipeline plus 3 reporting/analytics tools. Each tool runs as a container job, writes output to blob storage, and returns an output_url.
Stages 3a and 3b are interchangeable — both run_ml_job and run_dl_job produce a URL compatible with run_po_job. run_plot_job can visualize outputs from multiple stages. run_st_job and run_risk_job consume portfolio optimization output.

Stage Details

#ToolJobOutput prefixWhat it does
1run_data_extractionjob-data-extraction-workerdata_extractor_Downloads OHLCV data (Yahoo Finance or Polygon/Limex)
2run_feature_workerjob-feature-workerfeature_engine_Computes technical indicators — RSI, MACD, Bollinger Bands via TA-Lib
3arun_ml_jobml-worker-jobml_engine_Trains a scikit-learn model (e.g. RandomForest) and generates predictions. Requires feature_url + data_extractor_url
3brun_dl_jobdl-jobnn_engine_Trains a PyTorch neural network and generates predictions. Requires feature_url + data_extractor_url
4run_po_jobpo-jobportfolio_optimization_Optimizes portfolio weights using ML/DL predictions (e.g. Mean-Variance, HRP)
5run_trading_jobtrading-workertrading_report_Runs a VectorBT backtest from portfolio weights, returns performance metrics
6run_plot_jobjob-plot-workerplot_Renders chart HTML from JSON output of previous stages (prices, features, ML, or portfolio data)
7run_st_jobjob-st-workerstress_test_Generates stress-test HTML report from portfolio optimization output
8run_risk_jobjob-risk-workerrisk_Generates risk analytics artifacts (HTML/PNG/CSV) from portfolio optimization output

Job Execution Model

Every tool follows the same execution model:
1

Fetch job config

The server reads the job definition to get the current image and existing environment variables.
2

Merge env overrides

Tool parameters are serialized to JSON and merged into the job’s env vars.
3

Start the job

The job is triggered and the server blocks until the execution name is returned.
4

Poll for completion

The server polls every 15 seconds until the execution reaches a terminal state (Succeeded, Failed, Stopped, Degraded).
5

Find output blob

After success, the freshly written output file is located in blob storage.
6

Return result

Returns { status, output_url, output_name, execution_name }.
Timeout: 600 seconds (10 minutes) per job.

Blob Storage

All blobs are stored in the data container. Naming pattern: <prefix><YYYYMMDD_HHMMSS>.<ext>.
StagePrefixExample
Data Extractiondata_extractor_data_extractor_20260307_124937.json
Feature Engineeringfeature_engine_feature_engine_20260307_125103.json
MLml_engine_ml_engine_20260307_125841.json
Deep Learningnn_engine_nn_engine_20260307_125841.json
Portfolio Optimizationportfolio_optimization_portfolio_optimization_20260307_130215.json
Tradingtrading_report_trading_report_20260307_130504.json
Plotplot_plot_20260322_173012.html
Stress Teststress_test_stress_test_gaussian_20260322_184723.html
Riskrisk_risk_dynamic_weights_20260323_093145.html
To skip a completed stage, pass its existing blob URL directly as input_url to the next tool. Browse the data container in Storage Explorer to find intermediate results.

Tracing

Every tool call emits a custom telemetry event:
PropertyValue
tool_namee.g. run_ml_job
user_oidUser object ID from JWT
user_nameDisplay name or preferred_username
success"True" or "False"
duration_msWall-clock time of the job execution
Configure with APPINSIGHTS_CONNECTION_STRING in your environment.