Skip to main content

Overview

run_data_extraction is the first step of the pipeline. It triggers the job-data-extraction-worker pipeline job, which downloads historical OHLCV (Open, High, Low, Close, Volume) market data from the configured data source and writes the result to blob storage. The tool blocks until the job completes and returns the output blob URL.

Parameters

config
object
required
Data extraction configuration.

Returns

{
  "status": "Succeeded",
  "output_url": "https://stmcpfabricdev.blob.core.windows.net/data/data_extractor_20260307_124937.json",
  "output_name": "data_extractor_20260307_124937.json",
  "execution_name": "job-data-extraction-worker-abc123xyz"
}
FieldDescription
statusJob terminal status (Succeeded)
output_urlFull HTTPS URL to the output blob — pass to run_feature_worker
output_nameBlob filename
execution_nameJob execution ID for audit/debugging

Example

{
  "config": {
    "backtest_params": {
      "Tickers": "custom",
      "Data_source": "yahoo",
      "Learning_start": "2020-01-01",
      "Learning_end": "2024-01-01",
      "Testing_end": "2026-03-01"
    },
    "custom": ["AAPL", "MSFT", "GOOGL", "NVDA"]
  }
}

Resources

ResourceValue
Container Apps Jobjob-data-extraction-worker
Container namejob-data-extraction-worker
Env var injectedCONFIG (JSON-serialized config)
Output blob prefixdata_extractor_
Timeout600 seconds

Next Step

Pass output_url to run_feature_worker as input_url.