Safety Stock Calculation V2
Overview
Both XGBoost and Random Forest implementations follow identical data preparation, feature engineering, and safety stock calculation steps.The only difference is the underlying machine learning model used during training.This document covers the unified flow applicable to both implementations.
Step 1 — Input preparation
Convert Inputs & Build DataFrame
The method receives four parameters:
| Parameter | Description |
|---|---|
| HistoryData | Past demand values per time period (e.g. monthly units sold) |
| ForecastData | The known future demand — used as the model's target (label) during training |
| LeadTime | Time taken to receive stock for each period — used to measure supply risk |
| ServiceFactor | A multiplier reflecting desired service level (e.g. 1.65 for 95%) — scales the safety buffer |
All input arrays are converted into NumPy arrays and consolidated into a single Pandas DataFrame, with each row representing a distinct time period.
Step 2 — Feature Engineering
Two additional features are derived from the raw inputs prior to model training. These features provide the model with contextual information beyond the original input values.
2a. Seasonality Index
Measures where each period's demand sits relative to the overall average. This allows the model to understand peak and trough patterns across time.
Seasonality Index Formula
seasonality_index[i] = historical_demand[i] / mean(historical_demand)
A value > 1.0 means that period had above-average demand (peak). A value < 1.0 means below-average demand (trough). A value of exactly 1.0 is an average period.
2b. Supply Variability
Measures how unstable or erratic lead times have been over recent periods using a rolling 3-period window. This captures supply-side risk.
Supply Variability Formula
supply_variability[i] = std(lead_time[i-2], lead_time[i-1], lead_time[i])
A rolling standard deviation over a window of 3 periods (min_periods=1). Periods with consistent lead time yield a low value. Volatile or unpredictable supply yields a high value. NaN values are filled with 0.
2c. Final Feature Set
The four features passed into the model are:
| Feature | Source | What it tells the model |
|---|---|---|
| historical_demand | Raw input | Actual past demand per period |
| lead_time | Raw input | Time to receive stock per period |
| seasonality_index | Derived (Step 2a) | Relative demand position vs. average |
| supply_variability | Derived (Step 2b) | Instability in lead times recently |
The target variable (y) is future_demand — what actually occurred during the forecast horizon.
Step 3 — Train / Test Split & Model Training
The dataset is split into 80% training and 20% test sets using a fixed random seed (random_state=42) to ensure reproducible results. Both models are configured with the same hyperparameters to keep results comparable:
| Hyperparameter | XGBoost | Random Forest |
|---|---|---|
| n_estimators | 200 | 200 |
| max_depth | 6 | 10 |
| learning_rate | 0.1 | N/A (not applicable) |
| objective / criterion | reg:squarederror | squared_error |
| n_jobs | -1 (all cores) | -1 (all cores) |
| random_state | 42 | 42 |
| min_samples_leaf | N/A (not applicable) | 2 |
Key model difference: XGBoost builds trees sequentially — each tree corrects the errors of the previous one (boosting). Random Forest builds trees independently and in parallel, averaging their predictions (bagging). Both minimize squared error for regression.
Step 4 — Error Evaluation
Evaluate Model Accuracy on Test Set
After training, the model predicts demand on the held-out test set (20%). The predictions are compared to actual values using CalculateErrors(), a base class method that computes standard regression error metrics: Example: • MAE — Mean Absolute Error: average magnitude of prediction errors • RMSE — Root Mean Squared Error: penalises large prediction errors more heavily These error metrics are stored in SafetyStockResult.Error for reporting and comparison purposes. They do not affect the safety stock calculation itself — they are diagnostic outputs only.
Step 5 — Full Dataset Prediction
Predict Demand Across All Periods
Unlike the error evaluation step (which uses only the test split), the model now predicts demand for the entire dataset (X) — all historical periods. This gives a predicted demand value for every row in the DataFrame. The result is stored as a new column: df['predicted_demand']
⚠ Note: Because the model is trained on future_demand as the label and then predicts over all rows including training data, the predicted values on training rows will tend to be very accurate (low error). The meaningful signal comes from test rows and the gap between prediction and history.
Step 6 — Safety Stock Calculation
Compute Final Safety Stock Per Period
The safety stock for each period is calculated using the absolute gap between what the model predicted and what historically occurred, scaled by the service level factor:
Safety Stock Formula
safety_stock[i] = | predicted_demand[i] - historical_demand[i] | × ServiceFactor
The absolute difference represents the demand uncertainty or forecast error for that period. Multiplying by ServiceFactor (e.g. 1.65 for 95% service level) scales the buffer up or down based on the business's risk tolerance. A higher ServiceFactor means more buffer stock.
In plain terms: "How wrong could we be about demand? Buffer by exactly that much, scaled to how much risk we are willing to accept."
Step 7 — Result Assembly & Output
Build SafetyStockResult and Return as JSON
The final result object contains three fields:
| Field | Content |
|---|---|
| Data | Series of safety stock values — one per time period |
| Error | Error metrics from the test set evaluation (MAE, RMSE, etc.) |
| Status | "SUCCESS" if calculation completed without exception |
The result is serialised via .ToJSON() and returned to the caller.
Process Flowchart
End-to-end flow of the Safety Stock calculation — applies identically to both XGBoost and Random Forest implementations.
┌───────────────────────────────┐
│ START │
└──────────────┬────────────────┘
│
▼
┌───────────────────────────────┐
│ INPUT PREPARATION │
│ HistoricalData, ForecastData │
│ LeadTime, ServiceFactor │
└──────────────┬────────────────┘
│
▼
┌───────────────────────────────┐
│ DATA PREPARATION │
│ NumPy arrays → DataFrame │
└──────────────┬────────────────┘
│
▼
┌───────────────────────────────┐
│ FEATURE ENGINEERING │
│ Seasonality: demand / mean │
│ Variability: rolling std │
└──────────────┬────────────────┘
│
▼
┌───────────────────────────────┐
│ MODEL TRAINING │
│ (80% Train / 20% Test) │
│ XGBoost / Random Forest │
└──────────────┬────────────────┘
│
▼
┌───────────────────────────────┐
│ ERROR EVALUATION │
│ Store in Result.Error │
└──────────────┬────────────────┘
│
▼
┌───────────────────────────────┐
│ FULL PREDICTION │
│ Predict full dataset │
└──────────────┬────────────────┘
│
▼
┌───────────────────────────────┐
│ SAFETY STOCK CALCULATION │
│ |pred - hist| × Factor │
└──────────────┬────────────────┘
│
▼
┌───────────────────────────────┐
│ RESULT PREPARATION │
│ Data + Error + SUCCESS │
└──────────────┬────────────────┘
│
▼
┌───────────────────────────────┐
│ END │
│ Return ToJSON() │
└───────────────────────────────┘