Safety Stock Calculation V2

Overview

Both XGBoost and Random Forest implementations follow identical data preparation, feature engineering, and safety stock calculation steps.The only difference is the underlying machine learning model used during training.This document covers the unified flow applicable to both implementations.

Step 1 — Input preparation

Convert Inputs & Build DataFrame

The method receives four parameters:

Parameter Description
HistoryData Past demand values per time period (e.g. monthly units sold)
ForecastData The known future demand — used as the model's target (label) during training
LeadTime Time taken to receive stock for each period — used to measure supply risk
ServiceFactor A multiplier reflecting desired service level (e.g. 1.65 for 95%) — scales the safety buffer

All input arrays are converted into NumPy arrays and consolidated into a single Pandas DataFrame, with each row representing a distinct time period.

Step 2 — Feature Engineering

Two additional features are derived from the raw inputs prior to model training. These features provide the model with contextual information beyond the original input values.

2a. Seasonality Index

Measures where each period's demand sits relative to the overall average. This allows the model to understand peak and trough patterns across time.

Seasonality Index Formula

seasonality_index[i] = historical_demand[i] / mean(historical_demand)

A value > 1.0 means that period had above-average demand (peak). A value < 1.0 means below-average demand (trough). A value of exactly 1.0 is an average period.

2b. Supply Variability

Measures how unstable or erratic lead times have been over recent periods using a rolling 3-period window. This captures supply-side risk.

Supply Variability Formula

supply_variability[i] = std(lead_time[i-2], lead_time[i-1], lead_time[i])

A rolling standard deviation over a window of 3 periods (min_periods=1). Periods with consistent lead time yield a low value. Volatile or unpredictable supply yields a high value. NaN values are filled with 0.

2c. Final Feature Set

The four features passed into the model are:

Feature Source What it tells the model
historical_demand Raw input Actual past demand per period
lead_time Raw input Time to receive stock per period
seasonality_index Derived (Step 2a) Relative demand position vs. average
supply_variability Derived (Step 2b) Instability in lead times recently

The target variable (y) is future_demand — what actually occurred during the forecast horizon.

Step 3 — Train / Test Split & Model Training

The dataset is split into 80% training and 20% test sets using a fixed random seed (random_state=42) to ensure reproducible results. Both models are configured with the same hyperparameters to keep results comparable:

Hyperparameter XGBoost Random Forest
n_estimators 200 200
max_depth 6 10
learning_rate 0.1 N/A (not applicable)
objective / criterion reg:squarederror squared_error
n_jobs -1 (all cores) -1 (all cores)
random_state 42 42
min_samples_leaf N/A (not applicable) 2

Key model difference: XGBoost builds trees sequentially — each tree corrects the errors of the previous one (boosting). Random Forest builds trees independently and in parallel, averaging their predictions (bagging). Both minimize squared error for regression.

Step 4 — Error Evaluation

Evaluate Model Accuracy on Test Set

After training, the model predicts demand on the held-out test set (20%). The predictions are compared to actual values using CalculateErrors(), a base class method that computes standard regression error metrics: Example: • MAE — Mean Absolute Error: average magnitude of prediction errors • RMSE — Root Mean Squared Error: penalises large prediction errors more heavily These error metrics are stored in SafetyStockResult.Error for reporting and comparison purposes. They do not affect the safety stock calculation itself — they are diagnostic outputs only.

Step 5 — Full Dataset Prediction

Predict Demand Across All Periods

Unlike the error evaluation step (which uses only the test split), the model now predicts demand for the entire dataset (X) — all historical periods. This gives a predicted demand value for every row in the DataFrame. The result is stored as a new column: df['predicted_demand']

⚠ Note: Because the model is trained on future_demand as the label and then predicts over all rows including training data, the predicted values on training rows will tend to be very accurate (low error). The meaningful signal comes from test rows and the gap between prediction and history.

Step 6 — Safety Stock Calculation

Compute Final Safety Stock Per Period

The safety stock for each period is calculated using the absolute gap between what the model predicted and what historically occurred, scaled by the service level factor:

Safety Stock Formula

safety_stock[i] = | predicted_demand[i] - historical_demand[i] | × ServiceFactor

The absolute difference represents the demand uncertainty or forecast error for that period. Multiplying by ServiceFactor (e.g. 1.65 for 95% service level) scales the buffer up or down based on the business's risk tolerance. A higher ServiceFactor means more buffer stock.

In plain terms: "How wrong could we be about demand? Buffer by exactly that much, scaled to how much risk we are willing to accept."

Step 7 — Result Assembly & Output

Build SafetyStockResult and Return as JSON

The final result object contains three fields:

Field Content
Data Series of safety stock values — one per time period
Error Error metrics from the test set evaluation (MAE, RMSE, etc.)
Status "SUCCESS" if calculation completed without exception

The result is serialised via .ToJSON() and returned to the caller.

Process Flowchart

End-to-end flow of the Safety Stock calculation — applies identically to both XGBoost and Random Forest implementations.

         ┌───────────────────────────────┐
         │            START              │
         └──────────────┬────────────────┘
                        │
                        ▼
         ┌───────────────────────────────┐
         │      INPUT PREPARATION        │
         │  HistoricalData, ForecastData │
         │  LeadTime, ServiceFactor      │
         └──────────────┬────────────────┘
                        │
                        ▼
         ┌───────────────────────────────┐
         │       DATA PREPARATION        │
         │   NumPy arrays → DataFrame    │
         └──────────────┬────────────────┘
                        │
                        ▼
         ┌───────────────────────────────┐
         │     FEATURE ENGINEERING       │
         │  Seasonality: demand / mean   │
         │  Variability: rolling std     │
         └──────────────┬────────────────┘
                        │
                        ▼
         ┌───────────────────────────────┐
         │        MODEL TRAINING         │
         │     (80% Train / 20% Test)    │
         │   XGBoost / Random Forest     │
         └──────────────┬────────────────┘
                        │
                        ▼
         ┌───────────────────────────────┐
         │      ERROR EVALUATION         │
         │   Store in Result.Error       │
         └──────────────┬────────────────┘
                        │
                        ▼
         ┌───────────────────────────────┐
         │      FULL PREDICTION          │
         │   Predict full dataset        │
         └──────────────┬────────────────┘
                        │
                        ▼
         ┌───────────────────────────────┐
         │  SAFETY STOCK CALCULATION     │
         │  |pred - hist| × Factor       │
         └──────────────┬────────────────┘
                        │
                        ▼
         ┌───────────────────────────────┐
         │     RESULT PREPARATION        │
         │   Data + Error + SUCCESS      │
         └──────────────┬────────────────┘
                        │
                        ▼
         ┌───────────────────────────────┐
         │             END               │
         │        Return ToJSON()        │
         └───────────────────────────────┘