LightGBM

LightGBM (Light Gradient Boosting Machine) is a powerful, efficient, and fast machine-learning framework developed by Microsoft. While it is commonly used for classification and regression tasks, it can also be effectively applied to time series forecasting using supervised learning techniques.

Why Use LightGBM for Time Series Forecasting?

✅ Fast Training & Low Memory Usage – Works well with large datasets.

✅ Handles Missing Data – No need for complex imputation strategies.

✅ Feature Importance Analysis – Helps understand key factors affecting predictions.

✅ Parallel & GPU Training Support – Faster computation for big datasets.

Key Parameters in LightGBM

1. Core Model Parameters

These parameters define the structure and behavior of the LightGBM model.

Parameter	Default	Description
boosting_type	`"gbdt"`	Type of boosting method. Options: `"gbdt"` (Gradient Boosting, default),`"dart"`, `"goss"`, `"rf"`.
n_estimators	`100`	Number of boosting rounds (higher values can improve accuracy but increase training time).
learning_rate	`0.1`	Step size for updating models. Lower values make training stable but require more iterations.
max_depth	`-1`	Maximum depth of trees (`-1` means no limit). Higher values allow more splits but may cause overfitting.
num_leaves	`31`	Number of leaves per tree. Higher values allow more complexity, but too high can lead to overfitting.

2. Regularization Parameters

These parameters help control overfitting and improve generalization.

Parameter	Default	Description
min_data_in_leaf	`20`	Minimum number of data points required in a leaf node (higher values prevent overfitting).
lambda_l1	`0.0`	L1 regularization (Lasso). Helps make the model sparse by setting some weights to zero.
lambda_l2	`0.0`	L2 regularization (Ridge). Helps reduce overfitting by shrinking model coefficients.
bagging_fraction	`1.0`	Fraction of training data used per iteration (e.g., `0.8` means 80% of data is used, reducing overfitting).
bagging_freq	`0`	Frequency of applying bagging (subsampling). `0` disables bagging, `1` enables it every iteration.
feature_fraction	`1.0`	Fraction of features (columns) used in each boosting iteration (e.g.,`0.8` means using 80% of features per tree).

3. Speed & Efficiency Parameters

These parameters help optimize training speed and memory usage.

Parameter	Default	Description
max_bin	`255`	Number of bins used to discretize continuous features. Increasing this improves accuracy but uses more memory.
num_threads	`-1`	Number of CPU cores to use (`-1` means use all available cores).
device	`"cpu"`	Set to `"gpu"` for faster training on large datasets.

LightGBM Default Parameters:

Param Name	Description	Default Value
BOOSTING-TYPE	Determines the boosting type.	gbdt
N-ESTIMATORS	The number of boosting iterations (trees).	100
OBJECTIVE	The objective function to optimize.	regression
METRIC	The metric used for evaluation.	l2
VERBOSITY	Controls the verbosity of the training output.	1
LEARNING-RATE	Step size shrinking to prevent overfitting.	0.1
NUM-LEAVES	The number of leaves in one tree.	31
MAX-DEPTH	The maximum depth of trees.	-1
MIN-DATA-IN-LEAF	Minimum number of data in each leaf.	20
FEATURE-FRACTION	Fraction of features to consider for each tree.	1.0
BAGGING-FRACTION	Fraction of data to use for each iteration.	1.0
BAGGING-FREQ	Frequency of bagging, in boosting rounds.	0
LAMBDA-L1	L1 regularization term.	0.0
LAMBDA-L2	L2 regularization term.	0.0
MIN-GAIN-TO-SPLIT	Minimum gain to make a further split.	0.0