ARIMA_FORECAST
ARIMA Model
ARIMA stands for AutoRegressive Integrated Moving Average and is a widely used statistical method for time series forecasting. It is specified by three order parameters: (p, d, q), which define the structure and behavior of the model.
Parameters of ARIMA
-
AR(p) - Autoregression
- The autoregressive component models the relationship between a current observation and its past values.
- It uses a regression equation where the current value of the time series is expressed as a linear combination of its previous values.
- The parameter p represents the number of lagged observations included in the model.
-
I(d) - Integration
- The integration component involves differencing the time series to make it stationary. Stationarity ensures that the statistical properties of the series (mean, variance, etc.) remain constant over time.
- Differencing is the process of subtracting the current value of the series from its previous value. This is done d times to remove trends or seasonality.
-
MA(q) - Moving Average
- The moving average component models the relationship between the current observation and the residual errors from a moving average model applied to lagged observations.
- It represents the error of the model as a combination of previous error terms.
- The parameter q represents the number of lagged error terms included in the model.
Possible Values of p, d, q in ARIMA (R Implementation)
When implementing ARIMA in R, the values of p, d, and q are chosen based on the characteristics of the time series data. Here’s a guide to selecting these parameters:
-
p (Autoregressive Order)
- Represents the number of lagged observations to include in the model.
- Possible values: Non-negative integers (0, 1, 2, 3, ...).
- A value of 0 means no autoregressive component is used.
- Higher values of p capture more complex dependencies in the data but may lead to overfitting.
-
d (Degree of Differencing)
- Represents the number of times the time series is differenced to achieve stationarity.
- Possible values: Non-negative integers (0, 1, 2, 3, ...).
- A value of 0 means no differencing is applied.
- A value of 1 is commonly used to remove linear trends, while 2 may be used for quadratic trends.
-
q (Moving Average Order)
- Represents the number of lagged forecast errors to include in the model.
- Possible values: Non-negative integers (0, 1, 2, 3, ...).
- A value of 0 means no moving average component is used.
- Higher values of q capture more complex error structures but may also lead to overfitting.
Selecting p, d, q in Practice
- Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF): These plots are used to identify potential values for p and q.
- ACF helps identify the moving average component (q).
- PACF helps identify the autoregressive component (p).
- Stationarity Check: Use statistical tests like the Augmented Dickey-Fuller (ADF) test to determine the appropriate value of d.
- Model Diagnostics: After fitting the model, check residuals to ensure they resemble white noise (no patterns).