ARIMA_FORECAST

ARIMA Model

ARIMA stands for AutoRegressive Integrated Moving Average and is a widely used statistical method for time series forecasting. It is specified by three order parameters: (p, d, q), which define the structure and behavior of the model.

Parameters of ARIMA

  1. AR(p) - Autoregression

    • The autoregressive component models the relationship between a current observation and its past values.
    • It uses a regression equation where the current value of the time series is expressed as a linear combination of its previous values.
    • The parameter p represents the number of lagged observations included in the model.
  2. I(d) - Integration

    • The integration component involves differencing the time series to make it stationary. Stationarity ensures that the statistical properties of the series (mean, variance, etc.) remain constant over time.
    • Differencing is the process of subtracting the current value of the series from its previous value. This is done d times to remove trends or seasonality.
  3. MA(q) - Moving Average

    • The moving average component models the relationship between the current observation and the residual errors from a moving average model applied to lagged observations.
    • It represents the error of the model as a combination of previous error terms.
    • The parameter q represents the number of lagged error terms included in the model.

Possible Values of p, d, q in ARIMA (R Implementation)

When implementing ARIMA in R, the values of p, d, and q are chosen based on the characteristics of the time series data. Here’s a guide to selecting these parameters:

  1. p (Autoregressive Order)

    • Represents the number of lagged observations to include in the model.
    • Possible values: Non-negative integers (0, 1, 2, 3, ...).
    • A value of 0 means no autoregressive component is used.
    • Higher values of p capture more complex dependencies in the data but may lead to overfitting.
  2. d (Degree of Differencing)

    • Represents the number of times the time series is differenced to achieve stationarity.
    • Possible values: Non-negative integers (0, 1, 2, 3, ...).
    • A value of 0 means no differencing is applied.
    • A value of 1 is commonly used to remove linear trends, while 2 may be used for quadratic trends.
  3. q (Moving Average Order)

    • Represents the number of lagged forecast errors to include in the model.
    • Possible values: Non-negative integers (0, 1, 2, 3, ...).
    • A value of 0 means no moving average component is used.
    • Higher values of q capture more complex error structures but may also lead to overfitting.

Selecting p, d, q in Practice

  • Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF): These plots are used to identify potential values for p and q.
    • ACF helps identify the moving average component (q).
    • PACF helps identify the autoregressive component (p).
  • Stationarity Check: Use statistical tests like the Augmented Dickey-Fuller (ADF) test to determine the appropriate value of d.
  • Model Diagnostics: After fitting the model, check residuals to ensure they resemble white noise (no patterns).