Last month, I did a
simple forecast of Malaysia's exports based on import trends. Now that the February trade numbers are out, we can evaluate my simple regression.
To recap, this is the relationship between exports and imports based on a sample range of Jan 2000-Dec 2007:
Ln(exports)= 0.79+0.94 Ln(imports(-1))
Based on this, the February forecast was for exports to reach RMRM35,968 million with a 95% interval forecast (approximately 2 standard errors from the point forecast) of between RM43-RM30 billion. Compare that to the actual result of RM39,586.9 million, which is well into the upper end of the interval forecast.
It seems this sort of works - except that the gross error doesn't exactly inspire confidence. Two problems I can see here: first, the forecast has an in-built bias that is endemic to economic statistics, which is due to autocorrelation (also called serial correlation). Secondly, I have potential bias from seasonal differences as well, which I haven't accounted for.
One of the assumptions required for linear regressions using ordinary least squares (OLS) to be valid, is that the residuals (what's left after fitting the data) have to be independently, randomly distributed - in other words, their distribution should be approximately gaussian, and there should not be a systematic relationship between any of the residuals. In the case of my simple regression, this isn't true. Using a Breusch-Godfrey test* for 12 lags, I've got serial correlation at multiple lags – bad.
*The better known Durbin-Watson test is only valid for testing serial correlation for one lag only.
How to resolve these issues? I’ve got three choices:
1. Forget about trying a
structural model, and use a
stochastic model using an ARMA (Auto-Regressive, Moving Average) representation instead. In essence, I’d be modeling exports based on past values of exports. This is great for trending series, where structural changes aren’t an issue – which isn’t the case with Malaysian exports at this time. So that’s out.
2. Tackle the serial correlation problem directly, again using ARMA terms in the regression. I’d still have the structural component here, making this a better choice under the current circumstances.
3. Adjust for the seasonal effect and see if the serial correlation (or any other problems) still exists. Here, I have another two choices – I can apply seasonal adjustment to both export and import series, or I can try modeling the seasonal effect directly.
So that’s four different models to choose from, just because I’ve encountered one little problem. In practice, trying out and evaluating all four models would be necessary. So here are the baseline model results:
Baseline Model
Ln(exports)= 0.79 + 0.94 Ln(imports(-1))
Point forecast: RM35968
Upper bound: RM42547
Lower bound: RM29389
R2: 0.85
And the results from the rest:
ARMA (1,1) Model
Ln(exports)= 4.68 + 1.00 AR(1) - 0.63 MA(1)
Point forecast: RM45370
Upper bound: RM52965
Lower bound: RM37774
R2: 0.90
Structural ARMA(1,0) Model
Ln(exports)= 0.37 + 0.98 Ln(imports(-1)) - 0.36 AR(1)
Point forecast: RM38070
Upper bound: RM44546
Lower bound: RM31515
R2: 0.87
Seasonally Adjusted Model
Ln(exports*)= 0.38 + 0.98 Ln(imports*(-1))
Point forecast: RM39407
Upper bound: RM44264
Lower bound: RM34550
R2: 0.93
(*seasonally adjusted. Actual seasonally adjusted exports for February reached RM47045.)
Seasonal Effect Model
Ln(exports)= 3.84 + 0.98 Ln(imports(-1)) - 0.00 D2 + 0.24 D3 + 0.03 D4 + 0.10 D5 + 0.10 D6 + 0.11 D7 + 0.15 D8 + 0.14 D9 + 0.12 D10 + 0.10 D11 + 0.17 D12
Point forecast: RM34686
Upper bound: RM40054
Lower bound: RM29318
R2: 0.82
So – which model to choose? Based just on the ability to predict February exports, the obvious candidate would be the structural ARMA model, which has the lowest variance with respect to the actual result.
But what we want is not a model that minimizes error at one point in time, but rather at all points of time. In practice, this means looking at goodness of fit measures (R2), or information criterion (such as
Bayesian Information Criterion). On that basis, the best model is the seasonally adjusted model, despite the massive RM7.5b error for February. I’d also consider the seasonal effect model, because it came in second in terms of information criteria.
What do the models say about March exports? Let’s see which one does best - watch this space next month:
Baseline: RM33832; Range: RM40.0-27.6b
ARMA: RM44618; Range: RM51.5-37.7b
Structural ARMA: RM33712; Range: RM39.9-27.5b
Seasonally adjusted: RM39792*; Range: RM44.7*-34.9*b
Seasonal Effect: RM43861; Range: RM49.4-38.3b
(*seasonally adjusted)