Last month, I did a

simple forecast of Malaysia's exports based on import trends. Now that the February trade numbers are out, we can evaluate my simple regression.

To recap, this is the relationship between exports and imports based on a sample range of Jan 2000-Dec 2007:

Ln(exports)= 0.79+0.94 Ln(imports(-1))

Based on this, the February forecast was for exports to reach RMRM35,968 million with a 95% interval forecast (approximately 2 standard errors from the point forecast) of between RM43-RM30 billion. Compare that to the actual result of RM39,586.9 million, which is well into the upper end of the interval forecast.

It seems this sort of works - except that the gross error doesn't exactly inspire confidence. Two problems I can see here: first, the forecast has an in-built bias that is endemic to economic statistics, which is due to autocorrelation (also called serial correlation). Secondly, I have potential bias from seasonal differences as well, which I haven't accounted for.

One of the assumptions required for linear regressions using ordinary least squares (OLS) to be valid, is that the residuals (what's left after fitting the data) have to be independently, randomly distributed - in other words, their distribution should be approximately gaussian, and there should not be a systematic relationship between any of the residuals. In the case of my simple regression, this isn't true. Using a Breusch-Godfrey test* for 12 lags, I've got serial correlation at multiple lags – bad.

*The better known Durbin-Watson test is only valid for testing serial correlation for one lag only.

How to resolve these issues? I’ve got three choices:

1. Forget about trying a

*structural* model, and use a

*stochastic* model using an ARMA (Auto-Regressive, Moving Average) representation instead. In essence, I’d be modeling exports based on past values of exports. This is great for trending series, where structural changes aren’t an issue – which isn’t the case with Malaysian exports at this time. So that’s out.

2. Tackle the serial correlation problem directly, again using ARMA terms in the regression. I’d still have the structural component here, making this a better choice under the current circumstances.

3. Adjust for the seasonal effect and see if the serial correlation (or any other problems) still exists. Here, I have another two choices – I can apply seasonal adjustment to both export and import series, or I can try modeling the seasonal effect directly.

So that’s four different models to choose from, just because I’ve encountered one little problem. In practice, trying out and evaluating all four models would be necessary. So here are the baseline model results:

**Baseline Model**Ln(exports)= 0.79 + 0.94 Ln(imports(-1))

Point forecast: RM35968

Upper bound: RM42547

Lower bound: RM29389

R2: 0.85

And the results from the rest:

**ARMA (1,1) Model**Ln(exports)= 4.68 + 1.00 AR(1) - 0.63 MA(1)

Point forecast: RM45370

Upper bound: RM52965

Lower bound: RM37774

R2: 0.90

**Structural ARMA(1,0) Model**Ln(exports)= 0.37 + 0.98 Ln(imports(-1)) - 0.36 AR(1)

Point forecast: RM38070

Upper bound: RM44546

Lower bound: RM31515

R2: 0.87

**Seasonally Adjusted Model**Ln(exports*)= 0.38 + 0.98 Ln(imports*(-1))

Point forecast: RM39407

Upper bound: RM44264

Lower bound: RM34550

R2: 0.93

(*seasonally adjusted. Actual seasonally adjusted exports for February reached RM47045.)

**Seasonal Effect Model**Ln(exports)= 3.84 + 0.98 Ln(imports(-1)) - 0.00 D2 + 0.24 D3 + 0.03 D4 + 0.10 D5 + 0.10 D6 + 0.11 D7 + 0.15 D8 + 0.14 D9 + 0.12 D10 + 0.10 D11 + 0.17 D12

Point forecast: RM34686

Upper bound: RM40054

Lower bound: RM29318

R2: 0.82

So – which model to choose? Based just on the ability to predict February exports, the obvious candidate would be the structural ARMA model, which has the lowest variance with respect to the actual result.

But what we want is not a model that minimizes error at one point in time, but rather at all points of time. In practice, this means looking at goodness of fit measures (R2), or information criterion (such as

Bayesian Information Criterion). On that basis, the best model is the seasonally adjusted model, despite the massive RM7.5b error for February. I’d also consider the seasonal effect model, because it came in second in terms of information criteria.

What do the models say about March exports? Let’s see which one does best - watch this space next month:

Baseline: RM33832; Range: RM40.0-27.6b

ARMA: RM44618; Range: RM51.5-37.7b

Structural ARMA: RM33712; Range: RM39.9-27.5b

Seasonally adjusted: RM39792*; Range: RM44.7*-34.9*b

Seasonal Effect: RM43861; Range: RM49.4-38.3b

(*seasonally adjusted)