Thursday, April 16, 2009

Forecasting Malaysian Trade: Some simple models

Last month, I did a simple forecast of Malaysia's exports based on import trends. Now that the February trade numbers are out, we can evaluate my simple regression.

To recap, this is the relationship between exports and imports based on a sample range of Jan 2000-Dec 2007:

Ln(exports)= 0.79+0.94 Ln(imports(-1))

Based on this, the February forecast was for exports to reach RMRM35,968 million with a 95% interval forecast (approximately 2 standard errors from the point forecast) of between RM43-RM30 billion. Compare that to the actual result of RM39,586.9 million, which is well into the upper end of the interval forecast.

It seems this sort of works - except that the gross error doesn't exactly inspire confidence. Two problems I can see here: first, the forecast has an in-built bias that is endemic to economic statistics, which is due to autocorrelation (also called serial correlation). Secondly, I have potential bias from seasonal differences as well, which I haven't accounted for.

One of the assumptions required for linear regressions using ordinary least squares (OLS) to be valid, is that the residuals (what's left after fitting the data) have to be independently, randomly distributed - in other words, their distribution should be approximately gaussian, and there should not be a systematic relationship between any of the residuals. In the case of my simple regression, this isn't true. Using a Breusch-Godfrey test* for 12 lags, I've got serial correlation at multiple lags – bad.

*The better known Durbin-Watson test is only valid for testing serial correlation for one lag only.

How to resolve these issues? I’ve got three choices:

1. Forget about trying a structural model, and use a stochastic model using an ARMA (Auto-Regressive, Moving Average) representation instead. In essence, I’d be modeling exports based on past values of exports. This is great for trending series, where structural changes aren’t an issue – which isn’t the case with Malaysian exports at this time. So that’s out.

2. Tackle the serial correlation problem directly, again using ARMA terms in the regression. I’d still have the structural component here, making this a better choice under the current circumstances.

3. Adjust for the seasonal effect and see if the serial correlation (or any other problems) still exists. Here, I have another two choices – I can apply seasonal adjustment to both export and import series, or I can try modeling the seasonal effect directly.

So that’s four different models to choose from, just because I’ve encountered one little problem. In practice, trying out and evaluating all four models would be necessary. So here are the baseline model results:

Baseline Model


Ln(exports)= 0.79 + 0.94 Ln(imports(-1))
Point forecast: RM35968
Upper bound: RM42547
Lower bound: RM29389
R2: 0.85

And the results from the rest:
ARMA (1,1) Model


Ln(exports)= 4.68 + 1.00 AR(1) - 0.63 MA(1)
Point forecast: RM45370
Upper bound: RM52965
Lower bound: RM37774
R2: 0.90

Structural ARMA(1,0) Model


Ln(exports)= 0.37 + 0.98 Ln(imports(-1)) - 0.36 AR(1)
Point forecast: RM38070
Upper bound: RM44546
Lower bound: RM31515
R2: 0.87

Seasonally Adjusted Model


Ln(exports*)= 0.38 + 0.98 Ln(imports*(-1))
Point forecast: RM39407
Upper bound: RM44264
Lower bound: RM34550
R2: 0.93

(*seasonally adjusted. Actual seasonally adjusted exports for February reached RM47045.)

Seasonal Effect Model


Ln(exports)= 3.84 + 0.98 Ln(imports(-1)) - 0.00 D2 + 0.24 D3 + 0.03 D4 + 0.10 D5 + 0.10 D6 + 0.11 D7 + 0.15 D8 + 0.14 D9 + 0.12 D10 + 0.10 D11 + 0.17 D12
Point forecast: RM34686
Upper bound: RM40054
Lower bound: RM29318
R2: 0.82

So – which model to choose? Based just on the ability to predict February exports, the obvious candidate would be the structural ARMA model, which has the lowest variance with respect to the actual result.

But what we want is not a model that minimizes error at one point in time, but rather at all points of time. In practice, this means looking at goodness of fit measures (R2), or information criterion (such as Bayesian Information Criterion). On that basis, the best model is the seasonally adjusted model, despite the massive RM7.5b error for February. I’d also consider the seasonal effect model, because it came in second in terms of information criteria.

What do the models say about March exports? Let’s see which one does best - watch this space next month:

Baseline: RM33832; Range: RM40.0-27.6b

ARMA: RM44618; Range: RM51.5-37.7b

Structural ARMA: RM33712; Range: RM39.9-27.5b

Seasonally adjusted: RM39792*; Range: RM44.7*-34.9*b

Seasonal Effect: RM43861; Range: RM49.4-38.3b

(*seasonally adjusted)

6 comments:

  1. Hi Hishamh,

    Great post. A rare chance to see how economist is doing forecasts.

    Hope you can shed some lights on:
    1. For the seasonally adjusted forecast:
    What will be the actual Mar export forecasted if take away the seasonal adjustment ?

    2. what do the term "f.o.b." (behind exports)and "c.i.b." (behind imports) mean ?

    3. For the seasonal effect model, does
    the D mean difference of import of the month with respect to January import data ? the coefficient of D2... D12-> lower import on Jan, Feb, wonder why ?

    4. A component of import is for domestic consumption, a component of export (e.g local commodity for export) is not from import and they could be variables. Any way to model them in ?

    ReplyDelete
  2. Thanks WY.

    1. That would be the baseline model, based on my original regression - RM33832m

    2. f.o.b. means free on board, c.i.f. means carriage, insurance and freight. These refer to the way trade values are measured i.e. at point of entry or departure. Essentially, these signify the services element in trade (i.e. not just the goods values). The reason why gross trade values differ from national account values is because uder the national accounts, the services element is stripped away and included under private consumption.

    3. The 11 D variables represent seasonal dummy variables, and January is represented by the constant. The dummy variables take a value of 0 or 1, so your intuition is correct.

    As to why - shopping season, and holiday season. Early in the year, we have lots of festivals, which reduces working hours and production, hence demand for imported inputs. Production builds up during the year to meet demand for year-end shopping in advanced economies.

    4. Hey, then it wouldn't be simple! Ideally, I should start from micro-foundations - consumer demand function, firm production function, input costs, the real exchange rate, terms of trade, cost of capital, external demand functions, elasticities etc. etc. ad nauseum.

    In practice, I'm not sure it really matters for Malaysian trade. Import end-use breaks down to approximately 10% consumer goods, 15% capital goods and 75% intermediate goods. Local content is pretty small except for the natural resource based industries (oil, gas, wood for example), but even these sectors import capital goods for production. So modeling exports based on solely on imports makes practical sense.

    Another issue in applying more variables s that there is always the underlying assumption that the relationship between the variables is relatively stable, which is questionable under the current circumstances.

    ReplyDelete
  3. Thx for your insightful reply.. Learnt a lot. Will try to digest..

    I miss asking this one:
    1. how we get lower bound and upper bound ?

    ReplyDelete
  4. Each regression has a calculated standard error which can be used to calculate the interval forecast. In the Wiki article, they use the formula:

    upper bound: x + 1.96*s.e.
    lower bound: x - 1.96*s.e.

    which gives 95% of the observations - the area under distribution curve - assuming a normal distribution (x is the point forecast). I tend to assume a T-distribution instead which has slightly fatter tails, and is a little easier to calculate:

    upper bound: x + 2*s.e.
    lower bound: x - 2*s.e.

    Call me lazy.

    ReplyDelete
  5. Hey Hishamh,
    I'm currently an intern at research department and I was given the task of forecasting trade in Malaysia. My knowledge in forecasting is rather shallow as I have just only finished my 2nd year in uni. It would be great if you could shed light on how you did your forecast and which program did you use? And would it be plausible to forecast trade levels on a quarterly basis? Your advice is greatly appreciated. Thank you.

    ReplyDelete
  6. Hi Nick,

    I based these models on an empirical proven prior - Malaysian imports and exports are cointegrated. That means there's a long term relationship between imports and exports, which means it's possible to forecast one with the other. But the approach I've taken is necessarily limited; I haven't for instance incorporated a way to forecast imports as well, which would obviously allow for longer forecasts of both.

    But I take from your post that you're actually interested in doing both? That's another kettle of fish, and not something that can be easily discussed through post comments.

    Why don't you email me instead? Email address is in my profile. I'd need to know what kind of forecast duration you've been asked to do, as well as how much time you have to work it out.

    ReplyDelete