*[I’ve been working on this for over a week now, hope you enjoy it. Since this is a very long post, I’ve split it into three parts. This is Part II]*

__Analysis__

We should start off first by formalising the correlation into a regression. Using an unbalanced panel estimation with fixed effects on the sample data above (translation: we do a regression that covers all countries simultaneously over time), we get the following results (standard errors in parenthesis):

Ln(GDP) = 8.49 (0.06) + 0.26*Ln(CPI) (0.04)

What this says is that a 1% increase in the CPI score is associated with a 0.26% increase in the level of income.

So if you go from a CPI score of 5, and manage to increase it to 6 (an increase of 20%), your associated income level should be on average 5% higher (the 95% confidence range would be between 7% and 3%). If you’re going from a score of 2 (e.g. Cambodia, Laos) to 7 (US, France), your income level would be between 45% to 85% higher (average: 65%).

Would that kind of increase be sufficient to qualify as a high income nation? I’m not sure but I don’t think so, certainly not based on the examples I quoted. What about if we look at levels alone?

GDP = 8803 (894) + 965*CPI (205)

Since we’re dealing with current GDP numbers, we can evaluate this against the World Bank’s current threshold for high income, which happens to be USD12,195.

What this means is that all you need is a CPI score of about 3.5 (Thailand; El Salvador) to cross over into becoming a high income nation. Ahem.

That’s obviously not true, so we need to look into this a little deeper – the correlation, such as it is, isn’t really helpful at all, and can’t be relied upon to give a true picture of the relationship between corruption and income.

What about a non-linear relationship (log GDP against actual CPI score)? I tried it, and it’s not much different from the first attempt above (an increase of 1 point in the CPI score raises GDP per capita by 6%-8%; again not terribly convincing).

So back to first principles – what is the the CPI? It’s a continuous (not discrete) scale that ranges from zero to ten. Looking at the individual country scores and testing for unit roots suggest the CPI scores are mainly – though not all – I(0) variables i.e. the CPI scores are stationary variables. On the other hand, GDP per capita numbers are very obviously I(1) variables i.e. non-stationary variables.

If you want to know the difference, here’s a sample of the data for Australia:

In the first graph, the CPI numbers mainly fluctuate between 8.6 to 8.8, with the exception of a couple of years. That’s what a stationary variable looks like – it fluctuates around a central point through time. The GDP data however is continuously rising across time i.e. it’s non-stationary.

There are exceptions; for a subset of countries, the CPI is generally rising with GDP per capita, and for another subset, we have the opposite – the CPI score is falling but GDP per capita is rising. But on the whole, the general case is of a fairly stable CPI score with a continuously rising GDP per capita.

And this gives a partial solution to the problem – **the CPI score, as constructed, cannot have a long term causal relationship with GDP per capita. You need an absolute, not relative, equivalent measure to properly define the relationship between corruption and income.** Changes in stationary I(0) variables cannot “explain” long term changes in I(1) variables, you need to have variables of the same order of integration.

But all hope is not lost – if you can’t make the CPI data non-stationary, it’s fairly trivial to transform GDP per capita data into stationary data by taking the difference in values between each period. In other words, it’s theoretically valid to examine the relationship between the CPI score and real GDP ** growth**.

So, starting all over again, here’s the same dataset but tabulating the CPI score on the vertical axis, and real GDP per capita growth on the horizontal axis:

And one look is all you need – **there is no strong relationship between corruption and economic growth**. Changes in the level of corruption don’t appear to be associated with changes in the rate of growth. There might be a relationship between corruption and the variance of growth (wider scatter at low CPI scores), but not the level of growth itself.

More formally (standard errors in parenthesis):

GDP growth = 0.045 (0.01) + 0.001*CPI (0.002)

The intercept (0.45) is statistically significant, but the coefficient for the CPI (0.001) is not statistically significant from zero – rather strongly so (p-value=0.6275).

[BTW, we’ve just discovered the trend estimate for world real GDP per capita growth over the last 15 years (0.045 = 4.5%).]

Does GDP growth affect corruption? Not hardly:

CPI = 4.33 (0.15) + 0.11*GDP growth (0.23)

Same story as above – the intercept is statistically significant, but the coefficient for GDP growth is not (p-value again at 0.6275).

The obvious conclusion is that the correlation between the CPI score and real GDP per capita is spurious – they’re both being driven by (an)other unidentified process(es).

I’ll admit that finding surprised me – I expected to find a relationship, even if a very weak one. What could be the possible reasons behind this?

Be prepared to be called a stooge of UMNO.

ReplyDeleteBTW,thanks for your three posts..took me several readings to get over the statitics jargon.

:) You're welcome

ReplyDeleteHafiz,

ReplyDeleteOn the first point, you can still have a relationship between variables of different orders of integration, it's just that the relationship is necessarily short term. For example, in a VECM with I(1) variables, I(0) variables can enter through the error-correction term or as endogenous variables, but should not be included in any co-integrating relationships. So you could in principle look for an effect on the dynamics of a system, but not on its underlying structure.

On the second point, I absolutely agree - corruption is almost bound to cause inefficiencies and inequities. It's just that corruption doesn't seem to be much of a factor on an aggregate macro level in terms of overall income levels or growth. Of course, we're missing the counterfactual to test against, but that's why a panel approach is useful.

Oh, BTW, thanks for the links, interesting reading.

ReplyDelete