To wit: Is there a long term causal relationship between the age profile of a country, and it's potential to generate income? That's a hard (and complicated) question to answer, so in the best traditions of economics, I'm going to grossly simplify: Is there a statistical relationship between either the median age of a population or its dependency ratio, with GDP per capita?

Changing the question completely avoids the problems attempting to model the distribution of age within a population by focusing on simple numeric measures of a population structure, with the obvious pitfall of getting it all wrong if those measures are not actually related to income. Note that I'm also completely avoiding the issue of causality.

My underlying hypothesis here is that the higher the median age the greater the potential for generating income while the dependency ratio should have the opposite effect; and I'm also assuming that the relationship is linear. In real life, because of the different impact of the old age cohorts (high savings and consumption) compared to children (must be fully supported by the working population), I suspect the relationship between income with age and dependency would in actuality be non-linear, but particularly for the dependency ratio. If you recall the discussion in the last post, at the 1st and 4th stages of the demographic transition, the population is stable, which means there won't be a demographic impact on incomes at those stages. However, there are very few countries (Japan comes to mind), who have high enough median age populations to test this, so I'm pretty sure I'm safe assuming linearity in stages 2 and 3.

Moving on to the question of methodology, there's a number of approaches to answer the income-age/income-dependency question:

- The single country approach - simplest and least demanding to do, using time series data from a single country only;
- The cross-sectional approach - combines the data from a number of countries from a single year which is a more universal application, but ignores the probability that the sought-for relationship may actually change across time, as well as ignoring country-specific effects (government policies, availability of arable land etc);
- The pooled/panel approach - combines both cross-sectional and time series data. Great if you can manage it, but has horrendous data requirements depending on how many countries are included in the sample (preferably all).

I'm going to cover all three approaches (hey, I'm ambitious), and hopefully we'll come up with an answer that makes sense, as well as provide some forecasts as to if and when Malaysia may reach high-income status. Since the material is pretty extensive, I'm going to split this into three quick posts, rather than just the one I had originally planned, to make reading a little easier.

**Model I**

Here's the dataset I'm working with (details at the end of the post):

Note the different trends of the youth and old age dependency ratios (both relatively trend-stationary), which result in a non-stationary total dependency ratio. Also, I have high correlations between most of the variables, so there is the potential for multicollinearity problems (which tested out as being confirmed between median age and the old age ratio):

GDP_MYS | AGE_MYS | RATIO_T_MYS | RATIO_O_MYS | RATIO_Y_MYS | |

GDP_MYS | 1.00 | 0.98 | -0.95 | 0.98 | -0.96 |

AGE_MYS | 0.98 | 1.00 | -0.74 | 0.99 | -0.97 |

RATIO_T_MYS | -0.95 | -0.74 | 1.00 | -0.63 | 0.88 |

RATIO_O_MYS | 0.98 | 0.99 | -0.63 | 1.00 | -0.92 |

RATIO_Y_MYS | -0.96 | -0.97 | 0.88 | -0.92 | 1.00 |

Next, I ran a series of regressions - GDP against all the variables, singly or in combination. Here are the results:

- LOG(GDP_MYS) = 6.86*LOG(AGE_MYS) - 12.42 + [AR(1)=0.47]
- LOG(GDP_MYS) = -3.60*LOG(RATIO_T_MYS) + 7.55 + [AR(1)=0.72]
- LOG(GDP_MYS) = -2.88*LOG(RATIO_Y_MYS) + 7.54 + [AR(1)=0.71]
- LOG(GDP_MYS) = 2.80*LOG(RATIO_O_MYS) + 16.65 + [AR(1)=0.71]
- LOG(GDP_MYS) = 10.78*LOG(AGE_MYS) + 2.53*LOG(RATIO_T_MYS) - 23.62
- LOG(GDP_MYS) = 8.74*LOG(AGE_MYS) + 3.28*LOG(RATIO_Y_MYS) + 2.14*LOG(RATIO_O_MYS) - 10.77

All coefficients are statistically significant at the 5% level, with the regressions showing very high r-squared figures, and diagnostics that (mainly) work out ok. The AR(1) terms, where present, are to correct for serial correlation.

I know this is terribly wonkish, and not at all easy to follow if you've never studied statistics, but bear with me. To read the equations,

**simply treat the numbers before each variable on the right hand side as the precentage change to GDP per capita**, from every percentage change in that particular variable. So from equation 1, a +1% rise in the median age results in GDP per capita rising +6.86%. If the number is negative, then the relationship between that variable and GDP is also negative.

More generally, the results tend to bear out my intuition:

- An
**increase**in the median age is associated with an**increase**in GDP per capita - A
**reduction**in the total dependency ratio (both youth and old age cohorts) is associated with an**increase**in GDP per capita - A
**reduction**in the youth dependency ratio is associated with an**increase**in GDP per capita - An
**increase**in the old age dependency ratio is associated with an**increase**in GDP per capita - The last two results show the wrong signs for the dependency ratio coefficients, which is an artifact of the mutlicollinearity problem

The forecasts generated by the regressions are summarised in the chart below:

...which is an awfully wide range of estimates. Model selection criteria points squarely at equation 6, which just so happens to generate the highest forecast GDP per capita - tripling GDP per capita from 2010 to 2020. That doesn't seem plausible - although I should point out I'm forecasting 40 years forward from just 18 years of realised data.

Of the single variable regressions, the next best is equation 4 (GDP with the old age ratio), followed by equation 1 (GDP with median age). Both have GDP per capita doubling by 2020, although the forecasts diverge substantially after that.

Will that be enough for Malaysia to achieve high-income status? It will be close. The standard by which income is judged is the World Bank's classification, which converts local currency nominal GNI per capita to USD using the Atlas method. As of 2008, Malaysia's GNI per capita is around USD6,970, about 41.5% below the high-income threshold of USD11,906. Assuming 3% inflation per annum, that means by 2020 we have to hit GNI per capita of about USD16,975. In PPP-corrected terms (assuming price levels increase by the same ratio), that's about 34,270 in current international dollars, or ever so slightly above the 2020 forecast of equation 4 (34,187) and not too far from equation 1 (31,231).

In other words, it will be a close run thing. Given the uncertainty built into the forecasts and my assumptions, we could potentially pass the high-income barrier before 2020, and equally we might pass it well after. It really depends.

On the other hand, it also seems pretty clear that Malaysia's demographic transition will play a role in accelerating the process.

**Technical Notes:**

1. GDP data from the IMF World Economic Outlook Database (April 2009)

2. Population estimates from the US Census Bureau International Data Base

The added population have to be more productive than the current cohort. Otherwise, we will be just averaging downards.

ReplyDeleteLooking big population but very poor countries.....

Hi Poglet,

ReplyDeleteActually, you're missing the point I'm trying to make. In the demographic transition phase of development, the ratio of the working population to the total population increases as lifespans grow longer and birth rates fall, a sitution I'm proxying with an increase in the median age.

That means even if worker productivity stays the same, the total productivity of the country increases as there are more workers to dependents - i.e. the productivity of the population as a whole actually increases, even if individual productivity stays pat.

Poor countries with large populations tend to have high dependency ratios - high birth rates but also high death rates and shorter lifespans, leading to low median ages.

That's not the case with Malaysia. Our birth rates have fallen drastically in the last decade, while lifespans have gotten longer...the exact conditions for the demographic transition phase.