VoxEU Column Development

Your new composite index has arrived: Please handle with care

Policymakers and commentators are constantly looking for new ways to measure development. This column warns against embracing new composite indices with little guidance from economic or other theories. It provides a critical overview of the strengths and weaknesses of using such “mashup” indices of development.

A host of indicators are used to track development. The World Bank’s annual World Development Indicators presents hundreds of such indicators (World Bank, 2009). The United Nation’s Millennium Development Goals are defined using a long list of indicators.

Faced with so many indicators – a “large and eclectic dashboard,” as the Stiglitz report nicely puts it (Stiglitz et al. 2009) – there is an understandable desire to form a single composite index. For some of the composite indices found in practice, economic theory provides useful clues as to how the index should be constructed (GDP, for example). This is not the case for another type of composite index that is becoming popular.

For these, neither the list of underlying data nor the aggregation technique is informed by theory or practice. The maker of the composite-indicator has free roam and is largely unconstrained by economic or other theories intended to inform measurement practice. Borrowing from web jargon, I shall call these “mashup indices”.

Probably the most famous example is the Human Development Index (HDI) that is published each year in the United Nations Development Program’s Human Development Report. The HDI is a composite of:

  • life expectancy,
  • schooling (literacy and enrolment rates), and
  • log GDP per capita at purchasing power parity.

The HDI adds up attainments in these three dimensions, with equal weight (one-third each).

Another is the Multidimensional Poverty Index (MPI) was developed by Alkire and Santos (2010), in work done for the 2010 edition of the Human Development Report. The authors choose 10 components;

  • two for health (malnutrition and child mortality),
  • two for education (years of schooling and school enrolment), and
  • six to capture “living standards” (including both access to services and proxies for household wealth).

The three headings – health, education, and living standards – are weighted equally. Poverty is measured separately in each of these 10 dimensions.

Mashups have been devised for other dimensions of development, including governance (notably the Worldwide Governance Indicators), private enterprise regulation (most prominently, the Ease of Doing Business Index), and the environment (such as the Environmental Performance Index).

Probably the most ambitious example was published by Newsweek magazine in August 2010. It tries to identify the “World’s Best Countries”, using a composite of many indicators (many of them already mashup indices) assigned to five groupings: education, health, quality of life, economic competitiveness, and political environment.

The country rankings based on these and other mashup indices often attract media attention. People are naturally keen to see where their country stands. The details of how the composite index was formed, however, rarely get much critical scrutiny. The web-based publications do not (as a rule) comply with prevailing scholarly standards for documenting and defending a new measure.

For and against mashup indices

As data sources become more open and technology develops, creative new mashups can be expected. It is a good time then to take stock of the concerns with existing indices, in the hope of doing better in the future. In a recent working paper (Ravallion 2010), I have tried to provide a critical assessment of the strengths and weaknesses of existing mashup indices of development.

There can be gains in forming a mashup index, often stemming from inadequacies of prevailing indices. Composite indices are often trying to attach a number to an important, but unobserved, concept, for which prevailing theories and measurement practices offer little guidance. And there are clear attractions to finding a way of collapsing a (potentially) large number of dimensions into one.

Yet the current enthusiasm for new mashup indices needs to be balanced by clearer warnings for, and more critical scrutiny from, users. Some past mashup indices do not stand up well to such scrutiny. The appendix offers some specific suggestions for a “handle with care” sign for these new composite indices.

While there is invariably a gap between the theoretical ideal and practical measurement, for past mashup indices the gap is huge. Greater clarity is needed on what exactly is being measured. And more attention needs to be given to the trade-offs embodied in the index. In most cases the trade-off is not even identified in the most relevant space for users to judge, and in cases where it can be derived from the data available it has been found to be questionable – implying, for example, unacceptably low valuations of life in poor countries. Indeed, the value of life implicit in the HDI ranges from a remarkably low level in the poorest countries to a very high level in rich countries (Ravallion 1997).

There is a peculiar inconsistency in the literature on mashup indices whereby prices are deemed to be an unreliable guide to trade-offs, and are largely ignored, while the actual weights being assumed in lieu of prices are often not made explicit in the same space as prices. Thus we have no basis to believe that the weights being used are any better than market prices, when available. Nor do we have any basis for believing that the weights bear any resemblance to defensible shadow prices.

Aggregating under such conditions risks stifling, rather than promoting, debate about what trade-offs are in fact acceptable, when such trade-offs need to be set.

Mashup producers need to be more humble about their products. Their rhetoric is often in marked tension with reality. Not all are as ambitious as Newsweek’s effort to find the “World’s Best Countries” using a mashup of mashups. But exaggerated claims are common even in the more academic efforts. One is struck, for example, that the “multidimensional poverty indices” proposed to date actually embrace far fewer dimensions of welfare than commonly-used measures based on consumption at household level (Ravallion 2010). For example, the consumption measure based on a modern household budget or living standards survey will aggregate (actual or imputed) expenditures on literarily hundreds of consumption items (up to 1,000 categories in some surveys). The MPI only identifies six factors for “living standards.” So the MPI leaves out a great many of the multiple dimensions poverty that are included in a standard (“unidimensional”) consumption-based measure.

The uncertainty about the components and their weights is not adequately acknowledged by mashup producers, and users are given little guidance to the robustness of the resulting country rankings. Today’s technologies permit greater openness about the sensitivity of country rankings to choices made about a mashup index’s (many) moving parts. For non-market goods it appears to be highly implausible that the weights would be constant across everyone in a given country, let alone across all the countries (and people) of the world. Knowing nothing else about their design, this fact alone must make one sceptical of past mashup indices.

Policy relevance is often claimed, but is rarely so evident on close inspection. It is unclear what can be concluded about “country performance” toward agreed development goals in the absence of an allowance for the (country-specific) contextual factors that constrain that performance. (The words “performance” and “impact” are used too loosely in the mashup industry.) There are also potentially important “targeting applications”, though here policymakers might be better advised to use the component measures appropriate to each policy instrument rather than the mashup index.

A colleague pointed to a medical analogy. While many things impinge on your personal health, you would not want your doctor to base your check-up on a composite index. Similarly, a mashup index of all those dials on your car’s dashboard may fail to reveal that you are about to run out of fuel.

Conclusions

Mashup indices have often contributed to public awareness about important development issues. However, we should not presume that these mashups of pre-existing development data have taught us something we did not know – adding explanation, understanding, or insight where there was none before. That is not what happened when the mashup index was formed. Rather, it took things we already knew and re-packaged them, and too often in a way that will be opaque to many users, and yet contentious if those users understood what went into the mashup.

Arguably, mashup indices exist because theory and rigorous empirics have not given enough attention to the full range of measurement problems faced in assessing development outcomes. The lessons for measurement from prevailing economic theories only take us so far in addressing the real concerns that practitioners (including policymakers) have about current indicators. A mashup index is unlikely to be a very satisfactory response to those concerns. Theory needs to catch up. It also needs to be recognised that the theoretical perspectives relevant to measurement practice are not just found in economics, but also embrace the political, social, and psychological sciences.

Thankfully, progress in development does not need to wait for that catch up to happen. A composite index is not essential for many of the purposes of evidence-based policymaking. Recognising the multidimensionality of policy goals does not imply that we should be aggregating fundamentally different things in opaque and often questionable ways.

Note: These are the views of the author, and need not reflect those of the World Bank or any affiliated organisation.

References

Alkire, Sabina and Maria Emma Santos (2010), “Acute Multidimensional Poverty: A New Index for Developing Countries”, Oxford Poverty and Human Development Initiative, Working Paper 38, University of Oxford.

Ravallion, Martin (1997), “Good and Bad Growth: The Human Development Reports”, World Development, 25(5):631-638.

Ravallion, Martin (2010), “Mashup Indices of Development”, Policy Research Working Paper 5432, World Bank.

Stiglitz, Joseph, Amartya Sen, and JP Fitoussi (2009), Report by the Commission on the Measurement of Economic Performance and Social Progress,

United Nations Development Programme (Various Years). Human Development Report, New York: Oxford University Press.

World Bank, 2009, World Development Indicators, Washington DC: World Bank.

Appendix: Questions to ask about a new mashup index

What is this index measuring? The fact that the target concept is unobserved does not mean we cannot define it and postulate what properties we would like its measure to have. Yet this is not common in the industry of mashup indices. The frequent lack of conceptual clarity about what exactly one is trying to measure makes it hard to judge the practical choices made about what pre-existing indicators get used in the composite.

What trade-offs are embedded in the index? We need to know the trade-offs – the marginal rates of substitution (MRS) – built into the index if it is to be properly assessed and used. If a policy or economic change entails that one of the positively-valued dimensions of welfare increases at the expense of another dimension, then it is the MRS that determines whether overall welfare has risen or fallen. At one level, the weights in most mashup indices are explicit. Common practice is to identify a set of component variables, group these in some way, and attach equal weight to these groups. However, little or no attention is given to the implied trade-offs in the space of the primary dimensions being aggregated, and whether those weights are defensible. It does not even appear to be the case that the aggregation functions in most of the current mashup indices of development have been chosen with regard to the implied trade-offs.

How robust are the rankings? Theory never delivers a complete specification for measurement. There is inevitably a judgement required about one or more parameters. There is also statistical imprecision about parameter estimates. For these reasons it is widely-recommended scientific practice to test the robustness of the derived rankings. Users of prevailing mashup indices are rarely told much about the uncertainties that exist about the series chosen, the quality of the data, and their weights. Few rigorous robustness tests are provided. None of their websites make it easy for users to properly assess the sensitivity of these mashup indices to changing weights. Yet it would be relatively easy to program the required flexibility into the current web sites, so users can customise the index with their preferred weights, to see what difference it makes.

How should the index be used by policy makers? For these users the sign should read “handle with extreme care” because there are risks of distorting policy making by creating incentives for policy reform that bear little relationship to overarching development goals, such as reducing poverty. It is not credible that any one of these indices is a sufficient statistic for country performance even with regard to the development outcome being measured. These indices tell us nothing about how we should judge the performance of these countries, given the real constraints they face. For example, we may well rank the countries very differently if we took account of the country’s stage of economic development. Without greater effort to allow for the circumstances and history of a country it is not clear what we learn from these indices.

It has been argued that country comparisons of a mashup index can shame the lowly ranked countries into reform. It is not clear how important the comparisons of such country rankings have been in the learning process about good policymaking; the comparison of experiences with specific policies would seem no less important in practice. But if a country was keen to improve its ranking in some mashup index, it is clear what the government needs to do: it should focus only on the specific components of the index that it is doing poorly on. Government action to improve some selected component indicator that only claims to proxy for deeper characteristics of the economy and society can hardly constitute the sort of progress in policy making that everyone would like to see.

Source: Ravallion (2010).

 

1,575 Reads