Consumption data: New frontiers

Jonas Kolsrud, Camille Landais, Johannes Spinnewijn 04 April 2018



Household consumption is, in many ways, the Arlesienne of economic analysis. Like the character in Alphonse Daudet’s play, who is central to the plot but never seen, household consumption is central to the narrative of economic models and analyses, but desperately elusive when it comes to measurement and implementation.

The consumption and savings decisions by households are believed to be fundamental to understanding long run growth and development, or cyclical economic fluctuations (e.g. Mian et al. 2013). It is also essentially through consumption (and leisure) that economists have thought, for the past 150 years, about the relationship between economic activities and individual welfare. While most of the recent debate on inequality has focused on income and wealth inequality, most economists believe that focusing on the distribution of consumption would be the most meaningful way to analyse inequality and social welfare. The distribution and dynamics of consumption are also considered to be the main necessary inputs for understanding the value of all social insurance and redistributive policies (Chetty and Finkelstein 2013).

Despite being so fundamental to most of our economic thinking (e.g. Deaton 1992), consumption has remained, for quite a long time, an empirical underdog. This is mostly due to data constraints that have limited our ability to learn much about the distribution and dynamics of consumption across individuals and over time. Indeed, for most of the past 40 years, the main data source for studying consumption were consumption expenditure surveys, the limitations of which are quite apparent. These surveys are costly to gather, and they therefore have small sample sizes and often no panel dimension to study the evolution of consumption within household over time. Furthermore the recording of expenditures is subject to various measurement issues that can prove quite severe, especially when trying to capture expenditures at either end of the income distribution (Attanasio and Pistaferri 2016).

In recent years the hegemony of traditional consumption surveys has been challenged. First, proprietary data on consumption expenditures is becoming more easily accessible, offering new insights. While the Nielsen Consumer Panel data and retail scanner data have been used by economists for almost 15 years now, new data coming from credit card sources or household accounts from financial apps are starting to shed new light on household consumption patterns (Pistaferri 2015).

Backing out expenditures from income and assets

A promising avenue is also being opened by the increasing access to rich, administrative data, which can be used to compute comprehensive and universal measures of consumption expenditures. A strand of research has recently been pushing this important agenda, already providing new perspectives on the distribution and dynamics of household consumption (Autor et al. 2017, Eika et al. 2017, Kolsrud et al. 2015, Sodini et al. 2016).

The key idea of this research agenda is to use the accounting identity coming from a household’s budget constraint – i.e. consumption equals income minus changes in assets. A household’s consumption expenditures can thus be computed as a residual of the household’s income sources and the evolution of the household’s asset positions. The main challenge for such computation is, of course, that all household income streams and asset transactions (from real estate to financial assets) need to be observed and well-measured. Luckily, a series of countries, mostly in Scandinavia, offer such opportunities through comprehensive registry data.

The advantages of registry-based measures of consumption are manifold. First, a comprehensive registry-based measure has the potential to capture all household consumption expenditures. Survey measures, on the other hand, sometimes capture only part of total household consumption expenditures (e.g. food expenditures) while proprietary data are often even more limited (credit card expenditures, checkout scanner data, etc.). The residual nature of the registry-based measure also offers the possibility to analyse the anatomy of consumption responses by decomposing expenditure changes into various components (income, transfers, changes in assets, debts and durables, etc.). Second, the registries used to construct the residual measures have universal coverage. This obviously offers statistical power, compared to the small samples from surveys. Moreover, when focusing on sub-samples of individuals experiencing specific shocks, there is no reason for survey samples or proprietary data samples to be representative of the full population of individuals experiencing these shocks. This is particularly true for bank data or credit card users’ data. Third, the panel dimension of registry-based measures offers a critical advantage over cross-sectional survey data for the study of consumption responses. With the latter, identification relies on strong pseudo-panel assumptions regarding cross-sectional heterogeneity, while panel data enables full control for individual fixed-effects. Finally, registry-based measures may offer efficiency over survey measures, as they rely on third-party reported registry-information, which is not subject to various biases (recall errors, attrition and non-response, etc.) found in survey data.

In a recent paper (Kolsrud et al. 2017), we build on previous work on constructing a registry-based measure of consumption in Sweden (Koijen et al. 2014), discuss data and methodological challenges for constructing this measure, and demonstrate some important advantages in a series of applications.

New insights on income and consumption inequality

First, we demonstrate the power of our registry-based measure to study the relationship between income and consumption inequality, especially at the top of the income distribution.

As can be seen from Figure 1 below, which plots our registry-based measure of consumption against a survey measure of consumption for households in the Swedish consumption expenditure survey (HUT), the distribution of the registry-based measure and of the survey measure are highly correlated. Still, the measures differ more systematically in the tails. Importantly, the survey measure tends to drastically underestimate consumption levels relative to the registry-based measure at the high end of the distribution. This reflects the well-known challenge for surveys to incentivise high-income households to accurately report their expenditures on the one hand, and to capture the tail of Pareto distributions on the other hand.

Figure 1 HUT survey vs. registry-based consumption expenditures

Notes: Consumption survey data comes from the Swedish annual consumption survey (HUT). It is merged to register data via individual identification numbers and reported here as total household consumption. The joint coverage of the survey and registry data spans 2003 to 2007.

As a consequence, the registry-based measure seems to dominate the survey measure in capturing how income inequality translates into consumption inequality, especially at the top of the income distribution. We show in Figure 2 below that five-year differences in income are strongly correlated with five-year differences in consumption using our registry-based measure, and the correlation is highest across the high-income percentiles. Not only does the income increase, but the consumption increases are also the largest at the top of the income distribution. These patterns are completely missed using the consumption surveys.

The registry-based measure thus offers important new insights in the debate on the relationship between income and consumption inequality trends. In an influential paper, Krueger and Perri (2006), using survey data, suggest that the recent surge in income inequality in the US has not been met by a similar increase in consumption inequality. Our new data suggests that this is likely due to measurement error in consumption survey data. Indeed, in Sweden, the change in consumption and income inequality is highly correlated.

Figure 2 Log consumption changes vs. log income changes by percentiles of the income distribution

A. Registry-based measure

B. Household survey data

Notes: The figures compare the change of log consumption to log income, across income percentiles. In the top figure, we rank individuals by disposable income in 2003 (percentiles are indicated by the numbers above the dots, the first being the poorest). We then plot the difference in log income between 2003 and 2007 on the horizontal axis, and the difference in log consumption on the vertical axis. We repeat the same procedure using income and consumption from the survey data to produce the bottom figure.

Consumption dynamics around life-time events

The long panel dimension of registry-based measures of consumption also allows studying consumption responses to life-time events such as health shocks, retirement, or unemployment (see Kolsrud et al. 2015), and the different means used to smooth consumption in response to these shocks.

A large body of economic research has analysed how idiosyncratic income changes affect household consumption, but mostly relying on survey data with often partial measures, small samples, and a limited panel dimension. As a result, there are still considerable debates regarding the precise impact of many of these idiosyncratic shocks on household consumption, and on the ability of households to insure themselves against these income shocks and smooth consumption over time.

We demonstrate that our registry-based measure of consumption can helpfully contribute to these debates. Figure 3 below shows the evolution of consumption expenditures around two events – retirement (panel A), and health shocks (panel B). Each panel contrasts the estimates obtained from our registry-based measure of consumption with consumption survey measures. For both events we find significant drops in consumption following the event (measured as the average effect during the first four years after the event, β). But thanks to the universal coverage of our registry measure, it delivers much greater precision. In comparison, the estimates obtained using consumption surveys are noisier and an order of magnitude less precise.

Figure 3 Evolution of consumption expenditures

A. Consumption around retirement

B. Consumption around health shocks

Notes: The figures show the average household consumption around the time of a shock, as measured using registry data (black dots) and survey data (grey triangles). Consumption is normalized to zero in the year preceding the shock, so that each point estimate can be interpreted as the percentage change in consumption relative to year -1. For instance, household consumption after a health shock is on average 10% lower four years after the shock than it was the year before, according to the survey data. The bars represent 95% confidence intervals.

Open source!

All our programmes and detailed documentation of the data are available online here, and hopefully help other researchers in their analysis of household consumption. While the Swedish context is quite ideal for the construction of a registry-based measure, we hope that our efforts can also spur further research to improve our measurement, extend it beyond the period 1999-2007 in Sweden, and also extend it to other countries with potentially less precise information on income or wealth.


Attanasio, O P, and L Pistaferri (2016), “Consumption inequality”, The Journal of Economic Perspectives, 30(2), 3–28.

Autor, D, A R Kostol, and M Mogstad (2017), “Disability Benefits, Consumption Insurance, and Household Labor Supply”, NBER Working Paper 23466.

Chetty, R, and A Finkelstein (2013), “Social insurance: Connecting theory to data”, in A J Auerbach, R Chetty, M Feldstein and E Saez (eds.), Handbook of Public Economics, Volume 5, 111-193, North Holland.

Deaton, A (1992), Understanding consumption, Oxford University Press.

Eika, L, M Mogstad and O Vestad (2017), “What can we learn about household consumption from information on income and wealth”, Working paper.

Koijen, R, S Van Nieuwerburgh, and R Vestman (2014), “Judging the Quality of Survey Data by Comparison with Truth as Measured by Administrative Records: Evidence From Sweden”,  in C D Carroll, T F Crossley, and J Sabelhaus (eds.), Improving the Measurement of Consumer Expenditures, 308–346, University of Chicago Press.

Kolsrud, J, C Landais, P Nilsson, and J Spinnewijn (2015), “The Optimal Timing of Unemployment Benefits: Theory and Evidence from Sweden”, forthcoming, American Economic Review.

Kolsrud, J, C Landais, and J Spinnewijn (2017), “Studying Consumption Patterns using Registry Data: Lessons From Swedish Administrative Data”, Working paper.

Krueger, D, and F Perri (2006), “Does income inequality lead to consumption inequality? Evidence and theory”, The Review of Economic Studies, 73(1), 163–193.

Mian, A, K Rao, and A Sufi (2013), “Household balance sheets, consumption, and the economic slump”, The Quarterly Journal of Economics, 128 (4), 1687-1726.

Pistaferri, L (2015), “Household consumption: Research questions, measurement issues, and data collection strategies”, Journal of Economic and Social Measurement, 40(1-4), 123-149.

Sodini P, S Van Nieuwerburgh, R Vestman, and U von Lilienfeld-Toal (2016), “Identifying the benefits from home ownership: A Swedish experiment”, NBER working paper 22882.



Topics:  Frontiers of economic research

Tags:  consumption, households, survey, Inequality

Analyst and Researcher, National Institute of Economic Research, Stockholm

Professor of Economics, London School of Economics

Associate Professor in Economics, London School of Economics; Research Fellow, CEPR & IFS


CEPR Policy Research