Using traditional and digital data sources together in economic research

Edward Glaeser, Hyunjin Kim, Michael Luca 17 January 2018

a

A

How is Boston’s economy doing? According to the latest data available as of the end of 2017, employment in Suffolk County (which contains Boston) rose from 572,000 to 595,000 between 2014 and 2015. But data from 2015 may seem awfully out of date for policymakers, investors, and voters in Boston’s 2017 mayoral election. This speaks to a broader problem that stymies research and policy alike – it is valuable to have a sense of how the economy is doing this year or even this month, but official government data sources are often released after notoriously long lags.

Traditional public data sources on local economies, including County Business Patterns and other Census products, are typically available after a multi-year lag, and can take even longer when looking for more granular data that come with restricted access. However, private data from platforms such as Yelp, Google, and LinkedIn are essentially available in real time. This raises the potential for digital data to supplement official government statistics, by providing more up-to-date snapshots of the economy. In recent years, research has raised the potential for data sources from online platforms to predict economic outcomes such as inflation, unemployment claims, housing prices, and entrepreneurship (Choi and Varian 2012, Cavallo 2012, Einav and Levin 2014, Wu and Brynjolfsson 2015, Guzman and Stern 2016, Glaeser et al. 2017), as well as to improve policy evaluations (Luca and Luca 2017).

But if online data sources are going to be used to quantify economic activity, it’s important to understand how they compare to the key statistical datasets that have historically been used to measure the economy. With this in mind, we set out to explore the potential for digital data to predict traditional measures of the economy that are widely used by policymakers and academics, and evaluate the conditions under which the data can accurately measure changes in the economy.

In a recent paper, we illustrate how data from Yelp can provide an up-to-date snapshot of the economy of a city or neighbourhood (Glaeser et al. 2017). Yelp’s data can help predict contemporaneous changes in ZIP code-level establishment growth, especially in higher income, higher density parts of America.

Yelp coverage of consumer-facing retailing establishments is much better than its coverage of other sectors. By 2015, Yelp had reviews on 1.4 million businesses, 18% of the number of establishments listed in County Business Patterns. In the restaurant sector, Yelp covers 576,000 restaurants in almost 22,719 ZIP codes, while County Business Patterns has 542,000 restaurants in 24,790 ZIP codes.1 Figure 1 shows a map of Yelp’s coverage across the US in 2015.  

Figure 1 Yelp coverage of CBP restaurants by ZIP code in 2015

We test whether Yelp data predict establishment growth in County Business Patterns for the years prior to 2015 at the ZIP code level. There is strong persistence in establishment growth rates, and so we control for two years of lags in County Business Pattern establishment growth, which, along with year fixed effects, can explain 14.8% of the variation in establishment growth across US ZIP codes. Adding Yelp data from the year in question boosts the r-squared to 22.5%. Our point estimate suggests that one extra Yelp business is associated with 0.6 more businesses in County Business Patterns. In many cases, including today, we don’t even have the one-year lag of County Business Patterns growth, which means that the marginal contribution of Yelp data would be even larger.

Yelp’s predictive power in the restaurant sector is even more impressive, because past restaurant growth doesn’t predict current restaurant growth. Year effects and lagged growth can explain less than 1% of the variation in restaurant growth rates across ZIP codes. Including growth in Yelp-reviewed restaurants boosts the r-squared to 11%. Using a richer set of Yelp variables increases the r-squared to almost 14%.

The fact that Yelp’s current data can predict current economic events suggests that patterns in Yelp data reflect patterns in the underlying economy, rather than simply patterns in the adoption of Yelp. At the same time, Yelp is more predictive in some places than in others, because this online source of information is not used uniformly across places. Richer, denser, and better-educated places have better Yelp coverage, presumably because they have a more internet-savvy population that likes to go out more. We find that one Yelp business is only associated with 0.2 extra County Business Patterns establishments in places that are poorly educated, low income, and less dense. The relatively low coefficient reflects Yelp’s weaker coverage in those areas. The coefficient rises to 0.5 in richer parts of the US, even if they are less dense or less educated. The coefficient increases to almost 0.75 when a ZIP code has money, education, and density. We expect that these spatial differences may decline if Yelp spreads further, but for now, ‘nowcasting’ with Yelp is safer in richer and better-educated cities, such as Boston. Yelp’s predictive power also differs by industry. It provides little ability to predict manufacturing growth, and does best in predicting retail establishment growth and the growth of business and professional services.  

Yelp provides timelier local economic data than County Business Patterns, but it also provides data at a more granular level than is available in the public-facing County Business Patterns data. Block-by-block, street-by-street, Yelp can provide policymakers with recent changes in economic geography. In principle, these data can be a useful input to real estate investors, builders, and even business owners rethinking their location. Furthermore, Yelp makes it possible to measure new outcomes that were never included in traditional data sources. To illustrate in the context of New York City, Figure 2 shows how Yelp data can be used to analyse the types of restaurants that open across neighbourhoods, looking at price levels of their menus.

Figure 2 Number of mid-to-expensive ($$+) yelp restaurants opening per capita in 2015

Ultimately, digital data from online platforms can offer faster and geographically detailed images of the economy, but are a complement rather than a substitute for official government statistics. Our confidence in the value of Yelp data comes entirely from the fact they are is cross-validated with those statistics. The new big data frontier provides enormous opportunities, but it should never be an excuse for cutting the funding of traditional government data, which – when combined with new data sources – will provide a more complete picture of the economy.

References

Cavallo, A (2012), “Scraped Data and Sticky Prices”, MIT Sloan Working Paper.

Choi, H, and H Varian (2012), “Predicting the Present with Google Trends”, Economic Record 88:2–9.

Einav, L, and J Levin (2014), "The Data Revolution and Economic Analysis", Innovation Policy and the Economy 14.

Glaeser, E L, H Kim, and M Luca (2017), “Nowcasting the Local Economy: Using Yelp Data to Measure Economic Activity”, NBER Working Paper 24010.

Glaeser, E L, S D Kominers, M Luca, and N Naik (2018), “Big Data and Big Cities: The Promises and Limitations of Improved Measures of Urban Life” Economic Inquiry, 56, 1, 114-137.

Guzman, J, and S Stern (2016), “Nowcasting and Placecasting Entrepreneurial Quality and Performance”, Working Paper.

Luca, D, and M Luca (2017), “Survival of the Fittest: The Impact of the Minimum Wage on Firm Exit” Working paper.

Wu, L, and E Brynjolfsson (2015), “The Future of Prediction: How Google Searches Foreshadow Housing Prices and Sales” in A Goldfarb, S M Greenstein, and C E Tucker (eds.), Economic Analysis of the Digital Economy, Chicago: University of Chicago Press.

Endnotes

[1] These Yelp numbers exclude any businesses in Yelp that are missing a ZIP code, price range, or any recommended reviews.

a

A

Topics:  Frontiers of economic research

Tags:  big data, measurement, data, Yelp, Boston, digital data

Fred and Eleanor Glimp Professor of Economics, Harvard University

Doctoral candidate, Harvard Business School

Lee J. Styslinger III Associate Professor of Business Administration, Harvard Business School

Events

CEPR Policy Research