VoxEU Column Frontiers of economic research Productivity and Innovation

Data superpowers in the age of AI: A research agenda

23 Oct 2018

Calls for regulation of big tech are getting louder and louder. This column argues that policy proposals should be evaluated through the lens of their impact on the evolution of artificial intelligence. It proposes a holistic framework that encompasses consumer control over data, competition in product markets, incentives to innovation, and implications for international trade. It also highlights the role played by major big tech companies, and the threat of data and artificial intelligence monopolisation.

Paolo Ciocca

Commissioner Commissione Nazionale per le Società e la Borsa

Claudia Biancotti

Research advisor to the Head of IT Bank Of Italy

In advanced economies, large tech companies such as Google, Amazon, Facebook, and Apple (henceforth collectively known as ‘GAFA’) are drawing increasing scrutiny from authorities and the public. Like many corporate giants in a variety of trades, GAFA have been taken to task on grounds of abusing market power and tax evasion. They have also attracted new, sector-specific accusations, ranging from disregard for user privacy to inadvertent involvement in attempts to subvert democracy.

So far, policymakers have tackled each issue separately. For example, in most OECD countries the question of privacy has been framed as a pure consumer protection problem, while individual instances of non-competitive rents have been tackled with standard antitrust tools. But this segmented approach is likely to be suboptimal, as problems posed by big tech are interconnected.

Market efficiency, consumer protection, and fair access to information - to name just the main concerns - are all complementary sides to a single story, revolving around the emergence of artificial intelligence (AI) as a general-purpose technology (Trajtenberg 2018). In the long run, the question is: Who gets to own AI?

N-faced markets

GAFA are sometimes represented as two-sided platforms – that is, firms that derive revenue from facilitating transactions between two groups of agents in markets with cross-network externalities. The value of intermediation depends, for at least one of the groups, on the size of the other group.

Two-sided markets typically show asymmetric price structures (Rochet and Tirole 2003) – platforms sell below cost on the more price-sensitive side in order to expand the network, and offset their loss by charging the other side for the value of the resulting externality (Evans and Schmalensee 2007, Rysman 2009). A classic example is given by newspapers, where advertisers subsidise readers.

Indeed, if GAFA were just middlemen in two-sided markets there would be no need for regulatory innovation. Their handling of information collected from users should be governed by ordinary data and consumer protection rules, and any suspected anti-competitive behaviour should be analysed with ordinary antitrust categories, amended so as to take cross-network externalities into account (Evans and Schmalensee 2014).

Except that GAFA look more like an N-faced die (with a solid core of AI) than a two-sided platform. They are simultaneously active in a large number of markets, extending well beyond the original flagship goods and services. Amazon first gained dominance in e-commerce, and now it also leads in cloud computing and the development of digital home assistants. Along with Apple, a company that was founded to produce computers, it is branching out into healthcare. Facebook is best known for social networking and instant messaging, but it is also very good at recognising images, and patented a credit scoring method. Google started as a search engine, but today Gmail and Google maps are ubiquitous, and the Duplex system can simulate phone calls.

Some of the markets where GAFA operate are two-sided – social platforms and search engines both have users on one side, advertisers on the other, with the expected asymmetric price structure. Other markets, for example cloud storage, are one-sided. The key feature of all N faces, however, is that they double up as entry points for massive amounts of user-provided data; all products on offer, from apps to wearables, include AI components whose performance depends on this data.

Furthermore, AIs interact with each other. An AI embedded in a search engine can learn something about natural language by processing millions of user queries, then pass it on to another AI designed to write automated replies to emails; cross-network externalities in two-sided markets are complemented by cross-product ones in the whole ecosystem. GAFA, aware of how much this effect matters, are channelling significant resources into AI development.

One all-seeing AI?

While estimating the total contribution of AI to productivity growth is not easy, especially as the technology is still maturing (Brynjolfsson et al. 2017), investors believe that algorithms hold promise. According to Stanford University’s 2017 AI Index Report (Shoham et al. 2017), in the US, venture capital funding to AI startups has grown six-fold since 2000; it now represents 4–6% of total annual venture capital investment. Remarkable results have been achieved in commercial applications, but also in fields that have the potential to be more deeply relevant to human welfare, such as medical diagnosis (Brynjolfsson and Mitchell 2017).

In principle, policymakers should welcome the fact that some of the largest companies in the world, known for the high level of their human capital, are hard at work on innovation that might foster economic and social progress. But they should also be wary of monopolisation risks.

Algorithms already know how to perform and routinise specific tasks autonomously and efficiently; this is sometimes called ‘narrow AI’. Its most successful incarnation is machine learning, a statistical strategy for pattern recognition (Cockburn et al. 2018). Artificial general intelligence – that is, computer intelligence similar to human thinking, with an advanced capacity for flexibility and invention – is not yet part of our daily lives; still, research is making progress, with AIs learning how to improve themselves and manipulate abstract concepts.

At the moment, data-hungry algorithms prevail. Machine learning tools such as deep neural networks and support vector machines are, in essence, huge nonparametric classifiers; accordingly, they are affected by the ‘curse of dimensionality’ familiar to econometricians (e.g. Pagan and Ullah 1999). To deliver reliable insights, they need to be estimated – or ‘trained’, in machine learning parlance – on very large datasets, and validated on even larger datasets. Exclusive ownership of some types of big data by a few corporations may constitute a barrier to entry in markets for AI-based services (Rubinfeld and Gal 2017, Rogoff 2018), where the object of the transaction is generally a prediction delivered by an estimated model (Agrawal et al. 2018).

As machines refine the capacity for symbolic representation, the availability of data may not be as essential for growth as it is today. However, having the best AIs today might still entrench a competitive advantage in creating the best ones tomorrow. This is especially relevant given the fact that AI may evolve toward a swarm intelligence model (e.g. Bonabeau et al. 1999), whereby algorithms with complementary abilities self-organise to cooperate, creating a de factocollective mind. If the process goes unchecked, the magnitude of network effects could benefit first movers to a degree incompatible with any market contestability.

Monopolisation of AI, given its potential pervasiveness, could spill over from product markets into sensitive non-economic areas – such as political information – in ways that might not be imaginable today.

Research challenges

In order to understand whether and how data superpowers should be regulated in the age of AI, a unifying framework is needed. A research agenda, ideally bringing together economists with scientists and legal experts, would have to include at least three core components:

1) A general theory of property rights over data and of data tradability.

The information that enables AIs to yield a profit for GAFA is mostly produced by humans while online, but there is no consensus on the rules that should govern what can be provided and under which conditions. Can all data be owned and traded, or are there exceptions (European Data Protection Supervisor 2017)? Assuming that there are data that can be owned, does the data subject always own them independent of how they were generated, or do they sometimes belong to the collector? How should the revenue from data be shared between provider and collector on the continuum that goes from a piece of information that clearly has a monetary value of its own, to one that is profitable only when aggregated? Who owns the insights generated by algorithms? Under which circumstances can data be given away for free?

Intuitively, there is a difference between consumption exhaust such as search queries performed while shopping online, content intentionally provided by users such as videos shared on a platform, and data explicitly produced for machine learning purposes through repetitive tasks such as labelling photos that have no personal significance to the labeller. Only in the latter case has a transparent market developed, as exemplified by Amazon Mechanical Turk. Is this setup optimal? Or are GAFA extracting rents from one key information asymmetry, the inability that users have to estimate the value of the data they provide? Would the equilibrium look different if users had a way of coordinating their data provision decisions?

The understanding of property rights has implications on the options available to policymakers intending to avoid AI monopolisation. If, at one extreme, data provision is cast as labour (Lanier 2013, Posner and Weyl 2018), then the goal is turning GAFA’s labour oligopsony into a competitive market, thus eventually also curtailing non-competitive rents in AI product oligopolies.

If, on the opposite end of the spectrum, data is just another commodity but the development of AI is still socially desirable, regulators should make sure that smaller firms have enough high-quality information to compete effectively with big corporations, and that scientific research does not turn entirely into a for-profit endeavour dependent on private datasets. In the UK, a recent report commissioned by the Government suggests the creation of private data trusts with public guarantees (Hall and Pesenti 2017); in France’s national AI strategy there is an option for enforcement of data openness in fields of public interest (Villani 2018).

2) Collection and analysis of empirical evidence on how the availability of which data gives a competitive edge in developing which applications.

Without knowledge of what matters in making AI better, it is difficult to quantify cross-network and cross-product externalities. This might lead to inefficient regulatory decisions, such as requiring GAFA to share data that turn out to be irrelevant to AI improvement.

The issue is part of a larger black-box problem. In a parametric setting, predictions are generally explainable—for example, regression coefficients show whether a change in a given independent variable affects the outcome positively or negatively, and by how much. The contribution of individual predictors to a model’s overall fit can be evaluated (though sometimes imperfectly) with variance decomposition techniques. As a general rule, and at least for now, results from AI models cannot be read in the same way – computational complexity is too high to allow for the isolation of single-variable effects.

Significant resources are being channelled into solving AI explainability (Gunning 2017), also in the wake of ethical concerns around algorithmic bias (Boddington 2017); if we do not know how computers decide, we have no way of preventing them from assessing creditworthiness based on ethnicity. Inroads have been made on dataset diversity, and the elimination of human bias from variable selection – insights on measuring variable salience, however, remain scarce.

Recently, a few papers have appeared where the problem is partly circumvented by taking the black box as a given, instead of trying to pry it open, and looking at the way its very existence influences outcomes; among others, Schaefer et al. (2018) quantify how the length of user histories available to a search engine affects click-through rates for search results. While not a substitute for explainability research, more work in this vein could still prove useful.

3) Analysis of how regulation of big tech would impact international trade in AI.

GAFA are not alone in the world. China has its own data superpowers, heavily invested in AI development: search engine Baidu, e-commerce giant Alibaba, and multipurpose social platform Tencent (henceforth ‘BAT’). Currently, there is minimal overlap in the largest markets; GAFA are dominant in the OECD, while BAT lead in China. Compartmentalisation, however, may be fading fast.

On the one hand, BAT are stepping up their attempts to penetrate advanced economies – so far, they were held back by a combination of competitive disadvantage in purely economic dimensions and national security safeguards. The former may not last indefinitely, and the latter may not evolve in the same way everywhere – in the US, increased political friction with China over trade is likely to yield more restrictions, but other OECD jurisdictions might adopt different stances.

On the other hand, demand for AI-based services in emerging countries with young, tech-savvy middle classes is growing. With a combined population of 1.6 billion, India and Indonesia are turning into key GAFA–BAT battlegrounds; the African continent may follow soon.

As the data and AI markets open, how would social welfare in OECD countries be affected by the introduction of policies aimed at containing GAFA’s power? Should BAT gain more ground as a consequence, consumers on GAFA home turf may enjoy positive fallout from enhanced competition in terms of lower prices and better quality of goods and services, but may face negative effects from reduced control over data (for the intricate politics of online privacy in China, see Chorzempa et al. 2018).

More importantly, the risk of AI monopolisation gets worse in a global market, because of enhanced network effects and economies of scale. If GAFA are weakened by domestic policies, while China takes no action to contain its own behemoths, we may still end up with one all-seeing AI, except that its understanding of human-machine-government relations may reflect values different from those prevalent in liberal democracies.

This scenario is not dramatically far-fetched. China plans to become a global AI leader by 2025 (State Council of the People’s Republic of China 2017); BAT are instrumental to this goal; Chinese authorities will direct their efforts, oversee their progress, and may attempt to appropriate some – if not most – of their data and results. So what is the effect of behind-the-border policy choices on international competitiveness (Goldfarb and Trefler 2018)? In other words, what should OECD policymakers do in order to maximise efficiency and innovation at home, without handing a worldwide AI monopoly to China? Is there scope for joint OECD–China AI development, and under which conditions? Is there a case for trade restrictions and, if so, which ones? Among non-military sectors, are some fields – say, fintech – more sensitive than others?

One idea worth exploring, building on the swarm intelligence hypothesis outlined above, is the establishment of a network of co-operative AIs, developed and owned by multiple stakeholders. Would this network be able to perform better than an integrated ecosystem managed by a single entity? Could technical mechanisms be introduced that prevent concentration of knowledge and power (Buterin and Weyl 2018)? Depending on how failsafe their governance rules are, and on what is considered politically desirable, such mechanisms could either enable global cooperation in a multi-firm setting or power decentralised national champions. In both cases, monopolisation risks may be reduced.

Authors’ note: The views expressed in this note are personal and should not be attributed to the Bank of Italy, the Peterson Institute for International Economics or Consob. We would like to thank Riccardo Cristadoro, Mario Rasetti, Geza Sapi, and Giovanni Veronese for their comments on an earlier draft.

References

Agrawal, A, A Goldfarb and J Gans (2018), Prediction Machines: the Simple Economics of Artificial Intelligence, Harvard Business Review Press.

Boddington, P (2017), Towards a Code of Ethics for Artificial Intelligence, Springer.

Bonabeau, E, M Dorigo and G Theraulaz (1999), Swarm Intelligence: From Natural to Artificial Systems, Oxford University Press.

Brynjolfsson, E, D Rock and C Syverson (2017), “Artificial intelligence and the modern productivity paradox: A clash of expectations and statistics,” NBER, Working paper 24001.

Brynjolfsson, E and T Mitchell (2017), “What can machine learning do? Workforce implications,” Science 358(6370): 1530–1534.

Buterin, V and G Weyl (2018), “Liberation through radical decentralization,” Medium post, 21 May.

Chorzempa, M, P Triolo and S Saks (2018), “China’s social credit system: A mark of progress or a threat to privacy?” Peterson Institute for International Economics, Policy Brief 18-14.

Cockburn, I M, R Henderson and S Stern (2018), “The impact of artificial intelligence on innovation,” in A Agrawal, A Goldfarb and J Gans (eds), The Economics of Artificial Intelligence: an Agenda, NBER.

European Data Protection Supervisor (2017), Opinion 4/2017 on the Proposal for a Directive on Certain Aspects Concerning Contracts for the Supply of Digital Content.

Evans, D and R Schmalensee (2007), “The industrial organization of markets with two-sided platforms,” Competition Policy International 3.

Evans, D and R Schmalensee (2014), “The antitrust analysis of multisided platform businesses”, in R D Blair and D D Sokol (eds), The Oxford Handbook of International Antitrust Economics, Oxford University Press.

Goldfarb, A and D Trefler (2018), “AI and international trade”, NBER, Working Paper 24254.

Gunning, D (2017), Explainable Artificial Intelligence (XAI), DARPA Program Update.

Hall, W and J Pesenti (2017), Growing the Artificial Intelligence Industry in the UK, report prepared for the United Kingdom Departments for Digital, Culture, Media & Sport and Business, Energy & Industrial Strategy.

Lanier, J (2013), Who Owns the Future?, Simon & Schuster.

Pagan, A and A Ullah (1999), Nonparametric Econometrics, Cambridge University Press.

Posner, E and E G Weyl (2018), Radical Markets: Uprooting Capitalism and Democracy for a Just Society, Princeton University Press.

Rochet, J C and J Tirole (2003), “Platform competition in two-sided markets,” Journal of the European Economic Association 1(4): 990–1029.

Rogoff, K (2018), “Big tech is a big problem,” Project Syndicate, 2 July.

Rubinfeld, D and M Gal (2017), “Access barriers to big data,” Arizona Law Review 59: 339–381.

Rysman, M (2009), “The economics of two-sided markets,” Journal of Economic Perspectives 23(3): 125–143.

Schaefer, M, G Sapi and S Lorincz (2018), “The effect of big data on recommendation quality: The example of internet search,” Dusseldorf Institute for Competition Economics, Discussion Paper 284.

Shoham, Y, R Perrault, E Brynjolfsson, J Clark, J Manyika and C LeGassick (2017), Artificial Intelligence Index, 2017 Annual Report.

State Council of the People’s Republic of China (2017), New Generation Artificial Intelligence Development Plan.

Trajtenberg, M (2018), “Artificial intelligence as the new general-purpose technology: A political economy perspective,” NBER, Working paper 24245.

Villani, C (2018), “For a meaningful artificial intelligence: towards a French and European strategy”, report prepared for the French Prime Minister.

2,730 Reads