VoxEU Column Frontiers of economic research

The generalisability of experimental results in economics

Lab and field experiments help us understand human behaviour as they increase our confidence in causal effects in regard to different economic problems. This column highlights the relevance of experimental data and discusses the value of lab in comparison to field experiments. While lab experiments are the only applicable way-to-go in a number of situations, they tend to inflate scrutiny. This could artificially modify behaviour, and would potentially threaten the causal interpretation of the estimates. The debate about lab versus field experiments is far from settled. However, what economists do agree about is that to obtain convincing causal effects relating to human behaviour, a joint consideration of a number of methods would be superior to using any single one in isolation.

One of the most basic elements of scientific knowledge is the causal effect – the effect on a variable Y of a change in variable X. Virtually every decision that we make rests upon beliefs about causal effects, from the mundane – the causal effect of an alarm clock on punctuality – to the critical – the causal effect of fiscal stimulus on unemployment.

We would rather make decisions when certain of the relevant causal effects. Sometimes, science affords us something close – water freezes at 32° F – science has given us a clear idea of how liquids react to different variables. But in the social sciences (the study of human decisions), we are rarely confident in our causal effects because there’s so much we still need to understand about human behaviour. Instead, we must rely on estimates that we are constantly updating based on new evidence, be it the formal kind published in peer-reviewed journals, or simply what we observe casually in our day-to-day lives.

Loosely speaking, epistemology is the study of how one estimates causal effects and updates those estimates as new data emerge. The last 50 years have seen laboratory experiments and, more recently, field experiments penetrate economics. There has been a lively debate on how to assimilate experimental data into estimates of causal effects, and the sabre-rattling has been particularly cacophonous when it comes to laboratory versus field experiments as information sources.

An example

More concretely, consider realising the gains from trade. Bob wants to purchase Susan’s mug for $5, but they live far apart and so he will need to send a check. The mug is worth $3 to Susan and $9 to Bob, implying a societal surplus of $9 - $3 = $6 if the transaction occurs. However, if Bob sends the money first, will Susan send the mug? If Susan sends the mug first, will Bob send the money? Signing a legally enforceable contract would facilitate the trade.

However, what if property rights are poorly enforced (e.g., if they live in different countries)? Then their fear may result in them forgoing the trade and the surplus of $6. Formally, they would believe that the causal effect of fully enforcing property rights on the likelihood of the trade being executed correctly is positive and large. (As an aside, economists use a similar analysis when arguing the importance of strong and impartial legal enforcement to economic growth.)

In reality, we cannot conveniently manipulate the legal system. We can still glean information by comparing outcomes in strong legal systems (e.g., Germany) to weak ones (e.g., Somalia), but with so much non-random background variation (e.g., climate, history, ethnolinguistic fragmentation), the differences in outcomes are difficult to interpret. Alternatively, laboratory experiments allow us to surgically manipulate property rights to study their importance to trade.

The ‘trust game’ in the lab

This is what Berg et al. (1995) did using the ‘trust game’ – a microcosm of Bob and Susan’s quandary. The sender starts with $10 and the responder starts with $0. The sender can send any amount to the responder, retaining the remainder. The responder receives triple whatever the sender transfers. The responder then decides how to divide the tripled amount between the two, terminating the game. For example, if the sender transfers $4, the sender retains $6 and the responder receives $4 x 3 = $12; finally, the responder can choose to return anywhere between $0 and $12.

The desirable outcome is for the sender to transfer all $10, and for the responder to return at least $10 ($15 under egalitarianism). This mimics Bob and Susan trading the mug. It leaves the two with $30 between them – much more than just $10 with the sender. But the sender may doubt the responder’s trustworthiness. The responder may just choose to pocket whatever the sender transfers since the sender has no legal recourse. Anticipating this, like Bob and Susan, the sender may just decide to avoid transacting, retaining the $10.

Berg et al.’s results suggest that such fears were potentially ill-founded. Even when the trust game was played with anonymous strangers, on average, senders would send $5.16, and responders would return $4.66. The authors appealed to altruism to explain the results, i.e., the players feel bad about the other party receiving a low payoff, motivating behaviour closer to what emerges under fully enforced property rights. Subsequent studies (Fehr et al. 1993) have confirmed these results, and interviews reveal that participants’ altruism is their most common stated motivation.

Would reading these papers make Bob reconsider his reluctance to send Susan the check first? More formally, do the results of these laboratory experiments generalise to other environments? It could be the case that certain features of laboratory experiments make them ill-suited for generalisation to field settings, especially when behaviour in the experiment was strongly linked to altruistic behaviour. I was particularly concerned that the scrutiny of a laboratory experiment made people behave more altruistically than in genuinely anonymous environments. An extreme illustration would be someone asking you on television how much you would donate to save gorillas; you would be forgiven for exaggerating to avoid publicly looking cruel.

Going to the field

In a 2006 paper, I investigated these hypotheses using field experiments.

  • One of the advantages of field experiments is that participants are unaware that they are in an experiment, eliminating any added scrutiny from the experiment itself (though natural levels of scrutiny remain).

I went to professional sports memorabilia markets and recruited professional traders to play laboratory trust games. Like the literature, I found a modest positive causal effect of property rights.

I then ran a complementary field experiment. As a sports card enthusiast, I was aware that the market was awash with player cards of different quality (grade), and that the professional sellers were better at discerning grade than your average fan milling around the exhibition. Critically, an individual requesting a high-grade card and receiving a low-grade card was rarely aware at the time of the transaction and, if they found out subsequently, they had no legal recourse. Thus, buying a card required a buyer to trust – like Bob buying the mug – the sender in the trust game. I recruited some archetypal sports fans to go to professional sellers and request a specific card at a specific grade in exchange for a predetermined price, without revealing to the seller that this was an experiment. If traders were completely selfish, then they would return the lowest grade of card whatever price was offered to them, confirming the need to enforce property rights. Alternatively, if the traders behaved ‘altruistically’ like the laboratory experiment participants, then they should offer higher quality when offered higher prices.

The results were pretty grim for anyone who believes in the humanity of sports card dealers – card quality was insensitive to price offers. Note that these were the same traders who had apparently exhibited altruistic tendencies in the preceding laboratory experiment. Equally significant were the results a few months after the establishment of a third party that impartially grades sports cards. Sports card dealers now returned higher quality cards when offered higher prices, especially the local ones that stood to gain the most from cultivating a reputation for honest trading.

These results painted a bleaker picture of anonymous trade in the absence of property rights. Were Bob and Susan to read the entire literature, what would be the epistemologically ideal way for them to update their beliefs on the causal effect of property rights on trading behaviour? Which results generalise to their setting more accurately – laboratory or field?

Lab versus field experiments

Some general principles to keep in mind:

  • First, if you believe that laboratory experiments artificially inflate scrutiny, and that this induces artificially altruistic behaviour, then you should lean toward field experiments since they guarantee natural scrutiny levels for the environment in question.
  • Second, there are other reasons that many economists consider laboratory experiments artificial.

As is evident in the aforementioned trust game, scenarios can be very abstract, the range of actions unrealistically small, and participants take on unfamiliar roles (Levitt and List 2007). In contrast, in real settings (including field experiments), by definition participants behave in a realistic way, usually occupying their normal roles. These features reinforce one’s cautiousness over generalising from a laboratory setting.

  • Third, any experiment that requires explicit volunteers runs the risk of yielding results biased by the mere act of volunteering.

For example, suppose that President Obama announces an experiment designed to estimate the causal effect of an after-school program on children’s academic performance, and he calls for volunteers. Potential participants are told that they will be randomised into two groups: those participating vs. not participating in the program. Plausibly, the families that sign up could be the ones that stand to gain the most from participating, a potential source of bias when Obama seeks to extrapolate the estimated causal effect to the entire population. Field experiments can sidestep this issue by making participation an uninformed decision.

When it comes to generalisation, Bob might be hyper-conservative; he could say to himself: “List’s field experiment might be nominally closer to my real setting now than the laboratory trust games, but buying and selling a mug with a stranger is still a long way from buying graded sports cards in an exhibition, so frankly, both experiments are useless; the sports card experiment is useful only for the sports card environment.” But even under empirical hyper-conservatism, the solution is running a field experiment in as close as possible a setting to the one he wants to study. For the reasons described above, laboratory experiments are always quite far from a real setting, so their results rarely affect the beliefs of conservative consumers of empirical studies.

Does this eliminate the value of laboratory experiments?

No – there are many classes of causal effects where our experimentally gained knowledge depends principally upon the laboratory. Ethics, legality, and cost are three of the factors that most frequently render field experiments infeasible. For example, one hopes that Federal Reserve Chair Janet Yellen does not plan on deploying field experiments to refine her knowledge of the effect of interest rates on inflation. Similarly, field experiments on the effects of bankruptcy are unacceptable and impractical. In these cases, laboratory experiments often provide the best (albeit imperfect) source of information on causal effects.

While economists disagree about the relative usefulness of laboratory and field experiments, a consensus has emerged in the literature that when forging estimates of causal effects relating to human behaviour, elements of the entire spectrum of empirical techniques operate as complementary sources of evidence. Some of the most informative research projects combine surveys, observational data, and laboratory and field experiments as part of a multi-pronged attack on the question under consideration, nullifying the laboratory vs. field debate. (One example would be how laboratory, field, and observational studies have informed research on prospect theory with lab evidence like Knetsch (1989) inspiring observational studies like Genesove and Mayer (2001), and field experiments like List (2003), which has, in turn, inspired a new generation of lab, field, and observational studies – see Della Vigna (2009) for an overview.)

References

Berg, J, J Dickhaut and K McCabe (1995), “Trust, Reciprocity, and Social History,” Games and Economic Behaviour, 10, p122-142.

DellaVigna, S (2009), “Psychology and Economics: Evidence from the Field”, Journal of Economic Literature, 47, p315-372.

Fehr, E, G Kirchsteiger and A Riedl (1993), “Reciprocity as a Contract Enforcement Device,” Econometrica, 65, p833-860.

Genesove, D and C Mayer (2001). “Loss Aversion and Seller Behaviour: Evidence from the Housing Market”, Quarterly Journal of Economics, 116, p1233-1260.

Knetsch, J (1989), “The Endowment Effect and Evidence of Nonreversible Indifference Curves”, The American Economic Review, 79, p1277-1284.

Levitt, S and J List (2007), “What Do Laboratory Experiments Measuring Social Preferences Reveal About the Real World?”, Journal of Economic Perspectives, 21, p153-174.

List, J (2003), “Does Market Experience Eliminate Market Anomalies?”, Quarterly Journal of Economics, 118, p41-71.

List, J (2006), “The Behaviouralist Meets the Market: Measuring Social Preferences and Reputation Effects in Actual Transactions”, Journal of Political Economy, 114, p1-37.

3,884 Reads