How to measure risky sex: Biological markers or self-reported data?

Lucia Corno, Áureo De Paula

13 January 2015



An estimated 34 million people are living with HIV/AIDS worldwide, and 2.5 million people become infected every year (UNAIDS 2012). Risky sexual behaviours are the main conduit for the spread of the disease, and understanding how they change over time is crucial to design and evaluate potential interventions to address the epidemic. Given that sexual activities are largely private though, what is the best way to measure risky sex?

Biomarkers vs. self-reported data

Early studies on the HIV/AIDS epidemic mainly rely on self-reported data on sexual behaviours (e.g. frequency of sexual contacts, the use of condoms, etc.). Those have been criticised as individuals may misreport their activities and give socially desirable answers (among others, Palen et al. 2008, De Paula et al. 2014). Thus, more recently, economists and social scientists have instead focused on collecting biological markers (biomarkers) on the incidence of curable sexually transmitted infections (STIs) (e.g. chlamydia, gonorrhoea, syphilis) as objective measures of risky sexual behaviours. If risky sexual behaviours leading to the spread of HIV/AIDS also facilitate the dissemination of other STIs, individuals testing positive for a sexually transmitted disease can be tagged as having engaged in risky sex. Biomarkers on STIs are therefore quickly becoming a popular method to measure risky sexual behaviours in large-scale behavioural change interventions aimed at promoting safer sexual practices and reducing HIV prevalence (Gong forthcoming, de Walque et al. 2012).

One important point to bear in mind is that neither measure is perfect. The rate at which self-reported data on sexual behaviours correctly measures risky sex is simply the likelihood of truthful elicitation in a survey. On the other hand, the rate of correct classification of risky sex using biomarkers – that is, being tested positive for an STI when having behaved in a risky manner – is equal to the disease transmission rate. In a recent working paper (Corno and de Paula 2014), we evaluate the relative reliability of biomarkers and self-reported measurements as proxies for risky sex and provide strategies to combine various measurements to obtain more precise estimates of risky sex and its correlates.

We start by postulating a standard epidemiological model where the transmission of an STI to an uninfected person is related to (i) the proportion of infected people in the population; (ii) the likelihood of meeting one of those infected people; (iii) the number of partners; (iv) the number of sexual contacts per partner; and, finally, (v) the probability of infection from an infected partner in a given sexual encounter. Using this model to obtain a reasonable assessment of the likelihood of misclassification by a particular biological marker, we are able to establish a comparison with self-reporting data on risky sexual behaviours. For example, we can show that if the probability of disease transmission implied by the epidemiological model is lower than the frequency of reporting risky sexual behaviours in a survey, then the biomarkers have a higher probability of misclassification of risky sex than behaviours elicited by a survey questionnaire. This result can be useful to researchers and policymakers when choosing a proxy for risky sexual behaviour from a menu of biological markers and self-reported answers.

An illustration using Zambian data

To illustrate this result, we simulate the model using the Demographic and Health Survey (DHS) from Zambia in 2007. The biomarker available in the DHS is syphilis. In Figure 1, we plot the probability of correct measurement by the biomarker obtained from the epidemiological model on the vertical axis as a function of the proportion of infected people in the population on the horizontal axis. As expected, if a sexually transmitted disease is common in the population, the probability that an individual who engaged in risky sexual intercourse will be infected is higher, compared to settings where the same STI is less common. Consequently, the probability of detecting risky sexual behaviours with biomarkers is also higher. Furthermore, the intersection between the horizontal line indicating self-reported behaviour and the upward sloping curve describing the relationship between the probability of correct classification using biomarkers and STI prevalence provides the threshold prevalence rate below which self-reported answers are a better proxy for risky sexual behaviours. For values of prevalence below the threshold, self-reported data are a more accurate measure of risky behaviours.

In the case of Zambia, where syphilis prevalence is 4.3%, if the person meets every individual with equal and independent probability regardless of their infection status, biomarkers have a lower probability to correctly measure risky sexual behaviours than self-reported responses. (If syphilitic individuals are more likely to have sex with other syphilis-positive individuals, the syphilis biomarker is an even poorer measure of risky behaviour as uninfected individuals pursuing risky behaviours are less likely to be infected.) The prevalence for many other STIs is also relatively low in various studies. For example, in 2004 the Malawi Longitudinal Study of Families and Health finds that gonorrhoea is only found among 3.2% of respondents; chlamydia, only for 2.4%; and a mere 0.3% of the respondents test positive for trichomoniasis (see Kohler et al. 2014). These results can be an important tool to find the best measure of risky behaviours in programmes aiming at reducing HIV prevalence in a given country.

Figure 1. Accuracy of biomarkers

Source: Zambia Demographic and Health Survey 2007.

Notes: “Self-reported data on risky sex” is a binary variable taking value 1 if a non-married respondent reported that a condom was not used the last time he/she had sexual intercourse or a married respondent reported that the last sexual intercourse was not with the spouse/cohabiting partner and no condom was used, and 0 if a non-married respondent reported using condom during last intercourse or a married respondent reported not having had extramarital sex or extramarital sex with condom.

Combining biomarkers and self-reported data

In targeting particular interventions, it may be important to assess various demographic correlates (e.g. race, age, education) of risky behaviours. We also provide a strategy to combine biomarker and self-reported data to quantify the degree of association of such variables and behaviour. Estimating the likelihood of correct elicitation for self-reported measurements using STI-positive individuals (which one can certifiably mark as having behaved riskily), we can use standard econometric models (see, for example, Hausman et al. 1998) to identify the probability that someone at a particular age or education level, for example, will engage in risky behaviours.

Using the same data from Zambia, for example, we estimate that a one-year increase in age for the average person in the sample would reduce the likelihood of risky behaviour by 4 percentage points. Using self-reported behaviours, this reduction is estimated at less than one percentage point, whereas biomarkers would imply a (small) increase of about 0.1 percentage points.


Corno, L and A de Paula (2014), “Risky sexual behaviour: biological markers and self-reported data”, CEPR Discussion Paper 10271. 

De Walque, D, W H Dow, R Nathan et al. (2012), “Incentivizing safe sex: a randomized trial of conditional cash transfers for HIV and sexually transmitted prevention in rural Tanzania”, BMJ Open 2(1): 1–10.

Gong, E (forthcoming), “HIV Testing and Risky Sexual Behaviour”, Economic Journal.

Hausman, J, J Abrevaya, and F Scott-Morton (1998), “Misclassification of the dependent variable in a discrete-response setting”, Journal of Econometrics 87(2): 239–269.

Kohler, H-P, S Watkins, J Behrman et al. (2013), “Cohort profile: The Malawi Longitudinal Study of Families and Health (MLSFH)”, University of Pennsylvania Population Studies Center Working Paper 2013-06.

Palen, L, E A Smith, L L Caldwell, and A Flisher (2008), “Inconsistent Reports of Sexual Intercourse Among South African High School Students”, Journal of Adolescent Health 43(3): 221–227

de Paula, A, G Shapira, and P Todd (2014), “How beliefs about HIV Affect Risky Behaviors: Evidence from Malawi”, Journal of Applied Econometrics 29(6): 944–964.

UNAIDS (2012), “Global Report. Unaids report on the global AIDS epidemic”, Discussion paper, Joint United Nations Programme on HIV/AIDS, Geneva.



Topics:  Health economics

Tags:  health, Sex, sexually transmitted infections, sexually transmitted diseases, epidemiology, STIs, STDs, Zambia, measurement, HIV, AIDS

Assistant Professor of Economics, Catholic University and Executive Director, Laboratory for Effective Anti-poverty Policies (on-leave from Queen Mary, University of London)

Faculty Member, University College London and Sao Paulo School of Economics