Quantifying stereotyping associations between gender and intellectual ability in films

Ramiro Gálvez, Valeria Tiffenberg, Edgar Altszyler 01 April 2018



A particularly longstanding, prevalent and well-documented stereotype is the belief that men possess higher-level cognitive abilities than women (Broverman et al. 1970, Williams and Best 1982, Kirkcaldy et al. 2007, Upson and Friedman 2012).1 This ‘brilliance = males’ stereotype has even been shown to be endorsed by both boys and girls as young as six (Cvencek 2011, Bian 2017) and is believed to be a factor driving the under-representation of women in science, particularly in the STEM fields (Nosek et al. 2009, Leslie 2015, Smyth and Nosek 2015, Storage et al. 2016, Reuben 2017). Even when a consensus exists on this stereotype having strong cultural roots, studies on its perpetuation usually centre on the analysis of cultural behaviours such as differential guidance provided by parents to their offspring during shared scientific thinking (Crowley 2001) or differential guidance given by science teachers to students according to their gender (Shumow and Schmidt 2013). Notably, there is a dearth of large-scale studies focusing on the presence of this stereotype in mainstream cultural products.

In a recent paper, we study the presence of the ‘brilliance = males’ stereotype in a collection of over 10,000 movie transcripts covering half a century of film history in the Western world (Gálvez et al. 2018). As stereotypes are, in part, a collection of associations that link a group to a set of descriptive characteristics (Gaertner and McLaughlin 1983), we use natural language processing techniques to quantify associations between gender-related words and high-level cognitive ability-related words in films. In doing so, a strong focus is placed on analysing the presence of these associations in films aimed at children.

Materials and methods

We began data collection began by downloading from IMDb a series of lists containing the 1,000 top grossing titles in the US for every year from 1967 up to and including 2016. Then, for each title in these lists, metadata was downloaded. With this data in hand, we filtered out all titles which were not movies (such as TV series), which did not include English among the languages spoken in them, and in which the US, UK, Canada, or Australia were not involved in their production. Finally, for each movie in the resulting set, its most frequently downloaded English subtitle was obtained from OpenSubtitles.2 This resulted in our final sample of 11,550 film subtitles spanning half a century.

Figure 1A details the number of films analysed for successive ten-year periods (1967-1976, 1977-1986, …, 2007-2016), for the full-sample and for a sub-sample which contains only films belonging to the family and/or animation genres (family/animation sub-sample). Figure 1B shows the evolution of the ratio between the number of appearances of male pronouns (he, his, him, himself) relative to the number of appearances of female pronouns (she, hers, her, herself). In line with previous research on books (Twenge et al. 2012), in films this ratio has experienced a reduction since the mid-1960s – a phenomenon associated with an improvement in women's status (Twenge et al. 2012) – but has been consistently less favourable towards female pronouns in the family/animation sub-sample when compared to the full sample.

Figure 1 Film frequencies and gender pronouns ratios for successive ten-year periods

Notes: (A) Number of films analysed for successive ten-year periods (1967-1976, 1977-1986, …, 2007-2016), for the full-sample and the family/animation sub-sample. (B) Evolution of the ratio between the number of appearances of male pronouns relative to the number of appearances of female pronouns, for the full-sample and the family/animation sub-sample. In both panels tendencies are estimated through LOESS regressions.

To estimate word associations between gender-related words and high-level cognitive ability-related words, we compute positive pointwise mutual information (PPMI) scores (Martin and Jurafsky 2009) between gender pronouns and high-level cognitive ability-related words (e.g. genius, intelligent, clever).3 PPMI is a metric designed to capture how much more often than chance two words co-occur (higher values meaning higher associations), and is commonly used for measuring associations between words and concepts. PPMI estimates rely on values contained in a co-occurrence matrix, which presents the number of times a word appears in the context of another one (each row representing a target word and each column representing a context word). Figure 2 contains a snippet illustrating how, given subtitle data, we built these matrices.4

Figure 2 Co-occurrence matrix construction

Notes: Given a target word in a SubRip file (him in the illustration), all neighbouring/context frames are identified. Which frames constitute the neighbourhood depends on the size of a time window (Δt), which we set equal to 30 seconds. The text contained in all context frames is cleaned, tokenized and lemmatized. Then, the number of appearances of every context token is added to the relevant cell in the co-occurrence matrix under construction. The process is repeated for every word in every subtitle under analysis. A co-occurrence matrix presents the number of times a word appears in the context of another word (for example, and simply as an illustration, according to this figure smart appears eleven times in the context of him), and it serves as input for PPMI and statistical significance estimates.


Figure 3 quantifies associations between gender pronouns and words depicting high-level cognitive ability. Figure 3A presents estimates considering all movies from 2000 up to and including 2016, for the full sample and the family/animation sub-sample. Estimates indicate that associations of male pronouns with high-level cognitive ability-related words are higher than the associations female pronouns have with high-level cognitive ability-related words. This pattern is present in both the full sample and the family/animation sub-sample. Figure 3B explores the dynamics of these differences through time. Results from the full sample of movies reveal that differences in associations have been steady at least for half a century, with no evidence of convergence in the trends. Results from the family/animation sub-sample show that differences have also been prevalent in this set of films, although estimates are less stable (we attribute this to the fact that sample sizes for every ten-year period of the family/animation sub-sample are much smaller than their full sample counterparts, see Figure 1A).5 Overall, our estimates suggest that, at an aggregate level, the ‘brilliance = males’ stereotype is effectively present in films and that movies specifically aimed at children contain this stereotypical association (which we believe contributes to its early adoption). Moreover, this pattern seems to have been quite persistent for the last 50 years.6

Figure 3 Word associations between gender pronouns and high-level cognitive ability-related words

Notes: (A) Estimated association between gender pronouns and high-level cognitive ability related words when films from 2000 up to and including 2016 are analysed, for the full-sample (n = 2,902) and the family/animation sub-sample (n = 242). Asterisks indicate the results of Fisher's exact tests on the underlying contingency tables: *** significant at the 1% level. (B) Time evolution of the estimated associations taking as input sets of films belonging to successive ten-year periods (1967-1976, 1977-1986, …, 2007-2016). Tendencies are estimated through LOESS regressions. Grey areas indicate that, according to Fisher's exact tests on the underlying contingency tables, differences are not significant at the 5% level.


The film industry in the Western world has been the subject of controversy in recent times regarding gender equality. Controversies range from the existence of a strong gender pay gap (actors being paid considerably more than actresses) to allegations of widespread prevalence of sexual assault and harassment. Our results suggest that gender inequality is also considerably strong in the contents of its films. Given that stereotypes regarding intelligence have been found to shape intellectual identity and academic performance (Steele 1997), the need to proactively address their presence in films is evident.


Bian, L, S J Leslie and A Cimpian (2017), “Gender stereotypes about intellectual ability emerge early and influence children’s interests”, Science 355(6323): 389-391.

Broverman, I K, D M Broverman, F E Clarkson, P S Rosenkrantz and S R Vogel (1970), “Sex-roles stereotypes and clinical judgments of mental health”, Journal of consulting and clinical psychology, 34(1): 1.

Crowley, K, M A Callanan, H R Tenenbaum and A Allen (2001), “Parents explain more often to boys than to girls during shared scientific thinking”, Psychological Science, 12(3): 258-261.

Cvencek, D, A N Meltzoff and A G Greenwald (2011), “Math–gender stereotypes in elementary school children”, Child development, 82(3): 766-779.

Furnham, A, E Reeves and S Budhani (2002), “Parents think their sons are brighter than their daughters: Sex differences in parental self-estimations and estimations of their children's multiple intelligences”, The Journal of genetic psychology, 163(1): 24-39.

Gaertner, S L and J P McLaughlin (1983), “Racial stereotypes: Associations and ascriptions of positive and negative characteristics”, Social Psychology Quarterly, 23-30.

Gálvez, R H and V Tiffenberg and E Altszyler (2018), “Half a Century of Stereotyping Associations between Gender and Intellectual Ability in Films”, available at SSRN.

Kirkcaldy, B, P Noack, A Furnham and G Siefen (2007), “Parental estimates of their own and their children's intelligence”, European Psychologist, 12(3): 173-180.

Leslie, S J, A Cimpian, M Meyer and E Freeland (2015), “Expectations of brilliance underlie gender distributions across academic disciplines”, Science, 347(6219): 262-265.

Martin, J H and D Jurafsky (2009), Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall.

Nosek, B A et al. (2009), “National differences in gender–science stereotypes predict national sex differences in science and math achievement”, Proceedings of the National Academy of Sciences, 106(26): 10593-10597.

Reuben, E, P Sapienza and L Zingales (2014), “How stereotypes impair women’s careers in science”, Proceedings of the National Academy of Sciences, 111(12): 4403-4408.

Shumow, L and J A Schmidt (2013), Enhancing adolescents' motivation for science. Corwin Press.

Smyth, F L and B A Nosek (2015), “On the gender–science stereotypes held by scientists: explicit accord with gender-ratios, implicit accord with scientific identity”, Frontiers in psychology, 6, 415.

Steele, C M (1997), “A threat in the air: How stereotypes shape intellectual identity and performance”, American psychologist, 52(6): 613.

Storage, D, Z Horne, A Cimpian and S J Leslie (2016), “The frequency of “brilliant” and “genius” in teaching evaluations predicts the representation of women and African Americans across fields”, PloS one, 11(3): e0150194.

Twenge, J M, W K Campbell and B Gentile (2012), “Male and female pronoun use in US books reflects women’s status, 1900–2008”, Sex Roles, 67(9-10): 488-493.

Upson, S and L F Friedman (2012), “Where are all the female geniuses?”, Scientific American Mind, 23(5): 63-65.

Williams, J E and D L Best (1982), Measuring sex stereotypes: A thirty nation study. Beverly Hills, CA: Sage.


[1] ‘Intelligence’ being commonly associated with mathematical and spatial intelligence (Furnham 2002).

[2] Authors are particularly thankful to the OpenSubtitles administrators for their help during this process.

[3] The full set of high-level cognitive ability-related words as well as details on its construction are available in Gálvez et al. (2018).

[4] Further details on how PPMI is computed given our data available in Gálvez et al. (2018).

[5] Only for the 1967-1976 period do estimates indicate that high-level cognitive-related words have a greater association with female pronouns than with male pronouns, although a Fisher's exact test on the underlying contingency table does not reveal this difference to be statistically significant (p ≈ 0.4).

[6] In Gálvez et  al. (2018), we present further results on the relation between gender pronouns and gender stereotyping roles.



Topics:  Gender

Tags:  gender stereotypes, movie industry, film industry

PhD student in computer science at the Computer Science Department, FCEyN, Universidad de Buenos Aires


Postdoctoral fellow, Applied Artificial Intelligence Lab


CEPR Policy Research