Blogpost 2: Bittersweet emotions – replicating WennerĂ„s and Wold

Bittersweet emotions

By Ulf Sandström and Peter van den Besselaar
Posted on January 31st 2020

Bittersweet is probably the best way to describe the emotions that are activated when a researcher is cited ‑ but for the wrong reasons. A good example is the high-profile paper Sexism and Nepotism in Peer Review by Susanne WennerĂ„s and Agnes Wold (W&W), which was published in Nature, 1997.

The paper has inspired a lot of research on gender bias in science, also the GRANteD project, as it was the first analysis that explicitly included past performance measures. To use the language of Thomas Kuhn we find the paper to be a paradigmatic “exemplar”.

However, when looking into the gender studies literature focused on gender bias in grant allocation, one develops a sense of the bittersweet feelings. Despite many citations, there are very few studies that have understood and even fewer that have used the methodology proposed by W&W.

Although often read in that way, the sexism and nepotism study never talked about success rates in grant allocation. The word “success” was even never mentioned in the paper. Therefore, it is depressing to see how the follow-up studies have misrepresented the original paper. And it is easy to understand why this is the case: to have full information about the peer’s evaluation of grant applications are seldom accessible for researchers to analyse. Public laws and data protection stand between the interest of the researcher and the interest of the council and/or the individual applicant.

The WennerÄs and Wold study had a very elegant and well-argued method for showing that female applicants had to do more to have the same grading. They focused on the so called Competence Score which in the Swedish Medical Research Council more or less was how the panel evaluated the track record of each applicant.

Among the things that have gone un-noticed is that the paper introduced a size-dependent indicator for performance. WennerÄs & Wold simply added the journal impact factors per publication into one measure. That made it possible to do the type of regression analysis with the internationally renowned conclusions.

Let’s look how the W&W study has been received by analysing three prominent examples. The first example is the large Waisbren study. They (Waisren et al. 2008) used data on 6,319 applications by 2,840 faculty at Harvard Med School-affiliated institutions. Focusing on gender disparities in grant applications and funding, their finding is that women and men are equally successful, and that only marginal differences occur after controlling for academic rank. These are interesting results but not really what WennerĂ„s & Wold were doing: predicting the competence score from past performance and gender. If we consider all articles that cite the Waisbren study, we find that none but one paper out of 47 mentions ‘score’ as a central concept.

Many studies find that scholarly production and impact is significantly higher for male applicants. The latter is maybe the main finding from “the NIH-studies”: a number of studies on gender bias, based on NIH data from 2010 and onwards when the NIH opened up their project registers. There seem to be a unanimous agreement based on the finding that grant differences are due to the lower rate of publishing from women (e.g. Fridner et al. 2015; Evans & Moulder 2011; Kaufman & Chevan 2011; DesRoches et al. 2010). This might be the case, but with this we haven’t got a clarification on whether the peer system of NIH evaluates the female proposals the same way their male colleagues are evaluated.

The second example is Marsh et al. (2009), who discuss the W&W-study in a somewhat superficial manner, and they obviously miss the point that the study was not a success rate investigation but a study comparing competence scores with performance. Despite these mistakes, they claim that the study cannot be the basis for generalizations. Too many studies, they say, go in the other direction and differences between men and women are quite small.

Another important element in the design of the in the W&W paper was that they studied young researchers who applied more or less for the first time. They had just finished their post-doc period and applied for a fellowship. Basically men and women were on the same foot; their publication records were comparable.

Although Marsh et al. corroborated W&W’s findings with other studies focused on fellowships, they didn’t notice this dimension of the study, the element that directed the analysis on the early career phase (c.f. Van den Besselaar & Sandström 2016).

Furthermore, Marsh and his fellows used the follow-up study by Sandström & HĂ€llsten (2008) to support their finding ‑ no differences in grant allocation. Again, they missed several points also in that source: Firstly, the competence score indicated favouring of women, but that was ‘corrected’ by nepotism in favour of men, and secondly that the study indicated the plasticity of peer review – political pressure changed the valuation of track records. A plasticity which might be the main explanation as to why peer review continues to produce biased effects.

The third example is Ceci & Williams (2011), who focus on the methodological core of the W&W-paper. In this case the paper and method are described correctly. The critique is directed at the statistical analysis performed by W&W. The regression models were criticised, as was the assumption of the linearity of the performance model. Ceci & Williams also made an important statement concerning the openness of research data. They said to be waiting for the public disclosure of the original WennerÄs & Wold data in order to test possible other hypotheses, and apply other analyses.

Despite the general misunderstanding of it, the W&W paper has played an important role in the debate on gender bias in science. We agree with Ceci & Williams that it is worthwhile to replicate the study also with other methods. As we were able to re-establish the data, we plan to test results of WennerÄs and Wold in an upcoming study. All data will be publicly available together with our publication.

Cited literature:

Ceci SJ & Williams WM (2011). Understanding current causes of women’s underrepresentation in science. PNAS 108 (8): 3157-3162

DesRoches CM.; Zinner DE.; Rao SR.; Iezzoni LI.; Campbell EG (2010). Activities, Productivity, and Compensation of Men and Women in the Life Sciences. Academic Medicine 85(4): 631-639.

Evans HK.& Moulder A (2011): Reflecting on a Decade of Women’s Publications in Four Top Political Science Journals. PS-Political Science & Politics 44(4):793-798.

Fridner A, Norell A, Åkesson G, Senden MG, Lovseth, LT, Schenck-Gustafsson K (2015). Possible reasons why female physicians publish fewer scientific articles than male physicians – a cross-sectional study.  BMC Medical Education 15.

Kaufman RR & Chevan J (2011). The Gender Gap in Peer-Reviewed Publications by Physical Therapy Faculty Members: A Productivity Puzzle. Physical Theraphy 91(1):122-131.

Marsh HW, Bornmann L, Mutz R, Daniel HD, O’Mara A (2009). Gender effects in the peer review of grant proposals. Review of Educational Research 79 (3): 1290-1326

Sandström U & HÀllsten M (2008). Persistent nepotism in peer review. Scientometrics 74 (2): 175-189

Waisbren SE et al. (2008). Gender differences in research grant applications and funding outcomes for medical school faculty. Journal of Women’s Health 17(2): 207-214

Van den Besselaar P & Sandström U (2016). Gender differences in research performance and its impact on careers: a longitudinal case study, Scientometrics, 106 (1): 143–162

WennerÄs C & Wold A (1997). Nepotism and sexism in peer-review. Nature 387, 341-343 (22 May 1997) | doi:10.1038/387341a0