This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Gender imbalances in academia have been evident historically and persist today. For the past 60 years, we have witnessed the increase of participation of women in biomedical disciplines, showing that the gender gap is shrinking. However, preliminary evidence suggests that women, including female researchers, are disproportionately affected by the COVID19 pandemic in terms of unequal distribution of childcare, elderly care, and other kinds of domestic and emotional labor. Sudden lockdowns and abrupt shifts in daily routines have had disproportionate consequences on their productivity, which is reflected by a sudden drop in research output in biomedical research, consequently affecting the number of female authors of scientific publications.
The objective of this study is to test the hypothesis that the COVID19 pandemic has had a disproportionate adverse effect on the productivity of female researchers in the biomedical field in terms of authorship of scientific publications.
This is a retrospective observational bibliometric study. We investigated the proportion of male and female researchers who published scientific papers during the COVID19 pandemic, using bibliometric data from biomedical preprint servers and selected SpringerNature journals. We used the ordinary least squares regression model to estimate the expected proportions over time by correcting for temporal trends. We also used a set of statistical methods, such as the KolmogorovSmirnov test and regression discontinuity design, to test the validity of the results.
A total of 78,950 papers from the bioRxiv and medRxiv repositories and from 62 selected SpringerNature journals by 346,354 unique authors were analyzed. The acquired data set consisted of papers that were published between January 1, 2019, and August 2, 2020. The proportion of female first authors publishing in the biomedical field during the pandemic dropped by 9.1%, on average, across disciplines (expected arithmetic mean
Our findings document a decrease in the number of publications by female authors in the biomedical field during the global pandemic. This effect was particularly pronounced for papers related to COVID19, indicating that women are producing fewer publications related to COVID19 research. This sudden increase in the gender gap was persistent across the 10 countries with the highest number of researchers. These results should be used to inform the scientific community of this worrying trend in COVID19 research and the disproportionate effect that the pandemic has had on female academics.
As of the date of this writing, the COVID19 pandemic has claimed hundreds of thousands of lives worldwide and disrupted almost all aspects of human society. The socioeconomic impacts of the pandemic are yet to be assessed and the impending economic crisis and recession are becoming evident [
Stayathome orders, lockdowns, and school closures have affected scientists as well, especially those caring for children or other family members [
As a result, the research productivity of female scientists appears to have decreased [
A similar effect has been observed with publications on preprint servers. The proportion of female authors publishing on the most popular economics preprint servers is lower than expected [
Motivated by ongoing research efforts, we expand on the previous research by analyzing a large bibliographic data set in the biomedical field; we also employ different modeling techniques that can further improve our understanding of this phenomenon. The aim of this study is to quantify how the COVID19 crisis exacerbates the gender gap in scientific publishing in the biomedical field.
Bibliometric data on published papers were collected from three separate sources:
The bioRxiv repository contains 51,171 papers and 225,110 authors; Rxivist is the application programming interface (API) provider for bioRxiv publications [
The medRxiv repository contains 8845 papers and 52,364 authors; data are scraped directly from medrxiv.org.
The SpringerNature repository contains 19,525 papers and 91,257 authors; data from 62 journals are collected using the SpringerNature OpenAccess API. SpringerNature data include highimpact journals, such as Nature Genetics, Nature Medicine, and Nature Immunology, as well as multiple BMC journals, such as BMC Bioinformatics and BMC Genomics.
We included the data from all journals in the biomedical field for which SpringerNature provides data. A complete list of journals used in the analysis is available in Table S1 of
For each source, we collected the relevant metadata. For each paper in bioRxiv and medRxiv, we kept the
To infer each author's gender from their name, we used a stateoftheart tool, namely the genderize.io API [
In our data set, we identified the most likely gender of 466,836 authors in total. Out of these, the gender of 348,506 (74.7%) unique authors (214,095 male [61.4%] and 134,411 female [38.6%]) could be inferred with high accuracy, with confidence scores from genderize.io higher than 0.8.
To identify each author’s country in the bioRxiv and medRxiv data sets, we first located a toponym in each author's affiliation and assigned to it the most likely country code. If there was no toponym, we queried the Global Research Identifier Database, found the institution with the most similar name, and assigned the institution’s country to that author. Additionally, we manually checked the location of the most common affiliation names from the data set that covered most of the authors. The countries of approximately 80% of all authors were determined using this method. The countries of the authors in the SpringerNature data set were already provided by the API.
The papers that dealt specifically with COVID19 and similar topics were identified by the set of keywords that appeared in their titles or abstracts.
To measure the discrepancy between the expected and observed proportions of female authors, we first established baselines, which were the expected proportions of female researchers that appeared as authors of publications. The expected proportions were calculated using the ordinary least squares (OLS) model and historical data from January 2019 to March 2020 (see the Model section). We then calculated the true observed proportions of female authors who published during the COVID19 pandemic in 2020 and compared it to the expected baselines. The error for the predicted value was the mean standard error of the prediction. The error of the observed value was calculated as the standard error of the mean:
Using historical data from before March 15, 2020, we calculated the proportion of female authors who published each week. We fit an OLS regression model,
The OLS model tends to weight all data points equally, regardless of the number of samples. To guarantee the validity of the statistical analysis, we established the conditions under which the data points would be evaluated. The number of data points used to fit the OLS model before March 2020 and the number of data points after March 2020 were at least 10 each. This way, we limited the impact of smallsample observations that could skew the estimate.
We additionally evaluated the model by applying the generalized linear model with binomial errors and a logit link function, as the OLS model could overestimate the proportions in binary variables. Both models performed similarly, and the OLS model did not provide any outofnorm estimates. For the sake of better interpretability and consistency with modeling the nominal number of authors and papers, we decided to use the OLS model.
To better capture the productivity of the population, we counted each publication from each author separately, effectively modeling the proportion of papers authored by the population of female authors. Considering that multiple authorships in the observed period were relatively rare (<5% of all first authors and <10% of all last authors had >1 paper), we considered each authorship independently.
Statistical model. Schematic illustration of the ordinary least squares model used to calculate the expected numbers and proportions.
To estimate the potential causal effects of the pandemic on the proportion of female researchers, we devised a typical nonparametric regression discontinuity design (RDD) with a local linear regression in time, with the following general form:
where
The falsification, or placebo, tests were performed by using fake cutoffs before and after midMarch 2020 and comparing the treatment effect. We identified the optimal cutoff point
The data and source code for reproducing the results are available at GitHub [
Overall, during the pandemic, scientists posted papers on preprint servers at an increasing rate. On average, we observed 31.2% more papers than expected and a 41.6% increase in the number of authors (39.2% increase for females and 42.9% increase for males). Despite the absolute increase in the numbers of papers and authors across publishers (see Figure S1 and Tables S3S5 in
In biology, medicine, and related disciplines, the most active contributors are usually listed first. The author listed last is the most senior author, typically the head of the lab. To address the high variability of the number of authors on the publications (
The expected and observed proportions of female authors disaggregated by the order of authorship and the topic.
Author order and paper topic  Expected proportion  Observed proportion  Drop, %  










All  0.389  0.007  0.353  0.004  9.142 

COVID19  0.389  0.007  0.280  0.007  28.031 

NonCOVID19  0.389  0.007  0.380  0.004  2.372 



All  0.257  0.005  0.236  0.003  7.961 

COVID19  0.257  0.005  0.209  0.007  18.812 

NonCOVID19  0.257  0.005  0.246  0.003  4.416 



All  0.354  0.003  0.348  0.002  1.578 

COVID19  0.354  0.003  0.341  0.009  3.530 

NonCOVID19  0.354  0.003  0.351  0.002  0.934 



All  0.210  0.030  0.137  0.008  34.586 

COVID19  0.210  0.030  0.137  0.023  34.514 

NonCOVID19  0.210  0.030  0.168  0.013  19.802 
^{a}
^{b}
^{c}
^{d}
The aggregate results suggest that the proportion of female authors publishing on all topics as the first author decreased by 9.1% (expected arithmetic mean
Additionally, we focused our analysis on the papers with a single author and discovered an even greater disparity. We observed 34.5% (
The results suggest that the aggregate gender disparity in academia during the pandemic was due to the increased publication rate of papers about COVID19 authored by men. To further explore this possibility, we tracked the individual publication records and calculated the probability that the author would publish work about COVID19. Around 3.7% of men who had publication records in our data set would publish at least one paper about COVID19, compared to ~2.2% of women. Men who already had a publication before the pandemic were 37% more likely to publish a paper about COVID19. This suggest that women are getting excluded from critical research about COVID19.
Comparison of the expected and observed proportions of female authors that published during the COVID19 pandemic. Green bars represent the expected proportion of female authors, estimated by the ordinary least squares model from the historical data from 2019. Orange bars represent the observed proportion of female authors that published during the COVID19 pandemic. The standard errors of the aggregate analyses are represented as the vertical lines on top of the bars. The papers are divided by topic into three groups: (1) all papers from the data set, (2) papers that deal directly with COVID19 and related topics, and (3) papers that are not about COVID19 or related topics. The first row shows the results from all publishers combined. The following rows represent the results for each publisher separately.
When disaggregated by publisher, the relative drop in the proportion of female first authors for COVID19related research was 12.6%, 23.2%, and 2.1% for bioRxiv, medRxiv, and SpringerNature journals, respectively (see
Additionally, we checked whether there was a significant change in the proportion of women authors that occurred in midMarch 2020. To test the hypothesis, we performed an RDD analysis in time (see Methods section). We estimated a vertical discontinuity of the proportion of women over time by the coefficient
Further, we checked whether we could confidently use the proportion of women who published before the pandemic as the reference to estimate the proportion of women who published papers specifically about COVID19. A hypothesis is that before the pandemic, women were less likely to be represented in the scientific disciplines that would produce COVID19 research. To check this hypothesis, we first performed a chisquare test on the distribution of disciplines involved in COVID19 research. We discovered that some disciplines, such as infectious diseases, epidemiology, public health, and global health, were overrepresented (
To assess the temporal trends during the pandemic, we built the linear model
Parameters of the linear model of the proportion of female authors over time during the pandemic.
Paper topic  First author  Last author  All authors  Solo author  






All  –.002  .08  .002  .002  .001  <.001  .000  >.99 
COVID19  .001  .28  .003  <.001  .005  <.001  –.005  .12 
NonCOVID19  –.002  .02  .001  .03  .000  .17  .004  .13 
^{a}
We identified the most likely country of the authors based on their affiliations (see Methods section) and measured the difference between the expected and observed proportions of female authors during the pandemic.
A significant drop in the proportion of female first authors was consistent across the countries. Regardless of the topic, we observed a 24.9% drop in Italy (
When we observed the proportion of female authors regardless of the authorship order, the drop became less prominent but still consistent across the countries. For example, in Canada, a drop in the proportion of female authors for COVID19related papers was 15.7% (
The gender gap for nonCOVID19related research (see
Percentage drop in proportion of female authors during the pandemic across countries. Orange points mark the percentage decrease in proportion of female authors; green points mark the increase. Horizontal lines represent standard errors. The analysis is divided by topic into three groups: (1) all papers from the data set, (2) papers that deal directly with COVID19 and related topics, and (3) papers that are not about COVID19 or related topics. Missing points indicate insufficient sample size.
Further, we explored whether there were any commonalities among the countries with respect to the participation of women in research.
Gender disparity in research and gross domestic product (GDP). The proportion of women active in research is higher in countries with lower per capita GDP (upper). The proportion of female authors of research articles decreased more than expected in countries with lower per capita GDP (lower).
We analyzed bibliographical data from biomedical preprint servers and SpringerNature journals and showed that the fraction of women publishing during the COVID19 pandemic dropped significantly across disciplines and research topics. Since the announcement of the global pandemic and the start of lockdowns, we observed a drop of 9.1% in the number of women publishing biomedical scientific papers as the first author. Women were significantly excluded from COVID19related research, as we measured a 28% drop in female first authors in that area of research. This confirms some earlier suggestions that female first authors contributed less to COVID19 studies than to research in other areas [
For papers on topics other than COVID19, we did not observe this high discrepancy, and, in the case of medRxiv, we observed more women than projected by the model. The overall gender disparity in research during the pandemic was mostly driven by the higher publication rate of papers on COVID19 and related topics. It seems that such research is conducted disproportionately by men, as male authors are more likely to appear in first author positions on papers posted on preprint servers and published in peerreviewed journals.
It appears that the most significant drop in proportion of female authors happened early in the pandemic. The proportion of women has been increasing gradually for some authorship categories. Note that the observed gradual increase is statistically significant but is very slow. One can think that a possible explanation for such a sudden drop and a subsequent gradual increase is that most of the COVID19 papers published early during the pandemic were various epidemic models focusing on cases and death counts. Many of the authors’ affiliations were departments of engineering, mathematics, and physics, which might have a different proportion of women than the population of scientists in biology and medicine. Since research in the biomedical field usually takes longer to conduct and publish, it could lead to a shift in the gender distribution later. However, this argument does not explain the phenomenon entirely, as the base gender gap in science, technology, engineering, and mathematics fields is not higher than in biology [
Another likely explanation of a sudden drop in the proportion of female authors is that caregiving demands have exploded during the pandemic, and these have mostly fallen on women [
The global pandemic has touched almost every nation on the planet. Countries, however, responded differently in containing the spread of the disease. The variability of the measures and their timing, combined with differences in cultural norms and outbreak severity, have had a variable impact on researchers across the world. Countrylevel analysis better reveals global trends, as the aggregate data can be skewed by countries with a disproportionately large number of publications, such as the United States, which represents almost 29% of all authors in the data set (see Table S11 in
Gender imbalances in academia have been evident historically and still persist today. Various measures of research output, including the proportion of authors, fractionalized authorships [
The factors that led to such extreme and consistent differences in the proportion of female scientists can be numerous. The already existing barriers for female participation in science vary across countries. In some nations, men are more favorably placed than women [
The global pandemic caused this unforeseen crisis that will most certainly affect academia. All the difficulties female scientists faced previously may possibly be exacerbated by the extended lockdowns and sudden shift in worklife dynamics. It is important to understand the impact of such an extraordinary circumstance on the scientific community that will disproportionately affect research outputs as well as prospects for tenure and promotions [
The strengths of our study include the use of a relatively large and diverse data set from three different publishing platforms. The focus on preprint papers allows for the assessment of the observed effects in a timely manner. We focused on a structured and rigorous statistical analysis, making sure that the results are significant. The data and the code to reproduce the results are available.
Potential limitations warrant consideration. First, the gender of a publication’s author can be wrongly identified. Even though we excluded the results that had a low confidence, a small fraction of the authors could have been misgendered. Additionally, we acknowledge that automated gender classifiers do not recognize the various nonbinary gender identities [
Our findings documented a decrease in the proportion of female authors in the biomedical field who published research papers during the global pandemic. This effect was particularly pronounced for papers related to COVID19, indicating that women are producing fewer publications related to COVID19 research. A sudden increase in this gender gap was persistent across the 10 countries with the highest number of researchers. The results should be used to inform the scientific community of this worrying trend in COVID19 research and the disproportionate effect the pandemic has had on female academics’ research outputs.
Supplementary materials.
application programming interface
Defense Advanced Research Projects Agency
gross domestic product
ordinary least squares
regression discontinuity design
The authors are grateful to the Defense Advanced Research Projects Agency (DARPA) (contract W911NF192027) for their support.
All authors conceived and designed the study. GM collected and analyzed the data. All authors wrote and revised the manuscript.
None declared.