Dynamics of Gender Representation in Academic Mathematics

UVM Complex Systems Institute
November 17th, 2025

Phil Chodrow
Department of Computer Science
Middlebury College

Hi everyone!

I’m Phil Chodrow. I’m a new-ish assistant professor of computer science at Middlebury College, your neighbor ~1 hour south. I did my PhD in operations research at MIT and a postdoc with Mason Porter in math at UCLA.






I like to work on…
  • Higher-order networks, hypergraphs
  • Math models of social systems: opinion dynamics, hierarchy formation, segregation
  • Data science for social justice
I teach…
  • Machine learning
  • Network science
  • Discrete math






I also like…
  • Aikido
  • Hiking, cycling
  • Tea
  • Chess
  • Books (scifi, fantasy, history)

Questions for today



What mechanisms drive (lack of) gender representation in academic mathematics?

What can we expect to happen if these mechanisms continue to operate as is?

What could the effects of interventions be on long-term gender representation?

The Team

Heather Brooks
Harvey Mudd

Harlin Lee
UNC Chapel Hill

Mason Porter
UCLA

Juan G. Restrepo
CU Boulder

Anna Haensch
Tufts

Phil Chodrow
Middlebury

Inspiration from…

Some early caveats


This talk represents work-in-progress using data about our mathematics community. All results are preliminary!

This talk focuses on the production of PhD graduates and therefore almost exclusively considers doctoral universities.

Gender is not binary, but unfortunately our data (and therefore our story and our model) are.

Quantitative work complements, but never replaces: voices of marginalized scholars, qualitative research and critical theory, activism, and implementation of initiatives and policies.

Our data


Our data


Ben Brill, UCLA ’22

Total of 116,306 advisor-student pairs in the US since 1950, representing 21,781 distinct advisors.  We observe or estimate math subfields for 94% of these pairs (predictions based on thesis titles).  We estimate gender for 95% of PhD students and 97% of advisors. 

Data issues..

Misgendering

Incorrect MSCs inferred

Nonrandom missing data

Various shenanigans

Some hypotheses



Mentorship

Female advisors are more effective in attracting or retaining female graduate students.

Belonging

Greater representation in the grad student community attracts women to programs and subfields.

Attrition

Addressing disparities in career attrition for female faculty would help to close the gender gap.

Leadership

A small number of influential women can dramatically change the culture of a department or research community.

Many math subfields are on
qualitatively similar trajectories.

Six largest subfields
by record count.

Two-prong modeling strategy


Advisor production

Model the number of graduate students produced by a given advisor.

Technique: maximum likelihood estimation in a bespoke stochastic model.


Student gender

Model the gender of students produced by a given advisor.

Technique: logistic regression.

Generative model of advisor production

Assumptions:

  • Startup depends on subfield.
  • Career length depends on gender.
  • # students per year depends on subfield and gender.

Latent variable model

We model an observed sequence of students \(\color{#086788}{\mathbf{S}} = (\color{#086788}{S}_1, \color{#086788}{S}_2, \ldots, \color{#086788}{S}_T)\) produced by an advisor as a function of an unobserved advisor career \(\color{#07A0C3}{C}\) specified by the startup period length and retirement year.

\[ \begin{aligned} p(\color{#086788}{\mathbf{S}};\color{#F25C54}{\boldsymbol{\theta}}) &= \sum_{\color{#07A0C3}{C}\in\mathcal{C}} p(\color{#086788}{\mathbf{S}}|\color{#07A0C3}{C};\color{#F25C54}{\boldsymbol{\theta}})p(\color{#07A0C3}{C};\color{#F25C54}{\boldsymbol{\theta}}) \end{aligned} \]

The vector \(\color{#F25C54}{\boldsymbol{\theta}}\) contains the parameters to be estimated.

We do this using a hybrid expectation-maximization algorithm: some parameters can be estimated efficiently via EM, while others must be estimated by hill-climbing.

Men have estimated careers ~4 years longer

Longer careers \(\times\) more students per year = more students per career

We hypothesize that greater student production per year reflects unequal access to research resources; cf. Zhang et al. (2022)

Logistic model for advisee gender

Estimate the odds that the next student produced by an advsior is female based on subfield, advisor gender, and representation of women in advisor group and subfield.

\[ \begin{aligned} \log (\text{odds F}) = & \beta_0 + \\ & \rho \times (\text{advisor is F}) + \\ & \gamma_1 \times (\text{proportion F advisees in group}) + \\ & \gamma_2 \times (\text{proportion F advisees in group})^2 +\\ & \eta_{1} \times (\text{proportion F in subfield}) + \\ & \eta_{2} \times (\text{proportion F in subfield})^2 \end{aligned} \]

We tried a lot of other models with other features (e.g. decade, nonlinear transformations, etc) but this one was best in cross-validation among those without an explicit term for the topic of the subfield.

Logistic model for advisee gender

Estimate the odds that the next student produced by an advisor is female based on subfield, advisor gender, and representation of women in advisor group and subfield.

\[ \begin{aligned} \log (\text{odds F}) = & \beta_0 + &\quad \beta_0 &= -3.30 \; (0.05)\\ & \rho \times (\text{advisor is F}) + &\quad\rho &= \phantom{-}0.42 \;(0.02)\\ & \gamma_1 \times (\text{proportion F advisees in group}) + &\quad \gamma_1 &= \phantom{-}1.49\; (0.16)\\ & \gamma_2 \times (\text{proportion F advisees in group})^2 + &\quad \gamma_2 &= -0.51 \;(0.24)\\ & \eta_{1} \times (\text{proportion F in subfield}) + &\quad \eta_1 &= \phantom{-}4.16 \; (0.33)\\ & \eta_{2} \times (\text{proportion F in subfield})^2 &\quad \eta_2 &= \phantom{-}1.53 \; (0.51) \end{aligned} \]

Homophily effects: advisor-student and student-student

Both the gender of a students’ specific advisor and the overall proportion of female advisors in the subfield’s population contribute to the likelihood that the student is female.

High uncertainty in the model predictions for large \(p_F\) reflects the fact that we have very little data in that region.



Numerical estimation of stationary proportion

If \(p^*\) is the stationary proportion of graduates in the subfield, then \(p^*\) approximately satisfies \[ \begin{aligned} p^* = \color{#ffaf03}{w_f}\color{#ffaf03}{\sigma_f}(p^*) + \color{#5b427c}{w_m} \color{#5b427c}{\sigma_m}(p^*) \end{aligned} \]

  • \(\sigma_g(p^*)\): prob. next student of an advisor of gender \(g\) is female.
  • \(w_g\): proportion of students in subfield advised by advisors of gender \(g\) (estimated from advisor production model).

Mean-field assumption: advisor groups represent the subfield as a whole.



What might we expect near-term?








10 simulations initialized with 1,000 active advisors, 20% female.

















What could we do to make math a more inclusive profession?





Attrition Hypothesis

Addressing disparities in career attrition for female faculty would help to close the gender gap.

Approach

We can model equalizing career lengths and student production per year.

Some hypotheses

Mentorship

Female advisors are more effective in attracting or retaining female graduate students.

Belonging

Greater representation in the grad student community attracts women to programs and subfields.

Attrition

Addressing disparities in career attrition for female faculty would help to close the gender gap.

Leadership

A small number of influential women can dramatically change the culture of a department or research community.

Yes!

Female advisors are substantially more likely than male advisors to produce female PhD graduates.

Yes!

Subfields/advisor groups with greater representation of women tend to attract more women.

Yes!

This is an intrinsically inclusive, equitable goal AND may also accelerate progress by 10-20 years.

We’re exploring…

We’re developing models and data analysis to try to detect these effects in our data set.

Next?

Thanks everyone!


Heather Brooks
Harvey Mudd

Harlin Lee
UNC Chapel Hill

Mason Porter
UCLA

Juan G. Restrepo
CU Boulder

Anna Haensch
Tufts

Ben Brill
UCLA ’22

National Science Foundation

ICERM @Brown
Two summer programs!

Preprint coming soon 😬😬😬