Edge-Correlated Growing Hypergraphs

BRAINS Network Seminar | Paris and Middlebury, VT | October 3rd, 2024

Phil Chodrow
Department of Computer Science
Middlebury College

Hi everyone!

I’m Phil Chodrow. I’m an applied mathematician by training. These days my interests include:

Network models + algorithms
Dynamics on networks
Data science for social justice
Undergraduate pedagogy

Hi everyone!

I’m a new(ish) assistant professor of computer science at Middlebury College in Middlebury, Vermont.

Middlebury is a small primarily-undergraduate institution (PUI) in Vermont, USA.

Hypergraphs

A hypergraph is a generalization of a graph in which edges may contain an arbitrary number of nodes.

XGI let’s gooooo

What’s special about hypergraphs?

What can we observe or learn in hypergraph structure that we couldn’t observe or learn in a dyadic graph?

One answer for today: 2-edge motifs.

XGI let’s gooooo

Motifs in graphs

Milo et al. (2002). Network Motifs: Simple Building Blocks of Complex Networks. Science

Motifs in graphs

Milo et al. (2002). Network Motifs: Simple Building Blocks of Complex Networks. Science

2-edge motifs in undirected graphs

2-edge motifs in hypergraphs

Claim: What’s special about the structure of hypergraphs is that they have diverse, undirected, two-edge motifs: intersections.

How common are large intersections in empirical hypergraphs?

For the remainder of this talk, we’ll consider hypergraphs in which each edge has a discrete time-index: \(e_0, \; e_1, \ldots, e_t,\ldots\).

One way to measure the prevalence of intersections is to define an temporal intersection profile: the probability of an intersection of size \(k\) when picking two uniformly random edges.

\[r_k^{(t)} \triangleq \eta_t(\lvert e\cap f \rvert = k)\;,\]

where \(\eta_t\) is the empirical law of the hypergraph \(H^{(t)}\) containing only edges with indices up to \(t\) and \(e\), \(f\) are sampled uniformly from \(H^{(t)}\).

Sometimes there is a natural indexing of edges via temporal timestamps; other times we need to assign indices arbitrarily.

Intersections

In empirical temporal hypergraphs, the intersection profile \(r_k^{(t)}\) often decays at rates close to \(t^{-1}\) or \(t^{-2}\), regardless of \(k\).

If we were generating edges at random, we would need \(k\) independent events to form an intersection of size \(k\), yielding to decay \(t^{-k}\). 🧐

Benson, Abebe, Schaub, and Kleinberg (2018). Simplicial closure and higher-order link prediction. PNAS

Chodrow (2020). Configuration models of random hypergraphs. JCN

Landry, Young, and Eikmeier (2023). The simpliciality of higher-order networks. EPJ Data Sci.

Dashed lines give slope of \(t^{-1}\) and \(t^{-2}\) decay.

What models of growing hypergraphs can generate large intersections?

Are these models usefully predictive of future edges?

Are they mathematically tractable?

Ongoing work with…

Xie He
Mathematics
Dartmouth College

Peter Mucha
Mathematics
Dartmouth College

Mechanistic, interpretable, learnable models of growing hypergraphs.

Observation: New edges are usually pretty similar to old edges.

General: Form new edges by noisily copying old edges.

Benson, Kumar, and Tomkins (2018). Sequences of Sets. KDD

Benson, Abebe, Schaub, and Kleinberg (2018). Simplicial closure and higher-order link prediction. PNAS

Figure from Benson, Kumar, and Tomkins (2018).

In each timestep \(t\)…

Select an edge \(e\) uniformly at random from \(\mathcal{E}_t\).

From \(e\), select a \(\mathrm{Binomial}(\alpha, |e|)\) number of nodes, conditioned on selecting at least one node.

Add \(h\) novel nodes with probability \(\gamma_h\) (parameter).

Add \(\ell\) nodes from hypergraph with probability \(\gamma_\ell\) (parameter).

Form edge

Repeat

Formally,

Parameters: \(\alpha \in [0,1]\), \(\boldsymbol{\beta} \in \mathcal{P}^{k}\), \(\boldsymbol{\gamma} \in \mathcal{P}^{k}\)

Start with an empty edge \(f = \emptyset\)
Select an edge \(e \in H\).
Accept each node from \(e\) into \(f\) with probability \(\alpha\) (condition on at least one).
Add \(h\) novel nodes w.p. \(\beta_h\).
Add \(\ell\) nodes from \(H \setminus e\) w. p. \(\gamma_\ell\).

What can we say about the structure of this model?

Parameters: \(\alpha \in [0,1]\), \(\boldsymbol{\beta} \in \mathcal{P}^{k}\), \(\boldsymbol{\gamma} \in \mathcal{P}^{k}\)

Start with an empty edge \(f = \emptyset\)
Select an edge \(e \in H\).
Accept each node from \(e\) into \(f\) with probability \(\alpha\) (condition on at least one).
Add \(h\) novel nodes w.p. \(\beta_h\).
Add \(\ell\) nodes from \(H \setminus e\) w. p. \(\gamma_\ell\).

Let \(\mu_{\boldsymbol{\beta}}\) be the expected number of novel nodes added.

Let \(\mu_{\boldsymbol{\gamma}}\) be the expected number of nodes added from \(H\setminus e\).

Calculation: mean edge size

Let the mean edge size be \(\langle k \rangle\).

The expected size of a new edge is then \(1 + \alpha \langle k \rangle + \mu_{\boldsymbol{\beta}} + \mu_{\boldsymbol{\gamma}}\).

At stationarity, the expected size of a new edge must equal to the mean edge size:

\[ \begin{aligned} \langle k \rangle &= 1 + \alpha \langle k \rangle + \mu_{\boldsymbol{\beta}} + \mu_{\boldsymbol{\gamma}} \\ \langle k \rangle &= \frac{1 - \alpha + \mu_{\boldsymbol{\beta}} + \mu_{\boldsymbol{\gamma}}}{1-\alpha}\;. \end{aligned} \]

What can we say about the structure of this model?

Accept each node from \(e\) into \(f\) with probability \(\alpha\) (condition on at least one).

So, at least one node in each new edge is selected with probability proportional to its degree!

🧐

Power-law Heavy-tailed degree distribution

Proposition (He, Chodrow, Mucha ’24): As \(t\) grows large, the degrees of \(H_t\) converge in distribution to a power law with exponent
\[ p = 1 + \frac{1-\alpha +\mu_{\boldsymbol{\beta}} +\mu_{\boldsymbol{\gamma}} }{1-\alpha(1 + \mu_{\boldsymbol{\beta}} +\mu_{\boldsymbol{\gamma}} )}\;. \]

Notation:

\(\alpha\): retention rate of nodes in sampled edge \(e\).
\(\mu_{\boldsymbol{\beta}}\): expected number of nodes added to new edge.
\(\mu_{\boldsymbol{\gamma}}\): expected number of nodes from \(H\setminus e\) added to edge.

Some connections to preferential attachment hypergraphs:

Avin, Lotker, Nahum, and Peleg (2019). Random preferential attachment hypergraph. ASONAM

Roh and Goh (2023) Growing hypergraphs with preferential linking. J. Korean Phys. Soc.

Exponents Predicted from Fits to Data

Important: these slopes are not directly fit to the degree distributions at all!

What does it mean to fit the model to data? We’ll talk about that soon.

Edge Sizes

It is also possible to extract the edge-size distribution under our model by solving an eigenproblem \(\mathbf{s} = \mathbf{A}\mathbf{s}\), where \(\mathbf{A}\) is determined by model parameters.

What does it mean to fit the model to data? We’ll talk about that soon.

Edge Intersections

Proposition (He, Chodrow, Mucha ’24): There exist constants \(q_k\) such that, as \(t\rightarrow \infty\),

\[ \begin{aligned} r_{k}(t) \simeq \begin{cases} q_k & \quad \text{if } k \leq k_0\\ t^{-1} q_k &\quad \mathrm{otherwise}\;. \end{cases} \end{aligned} \]

Here, \(q_k = \sum_{ij} q_{ijk}\) and \(\{q_{ijk}\}\) solve the system

\[ q_{ijk} = \begin{cases} \frac{1}{2}\sum_{\ell} \left(q_{\ell j 0} \alpha_{i0|\ell j0} + q_{j \ell 0} \alpha_{i0|j\ell 0}\right) & \quad k = 0\\ \frac{1}{2}\beta_{ik|j} \sum_{\ell} (q_{\ell j0} + q_{j\ell 0}) + \frac{1}{2}\sum_{\ell, h\geq k} \left(q_{\ell j h} \alpha_{ik|\ell jh} + q_{j \ell h} \alpha_{ik|j\ell h}\right) &\quad k \geq 1\;, \end{cases} \]

where \(\alpha_{ik|\ell jh}\) and \(\beta_{ik|j}\) are constants determined by the model parameters.

These come out of linearized compartmental equations.

Edge intersections

Proposition: There exist constants \(q_k\) such that, as \(t\rightarrow \infty\),

\[ \begin{aligned} r_{k}(t) \simeq \begin{cases} q_k & \quad \text{if } k \leq k_0\\ t^{-1} q_k &\quad \mathrm{otherwise}\;. \end{cases} \end{aligned} \]

Learning from data

Aim: given the sequence of edges \(\{e_t\}\), estimate:

\(\alpha\), the edge retention rate.
\(\boldsymbol{\beta}\), the distribution of the number of novel nodes added to each edge.
\(\boldsymbol{\gamma}\), distribution of the number of pre-existing nodes added to each edge.

Expectation maximization:

For each edge \(e_t\), form a belief about which prior edge \(f \in H_{t-1}\) \(e_t\) was sampled from.
Maximize the expected complete log-likelihood with respect to \(\alpha\), \(\boldsymbol{\beta}\), and \(\boldsymbol{\gamma}\) under this belief.

Parameter Fits

Social interaction networks tend to have high rates of edge-copying \(\eta\), with smaller rates of novel nodes (\(\mu_{\boldsymbol{\beta}}\)) and extant nodes \((\mu_{\boldsymbol{\gamma}}\)).

Coauthorship networks have moderate edge-copying with higher rates of novel nodes (\(\mu_{\boldsymbol{\beta}}\)) and extant nodes \((\mu_{\boldsymbol{\gamma}}\)).

Biological and information data sets are more scattered.

Simple models can be predictive!

We computed AUC scores for a link prediction task in which our model learns parameters on 20% training data from three metabolic and organic reaction data sets with edges up to size 5. We compared these against previously-reported AUC scores on the same task for two dedicated neural network hypergraph link prediction algorithms.

Yadati et al. (2020) NHP: Neural Hypergraph Link Prediction. CIKM

Yang et al. (2023) LHP: Logical hypergraph link prediction. Expert Systems with Applications

	\(n\)	\(m\)	Our model	NHP AUC	LHP AUC
iAF1260b	1,668	2,084	0.643	0.582	0.639
iJO1366	1,805	2,253	0.769	0.599	0.638
USPTO	16,293	11,433	0.550	0.662	0.733

We are competitive with hypergraph neural networks using an 11-parameter model!

Summing up

Hypergraphs are locally distinct from graphs in that they have interesting two-edge motifs (intersections).

Learnable, mechanistic models of hypergraph formation allow us to both model this phenomenon and achieve competitive performance on link prediction problems.

arXiv coming soon 😬😬😬

Thanks everyone!

Xie He
Dartmouth College

Peter Mucha
Dartmouth College