Eigenvector Methods for Community Detection in Hypergraphs

# Eigenvector Methods for Community Detection in Hypergraphs
## Claremont Center for the Mathematical Sciences Applied Math Seminar
### Dr. Phil Chodrow Dept. of Mathematics, UCLA Sept 20th, 2021

---

exclude: true 
<style type="text/css">
code.r{ 
 font-size: 16px; 
}
pre {
 font-size: 16px !important; 
}
</style>

---

.column.bg-main1[
.content[ 
 ## Graphs and Hypergraphs 
 
 
 A .alert[graph] consists of a set of nodes `$\mathcal{N}$` and a set of edges `$\mathcal{E}$`. Each edge in `$\mathcal{E}$` is a set of two nodes.

In .alert[hypergraphs], edges in `$\mathcal{E}$` can contain *any number* of nodes. 
]

]

---

.column.bg-main1[
  
  ## Hypergraph Data

]

.column.bg-main2[.vmiddle[
 
]]

---

.column.bg-main1[
  
  ## Hypergraph Data

- .alert[**Interaction**]: nodes are agents, edges are interaction events (socializing in groups, attending events).

]

.column.bg-main2[.vmiddle[
 <img src="img/social-interaction.jpeg" width=100%></img> 
]]

---

.column.bg-main1[
  
  ## Hypergraph Data

]

.column.bg-main2[.vmiddle[
 <img src="img/teamwork.jpeg" width=100%></img> 
]]

---

.column.bg-main1[
  
  ## Hypergraph Data

- .alert[**Interaction**]: nodes are agents, edges are interaction events (socializing in groups, attending events).
  - .alert[**Collaboration**]: nodes are collaborators, edges are projects or teams (scholarly papers, legislation, etc). 
  - .alert[**Co-presence**]: nodes are chemical compounds, edges are drugs formed from those compounds.

]

.column.bg-main2[.vmiddle[
 <img src="img/mccoy.png" width=100%></img> 
]]

---

.column.bg-main1[
  ### The Hypergraph Community Detection Problem

Given some hypergraph data, assign each node to a .alert[**community**] (or "cluster") of "related" nodes. 
 
 "*Related*": often interpreted as "*densely interconnected.*"
 
 Applications in social network analysis, drug discovery, image processing, data visualization...

<div class="footnote">
 .alert2[One review in]: PSC, N. Veldt, A. R. Benson, (2021). Generative hypergraph clustering: from blockmodels to modularity, Science Advances, 7:eabh1303
</div>

]
.column[.content.vmiddle[.stretch[
 <img src="img/detection-1.png" width=100%>
]]]

---

layout: true
class: split-two middle 
 
.column[
  .split-three[ 
  .row.bg-main1[.content.vmiddle[.font_medium[  
.alert[**Graph community detection**] with eigenvectors.
  ]]]     
  .row.bg-main2[.content.vmiddle[.font_medium[
.alert[**Hashimoto operators**] and eigenvector methods for hypergraphs.  
  ]]] 
  .row.bg-main3[.content.vmiddle[.font_medium[ 
.alert[**Detectability thresholds**] and open questions. 
  ]]]
]]

.column[.center[.stretch[
  {{content}} 
]]]
 
---
class: hide-row2-col1 hide-row3-col1 hide-row4-col1 hide-row5-col1

<div class="footnote">
 Image from <a href="http://allthingsgraphed.com/2014/10/09/visualizing-political-polarization/"> All Things Graphed </a> 
 </div>

---
class: hide-row3-col1 hide-row4-col1 hide-row5-col1 
 
<img src="img/hypergraph-nonbacktracking.png" width=100%>

---
class: hide-row4-col1 hide-row5-col1 
 
<img src="img/heatmap-exp-1-with-curve.png" width=100%>

---

<div class="footnote">
 Image from <a href="http://allthingsgraphed.com/2014/10/09/visualizing-political-polarization/"> All Things Graphed </a> 
 </div>
---

class: bg-main2
layout: false
background-image: url(img/opportunity.jpeg)
background-size: contain

---

## ...the adjacency matrix
 
 .center[
 
 <img src="img/sample-matrix.png" width=100%>
 ]

]

---

.column.bg-main1[
  ## Modeling Clustered Graphs

Take `$n$` nodes and divide them into two groups `$a$` and `$b$`.

For each pair of nodes `$i$` and `$j$`, draw an edge with probability

`$$p_{ij} = \begin{cases} p &\quad i,j \text{ are in the same group} \\ q &\quad i,j \text{ are in different groups} \end{cases}$$`

We can represent the result as an adjacency matrix `$\mathbf{A}$`, where

`$$a_{ij} = \begin{cases} 1 &\quad (i,j) \in \mathcal{E} \\ 0 &\quad  \text{otherwise} \end{cases}$$`
]

<img src="img/sbm-matrix.png" width=80%>
]]
]

---

.column.bg-main1[
  ### Clusters from Eigenvectors

`$\mathbf{v}_1 = {\underbrace{(1,1,\ldots,1,1)}_{2n \text{ copies}}}^T$` 
 `$\lambda_1 = \frac{n}{2}(p + q)$`.

`$\mathbf{v}_2 = (\underbrace{1,1,\ldots,1}_{n \text{ copies } (a)}, \underbrace{-1,-1,\ldots,-1}_{n \text{ copies } (b)})^T$` 
 `$\lambda_2 = \frac{n}{2}(p - q)$`.

.alert[Results from random matrix theory]: the "interesting" eigenvalues/vectors of `$\mathbf{A}$` are close to the "interesting" eigenvalues/vectors of `$\mathbf{P}$` with high probability as `$n$` grows large.  
]
]

<img src="img/sbm-matrix.png" width=80%>
]]
]

---

.column.bg-main1[
  ## An Algorithm

Suppose we don't know the cluster labels. We can estimate them using `$\mathbf{v}_2$`.

1. Compute the second-largest eigenvector `$\mathbf{v}_2$` of `$\mathbf{A}$`. 
2. If `$v_{2i} > 0$`, guess that node `$i$` is in group `$a$`, otherwise in group `$b$`.

Variations on this work for other graph matrices. 
]

<img src="img/sbm-clustering.png" width=75%>
]]

---

.column.bg-main1[
  # Summing Up

Graphs represent .alert[pairwise] interactions and can be easily represented by .alert[2D] objects like adjacency matrices.

In a clustered graph, the 2nd eigenvector of the adjacency matrix can reveal cluster structure.

So now we'd like to .alert2[generalize] to hypergraphs...
]
.column[
.center[
 <img src="img/pol-blogs.png" width=70%>
]
]

---

layout: true
class: split-two middle 
 
.column[
  .split-three[ 
  .row.bg-main1[.content.vmiddle[.font_medium[  
.alert[**Graph community detection**] with eigenvectors. 
  ]]]     
  .row.bg-main2[.content.vmiddle[.font_medium[
.alert[**Hashimoto operators**] and eigenvector methods for hypergraphs.  
  ]]] 
  .row.bg-main3[.content.vmiddle[.font_medium[ 
.alert[**Detectability thresholds**] and open questions. 
  ]]]
]]

---
class: fade-row1-col1 fade-row3-col1
 
<img src="img/hypergraph-nonbacktracking.png" width=100%>

---

.column.bg-main1[

### Matrices for Hypergraphs?

We could transform the hypergraph into a graph.
- .alert[Problem]: loses higher-order information.

We could construct a set of adjacency tensors `$\mathbf{A}^{(2)}$`, `$\mathbf{A}^{(3)}$`, `$\mathbf{A}^{(4)}$`...

`$$a^{(3)}_{ijk} = \begin{cases} 1 &\quad (i,j,k)\in \mathcal{E} \\ 0 &\quad \text{otherwise...}\end{cases}$$`

- .alert[Problem]: we know eigenvectors of tensors, but not .alert2[sets] of tensors.

So, uh, what should we do?....

]

---
class: split-50 bg-main1 
layout: false 
 
.row[ 
.split-three[
.column[ 
 <img src="img/jamie_portrait.jpeg" width=90%> 
 ]
.column[ 
 <img src="img/eikmeier-3.png" width=90%> 
]
.column[ 
 <img src="img/phil_portrait.jpeg" width=90%> 
]

]
]
.row[ 
.split-three[
.column[ 
 .font_large[.alert-no-bold[Jamie Haddock]]
 Mathematics Harvey Mudd College 
 .alert2[@jamie_hadd]
] 
.column[ 
 .font_large[.alert-no-bold[Nicole Eikmeier]]
 Computer Science Grinnell College 
 .alert2[@NicoleEikmeier]
]
.column[ 
 .font_large[.alert-no-bold[Phil Chodrow]]
 Mathematics UCLA
 .alert2[@PhilChodrow] 
]
]
]

---

.column.bg-main1[
## The Hashimoto Operator

The adjacency matrix is `$n\times n$` and  operates on nodes.

The .alert[Hashimoto operator] operates on .alert2[edge-node pairs]:

Let `$(e_1, p_1) \rightarrow (e_2, p_2)$` if:

- `$p_1 \in e_1$` and `$p_2 \in e_2$`
- `$p_1 \in e_2 \setminus p_2$`
- `$e_1 \neq e_2$`

Then,

.font_smaller[ .font_smaller[
`$$\mathbf{B}[(e_1, p_1), (e_2, p_2)] = \begin{cases} 1 &\quad (e_1, p_1) \rightarrow (e_2, p_2) \\ 
0 &\quad \text{otherwise.}\end{cases}$$`
]]]

"I can get to `$p_2 \in e_2$` from `$e_1$` by passing through `$p_1$`. I can get to `$p_3 \in e_3$` from `$e_2$` by passing through `$p_2$`..."

]

---

.column.bg-main1[
## The Hashimoto Operator

Popularized (for graphs) in Hashimoto K. (1990), *Int. J. Math.*

First formulated for hypergraphs by Storm, C. K. (2006). *The Electronic Journal of Combinatorics*.

"Rediscovered" for hypergraphs by Angelini, M. C., Caltagirone, F., Krzakala, F., & Zdeborová, L. (2015), *Allerton Conference*.

*Also often called the "nonbacktracking operator/matrix."*

]

"I can get to `$p_2 \in e_2$` from `$e_1$` by passing through `$p_1$`. I can get to `$p_3 \in e_3$` from `$e_2$` by passing through `$p_2$`..."
]

---

.column.bg-main1[
## The Hashimoto Operator

Connected to prime cycles and zeta functions on graphs.

Can represent hyperedges of all sizes in the same matrix!

.alert[Second eigenvector] is correlated with communities, if corresponding eigenvalue is real.  
  - Needs to be aggregated to node level
 
]

---

.column.bg-main1[
## Issue \# 1: Computation

`$\mathbf{B}$` is indexed by edge-node pairs.

So, `$\mathbf{B}$` is of size `$m\langle k\rangle \times m\langle k\rangle$`, where `$m$` is the number of edges and `$\langle k \rangle$` is the average edge size.

A .alert[*small*] data set might have `$n = 300$` nodes, `$m = 8,000$` edges, and average edge size `$2.5$`.

So, `$m\langle k \rangle = 8,000 \times 2.5 = 20,000$`, which is already a pretty big matrix.

So...

]

---

class: bg-main2
background-image: url("img/large-matrix.jpeg")
background-size: contain

---

## Linear Algebra to the Rescue

**Theorem (PSC, JH, NE '21)**: Under mild conditions, if `$\lambda$` is an eigenvalue of `$\mathbf{B},$` then either:

1. `$\lambda \in \{1, -1, -2, \ldots, 1-\bar{k}\}$` and carries no structural information about the hypergraph, or
2. `$\lambda$` is an eigenvalue of the matrix

$$
\mathbf{B}' = \left[\begin{matrix}
 
 
\end{matrix}\right] \in \mathbb{R}^{2\bar{k}n\times 2\bar{k}n}\;.
$$

.font_smaller[.font_smaller[
- `$\bar{k}$` is the number of distinct edge sizes, `$n$` is the number of nodes. 
- `$\mathbb{A} \in \mathbb{R}^{\bar{k}n\times \bar{k}n}$` collects adjacency information for each hyperedge size.
- `$\mathbb{D} \in \mathbb{R}^{\bar{k}n\times \bar{k}n}$` collects node degrees for each hyperedge size. 
- `$\mathbf{K} \in \mathbb{R}^{\bar{k}\times \bar{k}}$` lists possible edge sizes. 
- `$\mathbf{I}_{\ell} \in \mathbb{R}^{\ell\times \ell}$` is the matrix identity of size `$\ell$`. 
- `$\otimes$` is the Kronecker product. 
]]

---

## Proof Sketch

1. `$\mathbf{B}$` can be written as `$\mathbf{S}\mathbf{T} - \mathbf{R}$` for suitable operators `$\mathbf{S}$`, `$\mathbf{T}$` and `$\mathbf{R}$`, which also satisfy handy relations like `$\mathbf{T}\mathbf{S} = \mathbb{A}$`.  
2. Consider `$\det(\lambda\mathbf{I} - \mathbf{B})$`, substitute `$\mathbf{B} = \mathbf{S}\mathbf{T} - \mathbf{R}$`, and use the *push-through identity*:
$$
\det(\mathbf{X + \mathbf{Y}\mathbf{Z}}) = \det(\mathbf{X}) \det(\mathbf{I} + \mathbf{Z}\mathbf{X}^{-1}\mathbf{Y})
$$
(*provided all inverses, sums, and products are defined*). 
3. Battle through a .alert1[**LOT**] of algebraic simplifications, obtaining 
$$
\det(\lambda \mathbf{I} - \mathbf{B}) = \det(\text{boring part})\det(\lambda\mathbf{I} - \mathbf{B}')\,.
$$

.footnote[Approach based on a proof of the the graph Ihara-Bass formula in: M. C. Kempton (2016). Non-backtracking random walks and a weighted Ihara’s theorem. *Open Journal of Discrete Mathematics* 6, 207-226
]

---

.column.bg-main1[
## Issue \# 1: Computation

A .alert[*small*] data set might have `$n = 300$` nodes, `$m = 8,000$` edges, and average edge size `$2.5$`.

If `$\bar{k} = 3$`, then can compute eigenvectors in

`$$2n\bar{k} = 1,800 \color{#FFD046}{\ll} 20,000 = m\langle k\rangle$$`

dimensions instead.

We can do that 100x-1,000x faster! 
]

---

.column.bg-main1[

## First Algorithm

2. Compute the second eigenpair `$(\lambda_2, \mathbf{v}_2)$` of `$\mathbf{B}'$`. 
 3. If `$\lambda_2$` is .alert[real], separate `$\mathbf{v}_2 = (\alpha, \beta)$`, with `$\alpha, \beta \in \mathbb{R}^{n\bar{k}}$`. 
 4. If
 `$$u_i = \sum_{k = 1}^{\bar{k}}\alpha_{ik} < 0\;,$$`
 assign `$i$` to cluster `$A$`, else assign `$i$` to cluster `$B$`.

]

---

---

---

---

.column.bg-main1[
## Issue \# 2: Edge Sizes

Our first spectral algorithm only works well when edges of different sizes carry .alert[similar types of information] about the clusters.

What if edges of different sizes mean different things?

.alert2[Example]: small interactions might be very likely to be within a cluster, but larger interactions might be more likely to be *between* clusters.

]

---