# 4 Interlude: A Rapid Survey of Research Questions

In this short set of lecture notes, we’ll give a *very* brief tour of some of the most common research questions and methodologies posed in the network science literature. The purpose of these notes is to help you develop your project ideas and connect them with concrete data sources, models, and methodologies.

These notes are *nonexhaustive.* If you’re interested in a topic or methodology not covered here, that’s ok! Network science is a big place. Feel free to ask me about your ideas and we’ll chat.

## 4.1 Network Data Science

In many research settings, we aim to perform a data analysis or prediction task using a real-world network data set.

### Community Detection

The *community detection* (also called *graph clustering*) task asks us to partition a graph into clusters of related nodes. We’ve already talked a bit about community detection using the minimum-cut methodology, and we’ll see more sophisticated methods in Section 5 (see also Chapter 14 in Newman).

### Ranking and Centrality

Ranking and centrality measures are often used to determine nodes that are *important*, *central*, *powerful*, or *influential* in a graph. We saw examples of centrality measures based on counting walks in Section 1: betweenness centrality and Katz centrality.

We often talk about *centrality* in undirected graphs and *rank* in directed graphs, although this is not a hard-and-fast distinction.

Newman 7.1 has discussions of several types of centrality in networks.

### Link Prediction

In the *link prediction* problem, we are given some *partial* information about the edges in a network, and we would like to make a prediction about edges that we haven’t yet observed. For example, companies like Facebook and Twitter would love to know how to predict which users will become friends or followers of other users.

There are *many* ways to perform link prediction. A very classical pipeline looks like this:

- For each node, compute some array of
*features*. Examples of features commonly used in network analysis include:- Any metadata attached to the node (demographics, expressed preferences).
- The degree of the node.
- The centrality of the node, using any centrality method.
- Average demographics of the
*neighbors*of the node. - And lots more!

- Using this array of features, train a statistical or machine learning model on all pairs of nodes to classify whether an edge is present or not, using the computed features.
- Finally, use this model to make predictions about unseen possible edges.

Link prediction is often used as a way to “validate” other kinds of network data science tasks, such as community detection and centrality. When clusters or centrality scores are predictive of future or unseen edges, this can be a reasonable sign that those measures give real insight into the structure or function of the network.

### Node Attribute Prediction

In a rather infamous study, Jernigan and Mistree (2009) found that publicly available information on Facebook could be used to predict sexual orientation. The idea is very simple: they found that gay male users were likely to be friends with other gay male users, and therefore the percentage of a male user’s friends who identify as gay men can be used to predict the orientation of the user himself. This study raises grave concerns about the effective involuntary disclosure of one’s sexual orientation through social networks. Such disclosure can have consequences for one’s social life, political rights, or ability to live safely in many parts of the world.

From a technical standpoint, this study was an early and influential example of *node attribute prediction*. In the node attribute prediction problem, the aim is to predict some unknown information about a node, given information about some other nodes in the network. Often, a good way to do this is by computing average values of features of nodes connected to the target node. Graph neural networks are also an increasingly popular way to perform node attribute prediction tasks.

### Where to get data?

I’ve listed links to several repositories of network data in the appendix. There are lots of other good ways to get data. In many cases, if you see a paper on a topic that interests you, you can find a link to a repository containing the data that they used.

## 4.2 Network Modeling

In many cases, it may be expensive, unethical, or scientifically impossible to obtain exactly the data that we want. In such cases, we often turn to mathematical or computer models of networks. We can also approach these problems with explicitly mathematical interest, aiming to prove theorems about them in certain special cases. You’ve seen some flavor of this approach in the recent course content on random graphs. A third reason to study mathematical models of networks is that some, such as the stochastic blockmodel which we will study in Section 5, are foundations for important network algorithms.

### Processes Evolving on Networks

#### Opinion Dynamics

In *opinion dynamics,* nodes have an opinion about some topic which changes in response to interaction with other nodes. The opinion of each node is often either binary (0 or 1, as in the voter model) or real-valued (as in the DeGroot and bounded confidence models).

#### Epidemic Spread

Epidemic spreading models are similar in some ways to opinion models, with nodes changing states in response to interactions to other nodes. The primary difference is that the states are more complicated. There is a wide range of different modeling assumptions to consider, some of which may be more appropriate for certain kinds of disease. Epidemic models were first considered as differential equations (the perspective emphasized here), but much recent work has studied them on social networks (see Newman, Chapter 16).

Recently, there’s been a lot of interest in the topic of *misinformation spreading.* Many techniques from epidemic modeling have been used in this area as well.

### Adaptive Networks

In the models described in the previous section, what changes over time is a property of the nodes. In adaptive network models, the structure of the network can also change. Adaptive network models have become very popular in modeling epidemic spread. For example, suppose that when a node becomes infected, it also practices **physical distancing**. We might model this as a temporary reduction in that nodes’ connectivity to the rest of the network, and would hope that this behavior could reduce overall epidemic severity.

### Agent-Based Modeling

Many of the models we study in the context of social networks can be viewed as *agent-based models.* An agent-based model consists of a set of agents and a collection of rules determining how those agents interact with each other. Many agent-based models assume that the agents are situated in a network, and interact only with their neighbors on the network. Arbitrarily complex decision-making rules can be expressed through the agent-based modeling framework. The Mesa package for Python is one of many frameworks for implementing agent-based models in programming languages.

## 4.3 “Fancy” Networks

So far in this course, we’ve focused on simple, undirected networks. However, lots of networks have more structure than this, and motivate different kinds of analyses.

**Multilayer**networks have edges of different categories (Kivelä et al. 2014). An example of a multilayer network is an urban transportation network, which might include layers for both streets and subways. Another kind of multilayer network is the social network of UCLA, where edges of different types could represent relationships like friendship, co-membership in clubs, enmity, collaboration on a project, etc. etc.

**Temporal**networks have edges that evolve over time (Holme and Saramäki 2012). Often, these edges represent timestamped interactions. For example, in a network of email communications, each email was sent at a certain time, and the timing of emails is important to the flow of information through the graph.**Directed**networks have edges which express asymmetric relationships between nodes. For example, in animal behavior networks, an edge could represent a fight between two animals in a colony, with the edge “pointing” at the winner and away from the loser.

Some networks are multilayer, temporal, *and* directed. For example, consider Twitter.

**Multilayer**: Multiple types of networked interaction are possible on Twitter, including following, retweeting, blocking, and replying.**Temporal**: Following and retweeting are specific actions that take place at a specific time. The timing of these actions can be important for their interpretation.**Directed**: interactions like following and retweeting are asymmetric – if I follow you, that doesn’t imply that you follow me.

Another important kind of fancy network is a **polyadic** or **higher-order** network. In polyadic networks, edges can join more than two nodes simultaneously. Polyadic networks are most commonly represented using hypergraphs and simplicial complexes. Bick et al. (2021) give a good overview of many of the math and modeling questions associated with these kinds of models.