Using Digital Trace Data in the Social Sciences, University of Konstanz (Summer 2018)

Instructor: Andreas Jungherr

Week 10: Sample Analyses: Networks

In this session, we will focus on the use of Twitter data in network analysis. Network analysis offers a powerful perspective on social processes with a focus on the influence of structures on outcomes and processes. Social media data offers a powerful base for research following a network analytical framework.

As a conceptual background to network analyis, I would recommend:

Easley, D. & Kleinberg, J. (2010). Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge, UK: Cambridge University Press.
Jackson, M. O. (2008). Social and economic networks. Princeton, NJ: Princeton University Press.

For the mathematics underlying network analysis:

Kolaczyk, E. D. (2009). Statistical analysis of network data: Methods and models. Cham, CH: Springer. doi:10.1007/978-0-387-88146-1.

For running network analysis in R:

Kolaczyk, E. D. & Csárdi, G. (2014). Statistical analysis of network data with r. Cham, CH: Springer.

And as a primer for network analysis:

Ognyanova, K. (2017). Network visualization with r.

Code Examples:

Before we begin, let us prepare our workspace:

cd "/Users/(...)/twitterresearch"

ipython

import examples
import database
import json

Prepare a network file

Now let’s start on working with network data. As a first step, let’s create .graphml files of retweet and @mention networks in messages you collected.

import network

Now, let’s create a retweet network:

network.retweet_links()

And now, let’s do the same for @replys

network.reply_links()

@ReTweet Networks

Now, we are ready to work with the network data in R. For this, start R Studio and call:

setwd(".../twitterresearch")

install.packages("igraph")
library(igraph)

rt.net <- read.graph("retweets.graphml", format="graphml")
summary(rt.net)

Edges and nodes (here called vertices) can be accessed with the following commands:

This one shows a list of vertex IDs:

V(rt.net)

This one shows a list of edges with source ID-> target ID notation:

E(rt.net)

Let us now look for specific users. So, where is Donald Trump in our network:

trump <- which(V(rt.net)$label == "realDonaldTrump")

And what are his network metrics, like degre…

degree(rt.net, v=trump)

… and indegree.

degree(rt.net, v=trump, mode="in")

Now, let’s write the degree as a node attribute:

V(rt.net)$degree <- degree(rt.net)

The same can be done for in-degree:

V(rt.net)$indegree <- degree(rt.net, mode="in")

To visuall inspect the degree distribution we can plot it in a log-log plot

plot(degree.distribution(rt.net, mode="in"), log="xy")

Let’s continue by sorting users by indegree

V(rt.net)[order(-indegree)]

Now, who are the top 10 users by indegree? Let’s identify them and print their names.

V(rt.net)[order(-indegree)]$label[1:10]

By creating a small subset of network graphs, we can gain quick and intuitive insights withoht being flooded by information of the whole network. For convenience let’s focus on the top 25 nodes by indegree. You could use any othe characteristic of nodes to define a subset of interest to you.

Let’s get the top 25 nodes:

top25.nodes <- V(rt.net)[order(-indegree)][1:25]

Now, we create a small graph by deleting all nodes except the top 25.

small.rt.net <- delete.vertices(rt.net, which(!V(rt.net) %in% top25.nodes))

To plot the network graph we choose a layout determined by the Fruchterman Rheingold algorithm. This one works nicely for small graphs.

l <- layout_with_fr(small.rt.net)

And plot it with corresponding labels:

plot(small.rt.net, l)

We can do the same by focusing on a different metric identifying prominence in the network by another criteria instead of raw indegree counts, such as betweeness centrality…

V(rt.net)$betweenness <- betweenness(rt.net)
top25.nodes <- V(rt.net)[order(-betweenness)][1:25]
small.rt.net <- delete.vertices(rt.net, which(!V(rt.net) %in% top25.nodes))
l <- layout_with_fr(small.rt.net)
plot(small.rt.net, l)

… or closeness centrality.

V(rt.net)$closeness <- closeness(rt.net, mode="in")
top25.nodes <- V(rt.net)[order(-closeness)][1:25]
small.rt.net <- delete.vertices(rt.net, which(!V(rt.net) %in% top25.nodes))
l <- layout_with_fr(small.rt.net)
plot(small.rt.net, l)

@Reply Networks

In analyzing @Reply networks, we follow the same steps as before:

reply.net <- read.graph("replies.graphml", format="graphml")

summary(reply.net)
V(reply.net)$indegree <- degree(reply.net, mode="in")
V(reply.net)[order(-indegree)]$label[1:10]
V(reply.net)$betweenness <- betweenness(reply.net)
V(reply.net)$closeness <- closeness(reply.net, mode="in")
V(reply.net)$closeness.out <- closeness(reply.net, mode="out")

top25.nodes <- V(reply.net)[order(-closeness)][1:25]
small.reply.net <- delete.vertices(reply.net, which(!V(reply.net) %in% top25.nodes))
l <- layout_with_fr(small.reply.net)
plot(small.reply.net, l)

As an alternative, you could simply load the .graphml file into Gephi or your network analysis program of choice.

As before, this is only dipping our toes into the potentials of network analyis. Make sure to dive deeper according to your interests.

Required Readings:

Jürgens, P. & Jungherr, A. (2016). A tutorial for using twitter-data in the social sciences: Data collection, preparation, and analysis. Social Science Research Network (SSRN). doi:10.2139/ssrn.2710146. (pp. 68-79).

Background Readings:

Easley, D. & Kleinberg, J. (2010). Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge, UK: Cambridge University Press.
Jackson, M. O. (2008). Social and economic networks. Princeton, NJ: Princeton University Press.
Kolaczyk, E. D. (2009). Statistical analysis of network data: Methods and models. Cham, CH: Springer. doi:10.1007/978-0-387-88146-1.
Kolaczyk, E. D. & Csárdi, G. (2014). Statistical analysis of network data with r. Cham, CH: Springer.
Ognyanova, K. (2017). Network visualization with r.

Week 9—Week 14

back