Visual Analytics: Comparison of @SCOREProject and @UKOER (and template for making your own)

Martin Hawksey

12 years ago

Lou McGill from the JISC/HEA OER Programme Synthesis and Evaluation team recently contacted me as part of the OER Review asking if there was a way to analyse and visualise the Twitter followers of @SCOREProject and @ukoer. Having recently extracted data for the @jisccetis network of accounts I knew it was easy to get the information but make meaningful was another question.

There are a growing number of sites like twiangulate.com and visual.ly that make it easy to generate numbers and graphics. One of the limitations I find with these tools is they produce flat images and all opportunities for ‘visual analytics’ is lost.

Twiangulate data

create infographics with visual.ly

So here’s my take on the problem. A template constructed with free and open source tools that lets you visually explorer the @SCOREProject and @ukoer Twitter following.

In this post I’ll give my narrative on the SCOREProject/UKOER Twitter followership and give you the basic recipe for creating your own comparisons (I should say that the solution isn’t production quality, but I need to move onto other things so someone else can tidy up).

Let start with the output. Here’s a page comparing the Twitter Following of SCOREProject and UKOER. At the top each bubble represents someone who follows SCOREProject or UKOER (hovering over a bubble we can see who they are and clicking filters the summary table at the bottom).

Bubble size matters

There are three options to change how the bubbles are sized:

Betweenness Centrality (a measure of the community bridging capacity); (see Sheila’s post on this)
In-Degree (how many other people who follower SCOREProject or ukoer also follow the person represented by the bubble); and
Followers count (how many people follower the person represented by the node

Clicking on ‘Grouped’ button lets you see how bubble/people follow either the SCOREProject, UKOER or both. By switching between betweeness, degree and followers we can visually spot a couple of things:

Betweenness Centrality: SCOREProject has 3 well connected intercommunity bubbles @GdnHigherEd, @gconole and @A_L_T. UKOER has the SCOREProject following them which unsurprisingly makes them a great bridge to the SCOREProject community (if you are wondering where UKOER is as they don’t follow SCOREProject they don’t appear.
In-Degree: Switching to In-Degree we can visually see that the overall volume of the UKOER group grows more despite the SCOREProject bubble in this group decreasing substantially. This suggests to me that the UKOER following is more interconnected
Followers count: Here we see SCOREProject is the biggest winner thanks to being followed by @douglasi who has over 300,000 followers. So whilst SCOREProject is followed by less people than UKOER it has a potential greater reach if @douglasi ever retweeted a message.

Colourful combination

Sticking with the grouped bubble view we can see different colour grouping within the clusters for SCOREProject, UKOER and both. The most noticeable being light green used to identify Group 4 which has 115 people people following SCOREProject compared to 59 following UKOER. The groupings are created using community structure detection algorithm proposed Joerg Reichardt and Stefan Bornholdt. To give a sense of who these sub-groups might represent individual wordclouds have been generated based on the individual Twitter profile descriptions. Clicking on a word within these clouds filters the table. So for example you can explore who has used the term manager in their twitter profile (I have to say the update isn’t instant but it’ll get there.

Behind the scenes

The bubble chart is coded in d3.js and based on Animated Bubble Chart by Jim Vallandingham. The modifications I made were to allow bubble resizing (lines 37-44). This also required handling the bubble charge slightly differently (line 118). I got the idea of using the bubble chart for comparison from a Twitter Abused post Rape Culture and Twitter Abuse. It also made sense to reuse Jim’s template which uses the Twitter Bootstrap. The wordclouds are also rendered using d3.js by using the d3.wordcloud extension by Jason Davies. Finally the table at the bottom is rendered using the Google Visualisation API/Google Chart Tools.

All the components play nicely together although the performance isn’t great. If I have more time I might play with the load sequencing, but it could be I’m just asking too much of things like the Google Table chart rendering 600 rows.

How to make your own

I should say that this recipe probably won’t work for accounts with over 5,000 followers. It also involves using R (in my case RStudio). R is used to do the
network analysis/community detection side. You can download a copy of the script here. There’s probably an easier recipe that skips this part worth revisiting.

We start with taking a copy of Export Twitter Friends and Followers v2.1.2 [Network Mod] (as featured in Notes on extracting the JISC CETIS twitter follower network).
Authenticate the spreadsheet with Twitter (instructions in the spreadsheet) and then get the followers if the accounts you are interested in using the Twitter > Get followers menu option
Once you’ve got the followers run Twitter > Combine follower sheets Method II
Move to the Vertices sheet and sort the data on the friends_count column
In batches of around 250 rows select values from the id_str column and run TAGS Advanced > Get friend IDs – this will start populating the friends_ids column with data. For users with over 5,000 friends reselect their id_str and rerun the menu option until the ‘next_cursor’ equals 0
Next open the Script editor and open the TAGS4 file and then Run > setup.
Next select Publish > Publish as a service… and allow anyone to invoke the service anonymously. Copy the service URL and paste it into the R script downloaded earlier (also add the spreadhsheet key to the R script and within your spreadsheet File > Publish to the web
Run the R script! … and fingers crossed everything works.

The files used in the SCOREProject/UKOER can be downloaded from here. Changes you’ll need to make are adding the output csv files to the data folder, changing references in js/gtable.js and js/wordcloud.js and the labels used in coffee/coffee.vis

So there you go. I’ve spent way too much of my own time on this and haven’t really explained what is going on. Hopefully the various commenting in the source code removes some of the magic (I might revisit the R code as in some ways I think it deserves a post on its own. If you have any questions or feedback leave them in the comments 😉