CFHE12 Analysis: Summary of Twitter activity

You may have noticed I missed an analysis of week 5 of CFHE12, but hopefully I’ll capture what I wanted to say and more in this post. In this post I want to pull together a couple of ideas around some of the measurable user activity generated as part of CFHE12. This will mainly focus around Twitter and blogs and will ignore other channels like the course discussion forum, webinar chat or other spaces participants might have discovered or created. I conclude that there are some simple opportunities to incorporate data from twitter into  other channels, such as, summary of questions and retweets.

Twitter Activity Overview

The headline figures for tweets matching the search ‘#CFHE12 OR OR OR’  for 8th October to 18th November 2012 (GMT):

  • 1,914 Tweets from 489 different Twitter accounts
  • 1,066 links shared
  • 10% (n=206) of the tweets were in @reply to another Twitter account
  • Contributions from accounts in 45 countries (top 5  United States of America – 166; Canada – 50; United Kingdom – 43; Australia – 28; Germany – 12) [1]

Map of #cfhe12 participants

Looking at week-by-week distribution of contributors and contributions it can be seen that after the initial first 2 weeks the number of tweets posted each week remained consistent around 200. 75% of the Twitter accounts (n=374) contributed tweets to 1 week of CFHE12.

Twitter contributors and tweets  Number of weeks participants contributed in

Comparing Twitter activity with blog posts aggregated with gRSSHopper doesn’t reveal any correlation but it’s interesting to note that whilst the volume of tweets remain relatively consistent for weeks 3 to 6 there is a significant drop in blog posts between week 4 and 5.

CFHE12 number of tweets and blogs

Looking at distribution of tweets over day and time shows a lull on Thursdays, but a peak around 1900hrs GMT which would appear to coincide with the usual time slot used for tweetchats.

CFHE12 Tweets by day of week  CFHE12 Tweets by time of day

Getting more technical having collected the friend follower relationships for Twitter accounts using #CFHE12 for each of the weeks it possible to analyse new connections between community members. At the end of week 1 the top 52 contributors were joined by 286 follow relationships. By the end of the course 45 new follow relationships were created increasing the graph density [0.103774 –> 0.120101597] and reducing geodesic distance [2.008136 –> 1.925296]

The graph below highlights the new relationships (bold line). Nodes are sized by the number of new connections (as part of the archive the friend/follower count is captured with each tweet so it may be possible to do further analysis). Its interesting to note that BarnetteAndrew is an isolated node in G7.

cfhe12 interconnection growth

Refining the signals from the Twitter feed

Adam Cooper (CETIS) has recently posted some tips from a presentation John Campbell on development of Signals at Purdue, which includes using a spreadsheet as a starting point as a way to find out what you need. I’ve already got some basic tools to overview a Twitter archive but used the CFHE12 data to experiment with some more.

By week

Adding to the existing Twitter Activity sparklines it’s been possible to extract a summary of week-by-week activity (a basic traffic light). Whilst this is in a way a duplication of the data rendered in the sparkline it has been useful to filter the participants based on queries like ‘who has contributed in week 1’ and ‘who as contributed in all the weeks’. If you were wanting to take this to the next level you’d combine it with the community friendship graph and pay extra attention to the activity of your sub community bridges (for more info on this see Visualizing Threaded Conversation Networks: Mining Message Boards and Email Lists for Actionable Insights).

CFHE12 Activity data

Conversation matrix

Graphing the conversations (@reply, @mentions and retweets) using TAGSExplorer gives this ball of mess.

CFHE12 TAGSExplorer View

Trying to find a more structured way of presenting the data I’ve experimented with an adjacency matrix (shown below or interactive version here)[2]. Each cell is colour coded to indicate the number of interactions (replies, mentions and retweets) between users. For example we can see that gsiemens has had the most interactions with barrydahl. Scanning along the rows gives you a sense of whether a person was interacting with a large number or select few other accounts. For example, pgsimoes has interactions mostly with suifaijohnmak. Filtering tweets for pgsimoes (possible from the link in the left column) it looks like it’s an automatic syndication of suifaijohnmak’s blog posts.

CFHE12 conversation adjacency matrix

Do you have any questions?

CFHE12 Any Questions?

At the beginning of CFHE12 I posted Any Questions? Filtering a Twitter hashtag community for questions and responses. This is a crude tool which filters out tweets with ‘?’ which might indicate they are a question. By counting the number of tweets in the archive which reply to a question you get the following breakdown

  • Total questions 309
  • Questions with replies 44

An important point to note is that as only tweets that meet the search criteria are archived there may be more responses ‘off tag’. The danger in a medium like Twitter used in courses like CFHE12 is that questions may go unanswered, misconceptions are not corrected, feedback is never given.

Part of the issue in the case of CFHE12 is participants are allowed to use tools like Twitter as they please. While this suits ‘visitors and residents’ some additional structure may be beneficial. Two simple approaches would be to direct participants to include the course tag in their reply and highlighting current questions by either using the ‘Any Questions’ tool or in the case of courses using gRSSHopper instead of including all the latest course tweets in the daily email alert filter the search for ‘CFHE12 AND ?


retweets For a while I’ve had a basic function that extracted the tweets with the most retweets, but <sigh> I think it on the list of developments I’ve never got to blog about. The routine is a relatively simple bean counter that goes through a list of tweets, removes any hyperlinks and crudely shortens the text by 90% (to account for any annotations) and counts the number of matches before returning a list of the top 12. Slicing the data for each week I get these tables. There is probably more analysis required of what is being retweeted before making a decision about how this data could be used. My main question for the #cfhe12 twitter community is doing they have a sense of what is being retweeted the most. The other angle is pushing some of this data into other communication channels like the Daily Newsletter or discussion forums.


So hopefully the summary data I extracted and experimentation with new data views has been useful. Something that has been reinforced in my own mind that more value could be easily gained using Twitter by either providing guidance on use and/or incorporating data from Twitter in other channels more effective (rather than dumping everything into a daily email select some key data). Now that I’ve got a template which splits some of the data into weekly slices it should be easier to deploy and pass data into other systems buy changing some of the dates on the summary sheet.

But what do you think? Are there any particular data views you found useful? If you’ve participated in a community that uses Twitter a lot what additional tools would you find useful to keep track of what is going on?

Get the Data


[1] Countries were reconciled by extracting location recorded in twitter account profile and generating geo-coordinates using recipe here. Locations were extracted for 404 accounts. Co-ordinates were uploaded to GeoCommons and analysed with a boundary aggregation to produce this dataset.

[2] the matrix was generated by exporting conversation data from TAGSExplorer by adding the query &output=true to the url e.g. like this, importing into NodeXL then filtering vertices based on a list of top contributors. This was exported as a Matrix Workbook and imported into the Google Spreadsheet. Conditional formatting was used to heatmap the cells.


  1. Hi Martin-
    I continue to be amazed what you can do with Google docs!

    Concerning the by-week tweeting and blogging: is there anything in the contents to give a hint as to the explanation? A quick hypothesis: to begin with people used twitter and to give instant views/reactions, with a significant number of tweets being about the course; more considered blog posts on the state of HE took a bit longer and after a few posts people are starting to have expressed their extended ideas and made their initial “about the course” posts; but twitter is stable from week 3 as baseline discourse.

    Looking at the adjacency matrix made me wonder whether you have tried clustering and re-ordering. I’m used to the way heatmaps work like this in R (although not using SNA-type modularity calcs) but I also found this rather neat visualisation made using D3 – .

    Cheers, Adam

    1. Hi Adam,

      Content is increasingly on my mind. As well as shuffling around your Text-Mining-Weak-Signals (pre-plunge) I’ve been wondering about Speech Act Analysis as a way of delving in a bit more.

      I had come across the D3 adjacency network from this presentation: A Fast and Dirty Intro to NetworkX (and D3) which uses this modified example. The sort is very nice, would be better if there was a control for the heatmap. The presentation has some other techniques and is worth an explore.


Comments are closed.