Using Google Spreadsheet/Apps Script and Google Social Graph to get Twitter edges for visualizing in NodeXL and Gephi

One of the things I really liked about the network analysis and visualisation tool NodeXL which I wrote about last week was the built-in tools for grabbing data from Twitter. I said:

The advantage of NodeXL, particularly for graphing Twitter communities, is it has built-in features for grabbing the data for you. Not only that the coding is clever enough to handle the data collection for mere mortals, so when you hit your rate limit NodeXL waits until it should be able to get more data.

What I didn’t mention at the time was that it can take a long time to get complex network data (as in set it running overnight) and also I was having problems getting this to work. I haven’t looked closely at how NodeXL is generating the data but say for example I wanted to find out if the people I followed also followed each other. So I can get all the user ids of the people I follow using which give me 497 ids.
To see if @psychemedia also follows any of these I could get his list of friend ids and see how many id’s match. Assuming everyone I follow has less than 5000 friends (this is the maximum the Twitter API can return in one call) I can do this in 496 API calls. NodeXL also captures more user information (friend/follower counts, bio etc) which it can do in batches of 100 using
So in summary:

  • 1x my following ids
  • 496 x who are my friends following (but more if any of my friends follow more than 5000)
  • 4 x details user info

Which means at least 501 calls (with a 350 api calls per hour this has to be done in two batches). And if you want to look beyond that and seeing who are the friends of your friends it’s a lot more.
Fortunately there is a way to get this information a lot quicker. A separate API which is mentioned at the very end of Tony’s Visualising Twitter Friend Connections Using Gephi is the Google Social Graph API. The Social Graph API “makes information about public connections between people easily available and useful” and importantly includes connection information from Twitter. Here’s a demo page from Social Graph using my Twitter id as a starting point.
As I’ve already Ported: Tony Hirst’s Using Protovis to Visualise Twitter Connections which uses Social Graph to a Google Spreadsheet it was a quick hack to modify this to take a list of twitter usernames your interested in and create a two column edge list required by NodeXL, Gephi and other network visualisation tools.

How-to get Twitter edges using Google Social Graph

Before I start I should say the endpoint in this is using NodeXL but Gephi is also able to import the edge csv file generated with this spreadsheet.

  1. Grab a copy of this Google Spreadsheet template for Get Twitter Edges from Screen Names

    If you are having problems getting a copy of the template try opening this spreadsheet and File > Make a copy

  2. Paste a list of twitter usernames you want to graph connections for in the source sheet (I’ve a collection of Google Spreadsheets I’ve developed which can be used to get friends/follows, search terms and more. I also recently wrote about SocialBro which can do a .csv export of usernames (but as pointed out by Angelo you can’t export other peoples data)) If you want to play along here’s a list from #eas11
  3. In the Twitter Edges spreadsheet open Tools > Script editor…  and follow the instructions for registering for a Twitter API key and secret (if you have used any of my other Twitter/Google Spreadsheet mashups you can use the same key/secret).
  4. Once you’ve entered the key and secret don’t forget to Run > authorize
  5. Next on line 24 you need to enter the level of data. For this example lets stick with 1.5
  6. Now Run > getConnections.
  7. Once the the yellow status bar with Running getConnections disappears close the script window.
  8. Back in the spreadsheet view open the ‘Edges’ sheet which should be populated with data
  9. Select File > Download as … > CSV (Current Sheet)
  10. Start a new NodeXL template
  11. Open the CSV in Excel and copy the edges data to your new NodeXL template Edges sheet
  12. From the NodeXL ribbon menu click on Prepare Data (left hand side) and select ‘Merge Duplicate Edges’ and then again from Prepare Data select ‘Get Vertices from Edge List’

And now you can do the rest of your analysis of the data (if you haven’t generated Twitter network visualisations in NodeXL before you can follow parts of the the basic recipe here).
and here’s one I made earlier
#eas11 hashtag community
One very important caveat – the data from Social Graph isn’t 100% accurate and as I posted in Social media wars: Measuring the battle lines since Google+ has come along this data might be becoming less reliable.
PS In making this I found that when I passed 50 usernames to the Social Graph API I only got data back for 15 usernames. I wasn’t able to find any documentation on why. Reducing the call to 10 names at a time seemed to work (but means there are bugs in my other Social Graphs bits and pieces:(


Join the conversation

comment 3 comments
  • Angelo Centini

    Following your posts I tried to use both Nodexl and the method described in this post. I realized that for small networks Nodexl seems more efficient, instead for larger networks this solution seems more reliable. In fact you can leave your machine processing the download without interruptions. To download my tiny twitter network (about 250 people I’m following and people they’re following within my network) it took 2 hours with Nodexl. Same network, different settings (broader grade level and followers included) after an entire day I was able to obtain my Twitter community following relationships edges and nodes. For those interested here’s the simple graph I made using Nodexl data import and Gephi visualization platform (nodes weighted by mutual follows into my network).

  • Mark

    Been using NodeXL & Gephi for a while, this looked like a great way to get some of the NodeXL function in Google docs.
    I’ve run the export friends & followers V2.1.2 works great, but when I try to run this it throws an error:
    Invalid JSON string. (line 60)
    Is it possible to run getConnections from username in V2.1.2 or get round the problem I currently have in line 60.
    Sorry I’m not so good with the code, wondered if you had any advice?

Comments are closed.