By @mhawksey

Using Google Spreadsheets to combine Twitter and Google Analytics data to find your top content distributors

When you share a link on Twitter there are a number of services, like bit.ly, which allow you to track the impact of the url in terms of the number of clicks it attracts from other users. At the same time there are a number of ways to monitor people sharing links to your site, the most basic being using a Twitter search like this one for hawksey.info. Using these search results you could start extracting the follower information from the person tweeting, workout potential reach and so on, but wouldn’t you like to know, as with your own bit.ly account, how many visits someone else’s tweet generated to your site? Fortunately there is a way to do this and in this post I give you two tools to help you do it and look at why this information might be useful, but lets first look at how it is possible.

Referral Traffic

In August 2011 Twitter started automatically wrapping links over 19 characters in it’s own shortening service t.co, later in October started wrapping all links in t.co. When you start navigating around the web at each page you visit the server generally knows which page you came from (exceptions include direct traffic and when coming from https://). When you click on a t.co link the site can track where you came from, known as referral traffic (interestingly it looks like Twitter also bypass your url shortener of choice by following the redirects until it reaches a final destination and use that link in the t.co redirect). So when you click on a t.co link posted on Twitter I can detect that, that’s where you came from. Furthermore, each time a person tweets a link it gets wrapped in a unique t.co url even if that url has been shortened before with the exception being new style retweets. This means that when someone clicks on a link tweeted by someone I can trace it back to a single person.
Lets see how this works in practice. When I fire up my Google Analytics account and look at referral traffic I can see it’s dominated by t.co sources. Drilling down into the t.co data I can actually see how many visits each t.co link generated.
 
Searching Twitter for say http://t.co/wEbXrPah allows me to trace it back to this tweet:

So we can say this tweet and it subsequent 5 retweets generated 42 visits to my blog. At this point you might be saying but this tweet above has a bit.ly link, there’s no reference to t.co. It may say bit.ly but underneath the hyperlink is t.co:

For some this isn’t news. In fact, Tom Critchlow was writing about how Twitter’s t.co link shortening service is game changing – here’s why way back in August 2011. His post probably has a better explanation of what is happening and also includes a bookmarklet (appears broken after recent Analytics overhaul) which takes the t.co referral path from your Google Analytics report and searches it on Twitter to find out which persons tweet is sending you the traffic.
I thought this was a neat idea but wanted to get the full impact of seeing the visit count associated with a named person. I tried a couple of ways to inject the data using a bookmarklet with no joy so turned to Google Spreadsheets (and in particular Google Apps Script) to marry data from Twitter with Google Analytics. So I give you:

Method 1: Quick 7 day search

*** Twitter/Google Analytics Top Distributor Sheet v1.0 ***
[File > Make a copy for you own version]

With this Google Spreadsheet I can authenticate with my Google Analytics account which then allows me to extract t.co data using the Google Analytics Core Reporting API. I then pass each t.co link to the Twitter Search API to find out who tweeted it first. This is all wrapped in a custom formula getGATwitterRef(startDate, endDate, optional numberOfResults) which generates a table of results like this:

So big thanks go to Brian E. Bennett (@bennettscience) for generating 16 visits, Alberto Cottica (@alberto_cottica), @futuresoup and others for also generating traffic. But who generated the 14 visits? The reason this row is blank is while the link is still generating traffic to my site the link was first tweeted over 7 days ago meaning it is outwith the capabilities of the Twitter Search API.
What a terrible shame never mind 9 out of 10 isn’t bad … ah but hold on I’ve got a Google Spreadsheet template that can archive Twitter searches (TAGS). So I give you:

Method 2: TAGS v4.0 with Google Analytics integration

*** TAGS v4.0 with Google Analytics integration ***
[File > Make a copy for you own version]
Update: 11th Sept. 2014 TAGS has moved on and this method might no longer work

So by using a search term for your domain you can start collecting ten of thousands of tweets over months and years, which you then query against your Google Analytics data.
‘What the flip’ you might be asking. Here’s the explanation. By using the same search query at the beginning of this post for all tweets containing ‘hawksey.info’ (and because Twitter is wrapping everything in t.co it knows these even if they began life as a bit.ly or goo.gl) I can build up a corpus of tweets containing links to my site. If you also look at this archive I’m building you’ll see in the column labelled ‘text’ there is not a bit.ly or goo.gl in sight, all the links are t.co.
So all I need to do is extract a t.co referral path I’m interested in from my Google Analytics data and find who first tweeted it in my archive giving me the number of visits that link generated.
So now I can say thank you @BillMew for the mystery 14 visits, thank you @TweetSmarter for the 38 visits generated from your tweet last month (you’ll see some #N/A I think because my collection went offline for a bit, working sweet now)

Why

Whilst there could be a seriously creepy side to this (lets not forget people like Google have made serious bucks knowing where you go and what you share), there are a couple of reasons why I was interested in following up this concept. One was in relation to the Learning Registry/JLeRN experiment (background info here) which is trying to create a framework beyond metadata to include activity/paradata around resources. The idea is this data can be used to provide feedback and improve discoverability of resources. So potentially you’ve got some rich data to push into a node .. errm I think, maybe #haventplayedwiththespecyet
Another thought was during the OER Visualisation project I discovered that social sharing of resources appears, insert caveats, to be rare. If you could recognising and reward the people pushing content it might encourage them to distribute more (and as I highlighted in How do I ‘like’ your course? The value of Facebook recommendation there is real and measureable value in people distributing information through their networks) – the flip side to this being as soon as you start measuring something someone else will start gaming the system.
There’s also a degree of profiling you could do. If someone ends up at your resource having clicked on a linked shared by A that person may have something in common with them so you could target additional resources to them based on what A might like.
I’m sure there are others. As I’ve shared the tools to do this it’s only fair that you share your ideas in the comments.

But there is potentially more …

I will leave you with one last thought. I haven’t mentioned much about the code, which is available and open source (my bits anyway) via the Script Editor. To help construct the Google Analytics query I used the Data Feed Query Explorer. Here’s a permalink to the main query structure I used. If you open the link and hit the ‘Authenticate with Google Analytics’ button choosing one of your own analytics ids you can see what data comes back. I’ve been conservative only pulling what I need but if you click on the ‘dimensions’ box you can see I could also be pulling where the visits were coming from, time of day, and more. All potentially valuable intelligence to give you a picture of how a resource is being shared if you can unlock it of course.

Exit mobile version