Using Google Spreadsheets to combine Twitter and Google Analytics data to find your top content distributors

When you share a link on Twitter there are a number of services, like bit.ly, which allow you to track the impact of the url in terms of the number of clicks it attracts from other users. At the same time there are a number of ways to monitor people sharing links to your site, the most basic being using a Twitter search like this one for hawksey.info. Using these search results you could start extracting the follower information from the person tweeting, workout potential reach and so on, but wouldn’t you like to know, as with your own bit.ly account, how many visits someone else’s tweet generated to your site? Fortunately there is a way to do this and in this post I give you two tools to help you do it and look at why this information might be useful, but lets first look at how it is possible.

Referral Traffic

In August 2011 Twitter started automatically wrapping links over 19 characters in it’s own shortening service t.co, later in October started wrapping all links in t.co. When you start navigating around the web at each page you visit the server generally knows which page you came from (exceptions include direct traffic and when coming from https://). When you click on a t.co link the site can track where you came from, known as referral traffic (interestingly it looks like Twitter also bypass your url shortener of choice by following the redirects until it reaches a final destination and use that link in the t.co redirect). So when you click on a t.co link posted on Twitter I can detect that, that’s where you came from. Furthermore, each time a person tweets a link it gets wrapped in a unique t.co url even if that url has been shortened before with the exception being new style retweets. This means that when someone clicks on a link tweeted by someone I can trace it back to a single person.
Lets see how this works in practice. When I fire up my Google Analytics account and look at referral traffic I can see it’s dominated by t.co sources. Drilling down into the t.co data I can actually see how many visits each t.co link generated.

Searching Twitter for say http://t.co/wEbXrPah allows me to trace it back to this tweet:

#HowTo show a #cool interactive visualization of event hashtags: bit.ly/rtoVa9
— TweetSmarter (@TweetSmarter) February 23, 2012

So we can say this tweet and it subsequent 5 retweets generated 42 visits to my blog. At this point you might be saying but this tweet above has a bit.ly link, there’s no reference to t.co. It may say bit.ly but underneath the hyperlink is t.co:

For some this isn’t news. In fact, Tom Critchlow was writing about how Twitter’s t.co link shortening service is game changing – here’s why way back in August 2011. His post probably has a better explanation of what is happening and also includes a bookmarklet (appears broken after recent Analytics overhaul) which takes the t.co referral path from your Google Analytics report and searches it on Twitter to find out which persons tweet is sending you the traffic.
I thought this was a neat idea but wanted to get the full impact of seeing the visit count associated with a named person. I tried a couple of ways to inject the data using a bookmarklet with no joy so turned to Google Spreadsheets (and in particular Google Apps Script) to marry data from Twitter with Google Analytics. So I give you:

Method 1: Quick 7 day search

*** Twitter/Google Analytics Top Distributor Sheet v1.0 ***
[File > Make a copy for you own version]

With this Google Spreadsheet I can authenticate with my Google Analytics account which then allows me to extract t.co data using the Google Analytics Core Reporting API. I then pass each t.co link to the Twitter Search API to find out who tweeted it first. This is all wrapped in a custom formula getGATwitterRef(startDate, endDate, optional numberOfResults) which generates a table of results like this:

So big thanks go to Brian E. Bennett (@bennettscience) for generating 16 visits, Alberto Cottica (@alberto_cottica), @futuresoup and others for also generating traffic. But who generated the 14 visits? The reason this row is blank is while the link is still generating traffic to my site the link was first tweeted over 7 days ago meaning it is outwith the capabilities of the Twitter Search API.
What a terrible shame never mind 9 out of 10 isn’t bad … ah but hold on I’ve got a Google Spreadsheet template that can archive Twitter searches (TAGS). So I give you:

Method 2: TAGS v4.0 with Google Analytics integration

*** TAGS v4.0 with Google Analytics integration ***
[File > Make a copy for you own version]
Update: 11th Sept. 2014 TAGS has moved on and this method might no longer work

So by using a search term for your domain you can start collecting ten of thousands of tweets over months and years, which you then query against your Google Analytics data.
‘What the flip’ you might be asking. Here’s the explanation. By using the same search query at the beginning of this post for all tweets containing ‘hawksey.info’ (and because Twitter is wrapping everything in t.co it knows these even if they began life as a bit.ly or goo.gl) I can build up a corpus of tweets containing links to my site. If you also look at this archive I’m building you’ll see in the column labelled ‘text’ there is not a bit.ly or goo.gl in sight, all the links are t.co.
So all I need to do is extract a t.co referral path I’m interested in from my Google Analytics data and find who first tweeted it in my archive giving me the number of visits that link generated.
So now I can say thank you @BillMew for the mystery 14 visits, thank you @TweetSmarter for the 38 visits generated from your tweet last month (you’ll see some #N/A I think because my collection went offline for a bit, working sweet now)

Why

Whilst there could be a seriously creepy side to this (lets not forget people like Google have made serious bucks knowing where you go and what you share), there are a couple of reasons why I was interested in following up this concept. One was in relation to the Learning Registry/JLeRN experiment (background info here) which is trying to create a framework beyond metadata to include activity/paradata around resources. The idea is this data can be used to provide feedback and improve discoverability of resources. So potentially you’ve got some rich data to push into a node .. errm I think, maybe #haventplayedwiththespecyet
Another thought was during the OER Visualisation project I discovered that social sharing of resources appears, insert caveats, to be rare. If you could recognising and reward the people pushing content it might encourage them to distribute more (and as I highlighted in How do I ‘like’ your course? The value of Facebook recommendation there is real and measureable value in people distributing information through their networks) – the flip side to this being as soon as you start measuring something someone else will start gaming the system.
There’s also a degree of profiling you could do. If someone ends up at your resource having clicked on a linked shared by A that person may have something in common with them so you could target additional resources to them based on what A might like.
I’m sure there are others. As I’ve shared the tools to do this it’s only fair that you share your ideas in the comments.

But there is potentially more …

I will leave you with one last thought. I haven’t mentioned much about the code, which is available and open source (my bits anyway) via the Script Editor. To help construct the Google Analytics query I used the Data Feed Query Explorer. Here’s a permalink to the main query structure I used. If you open the link and hit the ‘Authenticate with Google Analytics’ button choosing one of your own analytics ids you can see what data comes back. I’ve been conservative only pulling what I need but if you click on the ‘dimensions’ box you can see I could also be pulling where the visits were coming from, time of day, and more. All potentially valuable intelligence to give you a picture of how a resource is being shared if you can unlock it of course.

Join the conversation

comment 21 comments

Jaspal

March 7, 2012 at 12:18 am

Martin, what can I say…magic!
BTW I got odd behaviour from GDocs when trying to run the scripts first time – I had to open the onOpen script in script editor and run it before GDocs ‘recognised’ any of the other scripts. All running fine now though .Thanks!

Martin Hawksey
March 7, 2012 at 7:52 am
Thanks Jaspal. Surprised social media metric sites haven’t picked up on this yet. Postrank already have authorised access to my GA account, so they have access to the data they just need to glue together.
(Haven’t worked out solution for the onOpen issue, might have to dig out documentation on the onInstall function)
Martin
Martin Hawksey
March 16, 2012 at 1:16 pm
@Jaspal found and opened an issue about the onOpen behaviour

Su Butcher

March 7, 2012 at 8:34 am

Hi Martin,
This is marvellous. I’ve been using your TAGS template after finding that ‘Twapperkeeper’ (awful name) was being bought out by Hootsuite and would be a paid for service, and as you know your advice has helped us sort out ArchitectMap too.
As a complete novice to coding and scripts (though enthusiastic!) I shall take a look and see if I can make this work for me. My motivation is thanking people on twitter who help you, and helping them help you more. Finding helpful people on twitter is one of my hobbies!

Johan Myrberger

March 7, 2012 at 12:42 pm

Really cool stuff, thanks! Will for sure give it a try, no doubt it will be useful.
On a related side-note; when using earlier versions of the TAGS spreadsheets, for popular tags, I run into issues when the spreadsheet gets too large. both when the “number of cells”-limit is reached (of course), but also (as it seems) when you get a large spreadsheet, but still with space left. Have you noticed this from your end? Any ideas of good workarounds?

Martin Hawksey
March 7, 2012 at 3:58 pm
Hi Johan, I don’t normally have a problem but I’m usually only interested in small data. On the occasion that I’ve needed to do this I download the data and clear the archive then merge the data offline (usually in excel)
Something I keep meaning to do but have never got round to doing is pushing the data into Google Fusion Tables which is possible with Apps Script.
Martin

Johan Myrberger

March 13, 2012 at 11:05 am

Hm, trying the template now…
For some reason I get an error in the sheet TopDistributors (cells B5 and B18) stating “error: Authorization is required to perform that action.”
The GA-authentication (and connection to specific profile) seems to have gone well though. Not sure where to start trouble-shooting, any quick hints?
(And pouring the data into Fusion Tables would be nice for sure)

Martin Hawksey
March 15, 2012 at 10:01 pm
@Johan that error message is caused if there’s no GA-authentication – only thing I can think of is something went wrong internally but if you are able to select the GA profile you want to use it suggests everything is working … hmm not sure, my default fall-back is make a fresh copy and try again. If you figure what went wrong it would be useful to know
- Martin Hawksey
  March 16, 2012 at 1:18 pm
  @Johan – experiencing a similar behaviour – I think it’s because the results are cached. I closed the spreadsheet and reopened later and it was all working fine
  Martin

Johan Myrberger

March 16, 2012 at 2:28 pm

Oh well, it’s working for me as well at the moment (although I did close and reopened the spreadsheet a couple of times when I first had the issue it didn’t work, but now when I tried it works like a charm..)
I started to think that it was due to the GA account I hooked up with was not “owned” by me, only shared with me, but apparently that was not the issue…
Have a nice weekend!

Google Analytics rolling out social network activity streams: Paradata heaven? – MASHe

March 20, 2012 at 4:01 pm

[…] up given the Twitter/Facebook/Google+ social network war it’s unlikely that you are going to see individual tweet analytics as I achieved here in the near future.Access to the data It’ll be interesting if Google will make ‘activity […]

Sheila’s work blog » @jisccetis - how others see us

March 22, 2012 at 1:27 pm

[…] change number 1, @jisccetis is following people now. Armed now with our found new twitter and google intelligence, we have a much clearer idea of who is retweeting our posts and driving traffic to our blogs and […]

Automatic translation of TAGS Twitter archives using Google Apps Script ‘Language’ services – MASHe

March 25, 2012 at 9:32 am

[…] solution to archive tweets in a Google Spreadsheet using Google Apps Script (must write more about v4) … ponder, ponder, 5 minutes later by adding: objects[i]["text_en"] = […]

Tracking activity: Diigo site bookmark activity data in a Google Spreadsheet (who is saving your stuff) JISC CETIS MASHe

May 23, 2012 at 8:19 pm

[…] resources, the beginnings of a PostRank style RSS social engagement tracker, and more recently Using Google Spreadsheets to combine Twitter and Google Analytics data to find your top content dist… (it’s been a eye-opener to see how much individual activity data there is … if you know where […]

RSS Feed Social Share Counting Google Spreadsheet JISC CETIS MASHe

June 11, 2012 at 10:30 pm

[…] and page view counts for out blog, category or project feeds. The solution was a quick add-on to Using Google Spreadsheets to combine Twitter and Google Analytics data to find your top content dist…, recycling some other bits of code I had lying around (mostly from Introducing a RSS social […]

How JISC CETIS dashboard social activity around blog posts using a splash of data science JISC CETIS MASHe

August 8, 2012 at 9:27 pm

[…] some existing code to get stats from Google Analytics made it easy to pass a url to one of two GA accounts and get stats back. Because Scott is on […]

Tony Hirst

August 24, 2012 at 9:06 am

“because Twitter is wrapping everything in t.co it knows these even if they began life as a bit.ly or goo.gl”
I just did a test on this: I think t.co urls map onto unique (user,url) pairs for original tweets/link sharing (not sure what happens if I RT someone and cut/replace an original link? I suspect my new t.co link will be used?), so if I tweet a link to hawksey.info, or a bitly shorturl to hawksey.info, they will receive different t.co links.
The Twitter search searches over resolved URLs, so it will turn up a tweet containing a link shared to hawksey.info as well as a teet containing a shorturl that points to hawksey.info
(I guess you know this – I’m just clarifying for myself…!)

Martin Hawksey
August 24, 2012 at 9:38 am
yes Twitter appears to follow redirects and records the destination (which doesn’t appear to be publicly accessible via the api), the t.co then points to the to the url submitted by the user.
Something I hadn’t noticed until now is in the entities data returned (example below) it would be easy for a twitter client to replace a t.co with the original link. I’m not aware of any clients doing this and I imagine if twitter found out they would be shutting their api access sharpish.
{"urls":[{"expanded_url":"http://bit.ly/O9lxOI","indices":[81,101],"display_url":"bit.ly/O9lxOI","url":"http://t.co/Jr7yuADG"}],"hashtags":[{"text":"moocmooc","indices":[102,111]}],"user_mentions":[]}

Comments are closed.