Writing another blog post today which included reference to Google Analytics I pondered:
.@briankelly how many institutions are using google analytics? #iwmw12
— Martin Hawksey (@mhawksey) June 7, 2012
The response wasn’t promising:
RT @mhawksey: "how many institutions are using google analytics? #iwmw12". Anyone know answer?Suggest add tag #ganalyticsyes
— Brian Kelly (@briankelly) June 7, 2012
My thought was to detect Google Analytics urchin code from website homepages. Knowing Tony Hirst had done something I asked and at 4:08pm the response was:
@briankelly @mhawksey looking for urchin: no, but could extend HE homepage feed auto detect scraper wiki to do is?
— Tony Hirst (@psychemedia) June 7, 2012
@mhawksey @briankelly there’s also a complementary view that maybe needs revising… views.scraperwiki.com/run/uk_he_feed…
— Tony Hirst (@psychemedia) June 7, 2012
At 4:17pm
@psychemedia @briankelly 1st pass bit.ly/Nk9HVd
— Martin Hawksey (@mhawksey) June 7, 2012
So how was it done?
I didn’t like the prospect of tweaking Tony’s scraperwiki code but spotted he was getting a list of institutions from Universities UK. Using the Scraper Chrome Extension I was able to export all the institution urls to a Google Spreadsheet:
Having played around with Google Analytics before I knew if the site was using Google Analytics it would have a unique profile id in the source in the format UA-XXXXXX-X and found this regular expression to extract it using the following Google Apps Script:
function getUA(url) { var requestData = { method : "get", headers: { "User-Agent":"http://docs.google.com"} }; var html = UrlFetchApp.fetch(url,requestData).getContentText(); var urlPattern = /\bUA-\d{4,10}-\d{1,4}\b/ig; return html.match(urlPattern)[0]; }
I could then use a custom formula in column C to extract an urchin code from a website. This worked for most sites but I got a couple of errors for sites not using Google Analytics. Validating some of the results I noticed that it was because the UrlFetchApp wasn’t following browser redirects e.g. http://www.cardiffmet.ac.uk/ redirects to http://www3.cardiffmet.ac.uk/English/Pages/home2.aspx. This is a problem I’ve had before so recycled the code below which uses expandurl.appspot.com to follow a link to the destination.
function extractLink(text){ // create a url pattern var urlPattern = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig; var feedproxyPattern = /(\b(http:\/\/feedproxy.google.com))/i; // extract link from email msg var url = text.match(urlPattern)[0]; //if (feedproxyPattern.test(url)){ // if feedproxy url see if cached (or resolve end url) var cache = CacheService.getPublicCache(); // using Cache service to prevent too many urlfetch var cached = cache.get(url); if (cached != null) { // if value in cache return it return cached; } var requestData = { method : "get", headers: { "User-Agent":"GmailProductivitySheet - Google Apps Script"} }; try { // try and get link endpoint using http://expandurl.appspot.com/ var result = UrlFetchApp.fetch("http://expandurl.appspot.com/expand?url="+encodeURIComponent(url), requestData); var j = Utilities.jsonParse(result.getContentText()); var link = (result.getResponseCode()===200)? Utilities.jsonParse(result.getContentText()).end_url:url; } catch(e) { // if http://expandurl.appspot.com/ doesn't work just return extracted url var link = url; } cache.put(url, link, 3600); return link; //} return url; }
Using this formula in column D for the error results I got a fresh url to point the getUA function. Here’s the final spreadsheet (I’ve copied/pasted as values some of the formula results to save my quota) and the answer to my question:
134 institutional websites, 118 (88%) with Google Analytics code
But as Ranjit Sidhu reminded me
@mhawksey @briankelly but how many are ACTUALLY using GA more then then they did logfiles ?n
— Ranjit Sidhu (@rssidhu) June 7, 2012