88% of Universities UK members use Google Analytics: How I calculated the answer in under 10 minutes

Writing another blog post today which included reference to Google Analytics I pondered:

The response wasn’t promising:

My thought was to detect Google Analytics urchin code from website homepages. Knowing Tony Hirst had done something I asked and at 4:08pm the response was:

At 4:17pm

So how was it done?

I didn’t like the prospect of tweaking Tony’s scraperwiki code but spotted he was getting a list of institutions from Universities UK. Using the Scraper Chrome Extension I was able to export all the institution urls to a Google Spreadsheet:

Scraper Window

Having played around with Google Analytics before I knew if the site was using Google Analytics it would have a unique profile id in the source in the format UA-XXXXXX-X and found this regular expression to extract it using the following Google Apps Script:

function getUA(url) {
  var requestData = {
          method : "get",
          headers: { "User-Agent":"http://docs.google.com"}
  var html = UrlFetchApp.fetch(url,requestData).getContentText();
  var urlPattern = /\bUA-\d{4,10}-\d{1,4}\b/ig;
  return html.match(urlPattern)[0];

I could then use a custom formula in column C to extract an urchin code from a website. This worked for most sites but I got a couple of errors for sites not using Google Analytics. Validating some of the results I noticed that it was because the UrlFetchApp wasn’t following browser redirects e.g. http://www.cardiffmet.ac.uk/ redirects to http://www3.cardiffmet.ac.uk/English/Pages/home2.aspx. This is a problem I’ve had before so recycled the code below which uses expandurl.appspot.com to follow a link to the destination.

function extractLink(text){
  // create a url pattern
  var urlPattern = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
  var feedproxyPattern = /(\b(http:\/\/feedproxy.google.com))/i;
  // extract link from email msg
  var url = text.match(urlPattern)[0];
  //if (feedproxyPattern.test(url)){
   // if feedproxy url see if cached (or resolve end url)
   var cache = CacheService.getPublicCache(); // using Cache service to prevent too many urlfetch
    var cached = cache.get(url);
    if (cached != null) { // if value in cache return it
      return cached;
    var requestData = {
                        method : "get",
                        headers: { "User-Agent":"GmailProductivitySheet - Google Apps Script"}
    try {
      // try and get link endpoint using http://expandurl.appspot.com/
      var result = UrlFetchApp.fetch("http://expandurl.appspot.com/expand?url="+encodeURIComponent(url), requestData);
      var j = Utilities.jsonParse(result.getContentText());
      var link = (result.getResponseCode()===200)? Utilities.jsonParse(result.getContentText()).end_url:url;
    } catch(e) {
      // if http://expandurl.appspot.com/ doesn't work just return extracted url
      var link = url;
    cache.put(url, link, 3600);
    return link;
  return url;

Using this formula in column D for the error results I got a fresh url to point the getUA function. Here’s the final spreadsheet (I’ve copied/pasted as values some of the formula results to save my quota) and the answer to my question:

134 institutional websites, 118 (88%) with Google Analytics code

But as Ranjit Sidhu reminded me


Join the conversation

comment 5 comments
  • Tony Hirst

    Hmmm, I also got different results. My uni table has 136 records, and I found 115 GA tracking codes…

    • Martin Hawksey

      But how long did it take you? 😉 I did wonder about accuracy. When I’m on a PC next it would be worth doing a comparison

  • Paul Walk

    This demonstrates that many institutions installed the Google Analytics JavaScript call in their site’s source at some point. It doesn’t follow that they are using Google Analytics…

    • Martin Hawksey

      ‘Using’ being the operative word. How many institutions are actually using analytics as part of their informed decision making?

Leave a comment

Your email address will not be published.