Recently I’ve been interested in tracking activity around resources. This comes off the back of the OER Visualisation project where I started looking at social share data around educational resources, the beginnings of a PostRank style RSS social engagement tracker, and more recently Using Google Spreadsheets to combine Twitter and Google Analytics data to find your top content distributors (it’s been a eye-opener to see how much individual activity data there is … if you know where you look).
Working on the vague use case of ‘academic finds an interesting resource and bookmarks it for later’ my assumption is there might be more social bookmarking rather than shares via services like Twitter. To see what data is accessible I turned my attention to Diigo. For those that don’t know Diigo started as a online bookmarking service but have kept adding sharing, notetaking, highlighting type features and continues to try and steal the Delicious crowd.
Diigo does have an official API but is based around individual users rather than sites. Site level data is available and here is an example for my hawksey.info domain. The page returns the last 20 bookmarks made by users for my site. Clicking on a bookmark lets you see how many people have also publically bookmarked the page, the date and how they tagged it (there’s probably more here to do on crowdsourced metadata … for another day).
Back to the top level data. Obviously you could visit this page each day to see who has been bookmarking your material or maybe even find a service that emails the webpage to you each day. I’m more interested in how this data might be centralised in one place so that you can combine it with other information. It probably won’t be a surprise that I chose Google Spreadsheets to have a crack at this.
Below is embedded this Diigo Site Tracker Google Spreadsheet <- click on the link and File > Make a copy for your own version and enter your site url in cell B3
In the spreadsheet you can see the Diigo profile url for the person who has bookmarked a link, what was bookmarked and by scraping the details page how many times the link has already been saved.
How it was made
If you have been following my other work you might think this is powered by Google Apps Script, but you’d be wrong. The spreadsheet is entirely powered by the built-in importXML function. As you’ll see from the documentation the function can handle a range of markup languages including HTML. So we can point the function at a webpage but how do we get back the parts we want. This is usually the bit I trip over. To query the part of the page you want back you need to use XPath.
XPath lets you drilldown into the part of the page you want. The key I’ve found to unlocking XPath is a browser extension (I’m currently using this one) which lets me see the XPath for part of the page I’m looking at. I then use this information in the importXML function (it’s worth noting that Google Spreadsheets limits you to 50 imports per spreadsheet, so to scale this solution to get data from other services I’d probably have to switch to Apps Script or something else).
So that was Diigo, your homework is to do something similar with Delicious and I’ll give you my answer tomorrow 😉 [I might even be able to show you how you can link this to Google Analytics data].