As part of LAK13 I’ve already written a series of blog posts highlighting a couple of ways to extract data from Canvas VLE. Prompted by a question by On and my colleague Sheila MacNeill I wanted to show you a way of getting feed data into a spreadsheet without using any code. The solution is to use Yahoo Pipes, but as this post will highlight this isn’t entirely straight forward and you need to be aware of several tricks to get the job done. As LAK13 isn’t using Google Groups for this post I’ll be using the Learning Analytics Google Group as a data source.
Sniffing for data
First we need to find a data source. Looking for an auto-detected RSS/Atom feed by visiting the group homepage reveals nothing. [I always forget browsers seem are moving away from telling you when they detect a feed. To get around this I use the Chrome RSS Subscription Extension which indicates with the orange RSS icon when a page has a feed.]
Looking for an official Google Groups API as an alternative method turns up this open issue from August 2007 for a Groups API aka there’s no API 🙁 Digging deeper we find Groups did have data feeds in their old interface. So with a bit of url magic I can land on the old Groups interface for Learning Analytics which gives us the orange light
Requesting the View all available feeds page we get some additional feed options:
At this point I could grab the Atom links and with a bit of tweaking process it with my existing Google Apps Script Code, but lets look at a ‘no code’ solution.
Feeding Google Sheets with Yahoo Pipes
At this point it’s worth reminding you that you could use the importFeed formula in a Google Spreadsheet which would import the data from a Google Group. The issue however is it’s limited to the last 20 items so we need a better way of feeding the sheet.
A great tool for manipulating rss/atom (other data) feeds is Yahoo Pipes. Pipes gives you a drag and drop programming environment where you can use blocks to perform operations and wire the outputs together. I learned most of my pipework from the Pipemaster – Tony Hirst and if you are looking for a starting point this is a good one.
Here I’ve created a pipe that takes a Google Group shortname does some minor manipulation, which I’ll explain later, and output a result. When we run the pipe we get some export options:
The one I’m looking for is .csv because it’ll easily import into Google Sheets, but it’s not there … Just as we had to know the old Google Group interface has RSS feeds, with Yahoo Pipes we have to know the csv trick. Here’s the url for ‘Get as JSON’:
and if we swap &_render=json for &_render=csv by magic we have a csv version of the output (whilst we are here also notice the group name used when the pipe is run is also in the url. This means if we know the group shortname we don’t need to enter a name a ‘Run pipe’, we can build the url to get the csv.
Now in a Google Spreadsheet if you enter the formula =importData("http://pipes.yahoo.com/pipes/pipe.run?_id=14ffe600e0c0a9315007b922e41be8ad&_render=csv&group=learninganalytics")
we get the groups last 100 messages in a spreadsheet.
Extra tricks
There were a couple of extra tricks I skipped worth highlighting. RSS/Atom feeds permit multilevel data, so an element like ‘author’ can have sub elements like ‘name’, ‘email’. CSVs on the other hand are 2D, rows and columns.
When Yahoo Pipes generates a csv file it ignores sub elements, so in this case it’ll generate an author column but won’t include name or email. To get around this we need to pull the data we want into the first level (doing something similar for content).
The next little trick is to get the feed dates in a format Google Spreadsheets recognise as a date rather than a string. In the feed dates are in ISO 8601 format e.g. 2013-03-05T13:13:06Z. By removing the ‘T’ and ‘Z’ Google Spreadsheets will automatically parse as a date. To do this we use a Regex block to look for T or Z (T|Z) replacing with a single space (which is why the ‘with’ box looks empty).
I’ve wrapped the data in a modified version of the dashboard used for the Canvas data feed.
*** Google Groups Activity Dashboard ***
Limitations
A couple of big limitations to be aware of:
How long will Google Group data feeds last
Given we’ve had to dig out the data feeds from the old Google Group interface my suspicion is once this is shut off for good the feeds will also disappear. Who knows, Google may actually have a Group API by then ;s
Limited to last 100 messages
The eagle eyed amongst you will have spotted I was able to increase the number of messages returned to 100 by adding num=100 to the feed query. This is the limit though and you can’t use paging to get older results. There are a couple of ways you could store the feed results like using the FeedWordPress plugin. I’m experimenting with using IF I do THAT on {insert social network/rss feed/other} THEN add row to Google Spreadsheet, but as the data isn’t stored nicely (particularly dates which are saved like ‘February 15, 2013 at 01:48PM’ *sigh*) it makes it harder to use.
I think that’s me in terms of ways for extracting discussion boards data … for now. One other technique related to Google Spreadsheets is screen scraping using importXML. To see this in action you can read Dashboarding activity on public JISCMail lists using Google Sheets (Spreadsheets).
Sheila’s work blog » What can I do with my educational data? (#lak13)
[…] (as ever) pointed the way to some further exploration. And I'll be following up on Martin Hawksey's recent post about discussion group data collection […]
Sheila MacNeill
Thanks Martin – will give this a whirl and no doubt break something in your lovely reciepe in the process:-)
S
Martin Hawksey
what could possibly go wrong ;s
On
Hi Martin,
I followed the step by step procedure & try to repulicate ur results. The instruction is clear for a non programmer like me. Also, whenever you introduce a new term, you put the link with it. It’s not only convenience but also very effective. I like it very much. I stopped at the pipe section at this moment because I needed to read Tony Hirst ppt first. Once I’ve done that, I can come back to follow the remaining steps. Will let you know how it goes.
Thanks again, Martin. Much appreciated.
Cheers,
On