By @mhawksey

OER Visualisation Project: Beginnings of linking data from PROD to Google Spreadsheet and early fruit [day 8] #ukoer

One of my ambitions from Day 1 of the OER Visualisation project was to start linking PROD data in to Google Spreadsheets. Whilst this was primarily designed to help me with the project after speaking to some of the JISC/JISC CETIS people it sounds like it would help them and others.  

Here’s a spreadsheet which is the beginnings of a PROD Datastore (Warning: work in progress). Data is currently being populated from a Talis Datastore using a number of different sparql queries which are outputting csv data via a sparqlproxy (this will be switched to the Kasabi store when a suitable proxy has been sorted). You might find the SPARQL queries used to fetch this data useful for making your own so I’ve compiled them in this commentable document.  

What can we do with the spreadsheet? 

Creating pivot table reports

I used to find pivot table creation quite daunting but they are a great way to filter and analyse large sets of data. The PROD spreadsheet contains 2 example pivot reports one for technology and the other for standards (if you want to enable pivot table options you’ll need to File > Make a copy of the spreadsheet then on the pivot sheets select Data > Pivot Table Report).

The example ‘Technology Pivot’ is summarising the data from phase 1&2 of the OER Programme. You can see there is a number of technologies were recorded (over 100), the top three being YouTube, Flash and Slideshare. This data can be shown graphically using Google Spreadsheets chart tools and embedded as an interactive or static graphic.

The charts in Google Spreadsheets aren’t that exciting but there is a framework for extending these, which I’ll come back to later. Something I was hoping to do was link the data from Google Spreadsheet to IBM’s to more powerful visualisation service Many Eyes. For example below are examples of the technology pivot data as a bubble diagram and comparison Treemap and it would have been nice to automatically generate these. Tony had posted on this a couple of years ago using Many Eyes Wikified and Google Spreadsheets, but alas this part of the service was pulled last year) .

Visualising project relationships

One of the great things about the PROD data is there is a lot of relationship data already there. For example if you look at the PROD page for the ENABLE project you can see there are details of the projects that ENABLE builds on or was built on by, related projects and even comments that relate to the individual relationships.

This relationship data can all be extracted from PROD and in this case imported to the Spreadsheet. On the Relates_to sheet I’ve imported details of all the JISC funded project ‘relates_to’ relationships. What can we do with this data? Well at a basic level in column B we have a source id and column F has a target id which makes it suitable for using in a force layout diagram. Fortunately I’ve been playing around with online force layout diagrams for a while and most recently created a Google Spreadsheet Gadget to display this info (this is how you can extend the basic chart selection).

Whilst this gadget still needs to be verified by Google for anyone to see the results we can use the spreadsheet as a datasource for the gadget’s big brother EDGESExplorer. Publishing the spreadhseet to the web, using the built in tools to do this, we can reformat the data in EDGESExplorer to see how all JISC funded projects stored in PROD are related (click on the image below for the interactive version, you can explorer individual nodes by clicking on them). 

I think this graph provides a useful interface for seeing and exploring the relationship between JISC funded work. To become really useful some additional interfacing is required but there’s code I can reuse from my Guardian Tag Explorer and Twitter Conversation explorer tools, which gives you more project info and a link back to the appropriate PROD pa
ge (btw interesting post on ouseful.info on Information Literacy, Graphs, Hierarchies and the Structure of Information Networks).

Summary

So to recap: a Google Spreadsheet is being populated from PROD (live data). Users can create reports and charts within the Spreadsheet environment (live data) or export data to other services like Many Eyes (dead – as in the live link is broken – data). Finally we can publish live data from the Spreadsheet for use in other tools like EDGESExplorer.

My question for you is what data would you like in the Spreadsheet? Summary of projects by institution? Breakdown of projects by partners? Projects by JISC Programme Managers? Let me know what you would like to see 😉

Exit mobile version