One of my ambitions from Day 1 of the OER Visualisation project was to start linking PROD data in to Google Spreadsheets. Whilst this was primarily designed to help me with the project after speaking to some of the JISC/JISC CETIS people it sounds like it would help them and others.
Here’s a spreadsheet which is the beginnings of a PROD Datastore (Warning: work in progress). Data is currently being populated from a Talis Datastore using a number of different sparql queries which are outputting csv data via a sparqlproxy (this will be switched to the Kasabi store when a suitable proxy has been sorted). You might find the SPARQL queries used to fetch this data useful for making your own so I’ve compiled them in this commentable document.
What can we do with the spreadsheet?
Creating pivot table reports
I used to find pivot table creation quite daunting but they are a great way to filter and analyse large sets of data. The PROD spreadsheet contains 2 example pivot reports one for technology and the other for standards (if you want to enable pivot table options you’ll need to File > Make a copy of the spreadsheet then on the pivot sheets select Data > Pivot Table Report).
The example ‘Technology Pivot’ is summarising the data from phase 1&2 of the OER Programme. You can see there is a number of technologies were recorded (over 100), the top three being YouTube, Flash and Slideshare. This data can be shown graphically using Google Spreadsheets chart tools and embedded as an interactive or static graphic.
The charts in Google Spreadsheets aren’t that exciting but there is a framework for extending these, which I’ll come back to later. Something I was hoping to do was link the data from Google Spreadsheet to IBM’s to more powerful visualisation service Many Eyes. For example below are examples of the technology pivot data as a bubble diagram and comparison Treemap and it would have been nice to automatically generate these. Tony had posted on this a couple of years ago using Many Eyes Wikified and Google Spreadsheets, but alas this part of the service was pulled last year) .
Visualising project relationships
One of the great things about the PROD data is there is a lot of relationship data already there. For example if you look at the PROD page for the ENABLE project you can see there are details of the projects that ENABLE builds on or was built on by, related projects and even comments that relate to the individual relationships.
This relationship data can all be extracted from PROD and in this case imported to the Spreadsheet. On the Relates_to sheet I’ve imported details of all the JISC funded project ‘relates_to’ relationships. What can we do with this data? Well at a basic level in column B we have a source id and column F has a target id which makes it suitable for using in a force layout diagram. Fortunately I’ve been playing around with online force layout diagrams for a while and most recently created a Google Spreadsheet Gadget to display this info (this is how you can extend the basic chart selection).
Whilst this gadget still needs to be verified by Google for anyone to see the results we can use the spreadsheet as a datasource for the gadget’s big brother EDGESExplorer. Publishing the spreadhseet to the web, using the built in tools to do this, we can reformat the data in EDGESExplorer to see how all JISC funded projects stored in PROD are related (click on the image below for the interactive version, you can explorer individual nodes by clicking on them).
I think this graph provides a useful interface for seeing and exploring the relationship between JISC funded work. To become really useful some additional interfacing is required but there’s code I can reuse from my Guardian Tag Explorer and Twitter Conversation explorer tools, which gives you more project info and a link back to the appropriate PROD pa
ge (btw interesting post on ouseful.info on Information Literacy, Graphs, Hierarchies and the Structure of Information Networks).
So to recap: a Google Spreadsheet is being populated from PROD (live data). Users can create reports and charts within the Spreadsheet environment (live data) or export data to other services like Many Eyes (dead – as in the live link is broken – data). Finally we can publish live data from the Spreadsheet for use in other tools like EDGESExplorer.
My question for you is what data would you like in the Spreadsheet? Summary of projects by institution? Breakdown of projects by partners? Projects by JISC Programme Managers? Let me know what you would like to see 😉
Join the conversation
Lorna M. Campbell
Hi Martin, this is looking really good, though I must admit the EDGESexplorer demo makes me dizzy!
In terms of what data I’d like to see, it would certainly be useful to see a breakdown of projects by institution. I’d also be interested in seeing clusters of project partners, particularly in terms of whether institutions always tend to work with the same partners or whether they from different clusters for different programmes. Another factor it might be interesting to illustrate is whether particular institutions focus on the use of particular technologies. I think we already know the answer to that one but it would be good to be able to view the evidence!
> EDGESexplorer demo makes me dizzy!
Some more product testing may be required 😉
The partnerships is a good one and there are a number of ways this could be displayed. Does geography have a factor in this, are they generally local collaborations? I wonder if including partnership radius would be useful?
@lorna @martin I think it’s worth remembering that sometimes you get most from a visualisation when you have a conversation with the data, asking questions of it through queries and display parameters, and getting (partial) answers back in the form of pictures that may only be meaningful in the context of the current state of the conversation.
Getting most meaning from data may thus be a most meaningful when it is an active thing. Lorna asks “In terms of what data I’d like to see, it would certainly be useful to see a breakdown of projects by institution.”, but that can be displayed in many ways ,a nd may lead to other questions. Martin can produce snapshots of views of different visualisations that can be inserted into graphs, but that is a completely different context. I think it’s worth distinguishing between visualisations that can be used (inter)actively to explore datasets and find value/insight within the dataset and visualisations that communicate those insights once discovered. It’s way to easy to say ‘i want a great visualisation for this’ and then be unhappy with the result because the visualisation that’s produced is actually a tool that works in the visual analytics phase but it just complicated and meaningless when used as a presentation graphic. Sometimes, the most effective charts might be bar charts, or scatterplots; but it might also be the case that working out how to segment your data in order to produce those charts can be helped by using the shiny visual analytics tools…
In my own tinkerings, I spend a lot of time thinking about the tools and practice that can be used to support visual analysis of data as an active process, but most of the graphics I produce are crap as presentation graphics or inforgraphics. In fact, they tyoically aren’t intended as such. They’re snapshots of a process that show what someone engaged in the process of using those tools can stumble upon that helps them understand the data better. So for example, all the network graphs I post might be better reported as a simple table saying ‘these are the communities I found and these are the notable people in them’.
Going back to: “”In terms of what data I’d like to see, it would certainly be useful to see a breakdown of projects by institution”, do you want to see presentation graphic (that may be a text based table) that summarises this, or do you want a tool that lets you explore in a graphical way the Prod database and that allows *you* *yourself* to ask this question, and others like it, in a variety of ways? 😉
OER Visualisation Project: Data Driven Journalism [day 16] #ukoer – MASHe
[…] in theory have records in Jorum. We can use this assumption to validate the refined dataset.Using day 8’s CETIS PROD to Google Spreadsheet its easy for me to create a list of Phase 1 and 2 lead institutions (41 in total as some […]
OER Visualisation Project: What I know about #UKOER records on Jorum and OER Phase 1 & 2 [day 18] – MASHe
[…] and jorumUKOERReconciled – Issue 2. I’ve mentioned both of these spreasheets before (day 8 | day 16), but you might like to File > Make a copy of these to play with the data yourself and […]
OER Visualisation Project: Fin [day 40.5] – MASHe
[…] OER Phase 1 and 2 maps [day 20], timelines [day 30], wordclouds [day 36] and project relationship [day 8] Recommendations on use and applicability of visualisation libraries […]
Importing live SPARQL data into MS Excel … don’t go there – MASHe
[…] data feed from a SPARQL query. I wanted to go down the SPARQL route because consuming this in a Google Spreadsheet as part of the OER Visualisation project worked well. I was hoping that I could take one of my original queries, passed in a SPARQL […]
Comments are closed.