TAGSExplorer: Queryable Twitter archive exploration with Google Visualization API Query Language integration

Martin Hawksey

13 years ago

Since pushing out my Twitter archive visualisation tool, TAGSExplorer, it nice to see people are already pushing out tweets for their own archives. After a full on development period of a couple of weeks, TAGSExplorer is reasonably stable but like most other web services there is some continual tweaking going on behind the scenes. One of the biggest tweaks comes from a suggestion from Tony (Hirst). Tony thought it would be great to give some more control over the data visualised, essentially a queryable visualisation tool.

For example if you take the archive for #mozfest which has almost 8,000 tweets in it the visualisation you get is impressive but will send your browser into overdrive computing almost 2,000 nodes and 1,000 edges. Part of the problem is you get a lot of isolated nodes around the edge taking up navigation. These nodes represent people who tweeted #mozfest but didn’t @reply anyone with this tag.

So how can we easily filter these out? Fortunately Tony has a lot of experience with using Google Spreadsheets as a database and way back in 2009 started developing an explorer tool to help users built spreadsheet queries. This tool (here’s one of the latest versions) lets you input your spreadsheet url and provides tools for writing queries in the Google Visualization API Query Language. To take it back one step TAGSExplorer uses the Google Visualization API to read data from a Google Spreadsheet, using the API Query Language we can refine the data pulled in.

For example, if I take the #mozfest data and plug it into Tony’s Guarduan Datastore Explorer I can select columns A,B,C,D,J,L,M,N (the minimum for TAGSExplorer to work are the columns from_user, text, created_at, time (optional for sort), source, id_str, profile_image_url, status_url) and start defining ‘where’. This is a bookmark to select where B contains ‘@’ and not B starts with ‘RT’, which only selects tweets that contain a @reply or @mention, excluding all RTs. Using the Datastore Explorer gives me a preview of the data pulled back by the query.

I can now drop the qpc and gqw parts of this bookmark (eg &gqc=A%2CB%2CC%2CD%2CJ%2CL%2CM%2CN&gqw=%28B%20contains%20%27@%27%20and%20not %20B%20starts%20with%20%27RT%27%29) straight into a TAGSExplorer url for example http://hawksey.info/tagsexplorer/?key=0AqGkLMU9sHmLdEZFejItVVh2RGVQTjlsWHBVWlBWN2c&sheet=oau&gqc=A%2CB%2CC%2CD%2CJ%2CL %2CM%2CN&gqw=%28B%20contains%20%27@%27%20and%20not%20B%20starts%20with%20%27RT%27%29, which reduces the number of nodes displayed to less than 900 (refining this further with “starts with ‘@’” reduces the node count to 643)

Dipping into the Google Query Language Reference you can see there a lot of other select options. For example here’s a Guardian Datastore example which selects part of the #mozfest archive that contains ‘@’ and filters responses for the 4th November which can be dropped into TAGSExplorer via the url giving this:

[dotted lines are @mentions triggered by mentions=true in the querystring. This and retweets=true are both still undocumented oops]

Admittedly getting your head around the query language isn’t straight forward, but for the pro-user TAGSExplorer is now a querable data visualisation tool which I think is pretty cool!

BTW if you’re more used to writing tq queries as per the documentation tehn TAGSExplorer can also parse these (e.g. here’s the last example using a tq querystring rather than qpc and gqw)