Update 2: When I was pushed the twitter timeline through the generator I noticed there were a number of tweets which weren’t in English so I’ve now passed the results through the Google Translate API. Click the same link above to see the result.
Update 3: Some inline updates: extra tweet filtering and inclusion of .csv upload to twitter subtitle generator.
Originally I was more interested in mashing the Google I/O Android Keynote with Twitter subtitles because I could, but the process was useful in highlighting some areas for further development. The first is something Tony and I have discussed before is a way to curate the twitter timeline to sort the wheat from the chaff. For the Google I/O presentation I downloaded the archive in .csv format from Twapper Keeper and ‘tweaked it’ in Excel filtering for tweets meta tagged as EN (English) which took it down from 5420 –> 4638 tweets in 45 minutes (not surprisingly the majority of Twitter users ignore the language setting leaving it as the default despite the language they tweet in). Then filtering ‘retweets’ by removing ‘RT’s which took it from 4638 to a more manageable 3124 tweets. Update: I also noticed that a number of tweets had exactly the same timestamp so I filtered these out leaving 1790 tweets.
Having got this far it then highlighted the next issue, converting the truncated csv file into a timed text XML format. Previously I’ve shown how you can Convert time stamped data to timed-text (XML) subtitle format using Google Spreadsheet Script and could have easily gone down that route again but wanted to try something new. As the Twitter Subtitle Generator already integrates with the Twapper Keeper service it seemed a small step to get the tool to read a csv file rather than the Twapper Keeper feed. This was made so much easier by a PHP function which returns a multi-dimensional array from a CSV file optionally using the first row as a header to create the underlying data as associative arrays – sweet!
For once my code was clean enough that I could drop this function in and point it the the csv file I created. I haven’t worked this functionality into the ‘generator’ yet but at least it is another piece for the jigsaw. Update: couldn’t resist – added functionality to upload csv for subtitling.
So below is a short demo of the output. Click here to see the full 45 minute presentation with Twitter subtitles