Tuesday, December 13, 2011

NMA object browser

Success! Everything came together at the end of semester.

I got on the web, got all the data hooked up and and built a simple browser interface that worked.

NMA object browser - displaying drawing (right) in context of all drawings the NMA holds (centre) and all object types (left) 
The idea was to show as much context as possible on the screen at the same time to aid greater understanding of objects in the NMA's varied collections, which is particularly useful where individual item records are sparsely populated, and to encourage browsing to other like or different objects.

On the left is a list of all object types, with counts, ordered by count. Clicking on an object type brings up all of the items of that type in the centre window. These are displayed as a thumbnail grid of images, or where images are not available as catalogue reference numbers (IRN) with truncated titles. Of course the preference would have been to have all images, but I felt it was important to include all items, and the truncated titles are still often informative and visually look ok.

Mousing over an item brings up the title in a popup box in the right corner. This is simply a div with a z-index to ensure it is on top of everything else. Upon mouse out, the popup div is removed using JQuery $.remove(). Easy. This was one issue to note - having to remember to do a mouse out as well as a mouse over function was a little tedious, although I can see it could be useful. It would have been nice if there was similar to in CSS, an additional mouse hover event - which I used to make links underlined when moused over. Using the CSS cursor propoerty, I was also able to make spans and divs look like <links> with the hand pointer.

The objects are sorted chronologically, using provenance date over associated date if both are available. Items without dates are retained, and placed at the end of the list. The year is displayed under the item image/title. Displaying as an inline histogram of sorts adds a rich contextual dimension - otherwise to find out dates one would have to zoom into an individual item record, and even then there would be no way of knowing how many other items were from the same year and what was the spread of years for items of that type. Thankfully the source data was in a consistent format with year first, and then day and month after if available, as a single string, which allowed for sorting a simple extracting of the first four characters using the JavaScript substr() method.

On the right a summary of an individual item record is displayed. The first item in the sorted list of object types is automatically loaded. Other items can be loaded by clicking on them in the display grid, or by clicking next/previous to cycle through the list. Mousing over the next/previous links also brings up the popup with title, as a preview. It was a critical design intention to keep the zoomed in display on the same screen as the full collection context, rather than in a pop up or new tab. The full record on the NMA catalogue is still linked to, for further information.

NMA object browser - mousing over a collection title highlights the objects within that collection
Like the sorting chronologically, an important intention of the browser was to hook up other ways of sorting and sub-sorting the NMA collections. I attempted and adequately demonstrated the potential of this, by hooking up collections data to the list of items of object type. I was able to build a list of the collections that these items were part of and at the top of the centre window list the 5 collections that contained the most items of object type with counts. Mousing over a collection title highlights the collection by fading all the items not in the collection. This is achieved by changing the opacity accessed neatly with getElementById(#id).style. The mouse over worked very nicely, except there were two interface issues: for long lists you couldn't scroll to the bottom of the list without mousing out; and I didn't have room at the top of the window to list more than 5 collections.

There would be some easy extensions to this browser, which I would pursue if it was to be hosted on the NMA labs website - that is basically more links, more context. Clicking on a collection would bring up that entire collection in the browser centre window, and browse mode could be flipped to browse by collection, with all collections listed on the left. With the same architecture it would be easy to add other browse modes, such as material type, date or associated people.

So although I didnt get time to add all of this extra context, or refine further the browser interface, or draw some graph/visualisations, I am very glad that I challenged myself and built it to be native to the web. Once set up with data, the linking and mouse events work seamlessly. I feel like I could have fun linking up more and more and more, and am now ready to tackle some more websites!

I do think I could tidy up the code and data work a little. For example I forgot about global variables for much of the project and found myself getting convoluted in passing information to functions. I also could have prebuilt more of the lists, and done all the sorting, in Processing - to speed things up at the browser/client end.

That said, the next step really would have been to develop the NMA API to handle all the data calls, and this would allow the data to always be up to date. I wouldn't want to prebuild lists every month, when the NMA catalogue is added to.

Even if in a rudimentary form, I have established my confidence in showing everything in a big data set in a meaningful way. A great project to finish the Masters of Digital Design. Big thanks to Mitchell, and also to the National Museum for the privilege of working with this special data (which now must be wiped from our systems).

Sunday, December 4, 2011

NMA project update - debugging

All my focus seems to be on the technical side of things - getting it to work in the web world has been overwhelming to say the list.

The last couple of weeks I have been doing some series debugging. There are so many seemingly simple things that catch you! Here are some updates.

First to report. I prebuilt a list with each object type and a count in Processing. It made loading super fast! Great news. I am now able to handle easily the whole collection - at least at this zoomed out level and as a simple text list.

I still am sorting the list in the browser with JavaScript, although there is no need to do this dynamically so I could be saving more resources by presorting too.

A prebuilt list of object types

I also prebuilt lists of all the items of each object type, but I got totally stuck linking it up to the super list. I tried to embed the object type as a key in a mouseclick function so that the list of items would be loaded when the object type was clicked on. I realised, as before, that as the mouseclick event would not happen at the time of drawing, I would have to hardwire the key. This time however I couldn't get it to pass in the key as a string. Previously I was able to escape by using single and double quotation marks, but not this time. After much consternation it was an easy fix - visiting Mitchell. I needed to use backslash ' \' to denote that the character immediately after is real.

So great, now I was able to load a list of lists, and navigate to each... well almost..

Lists of item titles displayed for object types that have been clicked on

... not all the object types worked! It appeared to be those with spaces between multiple words.

Mitchell also showed me Firebug and how to write to the console - console.log(), which is neater than using JavaScript popup alerts to debug as I was earlier. The great thing about Firebug is that it is able to follow all the script, css etc from linked files and to show how it modifies the html.

So my initial though was that it couldn't load the json files that had spaces in the file names. This would make sense as I had previously learned that the web doesn't like spaces and often replaces that with % signs and other characters.

However  I was able to prove that I was loading the data by writing to the console the list length and each item's title. This was puzzling indeed.

Firebug console showing item titles for 'record covers' object type which dont display  on screen
So further investigations led me finally to discover that the id attribute can't contain spaces. This was my problem - I was using the object type to name the header div, so that I could call it later to append the list of items of that type.

The solution: I needed to find a way to parse the object type name and replace the space with a dash or underscore, which the id attribute would accept. (The id attribute is fussy - it wont allow the name to start with a number either).

Luckily I found a string.replace() function native to JavaScript. It took a bit of figuring out however. I couldnt get a standard regular expression like /s, which removes all whitespace, to work. I could however get my custom specified regex pattern enclosed by forward slashes to work,  ie literaly slash space slash, '/ /'. I then had to follow it with a 'g' to indicate global, which removes all the spaces not just the first as is the default otherwise.

So now I have to keep track of key and key2, because I still need the original with the space for display and to locate the file names. Anyway it works!

'head ornaments' and 'performance costumes' lists of item titles of that object type
However there were still a couple of object types that had parentheses in their name and so had the same problem with the id attribute. Writing a regex for parentheses was hard! Again I had to escape using the backslash, but I couldn't get it to work all in the one replace() function listing all characters in my regex pattern to replace at once as tutorials seemed to indicate was possible. I eventually through trial got it to work one character at a time with three subsequent replace() functions - one for the space, and one each for the opening and closing parentheses.

Once this was all working, I was also able to implement a few tidy ups.

I am now using the <span> tag rather than the <a> anchor tag, so that it doesn't refresh and lost my place on the page. With CSS cursor and hover properties I am still able to get the <span> to appear as a link.

I was also able to use the JQuery $.empty() to empty my container div that holds the list of items. This, coupled with a custom data-display = "on/off" attribute to act as a switch, means that clicking on an object type a second time hides the list of items rather drawing them twice. Neat.

Finally I plugged back in the images, put the item titles (truncated) in my place markers where there are no images, and linked them back to the NMA online catalogue. It is starting to come together, and look neat. Hooray!

'paintings' and 'bowls', items displayed with images if available else titles

Tuesday, November 15, 2011

NMA project update - drawing with images

I have been experimenting with how to draw and layout content - the image grid particularly. Here are a few of my iterations.

I was excited to try some html 5, so I dove right in and played with <canvas>. After figuring out how to get the canvas and make a context in it to draw to, I was able to layout the grid much as I would in Processing. As I ran through my loop of items I kept track of their x and y positions, and I could easily draw rectangles for items without images - canvas was intended for drawing.

Having smaller squares as place holders for items without images works pretty well to make the visualisation more compact and get more images (= interesting) on screen at once.

Object type sets drawn with an absolute position in a <canvas>

Working with canvas I even was able to make it scalable - by calling the draw function with both window.onload and window.onresize events, and then setting the canvas size to window.innerWidth and window.innerHeight.

However the problem with this approach is that everything that is drawn is not an html object, and therefore is static / can't be interacted with. In fact to have mouseclick or mouseover events would require separately keeping track off the mouse position, as you would in Processing. This is madness when html objects already have mouse interaction natively built in.

So next I went back to appending html using the JQuery $.append(string).  Here I threw back all the images with absolute position, set again by keeping track of x and y positions as I looped through the items. The images had a straight html link also.

Object type sets (items with images only) drawn with absolute position as html <img>

This was ok, but clearing the old images, running through all the date and recalculating x and y positions each time the window was resized made it slow and clunky (and only working with 800 of the 48,000 items). Also I was appending html after collecting it for an entire object type set - it would be nice (and easy) to append after looping past each item so that on a slow load you would be able to see the visualisation being populated.

A better way I quickly discovered was to use the document flow. The trick that had alluded me was an object display property 'inline-block', which allowed me to create objects (I started using the generic <div>) that had fixed dimensions and that followed one after another inline - ie all my earlier attempts with the document flow had resulted in each list item having a new line. One note - it appears, like canvas, to not be supported everywhere. To get this to work I had to upgrade to the lasted versions of Internet Explorer and Mozilla Firefox.

Object types sets drawn in the document flow as fixed dimension divs, inline-block

Looks good. The fixed dimension divs without images simply have a background colour. This actually is a very straight forward way to draw rectangles.

I still however had one small hiccup. The mouse interaction obviously doesn't happen at the same time that the html is being written in loop. I had to find a way to get the object to remember it's reference irn so that it could link to other stuff. I did this by writing at the time of the initial append, onclick to call a clickFunction with the particular irn already hardwired to pass in - ie onclick='clickFunction(" + item.irn + ")'.

An alert demonstrating that an item's reference irn can be recalled and that therefore interaction  is possible

I also made the object id be the irn so that it can be called from anywhere - this should allow me to highlight objects in the future! Apparently there is a convention for data attributes, ie storing data as a property rather than content - this seems counter intuitive, but perhaps it is a useful way to attach data but keep it hidden.

That's all for now.

Sunday, October 23, 2011

NMA project update - getting data organised, again

Ok so now that I am on the web, I have to get data organised (again). Here are the results of some of my playing.

This first sketch here I have ran through a for loop of the collection to build an associative array where the key is the object type and the value is a count of how many items there are of that type. I don't think I fully understand JavaScript associative arrays yet - I have been thinking of them like hash maps in Processing, but I think that really they are just a normal object (and that even a singular object is the first item in an array), and that keys are not keys but actually object properties. When testing if my associative array already contains a particular object type I use objectTypeList.hasOwnProperty(object_type). When getting items I can call them either with objectTypeList[object_type] or objectTypeList.object_type, but they dont have an index and it appears I cant get their length - to run through a for loop I can use for(key in objectTypeList).

To display, I am adding each item to an unordered html list with $.append, which I am also using to format the object type as bold. It would be better to put the object type and the count in different html tags with unique classes so that the formatting can be done separately with CSS.  The yellow background is however thanks to CSS.

A list of object types and count of items of that type

If I wanted to get all the keys out as an array I could do so with .keys however this appears to only have support in the very newest browsers. For now I will run through a for loop and get each individually. This leaves me wondering if I am missing a better approach to organising my data?

A list of object types and titles of each item of that type
Breakthrough! Yay! This second sketch here is organised. I have built lists of each object type, adding the items of that type - no need for custom classes, the items appear good to go straight from the JSON as JavaScript objects. Displaying all the titles is proof that I can access the individual items. I am in control! Now the count is the length of the lists.

Problem - handling undefined keys. Don't know how to skip. It appears all items have a title and object type recorded, but there is variable use of most other parameters. Something to come back to..

Next step in getting organised. Sorting. JavaScript .sort() worked nicely on this list of keys, which I extracted as described above. The sort function defaults to sorting alphabetically for a list of strings, to sort numerically I had to write a simple comparator function that compared the item list lengths. Once I have a sorted list of keys I can loop through it and using the key still access the individual items.

A list of object types sorted alphabetically
A list of object types sorted by count descending of items of that type
And finally I was able to load images in, simply using the <img> html tag and the $.append. What a relief! Now I have all the basic ingredients to make some of browser for the NMA collection.

A list of object types and images of each item of that type if available
Actually it turns out the problem above about identifying undefined properties/keys was very simple to resolve with an if(key == undefined).

A list of object types and images of each item of that type if available with a count of available images
And just to tidy up my weekend of getting data organised, I was able to make each image a link to the corresponding item record in the NMA online catalogue. Quite satisfying!

Item record in the NMA online catalogue linked from my list of object types
A big thanks to Mitchell for getting me started with some of his sketches.

My next steps will be to try working with the full dataset (I have only been using the first 800 item here), and to draw only what is on screen so there are not huge numbers of images flying around (see if I can get ajax to work now?). Then I will need to learn to draw so that I can get more sophisticated formatting and so that I can design some analytical visualisations (charts, graphs etc). For these I could try Processing.js or Raphael or D3.js which supercedes Protovis.

Thursday, October 20, 2011

NMA project update - how do I get on the web?

Big scary hurdle. Don't know where to get started. Moving out of my comfortable Processing world. Have to learn many new things at once...

First step I thought would be to get my data into the browser. I thought I would need to make a database and then an API to call to it. My website back of house allowed MySQL databases and had phpMyAdmin installed to manage. Ok so I know it isn't standard  to develop and test online, but I didn't/don't want to learn how to set up a local server, at least just yet, on top of everything else that is new. So I looked at phpMyAdmin - I can upload XML, but not JSON, and only files less than 100mb. So the very first thing to do is export a clean small XML version of the data. Stuck already. Stayed stuck for days.

I tried to write XML with proXML, a library for Processing. But I couldn't figure out how to actually put any data content in the XML elements. I could make elements. I could give them attributes and add children. I could check if elements had data content (text) and get that text. Seems such a basic thing. And the documentation for the library was otherwise good. I tried lots of ways that I made up myself to add data, but could only write elements that were not correctly formed. I couldn't find any help on forums either.

Then I tried to write XML using Java StAX. This library required you to explicitly code opening and closing tags and start document etc. Writing to a stream and remembering to flush was ok. My output used Java  FileWriter, which I thought should work. But it didn't! I kept getting error: access denied. Why? I couldn't figure it out - for ages. Turns out, after investigation prompted by Mitchell, that the Java file path name wasn't relative and so it was trying  to write at my top level C: drive! Problem fixed. Exported clean XML.

However at Mitchell's suggestion I decided to instead change approach and, at least initially, try to directly load JSON into the browser and work with it. Hopefully the files wont be too big - usually in web you would would (with an API calling to a database) only load the bits you actually needed at any particular time. Mitchell kindly gave me some sketches to hack to get started.

So working with JQUERY I made my first JavaScript sketches, which selected html elements and changed their formatting or added content. Yay, achieved something! Next I tried to load some data - but was badly stuck again. I couldn't get $.ajax({ url: ''dataURL" }).responseText; to work, nor $.getJSON. I eventually was able to write some JSON elements in html file and work with these, but I still couldn't get JSON to work. In fact I was just about to give up after most of a day of trying different combinations of $.ajax, $.getJSON, JSON.parse(data) and even eval() which I understood  to be a big no no because it didn't parse to check for valid JSON and so was a security threat. I had tested online and locally, neither worked.

I searched help forums to find why the functions were hanging or resulting in variables that were undefined. Then I realised that there were some syntax errors in the JSON data - they were just a bunch of objects floating, not separated by commas in an array. I fixed this, but is still wouldn't work! Searched some more but still couldn't find a solution.

Finally - brainwave - try a different browser. Works!!!
Don't know why I didn't think to try this earlier. Browsers are notoriously fussy.

So nothing would work for me in Chrome. Don't know why. In Internet Explorer $.getJSON works, but I still couldn't get $.ajax to work. Don't know why!

Anyway here is my very very sweet hello world.

A list of item titles with their object type in brackets

Next, now that the data is in the browser, I will begin to play with it...

End frustration.

Wednesday, October 5, 2011

Getting data organised

My first task with the NMA project was to get started working with the data. Mitchell Whitelaw helpfully set us up with some example code.

Our data came in a verbose xml that was too big to keep in memory in  Processing, so Mitchell showed us how to in Processing split the data and parse it into JSON format one line at a time, extracting only the data we needed. JSON is a lightweight format based on JavaScript that works well with Java (Processing).

Mitchell also demonstrated loading images from the collection (you can't load all at once - there are 20,000 in 3 different sizes!) and picking random objects to show, using a class for items. He also showed us hashmaps, which I first used with myTram - calling a key is much easier to work with than trying to remember an index position. The hashmap here contains arraylists of items organised by object type.

I used the hashmap to select a random object type to show all of the objects of that type in the collection. Clicking through random object types is not a bad way to start browsing. The data was indeed organised!

Showing an object type - motor cars, there are 11 in the NMA collection
Next I wanted to be able to sort the data, so that I could view it other than randomly. It was easy to sort an array alphabetically or numerically using the Processing sort array function, so I converted my arraylist of object types to an array, and hey presto I had a Ben Ennis Butler inspired histogram! It was indeed easy to scroll though object types and see how many of each there were.

Object type histogram, alphabetically sorted - advertising cards
Due to memory I only visualised the first 20 object types, but in the future I could have a more sophisticated way of not bothering with what was not on screen.

After this, however I was stuck. I wanted to sort numerically by the number each object type. I couldn't do this with arrays, because even if I extracted an array of all the counts and sorted this, there would be no way to syncronise it with any other lists.

The answer - to make another class for objtypes, and then to use comparators which instruct how to compare objects. In this case the comparator says when sorting an arraylist of object types to compare them based on the size of their corresponding arraylist of items.

I visualised this simply as a list for now. I would have to think about what to do visually with the scale difference between the most numerous couple of object types (6000, 3000, 2000) and the quick drop off (to a few hundred) and then a long tail (2, 1). Mitchell suggested something like a treemap that was compact.

List of most numerous object types - there are 6,000 mineral samples in the collection

List of some of the object types for which there are only 1 in the collection

I think that now I have the organisation to get started in making mockup visualisations in Processing - I still have to figure out how to translate to an online world. Hopefully I can experiment with the NMA API before building my own MySQL database.

Tuesday, October 4, 2011

Interactive word frequency cloud

Following the data visualisation unit, I was lucky enough to have the opportunity to work over summer as a research assistant for Andrew MacKenzie to develop a tool to explore survey responses from residents, architects and builders who had rebuilt in Duffy after the 2003 Canberra bushfires. The word cloud was built with supervision from Mitchell Whitelaw and is based on code he developed for the A1 Explorer.

Word frequency cloud (architects only, responses to all questions)  with substantial control panel  for filtering at right
Word frequency cloud with correlations to 'wanted' highlighted and all occurrences of 'wanted'  listed on right
The data can be filtered by response to particular questions, the category of respondent (resident who rebuilt, new resident, architect, builder etc) and individual respondent - so it is possible to see a cloud of everything or  any subgroup of responses or an individual response. A list of standard 'stop' words  and any words with less than 3 characters have been removed. Further words can be added to an exclusion list, by clicking, which is helpful to look beyond boring words or extremely frequent words that can obscure differentiation between less frequent words.

All of these filtering options end up in a large control panel, which took a bit of juggling to fit on screen. It may have been neater to hide it in drop down  or pop up menus. However I think it was important to highlight the current view position within in the entire data set.

Mousing over a word highlights corresponding words that occur in proximity and brings up a scrollable list of all occurrences of the highlighted word in fragmentary context of the five words pre and post it.

An appropriate way to understand and navigate data?

So this is another example of a show everything and zoom in visualisation. However the reason I posted it is primarily to make a brief observation about the appropriateness of visualisation techniques to understand/navigate data. A distinction between understanding and navigation is perhaps important.

In the case of Mitchell Whitelaw's A1 Explorer the word cloud visualises item titles in the National Archives A1 Series. Titles generally are specific and succinct, and considered. The A1 Explorer is a visualisation that reveals some of the topics and relationships in the series, but it is also an interface to the digitised items themselves.

Similarly a word cloud of a carefully crafted speech, such as Obama's inauguration speech, reveals succinctly some of the themes. It is probable that some speeches are written with word cloud analysis in mind. Political rhetoric noticeably employs frequently repeated, memorable, mantras. Of course, as Jodi Dean writes, a word cloud is in many ways a very superficial analysis that ignores sentences, stories and narratives.

A different example, designed specifically for visualisation as a word cloud, was curated by the ABC who to mark Julia Gillard's first year as Prime Minister called for the public to submit 3 words that characterise their perceptions of Gillard and also of opposition leader Tony Abbot. Not surprisingly the most frequently submitted words aligned closely with the rhetoric that had been most prominent in the media.

Even if visualising words by themselves are appropriate, a critical challenge for word clouds and like visualisation techniques is to be able to locate the small, hidden, items, because they are perhaps the most interesting or important. It might be that quantitative data analysis can only ever take us so far, and that curation is necessary to go beyond? However when it comes to big data, quantitative might be our only way  in - a starting point for exploration.

Andrew MacKenzie has said that the word clouds were very helpful as a research tool and their revelations support his observations during and other analysis subsequent to the interviews. My feeling is that there was substantial noise because of the nature of the raw survey data. The responses were not carefully crafted like an Obama speech or considered even like a title or a 3 word perception of Gillard - they were spontaneous and people thought as they spoke. The word cloud doesn't distinguish initial response from more considered closing summary remark. It doesn't take account of rambles, tangents or emphasis placed on particular ideas. That said the quantitative analysis also ignores any bias the researcher might have had in looking for particular ideas.