Archive for February, 2006

Monthly Archive

Daniel Eatock, the designer of the ubiquitous Big Brother Eye logo, posts weekly photographic “conceptual observations” — interesting patterns and alignments of objects. Very interesting. (via)

So we’ve been thrashing out some ideas for our Distributed Systems practical, which is insanely tight on time.

The challenge is to build a distributed file system across the computers in the fourth year lab. The primary goal is robustness, but we’re also striving to make it as fast as possible. It’ll be tested by adding some text files to the system on some computer, allowing them to disseminate for a while, and then stressing the system through a combination of Steve flooding it with traffic and Paddy fervidly unplugging random machines from the network. We will presumably be assessed on the availability of the original files once the dust settles.

Here are my own, possibly naïve plans for how this is going to go down. These ideas are a distillation of some heated exchanges we had on Monday. They’re subject to change as I become privy to new information, man.

* The first task is to discover the other machines on the network. It has been suggested that Apple’s Bonjour networking system is ideal for dealing with “dynamic node membership”. Each machine will cultivate a peer list, and add or delete machines as they join, or are forcefully extricated, from the network.

* Each file will have to be broken up into chunks. The spec says they have to be split into 1KB chunks. Each of these will need to be packaged as an object, and be tagged with relevant metadata so we can tell which file it was originally part of. We then serialise the object for transfer, and deserialise when it’s received. This adds a considerable amount of overhead to the system, but it’s necessary as far as I can see. Some of the metadata properties each fragment needs are the source file’s original filename1 and the part of the file it corresponds to (more on this in a moment).

* Each file is first slurped into a ByteArray. This means we can send various slices of the file to other machines on the network. Each slice (being the packaged 1KB chunks) will also have to store which part of the file they contain, which is simply their index into the file ByteArray. When reassembling the file, we first build up an empty ByteArray of the same size. We then send a call out to the other machines on the network, and they send back the fragments of the file we want. The full array is then folded back into a usable file.

* Each computer on the network will need to build up and maintain a listing of all the files — and fragments of files — that it currently stores (and yes, this means the hard drives on all the computers will fill up inexorably; this is the price we pay for true robustness). Each file is to be addressable by filename and by the hash of its contents. For this we need a HashMap with two keys for every value. The value they both point to is of course a key into another Map somewhere else, to avoid dangerous duplication.

* This index table will also store a record of the original file size and hash value of the full files2. It does not need to store any reference to the data stored on any of the other computers.

* When a search request for a file (by filename or hash value) is received, the machine running our system (the “local machine”) first searches its own index to see if it has the file on its own hard drive, or has replicated parts of the file itself. If the full file hasn’t been found, it searches over k nearest neighbouring machines from its peer list, which are hopefully ordered by their ping.

* Yes, this may mean having to search every machine on the network in pursuit of a single missing kilobyte of the file. Creating a partly-centralised system like distributed hash tables has too many problems of their own. Primary among them being the fact that the hash table itself needs to be distributed and replicated, which is the problem we’re trying to solve in the first place. *-) We still need to look a little deeper into this aspect.

* The key to this all working is that when we replicate a file, or part of a file, to another machine, we also send its corresponding record in the index table. This means that any computer that has part of a file also has a record of that in an easily-searchable format. So, when the search request propagates out to one of these remote nodes, it can very quickly ascertain whether it has part of the requested file or not, and return yay or nay.

* This discussion doesn’t attempt to guess at the “magic number” of nodes we should replicate file fragments to at a time. Logic would dictate that we should try to have every fragment of every file residing on at least three computers at a time if robustness is going to be maintained. There is likely some number to be deduced, based on the number of nodes in the network, the amount of files and amount of free space available, but we’re unlikely to work this out without first implementing the system.

  1. Instead of storing the filename as text, it would be a nice optimisation to store a reference to the file’s record in the index table instead. ?
  2. This can be easily extended so that the index table holds additional metadata about each file, like singer and album details for music files. These would then also be made searchable. ?

Update 2006/05/18: We successfully completed the assignment. Here’s my writeup comparing and contrasting current peer-to-peer systems.

[PDF] Peer-to-Peer Systems Individual Research Report prepared for completion of Comp 4.14: Distributed Systems

Reminiscent of Blur’s charming video for Coffee and TV, “Los Angeles Let’s Be Friends” is a terrific little music video, full of colour and visual bombast. I wonder how many “augmented reality” visualisations have added in elements with such vibrancy and life? (via)

Sure, the SRG is a fantastic place to come to work in the morning, but there are still a few small changes we could put into operation to make it all the better.

This paper describes visualising the process of looking after a medical patient, using multiple, simultaneous, tightly-coupled views.

Supporting Protocol-Based Care in Medecine via Multiple Coordinated Views,” by Wolfgang Aigner and Silvia Miksch. At CMV ’04.

 author = {Wolfgang Aigner and Silvia Miksch},
 title = {Supporting Protocol-Based Care in Medicine 
   via Multiple Coordinated Views},
 booktitle = {CMV '04: Proceedings of the Second International 
   Conference on Coordinated \& Multiple Views in Exploratory 
   Visualization (CMV'04)},
 year = {2004},
 isbn = {0-7695-2179-7},
 pages = {118--129},
 doi = {},
 publisher = {IEEE Computer Society},
 address = {Washington, DC, USA},

Though the middle sections of this paper do not have much specifically to do with my own field of visualisation, they do contain a lot of good passages on visualisation techniques. Visualising time-orientated data is particularly interesting. They refer to the somewhat misleadingly named “fisheye view”, in which the focused element is enlarged and centred, and all other elements are distorted by shrinking and moving them outwards, which is a nice way to better make use of display space.

There is also a good introduction to using multiple simultaneous views, each focusing on different aspects of the data. They go about this by splitting the display area into a Logical View and a Temporal View. This approach could also be used to visualise sensor data — in effect explicitly denoting the scientific and information visualisation aspects.

There is also a section on setting up a user study to ascertain the needs, general practices and expectations of seasoned medical professionals. This is relevant as I may well have to set up some testing time with domain experts to test if our new visualisation environment is indeed more useful for collaboration than the way things are currently done.

The characteristics that the doctors and other staff identified as being important in such a system were non-surprising: intuitive interactions, a clean interface and flexibility were all mentioned. Designing for domain experts also raises a new point: the system will have to include a means for data input, which is a part of the UI that I hadn’t given much thought before.