- Currently Listening to:
- Metallica — Orion
So we’ve been thrashing out some ideas for our Distributed Systems practical, which is insanely tight on time.
The challenge is to build a distributed file system across the computers in the fourth year lab. The primary goal is robustness, but we’re also striving to make it as fast as possible. It’ll be tested by adding some text files to the system on some computer, allowing them to disseminate for a while, and then stressing the system through a combination of Steve flooding it with traffic and Paddy fervidly unplugging random machines from the network. We will presumably be assessed on the availability of the original files once the dust settles.
Here are my own, possibly naïve plans for how this is going to go down. These ideas are a distillation of some heated exchanges we had on Monday. They’re subject to change as I become privy to new information, man.
* The first task is to discover the other machines on the network. It has been suggested that Apple’s Bonjour networking system is ideal for dealing with “dynamic node membership”. Each machine will cultivate a peer list, and add or delete machines as they join, or are forcefully extricated, from the network.
* Each file will have to be broken up into chunks. The spec says they have to be split into 1KB chunks. Each of these will need to be packaged as an object, and be tagged with relevant metadata so we can tell which file it was originally part of. We then serialise the object for transfer, and deserialise when it’s received. This adds a considerable amount of overhead to the system, but it’s necessary as far as I can see. Some of the metadata properties each fragment needs are the source file’s original filename1 and the part of the file it corresponds to (more on this in a moment).
* Each file is first slurped into a ByteArray. This means we can send various slices of the file to other machines on the network. Each slice (being the packaged 1KB chunks) will also have to store which part of the file they contain, which is simply their index into the file ByteArray. When reassembling the file, we first build up an empty ByteArray of the same size. We then send a call out to the other machines on the network, and they send back the fragments of the file we want. The full array is then folded back into a usable file.
* Each computer on the network will need to build up and maintain a listing of all the files — and fragments of files — that it currently stores (and yes, this means the hard drives on all the computers will fill up inexorably; this is the price we pay for true robustness). Each file is to be addressable by filename and by the hash of its contents. For this we need a HashMap with two keys for every value. The value they both point to is of course a key into another Map somewhere else, to avoid dangerous duplication.
* This index table will also store a record of the original file size and hash value of the full files2. It does not need to store any reference to the data stored on any of the other computers.
* When a search request for a file (by filename or hash value) is received, the machine running our system (the “local machine”) first searches its own index to see if it has the file on its own hard drive, or has replicated parts of the file itself. If the full file hasn’t been found, it searches over k nearest neighbouring machines from its peer list, which are hopefully ordered by their ping.
* Yes, this may mean having to search every machine on the network in pursuit of a single missing kilobyte of the file. Creating a partly-centralised system like distributed hash tables has too many problems of their own. Primary among them being the fact that the hash table itself needs to be distributed and replicated, which is the problem we’re trying to solve in the first place. *-) We still need to look a little deeper into this aspect.
* The key to this all working is that when we replicate a file, or part of a file, to another machine, we also send its corresponding record in the index table. This means that any computer that has part of a file also has a record of that in an easily-searchable format. So, when the search request propagates out to one of these remote nodes, it can very quickly ascertain whether it has part of the requested file or not, and return yay or nay.
* This discussion doesn’t attempt to guess at the “magic number” of nodes we should replicate file fragments to at a time. Logic would dictate that we should try to have every fragment of every file residing on at least three computers at a time if robustness is going to be maintained. There is likely some number to be deduced, based on the number of nodes in the network, the amount of files and amount of free space available, but we’re unlikely to work this out without first implementing the system.
- Instead of storing the filename as text, it would be a nice optimisation to store a reference to the file’s record in the index table instead. ?
- This can be easily extended so that the index table holds additional metadata about each file, like singer and album details for music files. These would then also be made searchable. ?
Update 2006/05/18: We successfully completed the assignment. Here’s my writeup comparing and contrasting current peer-to-peer systems.
Peer-to-Peer Systems Individual Research Report prepared for completion of Comp 4.14: Distributed Systems
- WebSockets in Firefox Mozilla Hacks – the Web developer blog
This is actually the primary value in WebSockets. The key thing is that the cost of sending a small amount of data is also very small. It’s possible to do bi-directional communication today with comet-style applications, but small amounts of data often require a huge amount of overhead to do so.
A second-hand story to give you a sense of scale: Google Wave, which tries to do real-time communication with keystrokes has a several-kilobyte overhead for just about every keystroke because of the TCP startup, teardown and HTTP message headers involved with what should be only a few bytes sent over the wire.
I haven’t tried, but I’m going to guess that if you built Quake on top of HTTP comet that the interactive experience would be poor. So this is where WebSockets really shine.
- TheBrain.com - Welcome to TheBrain
- It's simpler and cheaper to just join the dots - The Irish Times - Thu, May 27, 2010
THE PEOPLE of Nantes must have been thrilled in 2004 when Time magazine described it as “the most livable city in all of Europe”. And one of the main reasons was that it has a first-class public transport system, something Dublin doesn’t have and is unlikely to get any time soon.
Dublin has the unique distinction of being the only city to have two free-standing light rail lines that don’t link up
- Thoughts on OnLive - Wolfire Games Blog
When it was first announced, it was met with incredulity from pretty much everyone, especially from game journalists and game developers. Most stories about it inevitably focused on how unbelievable it is and how it's at least several years too early -- the infrastructure couldn't possibly support it. OnLive's heavy optimism combined with their policy of secrecy definitely gave off the fumes of vaporware. If their product was so great, why not let it speak for itself?
Well, today the veil and all media embargoes have been lifted. I have been part of the beta program, and have been playing it on my MacBook Pro and PC pretty extensively, so I thought I'd add my two cents. First of all, let me start with the most important point: this is the real deal. It's not perfect, but at least for beta users in San Francisco, it works pretty well.
- Geographic routing - Wikipedia, the free encyclopedia
Greedy forwarding can lead into a dead end, where there is no neighbor closer to the destination. Then, face routing helps to recover from that situation and find a path to another node, where greedy forwarding can be resumed. A recovery strategy such as face routing is necessary to assure that a message can be delivered to the destination. The combination of greedy forwarding and face routing was first proposed in 1999 under the name GFG (Greedy-Face-Greedy) [5]. It guarantees delivery in the so-called unit disk graph network model. Various variants, which were proposed later, also for non-unit disk graphs, are based on the principles of GFG [
- Border Gateway Protocol - Wikipedia, the free encyclopedia
The Border Gateway Protocol (BGP) is the core routing protocol of the Internet. It maintains a table of IP networks or 'prefixes' which designate network reachability among autonomous systems (AS). It is described as a path vector protocol. BGP does not use traditional Interior Gateway Protocol (IGP) metrics, but makes routing decisions based on path, network policies and/or rulesets.
- CiteSpace: visualizing patterns and trends in scientific literature
CiteSpace focuses on finding critical points in the development of a field or a domain, especially intellectual turning points and pivotal points. CiteSpace provides various functions to facilitate the understanding and interpretation of network patterns and historical patterns, including identifying the fast-growth topical areas, finding citation hotspots in the land of publications, decomposing a network into clusters, automatic labeling clusters with terms from citing articles, geospatial patterns of collaboration, and unique areas of international collaboration.
CiteSpace supports structural and temporal analyses of a variety of networks derived from scientific publications, including collaboration networks, author co-citation networks, and document co-citation networks. It also supports networks of hybrid node types such as terms, institutions, and countries, and hybrid link types such as co-citation, co-occurrence, and directed citing links.
- Zuckerberg: “We Are Building A Web Where The Default Is Social”
Today at Facebook’s F8 conference), Mark Zuckerberg laid out his plan to turn the Web into “instantly social experiences.” The building blocks to this super-social Web are Facebook’s new Open Graph and Social Plugins, which include new “like” buttons everywhere on sites outside Facebook.com, auto-login capabilities for those sites without clicking on Facebook Connect, and even a Facebook social bar which includes several of these plugins plus Facebook chat (goodbye, Meebo).
We’ve reported on all of these new features before, but today Zuckerberg put them into context: “we are building a Web where the default is social.” How is Facebook doing this? First and foremost, Facebook has redesigned its Graph API for developers so that not only can they see the social connections between people, but they can also see and create the connections people have with their interests—things, places, brands, and other sites. Zuckerberg calls it the Open Graph (as opposed to the Social Graph). It is rea
- iPad/iPhone OS 3.2 Stops Renewing DHCP Lease, Keeps Using IP Address
- Evil bit - Wikipedia, the free encyclopedia
The evil bit is a fictional IPv4 packet header field proposed in RFC 3514, a humorous April Fools' Day RFC from 2003 authored by Steve Bellovin. The RFC recommended that the last remaining unused bit in the IPv4 packet header be used to indicate whether a packet had been sent with malicious intent, thus making computer security engineering an easy problem.