Articles / The World Free Web

The World Free Web

Eric Ries has spotted an irony in the Slashdot effect: When a site is Slashdotted, the information on it becomes harder to reach not because it has become scarce, but because it is being copied to thousands upon thousands of other machines. Shouldn't this make it easier to get instead of harder? In today's editorial, Eric suggests a way to turn the situation around so popular information becomes instantly more accessible rather than less.For those that don't already know, FreeNet is a fully distributed information storage system somewhat akin to Gnutella, except that it is anonymous and far more resilient to attack. The basic concept is that each node in the network replicates content that passes through it on the way to another node. This has many advantages, including the fact that popular information tends to propagate across the network and become more abundant over time. I won't go into all of the features of the FreeNet project here because you can read all about it on their Web page.

Compare FreeNet to the current WWW architecture. As information on a Web server becomes more popular, it becomes more difficult for users to access. Witness the impressive "Slashdot effect" which occurs when thousands of users suddenly overwhelm a Web server. The Slashdot effect is caused by the centralization of information. When information is centrally located, this location becomes a single point of failure for the distribution of that information. The irony of the Slashdot effect is that it is caused by users making thousands of copies of the relevant information. It's not as if the information has become scarce -- quite the opposite. Users ought to be able to share that information with each other, decentralizing it. [Editor's note: In fact, this already happens. What do you see almost immediately in the comments to a story that links to a movie trailer or the photos of Hemos's wedding? "Here's a mirror" and "Here's another". Making this more efficient by turning it into part of the system would actually just be the next step in an already-established practice.]

I propose that FreeNet be integrated into the Mozilla cache structure, allowing users to form a sort of "browsers' cooperative" in which pages are freely shared in a giant collective caching structure. (Of course, this should not be limited just to the Mozilla browser, but I think it's a good starting point.)

I have given the structure of this cooperative some thought, and I want to give an overview of how I think the system should work. First comes a technical overview, which details the relatively simple integration work that needs to be done. Second, I will give a more abstract view of how I think the social structure of such a cooperative ought to be formed. I have dubbed this system the World Free Web (or WFW, not to be confused with the WWF or WCW).

Technical Overview

When a user makes a request for a page using this new enhanced WFW browser, three things would happen simultaneously:

  1. The browser makes an HTTP request to the requisite server.
  2. The browser checks the on-disk and in-memory caches.
  3. The browser submits a FreeNet request to the local FreeNet node.

Now, whichever of these methods returns a valid result first is displayed to the user. The user need not have any knowledge of which method was used, although if they get outdated or garbage data, they can always hit "shift reload" which should force the browser to use method #1 to re-fetch the data. This is similar to the way many proxy servers work today. Behind the scenes, as each page is inserted into the browser cache, it is also inserted into the local FreeNet node. The entire process is transparent to the end user.

In the course of normal operation, this whole scheme will behave in a very similar way to normal Web browsing. There are really only two cases in which the FreeNet node would provide a page faster than the HTTP request:

  1. When the site in question is down or under heavy load.
  2. When there is high network lag between the user and the site.

Philosophically, this scheme has one primary benefit: Each user of the Web, even a non-techie type, becomes a contributor to the network infrastructure instead of simply a drain on resources. This is much like the old days of Usenet, when each person shared her news feed with others. More on this below.

Practically, there are many more extended benefits. For instance, one of the problems with distributed systems such as FreeNet is the lack of feedback or ratings on the quality of the information. The WFW can automatically provide reliable feedback on the validity of information. If a user hits the "super reload" button after getting a page from FreeNet, this page is likely to be of suspect quality (it is either a bogus result or out-of-date). Large Web sites will no longer have a monopoly on the ability to handle a large number of users. This is just an example of the kinds of things users can do when they band together.

But it gets better. Once this starts to catch on, proxy servers and Web servers can be adapted to start participating in the system. Imagine a new HTTP response code that indicates that the server is too busy to handle your request right now, but that the data you want was just inserted into FreeNet with a given key. Small sites get to leverage the bandwidth and storage of their users to reduce costs.

FreeNet itself benefits in a number of ways. Since many more people are using FreeNet just by using their browsers, the amount of information and overall storage capacity in FreeNet is increased by several orders of magnitude. All the virtues of FreeNet's design become stronger as the number of users increases. Having more nodes increases the overall resilience of the network to attack, and having nodes run transparently by "normal" users makes it harder to accuse FreeNet users of engaging in suspicious activity.

Social Organization

In order to be successful, I believe the WFW should work as a true grassroots user movement. In order to support this, the WFW will have to add slightly to the underlying FreeNet protocol. However, a few things should be noted. First, WFW nodes could still act as normal FreeNet nodes, performing all the operations that the typical FreeNet node would. The rules I am about to outline would only apply to WFW nodes talking to other WFW nodes, and need not apply when they are talking to normal FreeNet nodes.

First of all, one of the big problems with FreeNet as it currently stands is the bootstrapping process of finding out about other FreeNet nodes. Currently, FreeNet maintains an optional central repository of nodes which is available via the Web. This is not a great long-term solution, as it reintroduces centralization into a system that should be fully distributed. My proposal is that the WFW be a closed "club" structure. In order to join, you have to get an existing member to sponsor you. In many cases, this member could just be your ISP, but it does not need to be.

Each WFW node could have an ACL that keeps track of other nodes that the current node is willing to accept requests from. When a node is introduced into the system via a sponsorship, at first this node will only be allowed to make requests via the sponsoring node. The node will also handle requests through its parent node, but as it fulfills these requests (and hence, becomes more and more useful to the rest of the network), other nodes will start to accept direct connections from it. This produces the proper incentives to marginalize the effects of spammers. If you are going to sponsor people, your node will be the primary victim of any malicious activity they engage in, and you will be able to cut off their access if they do engage in such behavior. Only after a node has proved its utility to the rest of the network will it gradually be brought closer to the strongly-connected center, and if it starts to change its behavior, it will gradually be pushed out towards the periphery.

Clearly, there is much more work to be done. The WFW is a first step towards accomplishing a more intelligent mainstream net architecture which recognizes that information cannot and should not be controlled by an elite few. But this is just an outline, a sketch of what's coming. I am hoping to get people involved in a development effort -- a few from the FreeNet team, a few from the Mozilla team at first -- but then there's plenty more work to be done. If this is to succeed, it will have to be a community effort. Consider this your official invitation. If you'd like to get involved, the project has a home at http://enzyme.sourceforge.net/WFW/.

While you're thinking about these issues, you might want to check out Professor David Gelernter's latest manifesto, The Second Coming.


Eric Ries (eries@CatalystRecruiting.com) is working on a BS in Computer Science and a BA in Philosophy at Yale University. He is currently CTO of the Internet startup company Catalyst Recruiting (http://www.CatalystRecruiting.com/) and its cousin, the Enzyme open-source project (http://enzyme.sourceforge.net/). His previous work experience ranges from Microsoft to the San Diego Supercomputer Center. He has been published on Java and other topics in both books and magazines. He was co-author of The Black Art of Java Game Programming, among others, and was the Games & Graphics editor for the Java Developer's Journal. His complete resume is available at http://i.am/EricRies/.


T-Shirts and Fame!

We're eager to find people interested in writing editorials on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an editorial gets a freshmeat t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about.

Recent comments

14 Mar 2001 19:45 Avatar abo

Re: Bandwidth Problem

> One of the posters mentioned a problem
> with bandwidth... such as do you really
> want to download a 16MB file from some
> guy with a 28.8 connection? Well... I'm
> not a programmer, but I am an
> engineer... If you know who has the
> file, couldn't you simultaneously fetch
> portions of it from multiple sources?
> In this case, the more available
> sources, the easier -- your end -- of
> the pipe saturates... in this case a
> good thing.
>
> -- Phenym


Hmmm... interesting idea. I suspect that the overheads in co-ordinating the partial downloads from all-over the place might kill the idea, but it's worth exploring...

23 Aug 2000 13:41 Avatar brlewis

AOLers already are using proxy caches
Those downplaying the usefulness of proxy caches erred in using AOLers as an example. AOL uses proxy caches now and has for a long time.


Dynamic content can be cached reasonably with an Expires or Last-Modified header. Cache-busting is the main reason for Squid and other proxy caches not being more popular than they are.

16 Aug 2000 14:02 Avatar phenym

Bandwidth Problem
One of the posters mentioned a problem with bandwidth... such as do you really want to download a 16MB file from some guy with a 28.8 connection? Well... I'm not a programmer, but I am an engineer... If you know who has the file, couldn't you simultaneously fetch portions of it from multiple sources? In this case, the more available sources, the easier -- your end -- of the pipe saturates... in this case a good thing.


-- Phenym

15 Aug 2000 03:15 Avatar abo

Standard Caching problems...
The problems of dynamic distributed caching are pretty clear;

security: how does a client know it's from the desired source, and how does a source know that only the desired client(s) got it. I'd suggest PGP/SSL type signatures and encryption. Be aware that data targeted for specific client/s cannot be cached, except specifically for those recipient/s.

freshness: large amounts of (all?) content is dynamic to some degree. How do you avoid serving stale content, while still caching? http already supports object meta-data in the from of Expires headers and also supports freshness-checking with If-Modified-Since (IMS) requests. The problem is, how do you know in advance when the the data might have changed by (time to do an IMS request), or even better should definitely have changed by (no point in caching anymore)?

storage: with limited distributed storage space, how do you optimise hitrates. This requires optimising distribution of data and determining what data to actually cache. Throwing away suspected stale stuff will free up space, but might miss out on IMS hits. Keeping stuff that is fresh is a waste if no one is going to request it.

bandwidth: huge amounts of storage are useless without the bandwidth needed to get at it. How do you use that bandwidth effectively? Do you really want someone on the other side of the globe to be fetching 16M objects from your little part of the distributed cache over your 28.8k modem? Would they even want to? This is possibly the biggest problem with distributed, client-based caches; servers have all the bandwidth, so why fetch from a low-bandwidth cache? Putting large caches at high-bandwidth distribution points is nearly always more effective than having low-bandwidth clients share caches.

These requirements all interact and contradict with each other. It's a big juggling act. I'd have to say http is pretty comprehensive in its attempts at ensuring at least freshness(if very ad-hock in design), unfortunately not everyone is using it to its full capability.

The most exciting solution for bandwidth I've seen is rproxy
, which uses the rsync algorithm for doing delta-based updates to stale cached objects. This could replace IMS fetches entirely, and reduce bandwidth significantly on misses, even for totaly dynamic content.

Freenet is a cool idea, but I see its primary aim as dynamicly distributing data to protect it from censorship. Efficient distribution of data is one of the technical problems that must be overcome to meet that aim. I'd be quite surprised (and pleased) if they manage to do this so affectively that it becomes a general solution to data distribution problems outside their primary objective.

01 Aug 2000 05:07 Avatar vinci

META information
It seems to me, that the problem can be fixed with HTTP and some XML-Tags (or HTML-META-Tags). What is needed is the information:
What mirrors do exist
What mirrors should be used


On the other hand there is the What does the browser do with the information ?-problem. We had Mosaic that interpreted the LINK-element - others did not. The best way should be the development of a tag-set that could also be integrated in XHTML - I suppose there is one allready that gives the possibility to decide what do to next after the header of a document is retrieved. It is also possible to distinguish between stable and dynamic parts of a page (maybe only ads are dynamic). I don't think we need another new software - we need widely accepted protocols.

I like the description of the browser site in the article (like the "What's related") But I think buttons and menus should be more flexible (functions should be loadable dynamically - not build-in!) I see that the classical web-page comes to an end. Why shouldn't every web-page should have one pull-down-menu for it's contents? If an index is recognized by the browser he could display ist like he or the user wants. This would result in compatibility to every browser-type (even Lynx).

XML was invented because one would not implement new features and elements every time someone has an idea. It is time for the WWW, that this ideas get supported more effectively. And this will also mean: Not limited to BROWSERS at all!

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.