Articles / The Problem With Mirrors

The Problem With Mirrors

Mirrors are extremely useful when used to their full potential -- but this rarely happens. There is nothing wrong with mirrors but the way that we use them. I want to make it so average users who don't (and shouldn't need to) know too many technical details can automatically make the best use of mirrors.

As Fiber to the home (15-30 megabit speeds) and Cable/DSL (1-6 megabit speeds) become more common, some servers are having trouble maxing out a user's download pipe. One way to increase performance is to download from multiple resources at once. This is mainly useful for large files.

Mirrors are confusing to an inexperienced Web user. The Fedora Project has 110 mirror sites in North America alone. List of Fedora mirrors Which do you choose? Which has all the files you want? Which is quickest?

In this case, not all mirrors carry all files. Some might not have all large ISOs (the Fedora Core 4 DVD image is around 2.5 gigabytes), or might only carry a subset of files (some kernel.org mirrors only have .tar.gz or .bz2 files, some have both). Or they might just be out of sync. That means you have to navigate through them to find out if they really have the file you need.

This is basically a usability problem. With some downloads, complications arise from users needing to select their Operating System, language, and location. I hope to make things easier.

Mirrors are great. We need to keep using them, but we need a better, more automatic way to use them. Peer-to-Peer (P2P) in general and BitTorrent specifically are amazing. They make it so individuals can share their bandwidth and distribute files that would otherwise cost too much through traditional server-to-client downloads.

But... P2P and regular hyperlinks are not that reliable. A hyperlink is one link to a file. If that file is gone or moved, or the server is temporarily down, that's it. 404 Error. You can search by filename, but there is no unique identifier to find that file again on the Web. P2P sharing is ephemeral. Most files are not available constantly or for the long term. I'm sure everyone has found a .torrent that he really wants, but that no one is sharing any more. BitTorrent downloads will not complete if there are no seeds at 100%. A torrent download will sit at 99.9% forever until a 100% seed (someone with the full file) starts sharing. There is no fallback plan.

I have been working on a file format called MetaLink that bundles the various methods (P2P/HTTP/FTP) of downloading files in order to improve usability, performance, reliability, and efficiency over one P2P method or a regular hyperlink. One of the main goals is to make the download process simpler for the end user. I hope this format will be found useful by Free and Open Source software projects.

Performance is increased because you download from multiple resources at the same time. Reliability is greater because there are multiple avenues or alternate locations to get a file. Hyperlinks have a single point of failure. Metalinks do not; all resources have to go out at the same time for a file to be unavailable. And it is more efficient because it spreads the downloads more evenly across multiple resources (P2P or Web/FTP servers) by multi-threading (a.k.a. segmenting or accelerating) downloads. That means that a portion of each file is downloaded from separate servers.

The minimum requirement for Metalink to be integrated into a program is that it already supports segmented downloads. Clients should also have a way to check MD5 and SHA-1 sums. And if it has BitTorrent and other P2P methods (ed2k links, magnet links, Gnutella) built in, even better. The perfect client will be able to share and access files across many P2P networks.

A few clients are implementing MetaLink right now and should be available shortly.

Here is an example MetaLink for OpenOffice.org 2.0 with links for a BitTorrent .torrent, magnet, ed2k, FTP, and HTTP. A really useful MetaLink will include combinations for different Operating Systems and languages.

<?xml version="1.0" encoding="UTF-8"?>
<metalink version="2.0" xmlns="http://www.m3talink.org/"
  origin="http://www.openoffice.org/mmm/OpenOffice.org-2.0.1.metalink"
  type="static" pubdate="2005-12-21-22:07:22"
refreshdate="2005-12-23-03:24:18">

<files>
  <file name="OOo_2.0.1_LinuxIntel_install.tar.gz">
    <identity>OpenOffice.org</identity>
    <version>2.0.1</version>
    <description>OpenOffice.org 2.0.1 - free office
suite</description>
    <tags>OpenOffice.org, office suite, OpenDocument, open
source</tags>
    <language>en-US</language>
    <os>Linux-x86</os>
    <size>109237237</size>
    <verification>
      <md5>e0d123e5f316bef78bfdf5a008837577</md5>
    </verification>
    <publisher>
      <name>OpenOffice.org</name>
      <url>http://www.openoffice.org/</url>
    </publisher>
    <license>
      <name>LGPL</name>
      <url>http://www.gnu.org/copyleft/lesser.html</url>
    </license>
    <copyright>Copyright 2000-2005 Sun Microsystems
Inc.</copyright>
    <resources>
      <magnet>
        <url>

magnet:?xt=urn:sha1:TWTEVOAO2IIEV67QT2ZITTXHXEUR4EXD&xt=urn:kzhash:07b7760f1c05440c779479b50dd9dd5d96708cf47b7cef1181058119637ff20ab7d38af0&xt=urn:tree:tiger:VKFOQ3RETGBCLWOJAMX53EQR4OWNV7CUEOAVY6Q&xt=urn:ed2k:8966658d3b75ff12e1260371ad257098&xl=109237237&dn=
OpenOffice.org_2.0.1_LinuxIntel_install.tar.gz&xs=http://ftp.snt.utwente.nl/pub/software/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz
    </url>
    <preference>90</preference>
      </magnet>
      <ed2k>
        <url>

ed2k://|file|OpenOffice.org_2.0.1_LinuxIntel_install.tar.gz|109237237|8966658D3B75FF12E1260371AD257098|h=3JVTR3O2DYGSBYCDCHKBOBXL2IJ6A3H3|s=
http://ftp.snt.utwente.nl/pub/software/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz|/
        </url>
    <preference>90</preference>
      </ed2k>
      <bittorrent>
    <torrent>

<url>http://borft.student.utwente.nl:6969/file?info_hash=%53%13%06%4e%30%c4%1e%e2%6f%e2%b0%24%8f%1b%e7%1e%97%ae%ec%ca</url>
        </torrent>
    <preference>100</preference>
      </bittorrent>
      <http>

<url>http://mirrors.isc.org/pub/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz</url>
    <location>US</location>
    <preference>80</preference>
      </http>
      <ftp>

<url>ftp://ftp.ussg.iu.edu/pub/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz</url>
    <location>US</location>
    <preference>20</preference>
      </ftp>
      <http>

<url>http://mirrors.ibiblio.org/pub/mirrors/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz</url>
    <location>US</location>
    <preference>20</preference>
      </http>
      <ftp>

<url>ftp://openofficeorg.secsup.org/pub/software/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz</url>
    <location>US</location>
    <preference>40</preference>
      </ftp>
    </resources>
  </file>
</files>

</metalink>

The goal is simplicity. A user will click this one .metalink, and the client will download the file in segments from P2P and mirrors. After the download is complete, the checksums will be compared to verify that the files are identical.

So, to sum up, these are the benefits over traditional methods:

  • It combines FTP and HTTP with Peer-to-peer (P2P, shared bandwidth).
  • It uses a standard unified format that collects links for automatic accelerated (segmented) downloads from multiple sources.
  • Automatic load balancing distributes traffic so individual servers are under less strain.
  • There's no Single Point of Failure as with FTP or HTTP URLs, so there's more fault tolerance.
  • There's no long, confusing list of possibly outdated mirrors and P2P links.
  • It makes the download process simpler for users (automatic selection of language, Operating System, location, etc.).
  • It stores more descriptive and useful information for Electronic Software Distribution.
  • There's no separate MD5/SHA-1 file or manual process for verification.
  • It uniquely identifies files, so even if all references to it in the Metalink stop working, the same file can be found via a P2P or Web search.
  • It can finish BitTorrent downloads even if no full seeds are shared.
  • For FTP/HTTP, an updated client is needed, but not a separate client as for P2P. (For example, the official BitTorrent client is a 6.5 megabyte download).

I'd be interested in any comments you have.

Recent comments

06 Sep 2007 21:09 Avatar RobertGoretsky

Setting the Preference Parameter On The Server?
I understand that the metalink configuration provides a 'preference' parameter for each link that determines how likely the client should be to select that particular link. I assume that this parameter would not be static, but rather would be dynamically set by the web server providing the metalink. But how would the server know how to set this? It seems that you may lose some of the intuitive &quot;I live near X, so I will choose the server near X&quot; functionality you get with regular mirror hyperlinks. Your thoughts on this?

Robert H. Goretsky

Hoboken, NJ

22 Oct 2006 11:52 Avatar antini

Metalink tools
Bram Nejit has released Metalink tools (http://prog.infosnel.nl/metalinks/) which are extremely useful for making metalinks, by generating many different checksums and importing mirror lists.

12 Sep 2006 22:03 Avatar antini

BSD/Linux Distributions using Metalink
DesktopBSD (http://desktopbsd.net/), BLAG Linux (http://www.blagblagblag.org/download), StartCom Linux (http://linux.startcom.org/), Berry Linux (http://yui.mine.nu/berry/edownload.php), Ubuntu Christian Edition (http://www.christianubuntu.com/)

07 Sep 2006 16:52 Avatar antini

Re: New and updated Metalink clients
Speed Download (http://www.yazsoft.com) (Mac) now supports Metalinks. It looks and works great, check it out.

14 Aug 2006 23:32 Avatar Mark8

Thank you
Great advice, thank you!

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.