Articles / Zero Install and the Web of…

Zero Install and the Web of Software

The GnuCash installation instructions warn non-programmers against even trying to install it. The word “nightmare” is used. Ideally, the process should be quite simple. If the project were distributed using Zero Install, users could safely fetch and run it, with all its required dependencies, using a single command.

Zero Install is a fundamentally different way to access software. Instead of copying software from the Web onto our computers, we cache it. It’s a faster, easier to understand, and safer way to get software, suitable for both broadband and dialup users. In this editorial, we will see how software is accessed via Zero Install and how we can distribute our own programs through it.

Introduction

So how would GnuCash be run with Zero Install? A commandline user would run it like this: $ /uri/0install/gnucash.org/gnucash

GnuCash would then run. GUI users would open the directory /uri/0install/gnucash.org in their favorite filemanager and click on gnucash. They might also bookmark it, or drag it to their panel to make it easier to get to in future. Shell users might symlink it into their bin directory or create an alias. Note that no password is needed, since this doesn’t effect other users of the system, and that the method for running the program is the same whether it is “installed” or not. Of course, GnuCash needs many, many libraries to work, such as GTK and Glade. That’s OK; instead of trying to load libgtk.so and having the linker report that it can’t find it, GnuCash will load /uri/0install/gtk.org/libgtk.so and Zero Install will cache it if it hasn’t already done so. Every dependency that a piece of software has is referred to by its location on the Internet, allowing it to be fetched as needed. There is no need to record what software is installed; all software is available at all times.

What about APT?

By now, most people are probably thinking “Doesn’t Debian’s APT system make software installation easy, too?”. But APT, like other traditional package managers, has a number of fundamental design problems. I’ll only talk about APT here; this is not to be mean to Debian, but rather because APT is the best example of a traditional package manager. The main problem is security. When APT is installing a program, it gives it complete access to the machine, running a script provided by the program as the root user. This means that you can’t let users install any packages they please, since even something as apparently harmless as a documentation package could wipe the system. This has the immediate effects of requiring a password from the user, limiting the range of software that can be installed, and making system administrators nervous about adding additional APT repositories to the system. Other problems include: Scalability The more software Debian provides, the longer it takes to update the packages list. Speed Doing a “dist-upgrade” to keep up with the latest security fixes and features downloads the full contents of every updated package instead of just updating the index files and refetching the software on demand. Bugs in the install and uninstall scripts can prevent the system from working or even damage important data. When disk space is low, it is difficult to know what can be safely uninstalled. Many packages include data you never need… Such as translations for other languages. … or dependencies on packages you don’t care about. Such as LaTeX requiring Perl whether you need that particular feature or not. And the system can’t be shared easily with another distribution You can’t use Debian-packaged software on a Red Hat system, for example. See Zero Install: What’s wrong with APT? for a more complete comparison.

A real example

GnuCash isn’t available through Zero Install yet, but some other software is. Here is an example walk-through, showing how the ROX desktop may be run with Zero Install. If you wish, you can install Zero Install now (something that future Linux distributions will hopefully do for you) and follow along with these instructions. However, I’ll try to write so you can still see what’s going on even without having it installed, as the system is still somewhat experimental. We could use an already-installed file manager for these examples, but I’ll assume we haven’t got one yet and show the process using the commandline. Initially, the /uri/0install directory is empty: $ ls /uri/0install $

However, when we try to access a subdirectory (an Internet site), Zero Install will fetch and cache the whole directory structure for that site. A progress indicator utility is available, which will pop up during longer downloads; depending on the speed of your network connection, you may or may not see it. $ cd /uri/0install/rox.sourceforge.net $ ls apps lib rox@

We can run the rox script here to start the filer, opening the apps directory: $ ./rox apps

When we do this, the apps directory opens in a filer window (again, the progress indicator may pop up while ROX-Filer is fetched): The apps directory This directory shows a number of applications. These are application directories (described by me in a previous freshmeat editorial), so each is just a directory with an icon and a program inside it. Clicking on the icon will run the program. So if we click on the Edit icon, Edit runs: The Edit text editor Edit needed the ROX-Lib library to run, and it simply ran it from the rox.sourceforge.net/lib directory. In turn, ROX-Lib required pygtk, which it got from a different site (zero-install.sourceforge.net). If we now check the 0install directory, we can see the new entries: $ ls /uri/0install rox.sourceforge.net zero-install.sourceforge.net

Edit also requires Python and GTK. In a full Zero Install system, we would see that gtk.org and python.org had been cached too. Currently, however, these must be installed in the traditional manner. If we close Edit’s window and run it again, it will start instantly, because everything it needs is now cached on disc. Likewise, if we run Memo (which also requires ROX-Lib and pygtk), only Memo itself will be fetched, since its dependencies are already present. Running cached software is just as fast as running traditionally-installed software. To save having to navigate to the applications each time they are used, users could drag them to a Start menu or to the desktop background, bookmark the apps directory, symlink programs into a bin directory, create a shell alias or keyboard shortcut, or use any other way of making software easier to get to. Here, the user has dragged some of the applications to the panel: Edit and Memo on the panel

Upgrading software

Zero Install uses a very aggressive caching scheme. If it has already fetched something, it won’t try to fetch it again. This is because software is typically used in a different way than Web pages (which are also often cached). When I open a Web page in my browser, I don’t want to see yesterday’s version. When I run the Gimp, I’d probably rather use a month-old version than wait for the new version to be fetched. However, we can force an upgrade very easily. Click on the Refresh button in the filer’s toolbar, and the cache (for the whole rox.sourceforge.net site) is updated. If nothing has changed, this requires downloading approximately 1K of data. If the site has changed, about 30K will be downloaded (the new index). Unchanged files on the site do not need to be re-downloaded, but anything that has changed will be refetched next time it is accessed. Also, only the initial 1K (with the index’s digital signature) needs to come from the named site; the rest may then be safely fetched using closer mirrors or peer-to-peer systems. In fact, sites normally make multiple versions of each package available (whether using Zero Install or not). If you instruct your filer to open Edit as a directory (rather than run it), you’ll see that it actually contains all the different versions (use the right-button popup menu and Look Inside if you’re using ROX-Filer): The Edit text editor You can thus run any previous version of Edit too, and the main application actually just runs the latest version (identified by the latest symlink). This illustrates a rather interesting feature of Zero Install. Programmers often have to decide when to drop an old feature, such as support for a version 1 file format, to avoid programs becoming too large. With Zero Install, old code is still easily available, but is not fetched until it is accessed. Thus, support for old formats need never be dropped, since the new version can just call an older version if required. Zero Install removes this size/features trade-off in other areas, too. For example, programs sometimes include debugging symbols. These make the programs considerably larger, but allow users to send much more helpful bug reports. Normal binary packages don’t include them, to save space, but with Zero Install, they can be placed in separate .debug files, and will only be fetched if the user tries to fire up the debugger, getting the best of both worlds.

Distributing your own programs

Not only is Zero Install easier for users, it’s much easier for programmers. One of the big savings is that, since everything is always available, you’ll never be tempted to add a hacked-up XML parser or the like to your application to save users from having to install one. Another is that the process of making software available is just that much easier. Here’s how you do it: Start by creating the directory structure you want to export. Most simply, you can just copy the binary in, but you could also provide a directory structure with different versions and binaries for different platforms, etc. You should think about this early, because you must expect other people to link to your programs. Try not to keep moving things around! Here, for simplicity, we have a directory called MySite containing a single executable called MyProg: A site directory to export The next step is to run the 0build program. This will build the index file and tar each directory. The output files need to go in a directory exported by your Web server. If you’re running Apache, this just means copying to your /var/www directory (or wherever your installation is set to look). You also need to specify the server’s hostname. If you’re using another computer as the Web server, you can just put the files in a local directory and use rsync or the like to copy them across. Haven’t got 0build? Don’t worry, it’s in Zero Install: $ alias 0build=/uri/0install/zero-install.sourceforge.net/bin/0build $ cd ~/MySite $ 0build /var/www example.org

When you do this, you’ll be prompted to create a GPG key. This adds a small amount of security; anyone who has accessed your site at least once before will have her system automatically check, when upgrading, that the upgrade is authorized. If you’re new to GPG, just select the offered defaults for key type, size, and expiry. The “Real name” can be anything, but the “email” address must be 0install@example.org (replacing example.org with the server name, of course). You only have to do the GPG work the first time. In the future, just run 0build in \~/MySite to rebuild the index with the same settings as last time (0build will check which files have changed and only create new archives where necessary). You can now see if it worked (possibly from a different computer): $ cd /uri/0install/example.org $ ls MyProg

And that’s it! Your program can, of course, link against, run, or otherwise use any other software in the Zero Install system just by using its full pathname starting with /uri/0install. There is one caveat, however: Because Zero Install always uses its cached copy of the index, the cached version of a site may be too old. You do, therefore, need to check the version of the resource you are using and force a refresh if it is too old. This isn’t actually anything new, since even traditionally-installed software needs to check for versions; the difference is that instead of reporting an error, you update the cache. Often, the library you are using will provide some way to do this. ROX-Lib, for example, is located (even in a traditional system) using the “findrox” module. This takes a version number and checks that the installed version is recent enough: import findrox; findrox.version(1, 9, 11) import rox

When used without Zero Install, the “version” function reports an error in a dialog box, telling the user where to download ROX-Lib. With Zero Install, it simply updates the cache (using the 0refresh command). You can also check the state of the cache in other ways. For example, Python programs normally use the /usr/bin/env program to search for the Python interpreter in PATH: #!/usr/bin/env python2.3 print “Hello world!”

Under Zero Install, /bin/0run can be used to check the cache and run Python through it: #!/bin/0run python.org/python2.3 2003-01-01 print “Hello world!”

This checks that /uri/0install/python.org/python2.3 exists and is at least as recent as the given date. If not, the cache is updated using 0refresh. For more information about packaging software for Zero Install (such as the recommended directory layout), see the Documentation for packagers.

Current status

Zero Install currently works on Linux 2.4 and 2.6 systems. Hopefully, developers on other systems will port it to their platforms. Up-to-date RPM and Deb packages will be needed to make installation of the system itself as painless as possible. While the implementation is still experimental, the interface seen by users and programs using the system (i.e., the /uri/0install filesystem) should remain stable. At worst (if the index format changes, for example), site maintainers will only need to rerun 0build to update.

Conclusions

By using a filesystem layout following the Internet DNS system, a Zero Install system is able to namespace and directly locate all of its resources. This removes the need for installers to move files to well-known locations (such as /usr/bin), since all files are already in their “standard location”. It also allows Zero Install software to be used alongside packages such as Debs and RPMs. Because every resource in Zero Install is namespaced in this way, it is possible to share a single cache between distributions (imagine dual-booting Debian and Red Hat, but sharing the /usr partition). Wherever the same resource is used, only one copy is needed; where different resources are used, both copies are present (python.org/python2.3 vs. redhat.com/python2.3, if Red Hat wanted to use a non-standard build of Python, for example). Because no install or uninstall scripts need to be run, users can run any software they please without risking the system or other users’ security. This enables a truly distributed “web of software” to which anyone can contribute and removes the need for software to be packaged specially for multiple distributions. Disk space can be recovered by deleting parts of the cache (either manually by the system administrator or by a cron job looking for unused files). If anything deleted is needed again, it will be refetched. While Zero Install doesn’t solve the problem of security for individual users (a malicious program “Foo” can still delete the files of a user who runs it), it does simplify the problem considerably, paving the way for better user security systems in the future (clearly, there is no point in users not trusting Foo in a traditional system; if it’s installed, it has already had the chance to do any damage it wants). See the Zero Install Security Documentation for more information on these topics. Zero Install is already being used to distribute the ROX desktop’s applications, and I hope that this article will inspire more people to make their software available this way. Future work will include polishing the core system and working toward better binary compatibility in the Linux world generally (the autopackage.org people share this goal with us, and we are using some of their software already to make our binaries more portable).

Recent comments

06 Dec 2004 18:37 Avatar wsc9tt

Re: How to make it more secure
Basically the same as the SFS (Self-certifying File System).


A very good way for a site to include system links to other software collections that are trusted. This would complete the chain and nothing can be forged.

27 Jan 2004 20:54 Avatar nx12

It's not a solution
As far as I see, it doesn't resolve the main problem of today's installation:distribution's specific dir's and config's files. So you still have write distro-specific installation scripts or whatever you do install anything. I don't see any improvements comparing with apt-get or portage-system. Either it will build absolutely separated app/lib directory in userspace or anywhere else, either it will have the same distro-specific incompatibility problems.
I see it just as something closed to dynamic linking. Nothing exceptional.

20 Jan 2004 05:41 Avatar damjan

Re: We could specify exact versions

>
> % If binary
> % compatibility existed on Linux, there
> % would be absolutely no need for any
> % weird package systems. Just look at
> % Windows.
>
>
> Even under windows one cannot trust
> DLL's to be backwards compatible (hence
> "DLL hell"). IMHO it would be best if
> each application specified the exact
> versions of each library that it
> required, and checked that the MD5
> matched. The application would only
> link against a new library version, if a
> trusted database reported that it had
> been thoroughly tested against that new
> version.


In windows its even worse, there are not different versions of a library like: libpng.so.2, libpng.so.3.
Linux already has this, but the problem is when oyu need to integrate two different software packages (like apache and Python

16 Jan 2004 08:08 Avatar tal197

Re: Two Holes and complex builds?

> 1) Setuid()/setgid() binaries?
> Certainly root isn't running any installation scripts written by unknown
> parties, but a non-trivial class of applications exist (email) that tend to
> rely on setuid()/setgid(). I took a quick look at my system here and Mutt
> and KDE are the standout setuid()/setgid() offenders.


Mutt isn't setuid here (Debian/unstable). Any idea why it would be? For KDE, the only thing I can think of is kpppd. The solution is for such programs to go through sudo, giving the admin a central point to configure all root access. Of course, sudo itself can't be in Zero Install (we don't allow setuid binaries under /uri/0install, for obvious reasons).

As an admin, this means you don't have to worry about packages installing setuid binaries behind your back, of course.


> 2) Hacked originating sites =
> comprimised 0install clients. GPG keys are nice and all, but how many people
> actually walk around with their GPG private keys on a floppy disk or USB
> keychain drive? IIRC a few months ago there was a site/source comprimise where
> the MD5 signatures matched the tar but the binaries had a trojan inserted.


Modifying MD5 sums to match binaries is trivial if you have access to the site (to be secure, you need some way to verify the MD5 sums themselves are correct, which zero install does with GPG).

As for breaking the GPG: yes, you could find out who admins the site, find their personal machine, get through the firewall/NAT, break into their machine (assuming they're running sshd on their laptop), install a keystroke logger, get the GPG passphrase and private key, break into the server and upload your trojaned build.

But it's a lot of work. Consider that most software doesn't come with a digital signature at all, and most people don't check them even if they do, and zero install will usually be more secure.


> For that matter, the 0install system can be used as a kind of Denial of Service
> engine in a "dumb-user" environment: all one has to do is set up
> a teaser website where the list of dependencies includes huge lists of
> large files; if left unattended at home/small/medium size sites it can
> saturate the Internet access pipe until the disk is filled.


That's true of your web cache, too, of course. Note that the user has to run the software (and leave it running while it downloads everything), the daemon process doesn't follow dependancies itself. For most home machines, you could achieve the same thing by telling a user to run 'wget -r' on a big site.

(you could also put per-user quota limits in the fetching daemon, but the system isn't big enough that we've had a problem yet)


> 3) Complex builds can strike the system
> down. Ex: PHP4 with all its extensions.
> Say extension #1 in PHP4 depends on
> liba; extension #2 in PHP4 depends on
> libc, which in turn depends on libb and
> liba, and for whatever reason two
> different versions of liba are specified
> between the two extensions. Boom.


However, it would be 'Boom' for everyone, not just your system, so it would get fixed ;-)

Mike Hearn (autopackage.org) is working on implementing the solaris linker behaviour, which allows this situation without problems.

12 Jan 2004 13:42 Avatar glamm

Two Holes and complex builds?
1) Setuid()/setgid() binaries? Certainly root isn't running any installation scripts written by unknown parties, but a non-trivial class of applications exist (email) that tend to rely on setuid()/setgid(). I took a quick look at my system here and Mutt and KDE are the standout setuid()/setgid() offenders. A normal user can't install a setuid()/setgid() application, so this means that 0install needs to run as root (at least during the chown() call). Not an unfixable problem, though.

2) Hacked originating sites = comprimised 0install clients. GPG keys are nice and all, but how many people actually walk around with their GPG private keys on a floppy disk or USB keychain drive? IIRC a few months ago there was a site/source comprimise where the MD5 signatures matched the tar but the binaries had a trojan inserted. Mode 600 doesn't protect GPG keys on a rooted machine.

For that matter, the 0install system can be used as a kind of Denial of Service engine in a "dumb-user" environment: all one has to do is set up a teaser website where the list of dependencies includes huge lists of large files; if left unattended at home/small/medium size sites it can saturate the Internet access pipe until the disk is filled.

3) Complex builds can strike the system down. Ex: PHP4 with all its extensions. Say extension #1 in PHP4 depends on liba; extension #2 in PHP4 depends on libc, which in turn depends on libb and liba, and for whatever reason two different versions of liba are specified between the two extensions. Boom. It's difficult enough for a person to properly configure a PHP4 installation with the myriad library/sublibrary dependencies and the interaction with Apache.

Granted, it's unlikely that the end user is going to be installing PHP4, but it's not impossible that other reasonably complex applications have similar problems.

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.