Articles / VM Code as a Software Distr…

VM Code as a Software Distribution Mechanism

Dave Gudeman writes: "A developer who wants to make a piece of software available to others faces the daunting task of software delivery. There are several strategies for delivering software, primarily source code, machine binaries, and virtual machine binaries, each with its own advantages and disadvantages. I'm going to discuss each of the alternatives, then suggest a variation that is potentially better than any of the other solutions for commercial as well as Open Source software projects."The simplest solution for the user of the software is to deliver machine binaries with a system-dependent installation script so the user does not have to do anything but run the script. This method is expensive for the distributor, who has to test, maintain, and deliver multiple distributions and installation scripts. And, with this method, it is inevitable that some systems will not be supported. The disadvantages of this method may be summarized by saying that machine binaries are too dependent on the user's hardware and OS platform.

From the distributor's point of view, the easiest delivery method is bare source code, since it requires no work other than making the code available. However, this does not make the problems of distribution go away; it just moves them to the user. In order to compile the program, the user needs to have a development system compatible with the developer's, including a compiler, translators, libraries, and tools such as make and yacc. And even with the proper tools, if the user's hardware or OS is different from the developer's, the user may need to do various porting work. The disadvantages of this method may be summarized by saying that source code is too dependent on the developer's hardware, OS, and development platform.

So binary distributions are too dependent on the target system and source code distributions are too dependent on the development system. These platform dependencies can be largely eliminated by delivering virtual machine (VM) binaries. This method has been popularized by Java and its class files, but it has been successfully used in other systems for decades. VM binaries are independent of the target system in the sense that virtually any computer can have a VM interpreter. Typically, VM implementations are not independent of the development system since they use only a single programming language, but there is no particular reason why this should be the case. In fact, although the Java VM was intended to execute a single language (Java), many other languages can now be translated into JVM class files (see http://grunge.cs.tu-berlin.de/~tolk/vmlanguages.html). Programs written in these languages can be implemented on any machine that has a Java interpreter, making them relatively independent of both the target and development systems.

At first glance, it seems that Java class files might solve all of our software distribution problems. We can translate all programming languages into JVM class files and distribute all programs in that form. But the problem with interpreted software in general, and class files in particular, is that it is much slower and requires more memory than compiled software. JIT compilers can partially address the speed problem by generating machine code on the fly, but they cannot do serious optimization since that would take too long. This will prevent Java-style JIT implementations from ever really competing with native code solutions in performance.

Of course, there is no reason in principle why we could not deliver machine-independent class files and have the installation run an optimizing compiler on the class file to produce native code. This would generally produce better code than a JIT compiler could, but it would still not be comparable to what a high-quality optimizing compiler could do with the original source code. The reason is that much of the semantic information you need for effective optimization is lost in the translation to class file format. Another problem with this solution is that optimization can take a very long time, so running an optimizing compiler is impractical for applets and inconvenient for the installing traditional applications.

So far, I have concluded that the most machine-independent form of software distribution is VM binaries, that it is necessary to optimize VM binaries, and that optimizing VM binaries on the user's machine is ineffective and inconvenient. The obvious alternative is to deliver VM binaries that have already been optimized. There is a problem with this as well: different target platforms require different optimization. But there are some optimizations that can be done at a higher level. For example, it is always a win to evaluate complex expressions at translation time instead of at run time, and it is always a win to remove dead or unreachable code. What is needed is a VM that allows more extensive system-independent optimization in the VM code. This sort of preoptimized code could be loaded by a program like a JIT compiler and executed at optimized native code speeds. The challenge is to find a way of optimizing VM files without relying on machine-dependent optimization.

To see how we could approach this, consider the process of translating a source language to a VM language and then to machine language for native execution.

Source ------> VM ------> Machine

The source language is very different from the machine language, and the VM language is somewhere in between. The question is, where is it? Is it closer to the source language, closer to the machine language, or somewhere in the middle? At one end of the scale, the VM for a traditional compiler is the machine itself, and at the other end are common scripting languages where the VM is the source language. Java class files are near the middle. The closer a VM is to the source language the easier it is to do the source-to-VM translation. The closer the VM is to the machine language, the easier it is to do the VM-to-machine translation.

For fast JIT compilation, we want the fastest possible VM-to-machine translation, so this suggests moving the VM toward the machine and away from the source. But going against this is the fact that we want to remain machine-independent, and that moves us away from the machine. Still, there are commonalities between machines. For example, most modern machines are register-based, and this suggests that the VM should be register-based so we can do preliminary register allocation in the VM.

We can view the process of translation more generally as one that involves many source languages and many machines, and the challenge is to find the point for the VM that allows for the most optimization and the simplest VM-to-machine translation for the largest set of machines. Let's call such a VM an Optimizable Portable Instruction Set or OPIS. I have studied these issues a little as part of implementing an optimizing compiler, and I am confident that a reasonably good OPIS can be designed. However, this a research project that would require expertise in implementing many different programming languages and in writing compilers for many different machines.

Is this something the Open Source community is capable of? Can the community model be applied to a large research project or is research too different from development for the model to carry over? In a sense, the community model is very similar to the academic research model; the work is distributed over many researchers each doing what he or she is most interested in, and the result is often rapid progress in the state of knowledge.

I would like to hear from anyone who might be interested in participating in such a research project and anyone who knows of related work in the area.


Dave Gudeman (dgudeman@azstarnet.com) received his PhD in computer science from the University of Arizona in 1994. His research areas involved programming language design and implementation and his dissertation involved the design of an optimizing compiler for a concurrent constraint programming language. He is currently working for a small software company designing databases and GIS applications. His contributions to free software include the Janus Compiler (for research purposes only) and a Java XML reader/writer called Harp (the source code for Harp will be posted on SourceForge Real Soon Now).


T-Shirts and Fame!

We're eager to find people interested in writing editorials on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an editorial gets a freshmeat t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about.

Recent comments

25 Oct 2000 10:48 Avatar fastt

Re: Reprise from the author
I thought I'd have a go at clarifying my reason for my original response. I have no interest in a flame war and am happy to agree to disagree, but I at least want to point out a few things so that in the future, we can all have a more productive discussion about the topic you presented.

Todd Fast wishes to berate me for not mentioning runtime code specialization.


My intent was not to berate, and I did not intend what I wrote to be derisive. However, I did write my response to attempt to qualify your otherwise unqualified statements regarding JIT compilation, namely these largely unsupported statements:

"But the problem with interpreted software in general, and class files in particular, is that it is much slower and requires more memory than compiled software. JIT compilers can partially address the speed problem by generating machine code on the fly, but they cannot do serious optimization since that would take too long. This will prevent Java-style JIT implementations from ever really competing with native code solutions in performance."

Furthermore, you use the above statement as a conclusion to bolster other conclusions:

"So far, I have concluded that the most machine-independent form of software distribution is VM binaries, that it is necessary to optimize VM binaries, and that optimizing VM binaries on the user's machine is ineffective and inconvenient."


While I will agree with you that this was not the main thrust of your editorial, you did say it, and if you include it as a premise to your conclusion, it's fair game for criticism.


You also had this to say:

"Of course, there is no reason in principle why we could not deliver machine-independent class files and have the installation run an optimizing compiler on the class file to produce native code. This would generally produce better code than a JIT compiler could, but it would still not be comparable to what a high-quality optimizing compiler could do with the original source code. The reason is that much of the semantic information you need for effective optimization is lost in the translation to class file format. Another problem with this solution is that optimization can take a very long time, so running an optimizing compiler is impractical for applets and inconvenient for the installing traditional applications."


I call attention to these exceprts:

"...generally produce better code than a JIT compiler could, but it would still not be comparable to what a high-quality optimizing compiler could do with the original source code..."

"Another problem with this solution is that optimization can take a very long time, so running an optimizing compiler is impractical for applets and inconvenient for the installing traditional applications."


More below.


In the first place I don't think any production JIT compilers do this


There are at least two. The first is Javasoft's Hotspot (http://www.javasoft.com/products/hotspot/index.html) product, which is the default VM technology included with JDK1.3. It is also available for JDK1.2 as an add-on. Hotspot has been in production for something 1-1.5 years I believe. The other is Sun's Exact VM, which is something of a competitor to Hotspot. I don't know its current release status, but I do know that it has been used successfully in commercial production environments, with significant benefit.


in the second place this technique benefits only a small minority of programs


What you mean by "small minority" is unclear, but it is indisputable that the vast majority--if not all--modern Java programs benefit from this technique with the inclusion of it in every shipped Java VM. Anecdotally, in every program that is CPU bound, I have seen Hotspot only improve performance, in many cases by an order of magnitude.


and in the third place, static optimization is still critical in systems that do implement this technique, all of which leave my original position unaffected by your response.


I agree, as does everyone else in the field, that static optimization is still necessary and advantageous. However, given your premises that:

"...an optimizing compiler...[would] generally produce better code than a JIT compiler could, but it would still not be comparable to what a high-quality optimizing compiler could do with the original source code..."

and

"Another problem with this solution is that optimization can take a very long time, so running an optimizing compiler is impractical for applets and inconvenient for the installing traditional applications."

I think it's fair to bring to light facts relevant to conclusions supported by them. Given the information to which I've been exposed and the current state of the art, it seems that the above premises are at least flawed, if not outright incorrect. It is clear--at least to me--how this new information influences your position.

"And may I suggest, Mr. Fast, that you are inviting flames by taking a quote out of context, introducing a completely new subject, and then urging the author to "learn more about such technologies before dismissing their advantages out of hand". I could hardly have dismissed these technologies when I never mentioned them."


Although it may not have been your original intent to discuss these factors as relevant to your conclusion, I believe I've outlined above just how integral your assumptions about JIT technology are to your overall position.


I don't disagree with your goal of outlining possibilities for VM code as a software distribution mechanism. I simply want to see it discussed with all the relevant information taken into account.


Finally, let me say that despte disagreeing violently with nearly every other conclusion you present in your article, "Java: a dissenting opinion", I agree and very much like your analysis of the true shortcomings behind the use of pointers--it is perhaps the first time I've seen this subtlety outlined.

19 Sep 2000 09:46 Avatar arjanmolenaar

Why not sources?

I think VM code will result in the same problems we have with sources:

compiler vs. VM optimizer
tools (like yacc) vs. optional libraries/classes
libraries vs. libraries/classes

What's the win? What does Open Source gain? Distributing half-binary versions of the program? That's not Open Source!


BTW For low level actions you'll get better performance by using the advantages of a specific OS, so you probably want to use the native interfaces anyway.


Conclusion:


It doesn't solve it! It just takes the problem to the binary realm (try to fix a bug there!).


Using portable libraries like GLib/OpenGL/OpenAL and staying close to POSIX is the best way to go IMHO.

12 Sep 2000 06:16 Avatar dgudeman

Reprise from the author
A few comments to the comments:

First, thanks to all the helpful people who have pointed out systems similar to the one I described. I've been planning to do the survey work on this idea for a while now and you all have given me some useful pointers.

Second, I wish I had ended the article with a more concrete statement of my goals for writing the article. The point wasn't so much to introduce a new idea, the idea was old even 6 years ago when I was doing compiler research, rather it was to try to provoke some interest in the project in the open source community. The technical difficulty of the project would be large, but much larger is the political difficulty of generating interest in developers for using a new VM. Yet I think the open software community has shown enormous power in influencing software development, and has a vested interest in this sort of project.

Third, some comments to my critics:

Todd Fast wishes to berate me for not mentioning runtime code specialization. In the first place I don't think any production JIT compilers do this, in the second place this technique benefits only a small minority of programs, and in the third place, static optimization is still critical in systems that do implement this technique, all of which leave my original position unaffected by your response. And may I suggest, Mr. Fast, that you are inviting flames by taking a quote out of context, introducing a completely new subject, and then urging the author to "learn more about such technologies before dismissing their advantages out of hand". I could hardly have dismissed these technologies when I never mentioned them.

jetson123 does not know what I mean by a "Java-style JIT compilation" and suggests different implementation strategies for Java. What I mean is a Just-In-Time compiler meant to improve the speed of a normal interpreter. You are probably correct when you say "...if you were to spend the enormous effort of coming up with a new VM ... you probably wouldn't do significantly better than the best current Java environments", but that is not my purpose anyway. My purpose
is to develop a strategy for distributing applications written in any language at all, Java, C++, C, Python, Perl, Sather, ML, Unicon, Janus, and other languages that have not even been invented yet, and to do so efficiently. And I have to disagree with you that the success of the current Java VM says anything at all about the technical merits of the approach. What it tells us is that Sun had a good marketing strategy and that the anti-Microsoft forces have considerable power when they focus their efforts.

In fact, I have a rather low opinion of Java technology in general, the language, the API's and the VM, and it is my hope that some day the open source community will embrace something better for a machine independent platform. If you want to know why anyone might possibly not like Java, you may want to look at http://www.azstarnet.com/~dgudeman/javacrit.htm (http://www.azstarnet.com/~dgudeman/javacrit.htm), although it is somewhat out of date.

Sesse suggest that a VM is not enough, one also needs to define an API and address the issue of revisions. He is correct on both points, but I didn't want to get too ambitious in one editorial... :)

11 Sep 2000 06:34 Avatar eterps

Slim binaries.
Have a look at http://www.ics.uci.edu/~franz/ (http://www.ics.uci.edu/~franz/) I recommend reading the document about 'slim binaries'. It uses a very different approach: a tree representation of the code instead of virtual machine instructions. This way it is much more efficient for optimizing (and also for compressing the binary!).

10 Sep 2000 08:25 Avatar fastt

Learn more about JIT technologies before dismissing them
The article mentions several times that JIT compilers, such as those used in the Java VM, are only partially effective at optimizing VM code, and not nearly as effective as a static compiler. The conclusion seems to be this:

"This would generally produce better code than a JIT compiler could, but it would still not be comparable to what a high-quality optimizing compiler could do with the original source code."

Contrary to this conclusion, new dynamic compilation technologies (technically not JIT compilation, though still in essence serving the same role) can improve optimization beyond that possible with a static compiler. The reason is that static optimization cannot take into account actual runtime characteristics of the code in question. In contrast, dynamic adaptive optimization technologies such as JavaSoft's Hotspot (http://java.sun.com/products/hotspot/whitepaper.html), can instrument dynamically compiled code to profile its actual usage. This results in the ability to perform dynamic optimizations (and unoptimzations) at runtime--like fully unrolling a performance-critical loop or massive inlining--that can seldom be done at compile time. Furthermore, it allows the optimizer to spend time on only the portions of the code that benefit the most from optimization. Therefore, siginficant runtime optimization need not require significant runtime resources or cause performance degradation.

Of course, you could attempt to simulate such optimzations in your source code, say by unrolling a loop manually, but because you don't know which loops might actually be used the most at runtime--user A may use feature X, whereas user B may use feature Y--you must make guesses or assumptions on the actual usage. Such assumptions cannot be accurate for all such uses of the program in question, and in general, the more one leans to optimizing one type of usage, the more other type of usage suffer. The result is software that is far less flexible than it could otherwise be. In the worst case, the program will actually run worse than expected because it has been improperly optimized, or optimized to such a degree for one use that it becomes less useful or useless for other uses. This will not happen with adaptive dynamic optimization.

Also, the person who mentioned that "JIT is only required when you want to run the binary for the first time" is incorrect. JIT compilation occurs each time the program is run and preferably continues throughout the program's lifetime (thereby spreading the cost of compilation across the entire program's life, and taking advantage of runtime profile information). Remember that as programs become larger and more complex, the types of uses they will encounter over time (within the same run) varies more and more. This is the root of the need for adaptable software. Consider as an example a web server that undergoes significant cyclic load variations. It is of great benefit that the server can be reconfigured for the type of load it is experiencing at any given time. For example, a server serving a high rate of dynamically generated web pages does a signficantly different kind of work than a server serving static content. In such situations, an adaptive optimizing compiler, by gathering runtime profile information, can continues to adapt the program to best suit the current use.

These sorts of optimizations go far beyond the kinds of optimzations that are possible at compile time. Thus, VM code has some significant advantages over static machine-dependent code. I urge you to learn more about such technologies before dismissing their advantages out of hand.

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.