If you want to write really fast code, what should you do? Hone your
assembly skills? Erik Greenwald thinks that's a mistake too many
people make, and that it's possible to write even better-optimized
code in a high-level language.
Many people seem to think that the best road to blindingly fast
programs is writing as much as possible in assembly. Typically, this
results in buggy software that is actually slower than that written in
a higher level language. Premature optimizations can cost huge amounts
of time and effort. No optimization should occur before the program
works without crashing or exhibiting weird behavior.
Make it Work.
Attempting to optimize a program that doesn't even run is a pointless
task. If your program crashes, then not only is it impossible to
profile (binaries compiled for profiling write the data file after
your program calls exit), but the users won't want to run it. Make
sure you have no syntactic or memory errors in your program before
even attempting to optimize it. I strongly suggest using all
facilities in the language to extend warnings and errors, as well as
using tools like Splint
and Electric Fence
to verify your program will not crash.
Make it Right.
Optimization can often make a program less readable, so you want to guarantee
that the program does exactly what it should BEFORE you start optimizing. Any
logical bugs should be tracked down and squished. I suggest a lot of test
cases, especially at the bounds. "Off by one" errors can be tricky to find in
clear code, and virtually impossible in code that has been optimized beyond
Make it Fast.
Once we confirm that the program runs free of both syntactic and logical bugs,
we're ready to begin the optimization. Often people will attempt to optimize
prematurely or in the wrong place, resulting in wasted effort. To make
optimizing easy and truly beneficial, a certain order of optimizations should
The first step in the optimization process is to locate and isolate
the performance bottlenecks. Amdahl's Law states that the greatest
opportunity for improvement is where the most time is spent. If a
function is using 50% of the program's processor time and the time
spent in that function is halved, the program will execute 25%
faster. If we alter another function so it runs eight times as fast,
but it only accounts for 10% of the CPU utilization, then we increase
the speed of the program less than 2%. Obviously, the key to a good
optimization is to improve the parts that consume the most time.
There are several tools available to do profiling, but I will
concentrate on GNU
gprof. It's a free tool that works with GCC and produces the
amount of time spent inside of each function. Most libraries can be
compiled with profiling support, so you can even use it to see if a
library function is the bottleneck of your application. The gprof
manual is available at http://www.gnu.org/manual/gprof/,
and the software package is available at the GNU FTP site (and is
probably already installed on your system if you use Linux). To get
started using gprof, you only need to know how to compile your program
to generate profiling information and how to view the generated
information. GCC has a flag to enable profiling information
generation, and the "gprof" command is used to view it. Note that
gprof has to be run in the same directory you ran the program in, and
requires the program as an argument.
$ gcc -o myapp myapp.c -pg
$ gprof ./myapp
Once we know where the time is being consumed, we can start trimming
down the time consumption. When you find a function or set of
functions that consume a large amount of time, the first question you
should ask yourself is "Can I approach this problem in a better,
smarter way?" Big-O notation is often used to describe the efficiency
of an algorithm, and is useful in this step. When I'm exploring the
difference between different algorithms, I like to use a higher level
language than what I'm writing the finished code in. This allows me to
experiment with some radical changes to the algorithms with very
little time and effort invested to explore those avenues.
While the algorithm itself may be sound, the implementation isn't always. If
you are satisfied with the algorithm, you should look for how you expressed
that algorithm in the programming language to see if there is something you
could have done better. The biggest improvement is usually to cache
computations instead of computing them several times. Functions that are called
frequently may benefit from being inlined. A good knowledge of how the language
is implemented can be very useful here.
Just about every modern compiler has some fairly hefty optimization
capabilities these days. Finding the right combination of flags to
suit your code may take a little experimentation and reading. The GCC
man page describes what the different flags do in a fair amount of
detail. The first flags to consider are the "-O" flags. "-O" by itself
does light optimization, and is equivalent to "-O1". To disable
optimizations, you can use "-O0", and to set more aggressive
optimization levels, you can use "-O2" or "-O3". Setting the machine
type is also useful, as the compiler can build the assembly to use
extended opcodes and registers. Most of us have x86 assemblers, so
some of the flags useful for that architecture are "-m486",
"-mpentium", and "-mppro", but be careful -- if you set your machine
type too high, earlier machines may not be able to execute the
code. The last series of flags you should consider are the specific
optimization enables. To use the fast math library, for example, you
can add -ffast-math to the compile string, but the O flags often
incorporate several of these and are probably going to generate better
code than if you try to set the "-f" flags yourself. If you know your
program would benefit from a flag and the optimization level doesn't
add that flag for you, you can force it.
Modern compilers do a very good job of figuring out how to convert
your code into efficient assembly code. While the #pragma precompiler
directives and certain keywords like "register" are still part of the
language, they should generally be considered deprecated.
When it comes to optimizing, the kneejerk reaction of a lot of people
is to write in assembly. This thinking is very wrong. The compiler can
almost always generate better assembly than a mere mortal. If you feel
the need to write a piece in assembly, you should first examine the
assembly that the compiler generates. To make gcc generate an assembly
source file instead of an object or binary executable, simply call it
with the "-S" flag. Examine the generated assembly file, and if you
are absolutely sure that you can build better assembly than gcc, knock
yourself out. I strongly suggest doing the assembly as inline assemble
in the source and wrapping it and the original version in a
precompiler conditional, so the program can still compile on different
Given this order of optimizing, you can greatly reduce the time and
effort that goes into making quality software that operates in an
expedient matter. I've only mentioned a few tools to help facilitate
development; there are plenty of others out there (some do slightly
different things, and some target different
compilers/platforms). Happy optimizing.
Erik Greenwald (http://math.smsu.edu/~br0ke/)
is a fourth year computer science student at Southwest Missouri State
University. A Linux addict since 1996, he is active in Free Software
development. He can be reached at firstname.lastname@example.org.
T-Shirts and Fame!
We're eager to find people interested in writing editorials on
software-related topics. We're flexible on length, style, and topic,
so long as you know what you're talking about and back up your
opinions with facts. Anyone who writes an editorial gets a freshmeat
t-shirt from ThinkGeek
addition to 15 minutes of fame. If you think you'd like to try your
hand at it, let email@example.com
know what you'd like to write about.