Articles / GCC Myths and Facts

GCC Myths and Facts

Since my good old Pentium 166 days, I've liked to search for the best optimizations possible so programs can take the maximum advantage of hardware/CPU cycles. If I have a nice piece of hardware, why not run it at its full power, using every little feature? Shouldn't we all try to get the best results from the money invested in our machines?

This article is written for the average desktop Linux user and with the x86 architecture and C/C++ in mind, but some of its content can be applied to all architectures and languages.

GCC 3 Improvements

GCC 3 is the biggest step forward since GCC 2 and represents more than ten years of work and two of hard development. It has major benefits over its predecessor, including:

Target Improvements

  • A new x86 backend, generating much-improved code.
  • Support for a generic i386-elf target.
  • A new option to emit x86 assembly code using an Intel-style syntax.
  • Better code generated for floating point-to-integer conversions, leading to better performance by many 3D applications.

Language Improvements

  • A new C++ ABI. On the IA-64 platform, GCC is capable of interoperating with other IA-64 compilers.
  • A significant reduction in the size of symbol and debugging information (thanks to the new ABI).
  • A new C++ support library and many C++ bugfixes, vastly improving conformance to the ISO C++ standard.
  • A new inliner for C++.
  • A rewritten C preprocessor, integrated into the C, C++, and Objective C compilers, with many improvements, including ISO C99 support and improvements to dependency generation.

General Optimizations

  • Infrastructure for profile-driven optimizations.
  • Support for data prefetching.
  • Support for SSE, SSE2, 3DNOW!, and MMX instructions.
  • A basic block reordering pass.
  • New tail call and sibling call elimination optimizations.

Why do some programmers and users fail to take advantage of these amazing new features? I admit that some of them are still "experimental", but not all of them. Perhaps the PGCC (Pentium compiler group) project gave rise to several misunderstandings which persist today. (PGCC offered several Pentium-specific optimizations. I looked at it when it first started, but benchmarks showed that the improvement was only about 2%-5% over GCC 2.7.2.3.)

We should clear the air about the GCC misconceptions. Let's start with the most loved and hated optimization: -Ox.

Myths

I use -O69 because it is faster than -O3.

This is wrong!

The highest optimization is -O3.

From the GCC 3.2.1 manual:

       -O3    Optimize yet more.  -O3 turns on all optimizations
              specified   by   -O2   and   also   turns  on  the
              -finline-functions and -frename-registers options.

The most skeptical can verify this in gcc/topolev.c:


/* Scan to see what optimization level has been specified.
   That will determine the default value of many flags. */ 

-snip- 

  if (optimize >= 3)

     {

      flag_inline_functions = 1;

      flag_rename_registers = 1;

     }


If you are using GCC, there's no point in using anything higher than 3.

-O2 turns on loop unrolling.

In the GCC manpage, it's clearly written that:

-O2 turns on all optional optimizations except for loop unrolling [...]

Skeptics: check topolev.c.

So when you use -O2, which optimizations are you using?

The -O2 flag turns on the following flags:

  • -O1, which turns on:
    • defer pop (see -fno-defer-pop)
    • -fthread-jumps
    • -fdelayed-branch (on, but specific machines may handle it differently)
    • -fomit-frame-pointer (only on if the machine can debug without a frame pointer; otherwise, you need to specify)
    • guess-branch-prob (see -fno-guess-branch-prob)
    • cprop-registers (see -fno-cprop-registers)
  • -foptimize-sibling-calls
  • -fcse-follow-jumps
  • -fcse-skip-blocks
  • -fgcse
  • -fexpensive-optimizations
  • -fstrength-reduce
  • -frerun-cse-after-loop
  • -frerun-loop-opt
  • -fcaller-saves
  • -flag_force_mem
  • peephole2 (a machine-dependent option; see -fno-peephole2)
  • -fschedule-insns (if supported by the target machine)
  • -fregmove
  • -fstrict-aliasing
  • -fdelete-null-pointer-checks
  • reorder blocks

There's no point in using -O2 -fstrength-reduce, etc., since O2 implies all this.

Facts

The truth about -O*

This leaves us with -O3, which is the same as -O2 and:

  • -finline-functions
  • -frename-registers

Inline-functions is useful in some cases (mainly with C++) because it lets you define the size of inlined functions (600 by default) with -finline-limit. Unfortunately, if you set a high number, at compile time you will probably get an error complaining about lack of memory. This option needs a huge amount of memory, takes more time to compile, and makes the binary big. Sometimes, you can see a profit, and sometimes, you can't.

Rename-registers attempts to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization will most benefit processors with lots of registers. It can, however, make debugging impossible, since variables will no longer stay in a "home register". Since i386 is not a register-rich architecture, I don't think this will have much impact.

A higher -O does not always mean improved performance. -O3 increases the code size and may introduce cache penalties and become slower than -O2. However, -O2 is almost always faster than -O.

-march and -mcpu

With GCC 3, you can specify the type of processor you're using with -march or -mcpu. Although they seem the same, they're not, since one specifies the architecture, and other the CPU. The available options are:

  • i386
  • i486
  • i586
  • i686
  • Pentium
  • pentium-mmx
  • pentiumpro
  • pentium2
  • pentium3
  • pentium4
  • k6
  • k6-2
  • k6-3
  • athlon
  • athlon-tbird
  • athlon-4
  • athlon-xp
  • athlon-mp

-march implies -mcpu, so when you use -march, there's no need to use -mcpu.

-mcpu generates code tuned for the specified CPU, but it does not alter the ABI and the set of available instructions, so you can still run the resulting binary on other CPUs (it turns on flags like mmx/3dnow, etc.).

When you use -march, you generate code for the specified machine type, and the available instructions will be used, which means that you probably cannot run the binary on other machine types.

Conclusion

Fine-tune your Makefile, remove those redundant options, and take a look at the GCC manpage. I bet you will save yourself a lot of time. There's probably a bug somewhere that can be smashed by turning off some of GCC's default flags.

This article discusses only a few of GCC's features, but I won't broaden its scope. I just want to try to clarify some of the myths and misunderstandings. There's a lot left to say, but nothing that can't be found in the Fine Manual, HOWTOs, or around the Internet. If you have patience, a look at the GCC sources can be very rewarding.

When you're coding a program, you'll inevitably run into bugs. Occasionally, you'll find one that's GCC's fault. When you do, stop to think about the time and effort that's gone into the compiler project and all that it's given you. You might think twice before simply flaming GCC.

Interesting Links

RSS Recent comments

15 Feb 2003 01:47 thefreek

some points still missing
"-mcpu generates code tuned for the specified CPU[...] so you can still run the resulting binary on other CPUs (it turns on flags like mmx/3dnow, etc.)"

The phrase in ( ) is NOT true (afaik) - gcc will >schedule< instructions according to the specified -mcpu, but will not use instructions not available by generic i386/pentium (I'm not sure if 386 is still the "reference") processors; as such, MMX or 3DNow! instructions are not used.
As an example, I can compile with -mcpu=i686, but still won't see cmov in the ASM-code; -march=i686 may run on my K6, but only if the compiler sees no need for cmov, for example.

The article is great in informing about common misconceptions.
But I miss giving a rough to more detailed explanation what the specific optimizations do.
E.g. -fomit-frame-pointer will give you another GP-register to use (%ebp) at the cost of debugging no more available (at least on x86); this register is normally used to indicate the stack-frame of the current function, but costs ~2 instructions more per function-call as it needs to be maintained, so it's a good option to specify.
And there are lots of other options, fstrict-aliasing, fstrength-reduce, ...

And please - developers - don't turn on -g (debugging) by default...
Thank you.

15 Feb 2003 04:12 noselasd

Frame Pointer
Let's not forget -fomit-frame-pointer (see here (burks.brighton.ac.uk/b...)), this frees a register in the CPU which in itself is good, and therefor also makes the compiler do more optimizations. Maybe not the biggest point on Alphas with lots of internal registers, but nice for x86. Note, it also makes debugging impossible on x86, but normal users won't need that.

15 Feb 2003 04:24 ccat

Optimizations in general
When I was writting ncc, I thought about the basic levels of optimization. These seem to be:

0) No optimization. Compiler just produces correct code. -O(-1)

1) Decent optimization. Compiler is a little smarter. This is the default gcc output -O1

2) Good optimization. Compiler does some basic jump prediction, inlining and architecture hacks. At this level we can talk about a decent compiler. This is not gcc -O2 yet!

3) Extreme optimizations: Here the compiler tries to be very smart. By looking at the output assembly, one would not be able to understand the structure of the program. There are two subcategories

a) Black magic stuff. Move blocks of code around, things disappear and reappear elsewhere, etc. This is usually gcc -O2/O3

b) Extreme architecture hacks. This is the intel's C compiler main advantage.

- This it the upper limit -
- Here some visioners dream of extreme features -

4) Infinite compiler intelligence which approximates "The programmer". This is a utopia for all compiler developers.

So what we really need from our compiler is to get to level 2. From then on we can optimize it manually. For the paranoid there is always assembly.

I'd like to add that
1) Many interactive programs do not need much optimization. Just good design.
2) In many of the programs that do need optimization, it's easy for the programmer to abstract the heavy loops and spend some more time with human optimizations on them.

For example in quake, Carmack has written a special copy_to_screen function which is based on using the cache efficiently.

Anyway, this is a nice article because it sums up the huge gcc manual. Now that's optimization!

15 Feb 2003 05:44 haering

Re: some points still missing

> And please - developers - don't turn on
> -g (debugging) by default...

Why not? If executable size is a problem, you can 'strip' them.

15 Feb 2003 06:29 Avatar michaelrsweet

Re: Frame Pointer
IIRC, -fomit-frame-pointer prevents libsafe and other tools from working, since they need the frame pointer to compute the upper bounds of the stack in the current function.

15 Feb 2003 06:45 quotemstr

What about -Os?
Since most software isn't cpu-bound, and since memory and disk are also limited resources, why not try -Os?

`-Os'
Optimize for size. `-Os' enables all `-O2' optimizations that do
not typically increase code size. It also performs further
optimizations designed to reduce code size.

Code compiled with this option would run just as fast (in wall-clock time, since it isn't cpu-bound), but reduces memory consumption, leaving more space for disk caches.

15 Feb 2003 07:12 jimfaulkner

Rename-registers
I thought that register renaming benefits
register-starved architectures the most?

On a processor with 8 general purpose registers
(x86), functions are more likely to use the same
register for their variables, so register renaming would
be more beneficial.

On a processor with 32 general purpose registers
(sparc), the compiler is much less likely to run out of
registers for holding variables, so register renaming
does not do so much good.

At least that's what my compiler design professor told
me.

15 Feb 2003 07:20 Lostguy

Re: some points still missing

> Why not? If executable size is a
> problem, you can 'strip' them.

What about leave the default build without '-g' and create a debug rule for people who will do debugging ? Most users don't even know what's strip. They only need a working and fast app.

15 Feb 2003 07:21 schneelocke

Re: Optimizations in general

> 4) Infinite compiler intelligence which
> approximates "The programmer".
> This is a utopia for all compiler
> developers.

A utopia indeed. From what I recall, this - generating the best code possible under all circumstances - is provably impossible. And of course, even defining just what is the "best code" is hard enough in itself. :)

15 Feb 2003 09:36 imipak

Three thoughts
First, the maximum optimization on modern GCCs is, indeed, -O3. This has not always been the case. Higher optimizations have existed in (much) earlier versions, usually undocumented.

I believe the highest optimization ever recognized was a massive -O6. We're talking 10 years or so ago, here. At some point, -O6 and -O5 vanished, leaving the KotH to -O4, which itself shortly vanished.

These were never official optimizations, to the best of my knowledge, and use of them (even on those GCCs that supported them) was usually considered as kamakazi coding.

The second point I want to raise is with architecture support. The number of architectures supported in GCC is declining. I took a look at the pages for GCC, and it's scary. I want to take this opportunity to borrow the Cluebat from UserFriendly and swat anyone who has contributed to this decline.

If GCC/Glibc2 are to become universally acceptable, they must FIRST be universally usable. We are far from that state. Glibc2 has gone through numerous revisions and clean-ups, and doesn't even run on a tenth of the systems Glibc1 did.

Sorry, but that ain't progress, in my books. Nobody is going to adopt an environment they can't use. That's the bottom line. It doesn't matter that the latest GCC and Glibc are brilliant (although Glibc needs better pointer handling). What matters is that if the only choice of development environment for a platform is proprietary, then Joe and Jane Average Programmer will believe that proprietary programming is what works.

People don't listen to what others say, they listen to what others do.

Last, but by no means least, more languages need to be moved into GCC. Guile, Smalltalk, perhaps ELisp - these are all candidates. (Elisp? Yeah! All you need is support for an Elisp bytecode target, and you've got an Elisp bytecode compiler in GCC.)

It would be cool if GPLed compilers for Cobol and Algol could make their way into GCC, too. Why??? Because Cobol is still heavily used, and Algol makes for a great teaching language.

15 Feb 2003 10:56 bryanhenderson

Re: utopia
If it's utopia, then it doesn't just approximate "the programmer," it matches it.

And it does more than match it. Much of the optimization that compilers do today exceeds the capability of a typical human programmer, and we'd want to keep that.

15 Feb 2003 11:04 bryanhenderson

Re: -g by default
Developers turn on -g by default as a means of setting policy for their users -- the ones not sophisticated or interested enough to do strips or use non-default configuration options. They want copies of their programs in the field to have the debugging symbols in them so they can solve problems with them easily. They determine that this debuggability is more important than the resource savings of omitting -g.

15 Feb 2003 11:06 bbrain

Re: some points still missing

> They only need a working and fast app.

. . . except when it breaks and they need to send in debugging information. This is why gcc should support saving the symbol table for a stripped binary. It would still need some kind of loader hack, though.

15 Feb 2003 11:12 bryanhenderson

Re: Don't spread Gcc more thin

I would rather have a Gcc developer spend his time making it work better for IA32 than making it work on some other architecture.

Similarly, I'd rather have Gcc get really good at a few good languages than be so-so in a dozen of them.

People have a little bit of choice over what architecture and language they use. If all of them have mediocre compilers, that's not much of a choice. If one of them has a superlative compiler, it's something to think about switching to.

I believe Gcc development resources are limited.

15 Feb 2003 12:12 fvoges

Re: some points still missing

>
> % Why not? If executable size is a
> % problem, you can 'strip' them.
>
> What about leave the default build
> without '-g' and create a debug rule for
> people who will do debugging ? Most
> users don't even know what's strip.
> They only need a working and fast
> app.
>

100% true. Even as a developer, I've only used -g while debuging an app. Never for production releases.

15 Feb 2003 12:19 khuber

Re: some points still missing

> And please - developers - don't turn on
> -g (debugging) by default...
> Thank you.

Why? The debugging symbols are only used by a debugger. They take up no memory when you run the binary. Symbols only consume some disk space. As a developer it is so much easier for me to have the symbols.

-Kevin

15 Feb 2003 12:20 khuber

Re: some points still missing

>
> 100% true. Even as a developer, I've
> only used -g while debuging an app.
> Never for production releases.
>

How do you debug core files? It is very time consuming without symbols.

-Kevin

15 Feb 2003 12:21 0x0d0a

More comments
First of all, excellent article, Joao. This is the most to-the-point, useful, and opinion-free editorial I've seen on Freshmeat.

A few more suggestions as regards optimization -- I've done a bit of benchmarking with different options. Most specific tweaks you can try (above -O3) make very little difference on average code, with the exception of three options.

First, -fomit-frame-pointer can provide a small boost (admittedly, not as much as I'd expect), at least on x86. The drawback is that you will not be able to get backtraces from core dumps or dying apps. This might be worth using if you have a program that's *almost* fast enough, but not quite, like an emulator or movie player, and you're not doing development on it (or care about sending in bug reports).

Second, -ffast-math *can* be very helpful, though most programs will not see much of a benefit, since usually you don't see a ton of floating-point operations in most software. This *can*, as per the gcc man page, break correct software, but I've yet to run into a package that it causes problems with.

Third, -fstrict-aliasing produces a speedup of around 10% in snes9x. While strict ANSI code should not be broken by it, it's relatively easy for someone to write code that *does* break with -fstrict-aliasing. I haven't seen many problems with it.

Fourth, -DNDEBUG isn't technically a compiler flag, but will tell the preprocessor not to evaluate assert() conditions. Decent for production builds. Most developers avoid having assert()s in inner loops *anyway*, so this is unlikely to provide a huge speedup. Also, while code with side effects should not be placed in assert() statements, this is easy to do -- and code of this nature will break with -DNDEBUG on. For most software, very minor benefits.

Fifth, -DG_DISABLE_ASSERT is a similar flag to -DNDEBUG, but applies to g_assert() in the glib package (used by gnome and gtk software). Again, for most software, very minor or nonexistant benefits.

Sixth, there are a few new arch types in gcc 3.2. If you used to use -march=i686 but have a pentium 2, you should now be using -march=pentium2.

Seventh, while the real-world benefits appear to be minimal, I've written some simple tests to see if the optimizer rips out branches that should obviously be dead code. gcc does not do so without -fexpensive-optimizations. OTOH, while I feel that -fexpensive-optimizations generates more appealing machine language, I haven't seen any huge performance benefits granted by it.

And just for the heck of it, (while this isn't really optimization-related), always compile with -pipe and -Wall. -Wall *will* help you find bugs, and -pipe will speed up compilation (in some packages, by a lot).

15 Feb 2003 12:47 omegax

pt
estudas em coimbra?! :) n tens uma versao desse artigo em português?

15 Feb 2003 13:08 ct2gnl

Re: pt

> estudas em coimbra?! :)


sim

%n tens uma
> versao desse artigo em português?


Não não tenho.

Quick-n-dirty translation:) "Yes.No I dont"

15 Feb 2003 13:44 ccat

Re: utopia

>
> And it does more than match it. Much of
> the optimization that compilers do today
> exceeds the capability of a typical
> human programmer, and we'd want to keep
> that.
>

An example from Stroustrup:

int f (int n)
{
if (n==1) return 1;
return n * f(n-1);
}

A super intelligent compiler would replace f(3) to the value 6.

That's the utopia.

On the other hand I do agree that compilers today produce better *assembly* than the typical human programmer.

15 Feb 2003 14:39 Avatar woods

Re: some points still missing

>
> % Why not? If executable size is a
> % problem, you can 'strip' them.
>
> What about leave the default build
> without '-g' and create a debug rule for
> people who will do debugging ? Most
> users don't even know what's strip.
> They only need a working and fast
> app.
>

Indeed! Why not leave '-g' enabled by default, especially for simple languages like C?

Users can indeed always strip a binary if it seems too big for them.

However users can never unstrip a binary.

The key word here is "working", and many applications need a lot of help with that part. Using '-g' allows the user to give (with appropriate recipies supplied by the developer) much better feedback when something fails catastrophically, as things all too often do. Developers can't always get direct access to the core dump, and sometimes providing an exact matching binary with debugging symbols is not possible either.

As for "fast", well in most modern applications that comes about through good design, not compiler optimisation. (and -g does not necessarilly slow anything down on the average modern system)

15 Feb 2003 14:54 ed_avis

Re: utopia
Have a look at C-Mix (www.diku.dk/forskning/...) as a candidate for your 'super intelligent compiler'. It should handle the factorial example you give.

However it's still true that no compiler can optimize perfectly.

15 Feb 2003 14:58 ed_avis

Re: some points still missing
I kinda like the system Windows uses where the debugging symbols are in a separate file alongside the executable. So compiling a program generates both foo.exe and foo.sym (IIRC). You can choose to install the symbols alongside the executable, or not.

15 Feb 2003 15:42 Avatar shlomif

Nice Article
A very good article, that clarified a few
things to me. Keep up the good work! In one of my
projects I used to set up the optimization flag as
-O3 in the makefile. When compiling for testing
and debugging, I use -g without any optimization
flags, so gdb will be happy.

Note that I recently encountered a bug that was
present only when compiling with -O2. Apparently,
without it, a variable I declared was initialized to
NULL, which was the value I implicitly expected it
to have. With -O2 it was initialized to random
bytes and so was not NULL.

So it useful giving some automatic tests to the
finally compiled executable.

15 Feb 2003 16:01 Avatar shlomif

Re: Three thoughts

> The second point I want to raise is with
> architecture support. The number of
> architectures supported in GCC is
> declining. I took a look at the pages
> for GCC, and it's scary. I want to take
> this opportunity to borrow the Cluebat
> from UserFriendly and swat anyone who
> has contributed to this decline.
>
> If GCC/Glibc2 are to become universally
> acceptable, they must FIRST be
> universally usable. We are far from that
> state. Glibc2 has gone through numerous
> revisions and clean-ups, and doesn't
> even run on a tenth of the systems
> Glibc1 did.
>
> Sorry, but that ain't progress, in my
> books. Nobody is going to adopt an
> environment they can't use. That's the
> bottom line. It doesn't matter that the
> latest GCC and Glibc are brilliant
> (although Glibc needs better pointer
> handling). What matters is that if the
> only choice of development environment
> for a platform is proprietary, then Joe
> and Jane Average Programmer will believe
> that proprietary programming is what
> works.
>
> People don't listen to what others say,
> they listen to what others do.
>


That's interesting information. I'll have to check it
out myself to verify this is the case. The question
is how much interest is there in porting Glibc
and/or gcc to these architectures. I know Cygnus
got a lot of income from doing just that.


> Last, but by no means least, more
> languages need to be moved into GCC.
> Guile, Smalltalk, perhaps ELisp - these
> are all candidates. (Elisp? Yeah! All
> you need is support for an Elisp
> bytecode target, and you've got an Elisp
> bytecode compiler in GCC.)
>


Guile is a Scheme _Interpreter_. Interpreters are
much more simple to write and maintain than
compiler front-ends, especially for symbolic
high-level languages such as Scheme, Perl,
Python, etc. There are Scheme compilers out
there, but I don't think the GNU people wish to
pursue this direction because guile is not intended
to run very quickly as it is. Not more than perl or
python, in any case.

Same goes for Elisp. Better keep the code clean
than start hacking on a useless GCC front-end. I
don't know too much about Smalltalk.


> It would be cool if GPLed compilers for
> Cobol and Algol could make their way
> into GCC, too. Why??? Because Cobol is
> still heavily used,


Right. There is Tiny COBOL or whatever, but I
don't know if it's as flexible as gcc. Are the various
COBOL implementations adhere to some kind of
common standard?

Of course, if you ask me, from what I've heard and
know of COBOL, it is so limited and brain-dead,
that it would be a good idea to re-implement all
this aging COBOL code that can be found around
in something more sensible. (from C to Java to Perl
and friends) I heard a statistics that claimed that
most of the code in Israel is in COBOL, and I was
quite surprised to hear that.

But of course, you still need COBOL compilers.


> and Algol makes for
> a great teaching language.
>


Algol is a very old language. Last I heard it was
superceded by Pascal as far as learning is
concerned. I believe one can find better languages
to teach programming today than Pascal as well.

Aren't there some Algol interpreters around?

15 Feb 2003 16:54 trizt

Re: Don't spread Gcc more thin

> I would rather have a Gcc developer
> spend his time making it work better for
> IA32 than making it work on some other
> architecture.

Here I can't agree with you at all, GCC is an important tool for many operating systems which runs on a lot of different CPUs. I do think that PPC and Sparc should be as much supported as x86 CPUs, of course old CPUs as 8088, mc68k won't get as much attention.

> Similarly, I'd rather have Gcc get
> really good at a few good languages than
> be so-so in a dozen of them.

I do agree here... the main manpower should be amied into the languages that are supported now (IMHO C adn C++, don't care much for the rest).

15 Feb 2003 18:32 leonbrooks

Re: utopia

> A super intelligent compiler would replace f(3) to the
value 6.
>
> That's the utopia.

DEC ForTran did this in 1980. Friend was
benchmarking it and got unreasonably good results, made
it spit out the asm, which was a single instruction:
print-fixed-string-and-exit with the correct end result.

15 Feb 2003 23:26 reduz

Re: some points still missing

> I kinda like the system Windows uses
> where the debugging symbols are in a
> separate file alongside the executable.
> So compiling a program generates both
> foo.exe and foo.sym (IIRC). You can
> choose to install the symbols alongside
> the executable, or not.


Not only that, it makes linking faster, and that
without mentioning microsoft's awesome incremental
compiler WHICH BINUTILS LACKS HORRIBLY, THUS
MAKING MY PROJECTS TAKE MINUTES LINKING..
AND USING SEVERAL DOZENS OF MEGABYTES!!

but as someone said about the world of open source
"Welcome to the world of half implemented features"

16 Feb 2003 03:09 noselasd

Re: some points still missing

>
> %
> % 100% true. Even as a developer, I've
> % only used -g while debuging an app.
> % Never for production releases.
> %
>
>
> How do you debug core files? It is very
> time consuming without symbols.
>
> -Kevin

Do you think my father, my boss and most other normal users will ever need/want to do that. They would rather see some speed. (Yes, I know -g doesnt have any speed penalties, but -fomit-frame-pointer should also be "default" )

16 Feb 2003 06:42 thefly

Re: some points still missing

>
> Not only that, it makes linking faster,
> and that
> without mentioning microsoft's awesome
> incremental
> compiler WHICH BINUTILS LACKS HORRIBLY,
> THUS
> MAKING MY PROJECTS TAKE MINUTES
> LINKING..
> AND USING SEVERAL DOZENS OF MEGABYTES!!
>
>
> but as someone said about the world of
> open source
> "Welcome to the world of half
> implemented features"

Why don't you use your beautiful closed windows programming environment in order to implement the other half of the unimplemented features instead of complaining about the job of volunteers?

17 Feb 2003 23:39 jwreschnig

Re: Nice Article

> Note that I recently encountered a bug
> that was
> present only when compiling with -O2.
> Apparently,
> without it, a variable I declared was
> initialized to
> NULL, which was the value I implicitly
> expected it
> to have. With -O2 it was initialized to
> random
> bytes and so was not NULL.

This isn't a bug. C variables aren't guaranteed to contain any particular value (this may be different in C99? I'm not sure, but I doubt it). Not having to make sure it's initialized to zero saves time; ergo, an optimization. :)

GCC probably turns it on by default to deal with compiling code that doesn't assign initial values, so it doesn't blantently crash. IMO this is a bad idea, since it encourages C programmers to think variables are NULL by default.

18 Feb 2003 00:46 renoo

Re: utopia

>
> An example from Stroustrup:
>
> int f (int n)
> {
> if (n==1) return 1;
> return n * f(n-1);
> }
>
> A super intelligent compiler would
> replace f(3) to the value 6.
>
> That's the utopia.
>

No, that's the future. In fact, computing facto at compile time can already be done in c++ using templates.

template
struct facto
{
static int value = facto::value * n;
}

template
struct facto
{
static int value = 1;
}

I think the compiler can replace f(3) by 6 by copy propagation techniques.

Utopia is perhaps something like this:

int fibo(int n)
{
if (n

18 Feb 2003 01:17 lcam

a sad observation (flame on?)
Your mileage may vary, but...

My experience with GCC 3.2 is that, for a fairly well-written (read: optimized, not neat) integer/mem code compiled with some sensible optimization options, the code generated that is slightly slower and larger compared to GCC 2.96 output.

I haven't had a chance to look at the output too carefully, but I've noticed a number of examples (a filesystem driver, a memory allocator and garbage collector, etc). Some code is, admittably, smaller and faster, but this does not justify the impact on other fronts.

I'm glad there's some new stuff that would sure benefit modern architectures, 3D graphics and floating point conversions, but my impression that there is a slight decline in the quality of produced code, and THIS comes at a price of slower compilation. Humm.

18 Feb 2003 01:38 omegax

Re: pt

>
> % estudas em coimbra?! :)
>
>
> sim
>
> %n tens uma
> % versao desse artigo em
> português?
>
>
> Não não tenho.
>
> Quick-n-dirty translation:) "Yes.No I
> dont"

s alguma vez pensars em escrever uma versao em portugues avisa! FiKa bEM

18 Feb 2003 01:46 ccat

Re: utopia

>
> I think the compiler can replace f(3) by
> 6 by copy propagation techniques.
>
> Utopia is perhaps something like this:

Yes. That was merely an example of "extreme features". One can think of 100 similar examples (not necessarily related to maths), which a compiler "could" optimize "if" it had an extremely complex optimization algorithm.

The fibo example proves that even propagation techiques are not always possible (what if the entire program is about find_prime_no (10^10)? This is a constant that could be computed at compile time, but unfortunatelly it will take several years for the compiler to compute it)

18 Feb 2003 22:37 didosevilla

Re: utopia

> int f (int n)
> {
> if (n==1) return 1;
> return n * f(n-1);
> }
>
> A super intelligent compiler would
> replace f(3) to the value 6.

Well, you could wind up getting a compiler that goes on a vain attempt at solving the halting problem if you tried to do this. What would your compiler do if it got f(-1), an expression that never halts? It wouldn't be able to figure out that it's doing infinite recursion (if you found an algorithm to determine this in the fullest generality, you would have solved the halting problem, and proved Alan Turing and Kurt Goedel wrong).

Finding the fastest code to perform a certain task is clearly an undecidable problem (this is worse than NP-complete, as it can be mathematically shown that an algorithm simply doesn't exist). The best I think that can be accomplished is a few heuristics that conform to some simple facts we know about how to speed code up.

18 Feb 2003 23:13 nobody2

thin article, butt presuming I am convinced.
How? how to install a new gcc version? there appear to be many dependendancies that run circular and are not happy with a new(er) version of gcc. the "how to" url is dated 3 years and is thin with respect to the "install section".
(covert request for assistance...)

gcc in .rpm &/or src.rpm wont install and complain about
libc.so.c & libstdc++ & glibc & none of them are happy.

sure, it is possible to upgrade the OS, but how about gcc?

usually failed dependancies are easily resolved but not so with gcc.

How about a gcc how to upgrade article addressing failed
dependencies... -clueless seeking clues...

The good news is finding out more about gcc as it fails to obey....
( smile )

19 Feb 2003 00:15 myg

Re: Rename-registers

> I thought that register renaming
> benefits
> register-starved architectures the most?

That's what I thought when I read that. I'm almost sure your compiler professor is correct. Relaxing strict register assignments is much more likely to produce better x86 code... assuming its properly implemented.

I haven't taken a look at GCC's optimizer since pre 2.95. Anybody care to do a quick analysis?

19 Feb 2003 10:37 MinnaKirai

Re: some points still missing
Incremental compiling is protected under US patent 5,586,328. Open-Source implementations will be illegal until 2017.

21 Feb 2003 01:27 jancs

Re: Optimizations in general
So, in common, if i am regular user who posess dual celeron box, compiles programs for it's own use and rarely bothers about bug-hunt, would be such cflags ok (gcc 3.2.1):
-O2 -march=pentium2 -mcpu=pentium2 -fomit-frame-pointer ?
I do nto know about -pipe, but i often noticed that -Wall is used by default.

Does it have sense to use such optimizations if the most part of system is built with -O2 -march=i386 -mcpu=i686?

21 Feb 2003 05:14 ccat

Re: Optimizations in general

> So, in common, if i am regular user who
> posess dual celeron box, compiles
> programs for it's own use and rarely
> bothers about bug-hunt, would be such
> cflags ok (gcc 3.2.1):
> -O2 -march=pentium2 -mcpu=pentium2
> -fomit-frame-pointer ?
> I do nto know about -pipe, but i often
> noticed that -Wall is used by default.
>
> Does it have sense to use such
> optimizations if the most part of system
> is built with -O2 -march=i386
> -mcpu=i686?
>

If I was distributing a program that could benefit from optimization, I'd make ./configure set the best values for each system.

I never change ./configure defaults.

IMHO these options should concern the developers, provided autoconf can determine/set architecture flags correctly.

21 Feb 2003 06:42 jancs

Re: Optimizations in general

> % Does it have sense to use such
> % optimizations if the most part of
> system
> % is built with -O2 -march=i386
> % -mcpu=i686?
>
> If I was distributing a program that
> could benefit from optimization, I'd
> make ./configure set the best values for
> each system.
>
> I never change ./configure defaults.

To compile the programs, i use the building frame of slackware, and it contains mentioned cflags set to i386/i686. As i watched the compilation process dump on screen, configure takes them as defaults (may be i am wrong?)

21 Feb 2003 14:43 ccat

Re: Optimizations in general

>
> % % Does it have sense to use such
> % % optimizations if the most part of
> % system
> % % is built with -O2 -march=i386
> % % -mcpu=i686?
> %
> % If I was distributing a program that
> % could benefit from optimization, I'd
> % make ./configure set the best values
> for
> % each system.
> %
> % (AS A USER) I never change ./configure defaults.
>
>
> To compile the programs, i use the
> building frame of slackware, and it
> contains mentioned cflags set to
> i386/i686. As i watched the compilation
> process dump on screen, configure takes
> them as defaults (may be i am wrong?)

I don't know. Probably yes, it overrides the defaults to better values.

It's apparent from this discussion that maybe gcc should provide a -mselect-best-arch-for-current-sysem, option where it would check the cpu of the machine and set the best optimization flags for it.

Then everybody would be happiest

24 Feb 2003 14:52 dmaas

Re: Nice Article

> This isn't a bug. C variables aren't
> guaranteed to contain any particular
> value (this may be different in C99? I'm
> not sure, but I doubt it). Not having to
> make sure it's initialized to zero saves
> time; ergo, an optimization. :)

global and static variables ARE null (or zero) by default.

int foo;
void func()
{
static int bar;
int baz;
}

"foo" and "bar" are guaranteed to be initialized to zero. "baz" is not, it'll be undefined.

("initialized to zero" just means the compiler/linker puts them in the executable's "bss" segment, which is mapped with zeros before execution begins. This is actually more efficient than explicitly initializing with "int foo = 0;" since they won't take up space in the binary on disk, as "int foo = 0" will!)

25 Feb 2003 11:22 bogado

Re: utopia

>
> % int f (int n)
> % {
> % if (n==1) return 1;
> % return n * f(n-1);
> % }
> %
> % A super intelligent compiler would
> % replace f(3) to the value 6.
>
>
> Well, you could wind up getting a
> compiler that goes on a vain attempt at
> solving the halting problem if you tried
> to do this. What would your compiler do
> if it got f(-1), an expression that
> never halts? It wouldn't be able to
> figure out that it's doing infinite
> recursion (if you found an algorithm to
> determine this in the fullest
> generality, you would have solved the
> halting problem, and proved Alan Turing
> and Kurt Goedel wrong).
>
> Finding the fastest code to perform a
> certain task is clearly an undecidable
> problem (this is worse than NP-complete,
> as it can be mathematically shown that
> an algorithm simply doesn't exist). The
> best I think that can be accomplished is
> a few heuristics that conform to some
> simple facts we know about how to speed
> code up.

This algorithm always stops even if you ask f(-1), you must remember that the C int is not an infinit set as the math Z. when the counter gets to the value MININT it will turn into MAXINT and goes down from there tilll it gets in 1. The simple code below will show this :

---
#include

int main()
{
printf ("%d %d %d %d\n", MININT -1, MININT, MAXINT, MAXINT+1);
}
---

linux redhat 8.0 shows on an intel pentium 4 CPU shows:

2147483647 -2147483648 2147483647 -2147483648

25 Feb 2003 17:02 themeld

Re: utopia

> The fibo example proves that even
> propagation techiques are not always
> possible (what if the entire program is
> about find_prime_no (10^10)? This is a
> constant that could be computed at
> compile time, but unfortunatelly it will
> take several years for the compiler to
> compute it)

This is trivially dealt with. Any system to do recursive constant propogation like this generally has (implcitly or explicitly) a limit to the depth of evaluation. Explicit can be in the sense of rules like "only do recursive function constant evaluation up to a depth N". Implicit can be in the sense of the compiler using recursive functions to do the expansion and stack overflowing itself when it gets too deep. The latter of course would trigger a bug report by someone and probably get translated to the former.

The one (unlikely?) possibility is if the compiler's evaluation technique uses tail recursion or some other method that won't trigger a stack overflow, in which case it will, as you said, run for several years. Personally I would file that behavior as either a bug that can be fixed with the explicit recursion limit, or under the general heading of GIGO (Garbage In ...)

25 Feb 2003 19:28 didosevilla

Re: utopia

> This algorithm always stops even if you
> ask f(-1), you must remember that the C
> int is not an infinit set as the math Z.
> when the counter gets to the value
> MININT it will turn into MAXINT and goes
> down from there tilll it gets in 1. The
> simple code below will show this :

Aye, but you still get wrong code. :P But I'm not just talking about the factorial function, of course. Let's generalize the situation. What if you had a function which for certain unspecified (and probably unknown) inputs goes into an infinite loop through some complex contortions? A compiler presented with code with an application of this function to one of those unspecified constants that induces infinite loops would also loop forever attempting to fold the constant. Remember that there's no algorithm capable of finding infinite loops in their fullest generality (the halting problem again).

Also, for a pure functional language (with no side effects, static binding, and so forth), such a strategy of folding constants from function applications might be feasible (but see above for caveats), but for an imperative language which depends on side effects, the algorithm explodes in complexity. If the value of a function happens to depend on side effects outside of its scope, what do you do? Throw away all of the work you made before figuring out that you have to run the program in its entirety? Jeez, if you're going to all this trouble, don't bother compiling your program. Write an interpreter. :)

26 Feb 2003 03:25 bogado

Re: utopia

>
> Aye, but you still get wrong code. :P
> But I'm not just talking about the
> factorial function, of course. Let's
> generalize the situation. What if you
> had a function which for certain
> unspecified (and probably unknown)
> inputs goes into an infinite loop
> through some complex contortions? A
> compiler presented with code with an
> application of this function to one of
> those unspecified constants that induces
> infinite loops would also loop forever
> attempting to fold the constant.
> Remember that there's no algorithm
> capable of finding infinite loops in
> their fullest generality (the halting
> problem again).
>
> Also, for a pure functional language
> (with no side effects, static binding,
> and so forth), such a strategy of
> folding constants from function
> applications might be feasible (but see
> above for caveats), but for an
> imperative language which depends on
> side effects, the algorithm explodes in
> complexity. If the value of a function
> happens to depend on side effects
> outside of its scope, what do you do?
> Throw away all of the work you made
> before figuring out that you have to run
> the program in its entirety? Jeez, if
> you're going to all this trouble, don't
> bother compiling your program. Write an
> interpreter. :)

You are compleatly right, I was just pointing out a fact that some people don´t see. Many problems, including security related, in programs comes from unexpected side efects like this one. Or maybe I was simply being a tigth a**. :-)

My sugestion is simple, why not create a modifier for functions that would hint
the compiler that this function depends only on it's parameters and nothing else. The compiler would then be able to mark call's to this functions that have static parameters and then when the liker see those calls it would simply call the
function and replace the call entirely with the result.

This could open a lot of security problems (think about, a compiler calling compiled code), and I'm not shure it should be ever implemented.

26 Feb 2003 15:20 unknownlamer

Re: Three thoughts

> Guile is a Scheme _Interpreter_.
> Interpreters are
> much more simple to write and maintain
> than
> compiler front-ends, especially for
> symbolic
> high-level languages such as Scheme,
> Perl,
> Python, etc. There are Scheme compilers
> out
> there, but I don't think the GNU people
> wish to
> pursue this direction because guile is
> not intended
> to run very quickly as it is. Not more
> than perl or
> python, in any case.
>
> Same goes for Elisp. Better keep the
> code clean
> than start hacking on a useless GCC
> front-end. I
> don't know too much about Smalltalk.

Actually, Guile is eventually (hopefully soon) going to compile to bytecode and probably to machine code using a GCC frontend. Guile does need to run fast because Emacs will eventually be ported to run on Guile instead of using Elisp (There will be an Elisp translator so you will be able to use either one).

28 Feb 2003 16:19 gurensan

Re: thin article, butt presuming I am convinced.

> How? how to install a new gcc version?
> there appear to be many dependendancies
> that run circular and are not happy with
> a new(er) version of gcc. the "how to"
> url is dated 3 years and is thin with
> respect to the "install section".
> (covert request for assistance...)
>
> gcc in .rpm &/or src.rpm wont install
> and complain about
> libc.so.c & libstdc++ & glibc & none of
> them are happy.
>
> sure, it is possible to upgrade the OS,
> but how about gcc?
>
> usually failed dependancies are easily
> resolved but not so with gcc.
>
> How about a gcc how to upgrade article
> addressing failed
> dependencies... -clueless seeking
> clues...
>
> The good news is finding out more about
> gcc as it fails to obey....
> ( smile )
>

You are referring to rpms, which are not the domain of
the GCC maintainers. See whoever made your distro
about this. The GCC people are only responsible for
making sure the thing compiles on the test systems,
and only then in a full release. If you try beta (or
alpha) code, you're asking for it.

03 Mar 2003 04:27 ajsoft

Re: a sad observation (flame on?)

> Your mileage may vary, but...
>
> My experience with GCC 3.2 is that, for
> a fairly well-written (read: optimized,
> not neat) integer/mem code compiled with
> some sensible optimization options, the
> code generated that is slightly slower
> and larger compared to GCC 2.96 output.
>
> I haven't had a chance to look at the
> output too carefully, but I've noticed a
> number of examples (a filesystem driver,
> a memory allocator and garbage
> collector, etc). Some code is,
> admittably, smaller and faster, but this
> does not justify the impact on other
> fronts.
>
> I'm glad there's some new stuff that
> would sure benefit modern architectures,
> 3D graphics and floating point
> conversions, but my impression that
> there is a slight decline in the quality
> of produced code, and THIS comes at a
> price of slower compilation. Humm.
>
>

My experience and viewpoint are quite the opposite.

I have seen gcc 3.2.1 outperform gcc 2.95.3 on
video compression applications by 3-5%, on an
Athlon running Linux and also on Cygwin. Pretty
impressive. (gcc 2.96 is buggy btw, beware -- pull
down mplayer if you doubt this.)

(If anyone can give some performance numbers on
commercial compilers (esp. Intel's) vs gcc on Linux/x86
systems, please post!)

I'm quite willing to trade compile time for improved
application performance, and with the speed and cost of
modern machines, I can't imagine who wouldn't.

Looking forward to gcc3-built Linux distribtutions.

-- ajs

04 Mar 2003 00:39 bstarynk

quick compilation (using tinycc)

Sometimes a very small compilation time matters (even at the expense of a bit slower execution time). For that, please consider using TinyCC, an opensource C99 compiler for Linux/x86. See tinycc.org (www.tinycc.org/) for details.

TinyCC compiles C code 5-10 times faster than GCC, but the resulting generated code runs about 30% slower.

Using TinyCC might be interesting in applications which generates C code and dynamically loads it. Alternatively, dynamic code generating programs (or metaprograms) might consider using the GNU lightning (www.gnu.org/software/l...) library (and my Qish (freshmeat.net/projects...) runtime & GC might help too).

04 Mar 2003 06:27 sibn

Re: a sad observation (flame on?)

> I have seen gcc 3.2.1 outperform gcc
> 2.95.3 on
> video compression applications by 3-5%,
> on an
> Athlon running Linux and also on Cygwin.
> Pretty
> impressive. (gcc 2.96 is buggy btw,
> beware -- pull
> down mplayer if you doubt this.)
>

This is getting awfully tired. gcc 2.96 was buggy in its original tarball, to be sure. It had somewhere in the region of 350 patches issued that corrected this problem. Mplayer had bad code in it, which prevented it from compiling with 2.96-300(ish), and this was never formally admitted by the mplayer developers.

They silently fixed the problem, and their spin stuck: people still think to this day that 2.96 was a bad compiler (and like I said, 2.96-0 WAS, but we don't live in that era any more). This is the same type of fudmongering often seen by diehard Windows fans who say they tried Linux but it was too complex, and immature.

For them, it may have been.... 8 years ago when they tried it. They continue to refer back to this experience though, as if it reflects any measure of reality. They labor under the delusion that GNU never evolves, and that the state of the system is the same as it was 8 years ago.

This is so obviously false it is becoming difficult to find linux users who believe that it's a difficult operating system to install and use.

The only people who truly believe this are the ones who haven't touched it in 5 years-- just like the people who tried gcc 2.96, or read the bad press it got when it was new.

At the time gcc 3 was released, 2.96 was better. gcc 3.1 may well be better than 2.96, but at the time that gcc 3.0 was released, gcc 2.96 was more mature, better tested, had wider deployment, and was more reliable. It was also the most standards-compliant gcc to date.

05 Mar 2003 09:15 jcduque

GCC myths
You may check out the busybox Makefile. busybox
uses

-Os -march=i386 -fomit-frame-pointer \
-Wall -mpreferred-stack-boundary=2 \
-malign-functions=0 -malign-jumps=0 \
-Wshadow

although the -W does not really optimizes your
code.

07 Mar 2003 22:10 r6144

Re: Frame Pointer
But do note that code size increase quite a bit (because stack references via ESP are one byte larger) when omitting the frame pointer. If you are compiling something small and computationally-intensive like gzip, this may help. But if you are compiling something big and not that computationally-intensive, such as the kernel or mozilla, it is often better to preserve the frame pointer and use -Os to reduce code size even more. You may even gain speed because the code fits more nicely in the cache.

16 Mar 2003 09:01 olsner

Re: some points still missing

> Incremental compiling is protected under
> US patent 5,586,328. Open-Source
> implementations will be illegal until
> 2017.

"Open-Source implementations available for distribution in the US ...", you mean?

22 Mar 2003 14:17 firecode

Re: utopia

> Well, you could wind up getting a
> compiler that goes on a vain attempt at
> solving the halting problem if you tried
> to do this. What would your compiler do
> if it got f(-1), an expression that
> never halts? It wouldn't be able to
> figure out that it's doing infinite
> recursion (if you found an algorithm to
> determine this in the fullest
> generality, you would have solved the
> halting problem, and proved Alan Turing
> and Kurt Goedel wrong).
>
> Finding the fastest code to perform a
> certain task is clearly an undecidable
> problem (this is worse than NP-complete,
> as it can be mathematically shown that
> an algorithm simply doesn't exist). The
> best I think that can be accomplished is
> a few heuristics that conform to some
> simple facts we know about how to speed
> code up.

You are correct but only in theory. However, in practice NP-completeness and Turing's results have very little meaning (IMHO). One can usually come up with algorithms where probability of failure can be made to be small enough.

For example many pattern recognition (PR) problems are NP-complete and/or mathematically ill-conditioned (I think), but one can usually make probability of failure small enough.

If one has more a priori problem specific information about the specific problem than in many PR problems then it's possible to have guaranteed bounds for error.
For example in one practical case: c*10^-n probability for failure and computational requirements: O(n). Now take n = 1000. (c

22 Mar 2003 18:03 bkaindl

Re: some points still missing
You can do this by:
- Compile with -g
- copying the binary: cp -pv progname progname.debug

- striping the debug information from progname
- install progname and progname.debug(or keep it).

Then you have the stripped binary and the debug binary
and if you need to debug or analyze a core file, just use
the debug binary for example with gdb to analyze the core file:

gdb progname.debug core

26 Mar 2003 07:01 pixelbeat

auto select gcc options
I've written a script to picks the optimal gcc options

for x86 hardware. Also it only works on Linux, but

this combination handles a significant percentage

of gcc users, so here you go (www.pixelbeat.org/scri...)

30 Mar 2003 04:30 anubi

Think about when optimizing and what
First of all most people don't even know what means optimizing. Otherwise you can't explain why many (ahem) programmers use things like VBasic. Ok this applies only for Win stuff, still they are lots of people.
And in general we shoud think what means optimizing. For example if you do scientic calculations, you are writing some 3d game it can be critical... but if you are just writing a chat program or a mail client, speed is not so important (it should be just decent). Gui slows down, because users tend to be slower than CPU... and if you got a 56k... ok you can have the best code, but you can't overdo that limit...
So... let's think about it, maybe it is obvious...
from the other side I get Eclipse and it takes 45 seconds on my slackware 8.1 linux box (that means a quick OS) with 1700MHz CPU ... it is not really accettable...
Think about it....

10 Apr 2003 01:22 Avatar pabs3

Re: some points still missing

> "Open-Source implementations available
> for distribution in the US ...", you
> mean?

No he means "non-licenced implementations existing in the USA"

18 Apr 2003 04:56 mnenov

Re: Think about when optimizing and what
% from the other side I get Eclipse and
> it takes 45 seconds on my slackware 8.1
> linux box (that means a quick OS) with
> 1700MHz CPU ... it is not really
> accettable...
> Think about it....

And what about running OpenOffice ?! I think it is a lot of work to make it start and run so slow !!! May me some optimization is needed, but I do not believe that the compiler can help in this case!

30 Apr 2003 01:28 bash99

gcc 3.2 is much improved, only a little lag than icc
just do a unstrict test in a P4 Xeon 1.8G with gcc 3.2(redhat 8.0 default) and icc 7.0¡£

compile openssl 0.96b

icc flags:
icc -O3 -ip -tpp7 -xW
gcc flags:
gcc -O3 --march=pentium4 -pipe -fomit-frame-pointer

in general, both is much fast then gcc 2.96, and icc is about 1% fast than gcc 3.2, but in some case, gcc 3.2 is fast than icc.

Don't test some profile feedback optimization feature yet.

03 May 2003 08:49 DavidTC

Re: utopia

> My sugestion is simple, why not create a
> modifier for functions that would hint
> the compiler that this function depends
> only on it's parameters and nothing
> else. The compiler would then be able to
> mark call's to this functions that have
> static parameters and then when the
> liker see those calls it would simply
> call the
> function and replace the call entirely
> with the result.
>
>
>

gcc already has this. info gcc and search for 'pure', which implies a function doesn't do anything except return a result based on global functions and passed params. gcc 3 has an even stricter one called 'const' which implies it doesn't even look at global functions.

While I doubt gcc actually does this, it would be perfectly legal to just evaluate a function marked 'const' during a *compile* and sub in the value of it.

03 May 2003 08:55 DavidTC

Re: some points still missing

> You can do this by:
> - Compile with -g
> - copying the binary: cp -pv progname
> progname.debug
>
> - striping the debug information from
> progname
> - install progname and progname.debug(or
> keep it).
>
> Then you have the stripped binary and
> the debug binary
> and if you need to debug or analyze a
> core file, just use
> the debug binary for example with gdb to
> analyze the core file:
>
> gdb progname.debug core

Or you can just ship stripped *known* binaries, and then ask users (who don't compile themselves) to email the core to you, where you have the unstripped binary.

If they compile themselves, of course, it should probably default to debug builds.

17 May 2003 21:51 hodeleri

Re: a sad observation (flame on?)

>
> > (gcc 2.96 is buggy btw, beware -- pull
> > down mplayer if you doubt this.)
> >
>
> This is getting awfully tired. gcc 2.96
> was buggy in its original tarball, to be
> sure.

I don't see a GCC 2.96 tarball on gcc.gnu.org. It never was a release, so of course it started out as buggy!

04 Aug 2003 23:36 tcfelker

Re: Think about when optimizing and what

> And what about running OpenOffice ?! I
> think it is a lot of work to make it
> start and run so slow !!! May me some
> optimization is needed, but I do not
> believe that the compiler can help in
> this case!
>

I haven't tried this yet (I intend to), but with huge apps like OpenOffice, you'd probably get more benefit with -Os, which optimizes for size. Most of loading OpenOffice is loading it's huge binary and it's custom GUI code.

15 Aug 2003 10:44 traal

Re: What about -Os?

> Since most software isn't cpu-bound, and
> since memory and disk are also limited
> resources, why not try -Os?

Well, perhaps because regardless of what the documentation says, -Os and -O2 do exactly the same thing? ;-)

(At least on gcc 3.2.3 and gcc 2.95.3; I don't have any other versions around I could try.)

02 Nov 2003 06:44 midiclub

Re: GCC myths

> -Os -march=i386 -fomit-frame-pointer \
> -Wall -mpreferred-stack-boundary=2 \

I don't like the options. First, best generic optimisation level for modern architectures is a Pentium - and you can really assume everyone has at least an old Pentium. Trying to optimise code (by hand or by genetic mutation) for Pentium 2 and 3, as well as for Athlon, almost never shows more than 2% speed increase, provided that Pentium code was really optimal, and taking into account that GCC doesn't effectively use MMX and SSE, except for special vector intrinsics, which are used very rarely in the code.

Also, beginning with Pentium there are large penalties for alignment of 2. At least alignments of 4 have to be used. For values larger than 4 bytes (like doubles), Pentium 2 has even larger optimal alignments.

So, there is a set of options to fairly satisfy vererything starting with an old Pentium, but the one you quote is not there at all.

What i'd like to know. Is there any way to specify one platform setting for one piece of source, and one for the other? Else i'd need to write a custom tool. For really dense performance-oriented code, multiple versions should be compiled, and selected at run-time.

-eye/midilcub

02 Nov 2003 06:50 midiclub

Re: Think about when optimizing and what

> Most of
> loading OpenOffice is loading it's huge
> binary and it's custom GUI code.

You'd be surprised how tiny the GUI code probably is. Just look at FLTK.

The key concept for OpenOffice is modularization, where it should simply not load parts which are not immediately requiered.

However, you can help it by roping, which is traditional on SUNs. There was some tool for Linux, which disassembles the files, does callgraphs, and then assembles the code back again, but the resulting code loads and even runs up to 30% faster! In Windows World, only DigitalMars C++ and VC++ 7.1 are able to do this, as far as i'm aware.

-eye

02 Nov 2003 06:53 midiclub

Re: quick compilation (using tinycc)

> Using TinyCC might be interesting in
> applications which generates C code and
> dynamically loads it. Alternatively,
> dynamic code generating programs (or
> metaprograms) might consider using the
> GNU lightning library (and my Qish
> runtime & GC might help too).

As is Tick C, which generates code which optimises itself at run-time. It makes sense, when some parameters are fixed at execution time and hold for a while. It has a crappy optimiser, but nontheless can show speeds af multiple of GCCs.

-eye

07 May 2004 05:01 qerub

Re: Think about when optimizing and what
prelink?

17 May 2004 13:08 d_weasel

Re: Think about when optimizing and what

> First of all most people don't even know
> what means optimizing. Otherwise you
> can't explain why many (ahem)
> programmers use things like VBasic.

Second of all you don't even know what means engrish!
If you are going to take the time to bash something at least take the to properly formulate your sentences. This way your opponents don't have flame fodder sitting all over the place!

Ack!! What a horrible blanket statement. There are tons of successful projects developed in Visual Basic. It a quite stable environement to develop and debug from. There is very little you can't do with VB, and the longer I use it the more I realize that you can do basically anything with it, given the right skills. Since you can call just about any API call directly from VB its is just as efficient as other languages. The only overhead you might have is a few K of the VB runtime environment, that it likely already installed and being actively used on your Window machine. It certainly not the best tool for every job, but no language is. Its also far better than alot of other languages/IDEs out there.

17 May 2004 13:10 d_weasel

Re: Think about when optimizing and what

>
> % First of all most people don't even
> know
> % what means optimizing. Otherwise you
> % can't explain why many (ahem)
> % programmers use things like VBasic.
>
>
> Second of all you don't even know what
> means engrish!
> If you are going to take the time to
> bash something at least take the to
> properly formulate your sentences. This
> way your opponents don't have flame
> fodder sitting all over the place!
>
> Ack!! What a horrible blanket statement.
> There are tons of successful projects
> developed in Visual Basic. It a quite
> stable environement to develop and debug
> from. There is very little you can't do
> with VB, and the longer I use it the
> more I realize that you can do basically
> anything with it, given the right
> skills. Since you can call just about
> any API call directly from VB its is
> just as efficient as other languages.
> The only overhead you might have is a
> few K of the VB runtime environment,
> that it likely already installed and
> being actively used on your Window
> machine. It certainly not the best tool
> for every job, but no language is. Its
> also far better than alot of other
> languages/IDEs out there.

Ack I just stuck my foot in my mouth, by improperly 'formulating' a sentence about properly formulating sentences....hahah

*feels like a fool*
close enough!

02 Oct 2004 04:31 tomfm

Re: some points still missing

> Incremental compiling is protected under

> US patent 5,586,328. Open-Source

> implementations will be illegal until

> 2017.

Only if the patent holds up in court, which seems unlikely, given that the technique is many years older than that.

04 Dec 2004 23:41 oliverthered

Re: Think about when optimizing and what

> For example if you do

> scientic calculations, you are writing

> some 3d game it can be critical... but

> if you are just writing a chat program

> or a mail client, speed is not so

> important

What if the person running the scientific calcluations

is also talking to someone else over the internet

using a chat client.

1: All applications should be optimized.

2: If your running scientific application (or povray) or

anything else that needs high throughput or a lot of

CPU time, profile the code and recompile using the

profile, this will help out more than -O?

3:Optimization is a trades off with features,

stability, and relase time. Sometimes you want a

fast turn around, with a simple toolkit even if it runs

twice as slow, that's kinda what you get with

Basic/VB. What you don't get is software that you'd

want to use in a server or critical environment, and

that's the tradeoff a lot of people took but regreted.

Personally I'd do with Delphi for fast turn around rad

in the days of VB.

16 Sep 2006 16:27 hzmonte

Optimization - does O3 always generate faster code than O2?
Is it possible that code generated using the O2 option runs faster than that using O3, for example? Is it posible that an optimzation

technique used by O3 is counter-productive for a particular algorithm?

And is there more detailed explanation (preferably with examples) about each optimization technique used by gcc than that in the GCC manual ?

The article says: "In the GCC manpage, it's clearly written that: -O2 turns on all optional optimizations except for loop unrolling [...]" (In the 4.1.1 manual, the exact wordings are: "The compiler does not perform loop unrolling or function inlining when you specify -O2." which is even more confusing.) True, but -O2 turns on all flags that are turned on by -O. And -O turns on -floop-optimize which "optionally" does loop unrolling. Therefore, I guess the conclusion (at least for gcc 3) is

1. -O2 does not mandate loop unrolling;

2. with -O or -O2, loop unrolling may or may not be turned on.

However, based on the 4.1.1 wordings, there is simply no loop unrolling under -O2, period. It somehow implies that if there is any loop unrolling optionally turned on by -O, -O2 would disable it. That is strange.

And how does -floop-optimize2 works?

GCC 4.1.1 manual :

-fprofile-use

Enable profile feedback directed optimizations, and optimizations generally profitable only with profile feedback available.

The following options are enabled: -fbranch-probabilities, -fvpt, -funroll-loops, -fpeel-loops, -ftracer.

-funroll-loops

Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies -frerun-cse-after-loop. It also turns on complete loop peeling (i.e. complete removal of loops with small constant number of iterations). This option makes code larger, and may or may not make it run faster.

Enabled with -fprofile-use.

-floop-optimize

Perform loop optimizations: move constant expressions out of loops, simplify exit test conditions and optionally do strength-reduction and loop unrolling as well.

Enabled at levels -O, -O2, -O3, -Os.

-floop-optimize2

Perform loop optimizations using the new loop optimizer. The optimizations (loop unrolling, peeling and unswitching, loop invariant motion) are enabled by separate flags.

That is, -O turns on -floop-optimize which optionally does loop unrolling. On the other hand, -fprofile-use enables -funroll-loops. And none of the -Ox flags turns on -fprofile-use. Also, none of the -Ox flags turns on -floop-optimize2. And it appears that the manual says that once -floop-optimize2 is turned on, loop unrolling is enabled by a separate flag, presumably -funroll-loops, and implies that -floop-optimize2 would "disable" -floop-optimize because -floop-optimize2 would force the loop optimization techniques be individually turned on. It follows that if I do this:

gcc -O -fprofile-use myprog.c or

gcc -O -floop-optimize2 myprog.c

No loop optimization is performed because any loop optimization that would otherwise be turned on by -O is turned off by -floop-optimize2. If I want to do loop unrolling using the so-called "new loop optimizer", and also benefit from other optimization (except loop optimzation) offered by -O, then I need to do this:

gcc -O -floop-optimize2 -funroll-loops myprog.c

This would do:

1. optimization (except loop optimzation) offered by -O

2. loop unrolling offered by the "new loop optimizer"

but would not do any other loop optimization.

Is my understanding correct?

16 Sep 2006 17:02 hzmonte

Re: Optimization - does O3 always generate faster code than O2?
%Therefore, I guess the conclusion (at least for gcc 3) is
%1. -O2 does not mandate loop unrolling;
%2. with -O or -O2, loop unrolling may or may not be turned on.

To clarify, what gcc 3 means may be:
-O2 does not perform loop unrolling unless it is already performed by -O. Therefore, loop unrolling may or may not be performed under -O or -O2.
And it seems -O3 does not turn on loop unrolling either (unless it is already performed ny -O).
Is my understanding correct?

With the wording in the 4.1.1 Manual, I have no clue what it means. In particular, it says "The compiler does not perform loop unrolling or function inlining when you specify -O2." It does not say "-O2 does not perform loop unrolling"; it says "the compiler does not perform loop unrolling". So it seems -O2 will turn off any loop unrolling that is enabled by -O!

Screenshot

Project Spotlight

Flowgrind

A tool to conduct TCP performance analysis.

Screenshot

Project Spotlight

ABC Path Solver

An automated solver for the puzzle game ABC Path.