in Compilers

x86: More code – less code

Sometimes more code in a source language, can mean less code in a generated language.

NOTE: below we used gcc 5.3 with -std=c++14 -O3 flags.

On the example of C programming language, and its translation to x64 assembly, let’s take a look at the following example:

// 7 times `x`.
int foo(int x) {
  return x + x + x + x + x + x + x;
}

int main() {
  return foo(1);
}

A generated code (applying an optimizing compiler) would look something like this:

foo(int):
        leal    (%rdi,%rdi,4), %eax
        leal    (%rax,%rdi,2), %eax
        ret
main:
        movl    $7, %eax
        ret

While if we add even more code in the source language, say 8 times for x:

// 8 times `x`.
int foo(int x) {
  return x + x + x + x + x + x + x + x;
}

int main() {
  return foo(1);
}

We can get less generated code, taking just one instruction for the actual calculation:

foo(int):
        leal    0(,%rdi,8), %eax
        ret
main:
        movl    $8, %eax
        ret

How this happens and why, is related to the format of the effective address calculation arithmetics. And in fact, the lea instruction semantically is more related to exactly address manipulations (like array elements access, fields of objects, etc).

However, compilers often reuse lea to do simple arithmetic evaluations, although only those that fit the format of the lea (we covered such optimizations in this article). You can also get additional info how this generic memory access format looks like here.

In case when the capabilities of lea are exceeded, compiler should fall-back to e.g. add instruction since the scale can only be 1, 2, 4 or 8. Having the code for 6 times x, we again get two instructions, but now already with less optimal addl instruction:

// 6 times `x`.
int foo(int x) {
  return x + x + x + x + x + x;
}

int main() {
    return foo(1);
}

In assembly they are leal and addl:

foo(int):
        leal    (%rdi,%rdi,4), %eax
        addl    %edi, %eax
        ret
main:
        movl    $6, %eax
        ret

In any case though, an optimizing compiler evaluated the whole resulting expression at static time, and the result from main function is a direct constant, instead of a runtime function call. Also notice, the optimizing compiler of course doesn’t do a bunch of add instructions (6, 7 or 8 times), but instead as we said uses an “unrelated” lea instruction for this.

NOTE: try compiling without full optimization, i.e. excluding -O3 flag.

Have fun with compilation 😉

P.S.: And of course depends on a compiler version, different translation strategies can be chosen. E.g. that example with 7 times x instead of two leas can still be compiled with one, though less optimal instruction — imul (actual multiplication). This is what we get in gcc 6:

foo(int):
        imull   $7, %edi, %eax
        ret
main:
        movl    $7, %eax
        ret

Write a Comment

Comment