Tips from the trenches: tweak your way to improved C compiler output
I'm never satisfied with the status quo. Solutions primarily offer compromises based on
available facts, costs, skills or time. With this mindset, why would I think that a C
compiler output gives the best results for my application? Loaded down by years of
squeezing bits into unimaginably small memory spaces and not wholly trusting the
infallibility of technology, I've come to appreciate C compilers, but to get great
performance from them takes some knowledge on our end, too. Today's compilers are
marvelous and even come with optimizers to tweak functions within the confines set by
compiler designers. You can manually select or trade off execution speed vs memory size,
how much optimization to apply and even how to most efficiently use memory and register
My problem isn't necessarily with compilers but with our expectations of them. We
expect a compiler to do a fantastic job without it having an understanding of our
intentions, application or logic design. Is this problem really any different than using a
spell checker? A properly spelled word used in the wrong context passes the spell checker
but creates an "error" with the person reading the text.
A compiler can't possibly know the nuances of my application. Thus I can either accept
its limitations and hope the processor has enough memory and speed for my code both today
and in the future, or I can help the compiler by adopting coding styles that move towards
the desired performance. Now, many readers are already rightly asking, "Why devote
that time to tuning C code because time is money?" However, they're not considering
costs associated with exceeding memory space and poor speed performance. Thus, first check
out the following suggestions before bombarding me with critical e-mail. I've accumulated
these techniques, which are independent of processor, compiler or application, from my
years writing C code for embedded processors.
- Look at the assembly language The compiler's assembly listing is
your only way to peek under the hood to see how well it translated your intentions. Most
software people, though, ignore this selectable output due to deadline pressures,
inexperience with assembly language, or they simply never even thought of doing so. An
assembly listing serves as a valuable tool when optimizing performance.
Mind you, I'm not advocating that designers tweak assembly code. Instead, use it to
check how the compiler translated your intentions into C code. A simple review, which
doesn't take long, might offer insight into how wasteful certain coding constructs can be.
Note that I don't immediately check assembly output every time I compile a module.
Doing so would waste time and money. Instead, first get a module or function working and
then check the assembly output. This offers a way to take advantage of a high-level
language's speed with a simple QA check.
What are some of the things to look for? In general, search for assembly constructs
that defy the simplicity of C code. For example, repeatedly assigning the same
floating-point constant to different variables is less memory efficient than using a
global constant without much of a speed hit.
Another example is the Switch/Case statement. With some compilers, that statement's
overhead is horrendous. See if it makes sense to recode a small Switch/Case into a
conditional statement. On a related note, some compilers allow you to force the Switch
argument into a more "natural" representation for the processor, thereby saving
code. The bottom line is that without a look at the assembly-language output you can't
catch these things.
- Bit fields Coming from an assembly-language background, I find
it second nature to manipulate bits. So when I found bit-field operations in the C
language I instantly started using them. With some processors those with
bit-manipulation opcodes the generated code is concise and straightforward. In
contrast, consider a processor that doesn't implement bit-manipulation instructions such
as the 80C188. Every time you want to set, clear or check a bit requires the compiler to
generate shifts, ands and maybe ors. A simple logic statement to set a bit becomes
multiple instructions (the number dependent on bit position) that chew up memory and waste
processor cycles. Multiply this loss by the number of times you use a particular bit, and
memory starts evaporating.
Today, I avoid using C-language bit fields because of the memory and speed penalties.
Instead, I map bits into a word and do comparisons on the entire word, which might be a
char, int or long, depending on overall system needs. This approach might not be as
intuitive as treating a bit field as a variable, but good comments and variable names
minimize this downside.
- Char vs Int I once thought that to conserve memory resources all
I had to do was pick the smallest memory type specifier. Well, it's really never that
simple. Again, using the 16-bit 80C188 as an example, specifying a char requires the
compiler to load the char into the LSB of the 16-bit register and a 0 (or sign-extend the
LSB) into the MSB. This sequence occurs every time you use the variable, which invariably
leads to more ROM space and slower execution. In this case, an int is the preferred
natural word size. However, because the 8051 is an 8-bit CPU, its preferred word size is 8
bits. Using a 16-bit value, where an 8-bit would suffice, is again wasteful.
- Signed vs unsigned variables I couldn't believe that signed vs
unsigned would really change the compiler output, but I believed it after seeing the
assembly listing. This example occurred with a PIC processor. In a For loop, using an
unsigned loop variable took more opcodes than switching the loop variable to a signed
character. The lesson here is to never take anything for granted.
- Compound conditional statements For many reasons, programmers
should limit the nesting of compound conditional statements. Nesting should never go more
than three deep due to the sheer complexity and difficulty in testing all program
pathways. Further, compound conditional statements use lots of memory and create
unpredictable timings that might affect performance.
Try to limit these compound statements. However, sometimes they might seem mandatory.
In those situations you can probably find an alternative. Sometimes I turn a hash into a
table. Specifically, I evaluate each conditional clause and for a True condition set a
unique bit in a variable. Then I use the variable as an index into a table, which might
contain pointers to different functions, values or outcomes for the program to use. This
technique saves memory, gives a more predictable path through the program, allows easy
testing of all pathways and is extremely simple to explain during a code review.
- Dereferencing pointers I'm a big proponent of pointers in C.
Some people shy away from them, and others blast them for giving the programmer too much
flexibility and leeway. Yet they provide a solution to many programming deadends. However,
one problem is having pointers to pointers all over. It's so easy to end up with a
statement such as ptr1->ptr2->value. It works perfectly fine, but every time you use
it the compiler dereferences ptr1 and ptr2, and each dereferencing can take a few
instructions. Multiply that amount by the number of times this construct might appear, and
you end up using lots of memory, thus slowing execution speed.
Instead use a temporary variable that removes the indirections. For example, assign
ptr3 = ptr1->ptr2. This method allows you to use ptr3->value, thereby saving one
level of dereferencing.
The savings add up
These suggestions might not seem like they add up to much, but in the long run they
make a big difference. For example, on a recent 68HC11 project, I found that the C library
functions ate up roughly 14k bytes of ROM, leaving me roughly 18k bytes for the
application software. Of course, I filled it up and then went through the assembly listing
to look for savings. By implementing some of the techniques discussed above, I uncovered
an additional 5k bytes of ROM space. This amount might not seem like much, but it made the
difference between not completing the project and finishing it with all required features.