Does compiler optimize switch-case?

Go To Last Post
27 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Question: Is there a way for the compiler to sort the different cases in a switch-case table and parse them in binary search style? I want to reduce the worst-case number of comparisons.

I'm building a user interface where an incoming command character from the user is compared to 20 recognized commands. Then when a comparison matches the MCU will know what to do.

So if the user enters the first recognized command character on the comparison list, the first comparison matches. But if it's the last compared command character it will take 20 comparisons to get a match. And if the user enters somehting that isn't on the list it will take 20 comparisons to determine that. In my code I do this quite frequently.

It's easy for me to sequence the recognized command characters and build a binary-search-style tree structure out of them that reduces the worst-case number of comparisons. But that would make the code less readable. And it would make the list of recognized commands harder to edit.

Regards,

Borge

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As far as I know, the compiler will handle the switch with either a series of comparisons, or a jump table, depending on the number of cases and their values. I would guess that it's extremely unlikely that the compiler could sort and binary chop search. Are 20 comparisions really going to slow things down that much? How difficult is it to keep your command table in alphabetical order?

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's difficult-ish :-)

The thing is, I'm not only doing this for the user interface but also for messages arriving from a CD drive on the DSA serial bus. And I'd like to have those hex codes sorted by functionality using defines rather than hex value.

Just thought it would be a nifty trick to do binary searches!

Regards,
Borge

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A jump table will usually give the best performance. You can mitigate the inefficiency of a sequential compare by "sorting" based on command likelihood.

Analyze the emitted code to see which alternative the compiler has chosen. As John suggests, my experience is that for many situations the improvement is small relative to what follows when the command is executed.

C: i = "told you so";

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The compiler will try to do the best it can, but what that is depends on what it is given. If the commands are consecutive integers and you give them in sequential order, the compiler will have a much easier time building a jump table. If you have values that are widely and erratically spaced, out of order, or have some cases fall into the next case, then the compiler will have a much harder time.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I agree, either make a jump table, or if it's a textual search, then I would make a helper app that creates the binary search tree. (similar in function to what Flex/Bison do, only simpler) So instead of writing a C file, you write a file in your own command syntax, that is then pre-processed into C code that is then compiled and linked with your project. This keeps it easy to read/edit, and gives you an optimized runtime result.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you really want, you could use your own jump table, e.g.,

goto jump_table[fred&31];

As others have noted, it's not likely to matter much.

As a wise man, Ken Thompson, once said, "When in doubt use brute force."

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
If you really want, you could use your own jump table, e.g.,
goto jump_table[fred&31];

Sadly that won't work, because goto requires a label. C does not offer an indirect goto.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

glitch wrote:
I agree, either make a jump table, or if it's a textual search, then I would make a helper app that creates the binary search tree. (similar in function to what Flex/Bison do, only simpler) So instead of writing a C file, you write a file in your own command syntax, that is then pre-processed into C code that is then compiled and linked with your project. This keeps it easy to read/edit, and gives you an optimized runtime result.
I have a similar problem to the OP, except I currently have 253 commands to parse. Luckily the commands are broken into 3 or 4 words separated by an underscore. Bad news is that I now have about 136 words to recognize.

Once I have the recognized words they are assembled into a 4-byte "command" (1 byte for each word) and the command table is searched.

The words and commands are sorted into a "most likely first" order, but still it's ugly. Linear searches are, well, linear.

I have also been thinking about some sort of binary/b-tree/hash-linked-list sort of search. I looked at flex but didn't get far. My last use of (then) LEX is more than a decade in the past at this point.

If someone else has looked at this, I wouldn't mind a pointer. Or I may get to it when the noise of releasing new product dies down.

Stu

PS: Forgot to add -- I also need to parse parameters, but that's another story. :wink:

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You could perform some sort of simple hash on the command text that yeilds an 8 or 16 bit value. Then use that value to index in a hash table to arrive at the appropriate command.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

glitch wrote:
skeeve wrote:
If you really want, you could use your own jump table, e.g.,
goto jump_table[fred&31];

Sadly that won't work, because goto requires a label. C does not offer an indirect goto.

It's a gcc extension, but I got the syntax wrong:
goto *jump_table[fred&31];

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

cool, I wasn't aware of that GCC extension. Assuming it cleans up the stack properly, that's a nice extension, making jump tables much easier to implement. OF corse it's not really all that different than

return jump_table[fred&31]();

All that would be saved, speed wise, is the cost of the extra push/pop of the return addresses from the call/ret pair. The stack will be smaller too, because the calls would not be nested.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In a previous project I actually had a case where a binary search tree rescued the code. I was writing a protocol for a wireless subwoofer that had to receive, transmit or re-transmit at different time slots. There were quite a few different time slots and I just didn't have the time to do a linear search. And rather than porting to assembly I ended up with 3 levels deep of if-else-if. That worked as a charm. Rather than having anywhere from 1 to 20 comparisons, I went down to a flat 3 or 4.

The real-time requirements in the UI code are much kinder, so I wouldn't bother writing and maintaining multiple levels of tests. I was just curious if this was built in anywhere.

Borge

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

stu_san wrote:

> I looked at flex but didn't get far.

I've got patches for flex and byacc to adapt them into a more AVR-like way
(store all tables in flash ROM rather than RAM). It works, but be warned:
it's not going to be small. The "noise floor" is about 15 KiB of ROM
usage. The good news about it though is: it's damn fast (as lex/yacc has
always been), and if you start adding code, it doesn't bloat much more. As
a proof of concept, I wrote a simple (and ugly) BASIC interpreter, full floating
point support, arrays, etc. pp, and it was still below 32 KiB of ROM. Naturally,
a BASIC interpreter only makes sense with enough RAM to store the program
in, but lex/yacc itself don't use too much RAM of their own once the tables
are moved to ROM.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Linear searches are, well, linear.

Consider augmenting your linear search with a cache. Similar to glitch's suggestion, you can use a fast hash to map a candidate into a cache table; if the table entry matches, the search is done. If not, continue with the slow (full) search.

if (cache-match(candidate))
    return cache-data;
do-slow-search;
update-cache-with-search-result;
return slow-result;

Of course, cache size, hash function, and actual command sequence greatly affect cache savings.

C: i = "told you so";

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

stu_san wrote:
I have a similar problem to the OP, except I currently have 253 commands to parse. Luckily the commands are broken into 3 or 4 words separated by an underscore. Bad news is that I now have about 136 words to recognize.

Once I have the recognized words they are assembled into a 4-byte "command" (1 byte for each word) and the command table is searched.

The words and commands are sorted into a "most likely first" order, but still it's ugly. Linear searches are, well, linear.

With binary search, you'll need at least 8 comparisons.
They don't need to be be string comparisons.
At the top level have picked, if you can, a character to
test that will split the command list roughly in half.
Within each case, treat the uneliminated commands as a new top level.
When you're down to one command, do a string test for validity.

One could do something similar with nested jump tables.

In either case, you will likely want to write a code-generator.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There are several approaches that depend on whether you want a fast average-time or a fixed-time response or a minimum-time response.

For large symbol tables, a hashtable or binary chop are worthwhile. The hashtable can be fixed time if reasonably sparse.

For moderate quantities, a table of indices to start a linear search is not unreasonable for smaller searches. An alphabetic index can avoid any multiplication completely.

An intelligent search by a hand-crafted if-else structure can work well with fixed tables. And remembering your previous search can often help.

You have the classic N, N squared, N-order or exponential algorithm times. For smaller N, a simpler algorithm can be faster. In the limit, you always choose the lowest order of N.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Generally, a full hash table implementation is considered O(1), and a binary search is O(log N).

Quote:
With binary search, you'll need at least 8 comparisons.

This is wrong. Perhaps you mean worst case?

C: i = "told you so";

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I've got patches for flex and byacc to adapt them into a more AVR-like way
(store all tables in flash ROM rather than RAM). It works, but be warned:
it's not going to be small. The "noise floor" is about 15 KiB of ROM
usage. The good news about it though is: it's damn fast (as lex/yacc has
always been), and if you start adding code, it doesn't bloat much more.

Interesting. I haven't done lex-yacc/flex-bison for years, and never in an AVR environment so far. But I used to do quite a bit of "cross compilers" and interpreters for simulators and the like in a couple of past lives for "engineering support tools". I've gotten away from the "compiler writing" to embedded apps in mostly industrial. So the LALR scanners/parsers are still "the" way?

I do remember a couple apps a long time ago when memory space was dear where we cut down the allowed characters to 64 instead of 256 (handling anything else as an exception patch) and that cut our size down quite a bit on an old small mini. And did somewhat the same thing in early DOS days to keep a program small.

For stu's case, I try to structure the communications protocol to avoid parsing "words". Most of my apps have a Modbus-like read/write "registers" commands which makes for a nice tight AVR implementation.

I'm rusty on it, but for stu's description I'd try a couple of different approaches--one extreme would do the whole job in flex, the other would do minimal flex and put the smarts into bison. If memory space was tight I'd think a tweaked flex-only would do a nice fast job.

Some of it depends on the goals/requirements. If there are a LOT of these parses and speed is important then perhaps extra flash/next bigger AVR wouldn't be a big deal.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

cpluscon wrote:
Generally, a full hash table implementation is considered O(1), and a binary search is O(log N).

Quote:
With binary search, you'll need at least 8 comparisons.

This is wrong. Perhaps you mean worst case?

I should have written "8 levels of comparisons."
Assuming 129..256 text entries, in the best case, the longest path will have at least 8 comparisons.
If you can't avoid 60-40 splits, the longest path will be even longer.
In the very worst case, each comparison distinguishes precisely one entry from all the others.
That is equivalent to linear search and its shortest path has a single comparison.

I would expect either binary search or a hash
table to do its job in a time comparable
to the time required to read the command.
For doing either, I'd recommend a code generator.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for all the pointers -- I am quite familiar with hashing and such, just hadn't implemented it.

I was thinking of a wide-node tree (26 pointers per node - no comparison needed!) for parsing the words. I also limit the parsed word size to 4 characters and ignore anything past the 4th character - This makes parsing the words faster and saves on typing. :wink:

BTW, I'm already using an ATmega2560, so until the Xmega392 shows up, I'm already in the largest available processor. :D

I also had thought about creating a code "compiler", but what I have works and, for the most part, works well. Just wanted to get some ideas about how much work on lex/yacc-like stuff had been done for AVR.

Jörg, I may bug you about your patches for flex/bison at some point. 15 KiB in my world is acceptable for this function, although I may come up with something easier. See my .sig for comment on this.

Thanks everyone!

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I should have written "8 levels of comparisons."
Assuming 129..256 text entries, in the best case, the longest path will have at least 8 comparisons.
If you can't avoid 60-40 splits, the longest path will be even longer.
In the very worst case, each comparison distinguishes precisely one entry from all the others.
That is equivalent to linear search and its shortest path has a single comparison.

Performing a binary search on a sorted list of size 256 will take at most 9 comparisons. There is no case for which more than 9 comparisons are needed. So 60-40 splits never apply.

OTOH, a binary search tree (aka heap) is used when insertion and/or deletion are required. This is a data structure consisting of nodes each having right and left pointers to other nodes. Such a data structure would not be used for Stu-sans application, unless the commands are changing.

A heap can degenerate so that searches require more comparisons, up to N, when the tree is not balanced. There are solutions to this problem that involve rebalancing the tree after insertion/deletion.

C: i = "told you so";

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

cpluscon wrote:
Performing a binary search on a sorted list of size 256 will take at most 9 comparisons. There is no case for which more than 9 comparisons are needed. So 60-40 splits never apply.
"Bi-" means two, not one-half.
Consider the following sorted list of 243 strings
AAAAA
AAAAB
AAAAC
AAABA
...
CCCCC.
ATtinys...ATmegas can only compare one pair of bytes at a time.
Five comparisons will have a roughly 67-33 split.
The longest path will require ten comparisons.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This horse is nearly dead, but just in case....

skeeve wrote:
Five comparisons will have a roughly 67-33 split.
The longest path will require ten comparisons.

By "comparison" I mean an evaluation of difference of two instances of data structure entries (which could be bytes, integers, floats, widgets, or in your example, 5 character strings). In the latter case, the maximum number of "comparisons" is 8, no more. (The minimum number is 1 - for BBBBB.) There are 116 entries for which 8 is required.

If you want to go to the level of processor instructions, naturally things get more complicated. For your example, my simulation shows the most comparisons required for any string in the list is 56, for CBACB (or ABCAB depending on whether you check less-than or greater-than first). The minimum is of course 10, again for BBBBB, since less-than and greater-than both need to be checked, five times.

If you prefer to count a byte-wise less-than/greater-than operation as a single comparison then the minimum and maximum counts are 5 and 29. This could be justified if the pertinent opcodes perform a single difference and base the less-than/greater-than evaluation on status flags.

C: i = "told you so";

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

cpluscon wrote:
This horse is nearly dead, but just in case....
Yeah. Work for Doughnut Jimmy.
I think our readers, if any, have all the useful information they are going to get.
Quote:
skeeve wrote:
Five comparisons will have a roughly 67-33 split.
The longest path will require ten comparisons.

By "comparison" I mean an evaluation of difference of two instances of data structure entries (which could be bytes, integers, floats, widgets, or in your example, 5 character strings).
A sensible definition, but not always the most useful one.
Quote:
In the latter case, the maximum number of "comparisons" is 8, no more. (The minimum number is 1 - for BBBBB.)
I gather you were doing trinary search
Quote:
There are 116 entries for which 8 is required.

If you want to go to the level of processor instructions, naturally things get more complicated. For your example, my simulation shows the most comparisons required for any string in the list is 56, for CBACB (or ABCAB depending on whether you check less-than or greater-than first). The minimum is of course 10, again for BBBBB, since less-than and greater-than both need to be checked, five times.

If you prefer to count a byte-wise less-than/greater-than operation as a single comparison then the minimum and maximum counts are 5 and 29. This could be justified if the pertinent opcodes perform a single difference and base the less-than/greater-than evaluation on status flags.

With an AVR, a two-way branch is a comparison and a branch.
A three-way is a comparison is a comparison and two branches.
Whether the additional branch is worth it depends on the frequency of equality.

In many cases, a multi-byte comparison will be decided in the first few bytes.
In those case, the distinction between a byte comparison
and a widget comparison is a small factor.

My example is pathological if one is allowed to pick the significance order of the characters.
OTOH if one has to use the order given, common prefixes can slow things down.
OYAH if one is typing in strings, how important is speed?
Dealing with speling variations would seem more important.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

So the LALR scanners/parsers are still "the" way?

Interesting you should mention that. Lex/yacc (flex/bison) still seem to be used on a regular basis. However, GCC somewhat recently (within last year, IIRC) got rid of their flex/bison based parser for C (and C++ IIRC) and replaced with a hand-written parser because it proved to be faster.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
replaced with a hand-written parser because it proved to be faster.

Wow--interesting. Like with today's PC speeds a few percent improvement on scanning or parsing would be noticeable--I'd think symbol table manipulation would dwarf it; it always did in my compiler work in past lives.

But then maybe the apps are one or more orders of magnitude larger than when I was working on PCs running at a few MHz.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.