(var/2) vs. (var>>1) issue on avr-gcc

Go To Last Post
22 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

This code is a snippet from a software UART, I've stripped timing and control code to focus on the title issue.

 

#include<avr/io.h>

int main () {
	/* Sample bits */
	uint8_t c = 0;
	for (uint8_t i = 0; i < 8; i++) {
		/* Take sample */
		//c >>= 1;
		c /= 2;
		if ( PIND & (1 << PIND6) ) {
			c |=  0x80;
		}
	}
	return c;
}

 

There is a line where I could use either a right shift by 1 or divide by 2, and I expected avr-gcc to generate the same code due to optimizations. But to my surprise, the code using the division is better than the shift one!

I tried several versions of avr-gcc. The shift code is worse because the uint8_t variable is promoted to 16 bits, shifted, then cropped back to 8 bits:

; code using right shift


00000080 <main>:
  80:   98 e0           ldi     r25, 0x08       ; 8
  82:   80 e0           ldi     r24, 0x00       ; 0
  84:   28 2f           mov     r18, r24
  86:   30 e0           ldi     r19, 0x00       ; 0
  88:   35 95           asr     r19
  8a:   27 95           ror     r18
  8c:   82 2f           mov     r24, r18
  8e:   4e 99           sbic    0x09, 6 ; 9
  90:   80 68           ori     r24, 0x80       ; 128
  92:   91 50           subi    r25, 0x01       ; 1
  94:   b9 f7           brne    .-18            ; 0x84 <main+0x4>
  96:   90 e0           ldi     r25, 0x00       ; 0
  98:   08 95           ret

 

When using the division by 2, the code is what I expected:

; code using division

00000080 <main>:
  80:   98 e0           ldi     r25, 0x08       ; 8
  82:   80 e0           ldi     r24, 0x00       ; 0
  84:   86 95           lsr     r24
  86:   4e 99           sbic    0x09, 6 ; 9
  88:   80 68           ori     r24, 0x80       ; 128
  8a:   91 50           subi    r25, 0x01       ; 1
  8c:   d9 f7           brne    .-10            ; 0x84 <main+0x4>
  8e:   90 e0           ldi     r25, 0x00       ; 0
  90:   08 95           ret

 

This makes quite a difference on a tight loop where timing is quite critical. I don't understand why the variable is promoted to 16 bits in the shift case, maybe the C standard mandates this? Opinions welcome.

This topic has a solution.
Last Edited: Thu. Aug 9, 2018 - 02:29 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:
the uint8_t variable is promoted to 16 bits

That's a 'C' standard compliance thing.

 

Keil's C51 compiler for the 8051 has an option to disable that: http://www.keil.com/support/man/...

 

May be worth digging to see if (AVR) GCC has a similar option ...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK, so the standard mandates promotion to int, but then some optimization pass notices this is not needed, but only for the division, the shift is left as 16 bits. I found the avr-gcc option "-mint8" but it's just too blunt to be used, it makes the size of every int 8 bits.

The problem seems to be that the promotion of unsigned char to int changes the signal behaviour and this confuses the optimizer.

 

Moreover, the promotion is costly because new registers are allocated. If we start with an int variable, that is:

	int c = 0;

the code is actually smaller because no new registers are allocated:

00000080 <main>:
  80:   28 e0           ldi     r18, 0x08       ; 8
  82:   80 e0           ldi     r24, 0x00       ; 0
  84:   90 e0           ldi     r25, 0x00       ; 0
  86:   95 95           asr     r25
  88:   87 95           ror     r24
  8a:   4e 99           sbic    0x09, 6 ; 9
  8c:   80 68           ori     r24, 0x80       ; 128
  8e:   21 50           subi    r18, 0x01       ; 1
  90:   d1 f7           brne    .-12            ; 0x86 <main+0x6>
  92:   08 95           ret

 

These quirks are interesting, I'll have to keep them in mind.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is one of those areas where one would hope that the specifically-focussed, commercial 8-bit compilers would justify their price tag ...

(as Keil does)

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:
I tried several versions of avr-gcc.

I cannot reproduce your results, using my vanilla test project, Studio 7, "right out of the box".

#include<avr/io.h>

int main () {
  7a:	98 e0       	ldi	r25, 0x08	; 8
	/* Sample bits */
	uint8_t c = 0;
  7c:	80 e0       	ldi	r24, 0x00	; 0
	for (uint8_t i = 0; i < 8; i++) {
		/* Take sample */
		//c >>= 1;
		c /= 2;
  7e:	86 95       	lsr	r24
		if ( PIND & (1 << PIND6) ) {
  80:	4e 99       	sbic	0x09, 6	; 9
			c |=  0x80;
  82:	80 68       	ori	r24, 0x80	; 128
  84:	91 50       	subi	r25, 0x01	; 1
#include<avr/io.h>

int main () {
	/* Sample bits */
	uint8_t c = 0;
	for (uint8_t i = 0; i < 8; i++) {
  86:	d9 f7       	brne	.-10     	; 0x7e <main+0x4>
		if ( PIND & (1 << PIND6) ) {
			c |=  0x80;
		}
	}
	return c;
  88:	90 e0       	ldi	r25, 0x00	; 0
  8a:	08 95       	ret

0000008c <_exit>:
  8c:	f8 94       	cli

0000008e <__stop_program>:

No promotion; no register-whomping.  Is the -mint8 a default?

 

I don't see it in the compile options:

        Building file: .././main.c
        Invoking: AVR/GNU C Compiler : 5.4.0
        "C:\Program Files (x86)\Atmel\Studio\7.0\toolchain\avr8\avr8-gnu-toolchain\bin\avr-gcc.exe"  -x c -funsigned-char -funsigned-bitfields -DDEBUG  -I"C:\Program Files (x86)\Atmel\Studio\7.0\Packs\atmel\ATmega_DFP\1.2.132\include"  -Os -ffunction-sections -fdata-sections -fpack-struct -fshort-enums -mrelax -g2 -Wall -mmcu=atmega328p -B "C:\Program Files (x86)\Atmel\Studio\7.0\Packs\atmel\ATmega_DFP\1.2.132\gcc\dev\atmega328p" -c -std=c99 -MD -MP -MF "main.d" -MT"main.d" -MT"main.o"   -o "main.o" ".././main.c"
        Finished building: .././main.c

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The promotion happens if you use the shift version instead of the division (that is, the line that is commented out).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
No promotion; no register-whomping.

Time for the OP to post his toolchain version!

 

Jim

 

Click Link: Get Free Stock: Retire early!

share.robinhood.com/jamesc3274

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:
The promotion happens if you use the shift version instead of the division.

OK, I'll bite... ;)

0000007a <main>:
#include<avr/io.h>

int main () {
  7a:	98 e0       	ldi	r25, 0x08	; 8
	/* Sample bits */
	uint8_t c = 0;
  7c:	80 e0       	ldi	r24, 0x00	; 0
	for (uint8_t i = 0; i < 8; i++) {
		/* Take sample */
		c >>= 1;
  7e:	86 95       	lsr	r24
		//c /= 2;
		if ( PIND & (1 << PIND6) ) {
  80:	4e 99       	sbic	0x09, 6	; 9
			c |=  0x80;
  82:	80 68       	ori	r24, 0x80	; 128
  84:	91 50       	subi	r25, 0x01	; 1
#include<avr/io.h>

int main () {
	/* Sample bits */
	uint8_t c = 0;
	for (uint8_t i = 0; i < 8; i++) {
  86:	d9 f7       	brne	.-10     	; 0x7e <main+0x4>
		if ( PIND & (1 << PIND6) ) {
			c |=  0x80;
		}
	}
	return c;
  88:	90 e0       	ldi	r25, 0x00	; 0
  8a:	08 95       	ret

0000008c <_exit>:
  8c:	f8 94       	cli

for

#include<avr/io.h>

int main () {
	/* Sample bits */
	uint8_t c = 0;
	for (uint8_t i = 0; i < 8; i++) {
		/* Take sample */
		c >>= 1;
		//c /= 2;
		if ( PIND & (1 << PIND6) ) {
			c |=  0x80;
		}
	}
	return c;
}

As mentioned,

theusch wrote:
I cannot reproduce your results, using my vanilla test project, Studio 7, "right out of the box".

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Just to point out that I have a sneaking suspicion that El Tangas (being a great fan of such things) may actually be using the C++ compiler not the C compiler (so main.cpp rather than main.c). C++ is a whole new ball game as it is much more stringent about types/promotions etc.

 

EDIT: yup just run the experiment with main.cpp and /2 is the LSR but >>1 is the messy stuff

 

EDIT2: I don't know enough about C++ to know if the generic << operator (not one associated with a specific class of object) can be overloaded but you could maybe envisage a situation where you redefine a specifically uint8_t variant of it that sticks to byte wide operation perhaps?

 

EDIT3: you say >>, I say <<, let's call the whole thing off! :-)

Last Edited: Thu. Aug 9, 2018 - 01:49 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

clawson wrote:
EDIT2: I don't know enough about C++ to know if the generic << operator (not one associated with a specific class of object) can be overloaded but you could maybe envisage a situation where you redefine a specifically uint8_t variant of it that sticks to byte wide operation perhaps?

And here for a decade or more Johan has been trying to pound "no bloat" into us.  "Fake news."

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In this case, the bloat is a compiler thing,

not a language thing.

It might depend on optimization flags.

avr-g++ might have different defaults from avr-gcc.

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

I think it's widely understood that avr-g++ is not as "honed" as avr-gcc. In part this is because the C++ team are sticklers for adherence to standards and are not willing to accept changes specifically for some no-name 8bit micro that no one has ever heard of that might upset their standard compliance (even if it might make for more efficient code for that specific target).

 

Bottom line, you aren't going to turn the oil tanker!

 

(there's quite a lot of scope for making efficiency changes to avr-gcc if you are happy to stick with C)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
the [GCC] C++ team are sticklers for adherence to standards and are not willing to accept changes specifically for some no-name 8bit micro that no one has ever heard of that might upset their standard compliance (even if it might make for more efficient code for that specific target)

Again, this is where the commercial specifically-focussed AVR compilers can win.

 

Whether any gain is worth the price tag is another question ...

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You're right! Damn C++cheeky that's why theusch couldn't reproduce it.

No, operators for built in types can't be overloaded, that would allow complete customization of the language and we can't have such a thing...

 

Ok, I'll mark your post as a solution, before the C++ guys find this thread and set out to "fix" the division non-compliance to standards!

Last Edited: Thu. Aug 9, 2018 - 02:30 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:
Ok, I'll mark your post as a solution, before the C++ guys find this thread and set out to "fix" the division non-compliance to standards!
To be clear: the avr-gcc version is standard-compliant: c gets the correct values and all the other side-effects are the same.

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah, I just prefer when the standard is met with less instructions and/or cycles.

 

It never even occur to me that avr-gcc would have a different optimizer for C and C++, that's why I like this forum, there are people with lots of knowledge and experience here. Always learning...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

not willing to accept changes specifically for some no-name 8bit micro that no one has ever heard of

They need to get educated!  However, I suppose we get what we pay for in compilers...can't complain too much when free.

We'll see how long free is supported by microchip.

When in the dark remember-the future looks brighter than ever.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I don't think the C++ standard requires generating inefficient code.  Both C and C++ are all about speed.  More likely there just hasn't been as much work on making the same optimizations that are already present in the C compiler.

 

--Mike

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avrcandies wrote:
I suppose we get what we pay for in compilers

Indeed - see #2, #4, #13

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry I could not resist so here is a ASM version that do the same :

	ldi r24,0x80
	clc
L1:	ror r24
	sbic 0x09,6
	ori r24,0x80
	brcc L1
	ret

but normally I would do it with a mask I shift up and when it's zero we are done.

something like this:

	ldi r24,0x00
	ldi r25,0x01 
L1:	sbic 0x09,6
	or  r24,r25
	add r25,r25
	brne L1
	ret

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avr-mike wrote:

I don't think the C++ standard requires generating inefficient code.  Both C and C++ are all about speed.  More likely there just hasn't been as much work on making the same optimizations that are already present in the C compiler.

 

--Mike

 


That's what I said. The C++ folks are only interested in ensuring the x86/64/ARM versions are standards perfect. The AVR8 is an irrelevance so they wouldn't contemplate taking on board any change to make AVR8 more efficient at the risk of it perturbing the important versions of the tool. For example they wont accept __flash.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The C++ folks are only interested in ensuring the x86/64/ARM versions are standards perfect. The AVR8 is an irrelevance so they wouldn't contemplate taking on board any change to make AVR8 more efficient

I'll bet they are probably really just upset they never got a t-shirt...payback

 

 

When in the dark remember-the future looks brighter than ever.

Last Edited: Thu. Aug 9, 2018 - 09:07 PM