avr-gcc produces awful code when using global register variables...

Go To Last Post
15 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This came up on the Arduino forums...  Avr-gcc supports putting global variables in registers, right?  As in:

volatile register unsigned char xxx asm("r3");

ISR(TIMER0_OVF_vect, NAKED) {

    xxx <<= 1;
    ++xxx;
}

 

But this ends up producing code that is much worse than I would expect:

 /usr/local/CrossPack-AVR-48/bin/avr-gcc -g -c -Os -mmcu=atmega328p foo.c
BillW-MacOSX-2<5027> avr-objdump -S foo.o

00000000 <ISR>:
volatile register unsigned char xxx asm("r3");

ISR(TIMER0_OVF_vect, NAKED) {
    xxx <<= 1;
   0:	83 2d       	mov	r24, r3
   2:	88 0f       	add	r24, r24
   4:	38 2e       	mov	r3, r24
    ++xxx;
   6:	8f 5f       	subi	r24, 0xFF	; 255
   8:	38 2e       	mov	r3, r24
}
   a:	08 95       	ret

instead of

  add r3,r3  ;; shift
  inc r3

 

Initially I though that this might be because of the 2nd-class status of low-numbered registers, but I get about the same thing if I use r16.  Or if I use slightly different source code or different optimization flags.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This seems to be perfectly appropriate. If you want to crash your machine, use NAKED.

If you know what you are doing, you would either put the ISR() directly in the vector table or use asm() sequence in the first place.

It is not something that I have ever considered doing. It always seems wiser to improve the algorithm rather than cripple the Compiler.
Having said that, it might be interesting to see what Codevision does in this situation. It is less efficient with local variables but may win with a global register.

Oh, and surely any arithmetic operation is going to affect SREG. Hence you need to preserve it in your ISR().

David.

Last Edited: Thu. Sep 3, 2015 - 06:53 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

(What I "need to preserve" is orthogonal to the question.  You get the same code using register globals in main() and elsewhere.  I just used an ISR to better illustrate a possible place where you might want to use such globally defined registers...)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The whole point of assigning a register (which involves using asm() itself ;) is because at the time critical point (the ISR) you are going to implement the access yourself in asm() (and probably "naked"). 

 

There's no point assigning a variable to a register and then just making accesses in C, the compiler's code generator is almost bound to use r24 to work on stuff. So just wrap the add and Inc that you want in asm(). 

Last Edited: Thu. Sep 3, 2015 - 08:00 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
... I just used an ISR to better illustrate a possible place where you might want to use such globally defined registers...)
Could you post a .s file output by the avr-gcc, or send it to me as a PM? If it has lots of such patterns, even better. I am finishing up a program that could shrink the code.

- John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jfiresto wrote:

 

westfw wrote:

... I just used an ISR to better illustrate a possible place where you might want to use such globally defined registers...)

Could you post a .s file output by avr-gcc, or send it to me as a PM? If it has lots of such patterns, even better. I am finishing up a program that could shrink the code.

 

- John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Another issue is that, though they would be meaningful,

gcc does not believe in volatile register global variables.

Your code should have generated a warning hinting at that.

 

Also, according to documentation, naked is not guaranteed to work with anything other than in-line assembly.

The current code trashes a register that it should not trash.

You might get rid of the naked and check whether the result is any better.

avr-gcc often does awful things with ISR boilerplate.

Your only recourse might be assembly.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here is the code generated by avr-gcc 3.4.6:

 

% avr-gcc -Os -mmcu=atmega329 -S foo.c
foo.c:1: warning: volatile register variables don't work as you might wish
% cat foo.s
	.file	"foo.c"
	.arch atmega329
__SREG__ = 0x3f
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__tmp_reg__ = 0
__zero_reg__ = 1
	.global __do_copy_data
	.global __do_clear_bss
	.text
.global	ISR
	.type	ISR, @function
ISR:
/* prologue: frame size=0 */
/* prologue end (size=0) */
	lsl r3
	inc r3
/* epilogue: frame size=0 */
	ret
/* epilogue end (size=1) */
/* function ISR size 3 (2) */
	.size	ISR, .-ISR
/* File "foo.c": code    3 = 0x0003 (   2), prologues   0, epilogues   1 */

It will not compile for an atmega328p, however.sad

- John

Last Edited: Thu. Sep 3, 2015 - 07:27 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just for fun, try different optimization than Os.

 

For GCC gurus:  In any real app, can one blindly allocate R3 to a variable and expect any library code used to leave it alone? [not as I remember]

 

Avr-gcc supports putting global variables in registers, right?  As in:

...

But this ends up producing code that is much worse than I would expect:

;)  Well, it was >>you<< that chose to use the infinite-value toolchain. ;)

 

#include <io.h>
register unsigned char xxx;

void main (void)
{
#asm("sei");
while (1) {}

}
// Timer 0 overflow interrupt service routine
interrupt [TIM0_OVF] void timer0_ovf_isr(void)
{
xxx <<= 1;
xxx++;
}

                 ;// Timer 0 overflow interrupt service routine
                 ;interrupt [TIM0_OVF] void timer0_ovf_isr(void)
                 ;0000 000C {
                 _timer0_ovf_isr:
                 ;.FSTART _timer0_ovf_isr
000040 93ea      	ST   -Y,R30
000041 b7ef      	IN   R30,SREG
                 ;0000 000D xxx <<= 1;
000042 0c22      	LSL  R2
                 ;0000 000E xxx++;
000043 9423      	INC  R2
                 ;0000 000F }
000044 bfef      	OUT  SREG,R30
000045 91e9      	LD   R30,Y+
000046 9518      	RETI
                 ;.FEND

Now, still not what you want.  BUT--in any real app, only instructions that don't affect SREG can be used without save/restore.  Are there any instructions to do x2+1 without affecting SREG?  Not that I can think of.

 

Set a bit?  Clear a bit?  Sure, in CodeVision:

                 ;// Timer 0 overflow interrupt service routine
                 ;interrupt [TIM0_OVF] void timer0_ovf_isr(void)
                 ;0000 000C {
                 _timer0_ovf_isr:
                 ;.FSTART _timer0_ovf_isr
                 ;0000 000D 
                 ;0000 000E PORTD |= 0x04;
000040 9a5a      	SBI  0xB,2
                 ;0000 000F GPIOR0 |= 0x08;
000041 9af3      	SBI  0x1E,3
                 ;0000 0010 GPIOR0 &= ~(1<<0);
000042 98f0      	CBI  0x1E,0
                 ;0000 0011 }
000043 9518      	RETI
                 ;.FEND
                 ;

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I had not got around to trying it for myself.    But Lee has shown how "skinny" an ISR() can be !!!   Even if written completely in C.

 

Incidentally,   you seldom want to do arithmetic in ISR()s.    Often it is just a question  of starting or stopping a peripheral or setting / clearing a flag.    If it is in SBI/CBI range,   you have a two word ISR().

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

(What I "need to preserve" is orthogonal to the question.  You get the same code using register globals in main() and elsewhere.  I just used an ISR to better illustrate a possible place where you might want to use such globally defined registers...)

 

Fair enough.  But what about:

For GCC gurus:  In any real app, can one blindly allocate R3 to a variable and expect any library code used to leave it alone? [not as I remember]

 

 David:

Having said that, it might be interesting to see what Codevision does in this situation. It is less efficient with local variables but may win with a global register.

Now, I'll jump in and do some CV crowing in situations such as skinny ISR generation where I know it does a very good job.  But in some/many instances, the "classic" optimization techniques are better with GCC.  that said, I tend to write striaghtforward and fairly simple (as C goes) code, and I end up pretty much with WYSIWYG.

 

I don't exactly know what David meant.  In CV, one can designate R2-R14 to use for global register variables and/or "bit" variables.  Library code doesn't touch them without preservation.  Quite liberal to have half the machine for your own private use, even if it is the somewhat crippled low registers.

 

In addition, R16-R21 (or is it R22) are used in functions for registerable variables.  Thus my apps will have 16-bit "work" (R16-R17) and 8-bit "scratch" (R18) and I use the heck out of them.  But my flat coding style and heavy use of globals is heretical and offends many sensibilities.

 

David, did you mean a variable local to the ISR?  Or what?  In a non-ISR context won't I get:

#include <io.h>
register unsigned char xxx;

void main (void)
{
unsigned char scratch;

    xxx <<= 1;
    xxx++;
    
    scratch <<= 1;
    scratch++;
while (1) {}

}

 

                 	.CSEG
                 _main:
                 ;.FSTART _main
                 ;0000 0006 unsigned char scratch;
                 ;0000 0007 
                 ;0000 0008     xxx <<= 1;
                 ;	scratch -> R16
00003d 0c22      	LSL  R2
                 ;0000 0009     xxx++;
00003e 9423      	INC  R2
                 ;0000 000A 
                 ;0000 000B     scratch <<= 1;
00003f 0f00      	LSL  R16
                 ;0000 000C     scratch++;
000040 5f0f      	SUBI R16,-1
                 ;0000 000D while (1) {}
                 _0x3:
000041 cfff      	RJMP _0x3
                 ;0000 000E 
                 ;0000 000F }
                 _0x6:
000042 cfff      	RJMP _0x6
                 ;.FEND

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
                 ;.FSTART _timer0_ovf_isr
000040 93ea      	ST   -Y,R30
000041 b7ef      	IN   R30,SREG
                 ;0000 000D xxx <<= 1;
000042 0c22      	LSL  R2
                 ;0000 000E xxx++;
000043 9423      	INC  R2
                 ;0000 000F }
000044 bfef      	OUT  SREG,R30
000045 91e9      	LD   R30,Y+
000046 9518      	RETI
                 ;.FEND

Now, still not what you want. 

 

No, that's exactly what I want.   I'M SORRY I USED NAKED IN MY EXAMPLE!   My main reason was to cut down on the length of the posted object code.

This is not an application question, this is a compiler question.  gcc in general seems to make poor use of the low registers; this is a particularly blatant example.

(it would be wonderful if someone noticed "hey!  We forgot to set this new flag in the code generator that marks the low registers as registers!"  And it would be acceptable if someone said "we decided not to treat the low registers as registers because it's a PITA to handle the special cases.")  (or maybe: "since low registers are callee-saved, their "cost to use" was set higher than other registers.  Maybe too high.  And a global register variable doesn't decrement the cost for that register. (and it shouldn't, because then generic code might use it!)")

 

 

the compiler's code generator is almost bound to use r24 to work on stuff.

Apparently.  but WHY?  I mean, I could understand if gcc had originated as a 6502 compiler with accumulator-based architecture assumptions, but the 68k and x86 (original gcc targets) both had (relatively) general purpose registers.   In this case, it seems the compiler barely recognizes that it's dealing with a register at all.

 

Here's a revised example:

register unsigned char xxx asm("r3");
void foo() {
    xxx <<= 1;
    ++xxx;
}

With gcc 4.8, it produces the following for ALL optimization settings beyond O0:

00000000 <foo>:
   0:	83 2d       	mov	r24, r3
   2:	88 0f       	add	r24, r24
   4:	8f 5f       	subi	r24, 0xFF	; 255
   6:	38 2e       	mov	r3, r24
   8:	08 95       	ret

gcc 4.3.3 produces something slightly different (in an interesting way) (but not better.)

   0:	83 2d       	mov	r24, r3
   2:	88 0f       	add	r24, r24
   4:	38 2e       	mov	r3, r24
   6:	33 94       	inc	r3
   8:	08 95       	ret

.s file pm'ed to jfiresto...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am not too sure what I was meaning!

 

I had not tried CV with "my" global register.    You have since shown that it uses R2 very well.   Most importantly,  it knew when to preserve SREG and when not to.

 

GCC is better with stack local variables and expressions in general.    Mind you,   I am sure there are different situations when the 'other' Compiler wins.

 

I far prefer to write straightforward C and let the Compiler make the best of it.   If your application is so sensitive,   it is time to move to another CPU.    (or hand-optimise some ASM)

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If the register variable is in the call-saved list (i.e. r2-r17 & r28/29), then library code is obliged to preserve it. The issue arises when the register variable is used (read or written) in another thread i.e. an ISR. You could arrange to disable interrupts while that library code is running, but that may mean a performance hit, and is hard to maintain since it requires a knowledge of the library's implementation.
The general rule is don't use them in ISRs, which is a bummer since that is precisely where you are most likely to want them.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gcc in general seems to make poor use of the low registers;

Blame Atmel and their choice of 5 bits in an LDI that limits it to R16..R31. That will always make the upper 16 registers "better" than the lower 16 registers and it seems the designers of the AVR port of GCC recognized this and favour R16..R31 for doing any "real" work. While it may spill into the lower registers when really pushed the code generator clearly favours the upper 16 whenever it can.

 

Oh and as to Lee's point above. You are right the AVR-LibC code (which unlike Codevision comes pre-compiled) has not been built with any reserved registers in mind so as soon as you start mixing avr-gcc register reservations and library code you are asking for trouble.

 

GCC (AVR) has the command line option -ffixed-n so you can build code to avoid using R7 (say) with -ffixed-7 and the compiler will avoid using that register in any code it generates but to be able to use library code "safely" you'd have to pull the source for printf() or strcpy() or whatever and ensure that was also build -ffixed-7 to avoid using it to.

 

A rather nasty "hack" is to link with the library functions you want to use the simply grep the disassembly for references to "r7" and if you dont see it you are probably safe. But that is only for this week's compiler/library - if you upgrade the compiler and rebuild you might find (or worse, not find!) that the new code generator/prebuit library now started to use R7.

 

So, if code is going to use "uint8_t myvar asm("r7")" then make the time-critical part that needs fast access to r7 in naked asm() (or, perhaps better, .S source) and don't call any library functions unless you are positive R7 is not used.