GCC (non)usage of r0

Go To Last Post
43 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm a bit puzzled by the fact that GCC religiously saves R0 in IRQ prologues (and restores it in epilogues), but basically doesn't seem to use that register in any generated code. (Some of the avr-libc ASM inlines use it though, e.g. for reading an EEPROM byte.)

What's up with that? For IRQ handlers that means a constant needless overhead of two instructions ... and often some additional overhead to save a scratch register (say, r24) that's not actually needed (since r0 is available). Since I'm right now squishing code into an ATtiny24, saving two or four instructions per IRQ handler seems very attractive. I'd kind of expect that it would be both possible and desirable to just tell GCC to treat r0 like any other call-clobbered (?) register, and that would ease register pressure (in some cases) and shrink code. That is, possible for someone who understands the deep magicks associated with those invocations to the dark gods of GCC internals ... not me!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

r0 is used as a scratch register.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sure, I've seen docs saying it's used as a scratch register.

But that seems to be just docs, and I've not seen any code generated which actually uses it that way. That's my point: *all* registers are "scratch registers", modulo the need to preserve some across function calls. But it's r0 that seems to be ignored, other than that irrelevant logic to save and restore it in IRQ handlers.

Hence my question: what's up with that non-use? More specifically, is there any reason to think it's not just a bug? (Of the "stupid codegen" flavor, not the "incorrect codegen" type.)

FWIW: right now I'm looking at what gcc 4.3.0 generates. It's got a few patches applied from bugzilla, but only the one for bugid 32871 ought to have the potential to affect this.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

From a larger project of mine:

% avr-objdump -d firmware.elf | grep r0 | wc -l
    624

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And "gcc -S" for mine shows the exact wasteful prologue I described:

        .text
.global __vector_13
        .type   __vector_13, @function
__vector_13:
        push __zero_reg__
        push r0
        in r0,__SREG__
        push r0
        clr __zero_reg__
        push r18
        push r19
        push r24
        push r25

Where R0 is not used at all in the body of the function. Or for that matter, anywhere else in this firmware except when calling __eeprom_read_byte_1C1D1E and inside memcpy_P() ... both cases come from hand coded assembly language.

In this and similar routines, R0 could replace another register. Like r24 or r25 here.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Alas, the ISR prologue is still somwhat hard-coded, and partially
not dependend of the generated code. GCC simply does not track its
usage of __temp_reg__ internally, so it doesn't know whether it
really will be used or not.

You're welcome to enhance that. ;-)

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, the ISR prologue is just one of the issues. Presumably that's at least straightforward to change, although adding R0 to the set of registers that are tracked would be more complicated. I'd be more likely to change a different part of the ISR prolog: the needless clearing of R1 a.k.a. __zero_reg__ ... since nothing ever sets R1 to anything else, so pushing/popping and clearing it could at the least become an AVR8-specific compiler flag: -freadonly-r1 or something could save three words in every ISR (handy with small flash configs), without needing to restructure R0 handling. Although that would imply some other register would be needed for saving SREG. Like R0. ;)

The other issue is the strong reluctance to actually *use* R0 ... that might be partly explained by the availability in this case of a known alternate choice (say, R24 in this case), but that leaves the problem of why it even bothered to allocate r24 when R0 was already available. It's as if it didn't know that R0 was already allocated...

I'm more than a bit reluctant to dive into GCC internals, but would like to understand more about of this issue since it seems like such an obvious case of code pessimization. Is the case perhaps that removing all knowledge of a "__temp_reg__" and just classifying R0 correctly (as "call-clobbered"?) would free the code generation to use that whenever it needs a temporary? Or at least "should free", absent bugs.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here's the problem: How does GCC know when the ISR is going to be called? How can GCC find out what should pushed and popped?

And if you think that R1 is never set to anything else, then please look at the AVR Instruction Set datasheet and the MUL, MULS, and MULSU instructions.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
How does GCC know when the ISR is going to be called?
That's a rhetorical question. It doesn't. It's responsible for generating code that can safely be interrupted between any two instructions. (Whether the C source is correct under that assumption is a different issue.) In the same way, AVR8 runtime libraries must also allow interrupts at any point. Code that's not IRQ-safe must defend itself against IRQs (e.g. by temporarily disabling them to walk data structures modified by IRQ handlers).
Quote:
How can GCC find out what should pushed and popped?
Also rhetorical. Registers that a function uses must be saved and restored, unless they caller knows they may be clobbered. By definition, IRQ handlers may not clobber anything; they may be "called" at any instant.

The R0 issues I noted are that the generated code isn't actually using R0 ... and, separately, that it's needlessly saved/restored. Even though it's always available for use. It's never used to hold intermediate values, or local variables. The IRQ handlers were just one place this pessimization was just blazingly obvious, since there was a stretch of code (with no subroutine calls) that saved and restored registers that it never even used -- and which allocated a register for something that R0 could have handled.

Re the potential R1 IRQ handler improvement, I noticed that a "readonly-r1" option wasn't compatible with use of MUL. OK, trivial to deal with ... and it's not like the processors with small amounts of flash even support that instruction. There's a similar issue with __udivmodsi4() in libgcc.S: it mangles r1 too. (A non-issue in my case, which doesn't use that. I've been well trained to avoid division in systems code!) So that particular tweak would need some attention to ensure it's not applied when those math operations are in use ... with __udivmodsi4() the only thing seeming to make that improvement tricky.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mojojojo wrote:
Even though [r0 is] always available for use.
The point that you're missing is that r0 is not available for indiscriminate use in an ISR. If it were, it couldn't be used reliably anywhere else.

Consider this code excerpt (not in an ISR):

lds r0, var1
or  r0, r24

Clearly, this code would not work correctly if an interrupt occurrs just after the first instruction and the ISR modifies r0.

Don Kinzer
ZBasic Microcontrollers
http://www.zbasic.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think that mojojojo wants avr-gcc to behave the way that he would like, regardless of the rest of the world.

He is quite free to write his own ISRs in assembly anyway. And of course if he is not using R0 inside his ISR, there is no need to stack it.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dkinzer wrote:
The point that you're missing is that r0 is not available for indiscriminate use in an ISR. If it were, it couldn't be used reliably anywhere else.
Go back and read what I actually said. The point is that since it's saved and restored, it *can* be used; that's why the save/restore idiom exists. But GCC is being stupid here, and isn't even trying. So either it shouldn't bother saving registers it doesn't use ... or it should learn somehow that R0 is fully available for use (since it's being saved and restored).
david.prentice wrote:
I think that mojojojo wants avr-gcc to behave the way that he would like, regardless of the rest of the world.
Ditto: go back and read what I actually said. I'm pointing out some cases of poor code generation, nothing more and nothing less.

Folks, if you can't understand that saving and restoring a register that's not even referenced by that IRQ handler is just poor code generation ... you need to go back to school for a while. Likewise if you can't understand that functions allocating a new register (like R24) when they already have an unused register in hand (R0) is bad code generation.

This isn't The End Of The World, but it's just something rather puzzling. GCC could very obviously do better than this, since it does so in most other places.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The joy of GCC is that if there's some behaviour you don't like YOU can fix it ;)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

R0 has been intentionally excluded from the set of registers that need to be saved and restored as a part of explicit function calls/returns. Hence, it is a scratch register - a function makes the assumption that it is free to modify R0 at any time without any need to save its previous contents, and that within any contiguous portion of its own body, without any intervening explicit function calls, the content of R0 will only change if the function explicitly changes it. The fact that GCC apparently sometimes fails to take fullest advantage of that scratch register space is unfortunate.

If an ISR detects that it is clobbering R0 inside its own body, it obviously must save R0 like it would any other register. If the ISR detects any function calls that are not inlined, then it cannot know for sure whether or not R0 will be touched by that funciton, therefore it must save R0. In any other case, of course, it is not necessary to save R0. The fact the GCC apparently fails to detect such situations is unfortunate.

The ISR, not knowing whether or not it is interrupting a snippet of code in which R1 is non-zero, must assume that R1 has an unknown initial value upon entering the interrupt. If the ISR's body itself makes use of R1 as a zero-register or for any other purpose, then it must save-clear-and-restore R1. If the ISR calls any functions that are non-inlined, then it cannot know for sure whether or not that function will be making use of R1 as a zero-register (or for any other purpose), therefore it must save-clear-and-restore R1. In any other circumstance, it isn't really necessary to save-clear-and-restore R1. The fact that GCC apparently fails to detect such situations is also unfortunate.

I think that's all mojojojo is saying. None of the changes mojojojo is proposing would necessarily have to be in contradiction to the established register-use conventions, if implemented correctly.

To which we respond, if it upsets you enough to want it fixed, feel free to submit a patch to the GCC maintainers. Speaking for myself personally, so far it hasn't hurt me enough to justify the time it would take to make the changes.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

lfmorrison wrote:
Hence, it is a scratch register - a function makes the assumption that it is free to modify R0 at any time without any need to save its previous contents ... The fact that GCC apparently sometimes fails to take fullest advantage of that scratch register space is unfortunate.
OMG ... someone understands my point! Stop teh presses!

Although more accurately: GCC doesn't *ever* use R0 in the code I'm looking at. And it thus wastes code space. Code space is increasingly scarce on this 2KB part that I'm using, so I'm scanning for obvious wastage ... but when GCC is the source of the wastage, it's harder to fix.

lfmorrison wrote:
To which we respond, if it upsets you enough to want it fixed, feel free to submit a patch to the GCC maintainers. Speaking for myself personally, so far it hasn't hurt me enough to justify the time it would take to make the changes.
I certainly understand that ramification of the GPL. I've submitted plenty of patches, although I can't recall any specifically for GCC. :)

Of course, the reason I asked my question is to get more information about the technical underpinnings of that pessimization. I haven't actually gotten any such information though, which makes it less likely I'd try to come up with fixes for these problems.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mojojojo wrote:
lfmorrison wrote:
Hence, it is a scratch register - a function makes the assumption that it is free to modify R0 at any time without any need to save its previous contents ... The fact that GCC apparently sometimes fails to take fullest advantage of that scratch register space is unfortunate.
OMG ... someone understands my point! Stop teh presses!

Although more accurately: GCC doesn't *ever* use R0 in the code I'm looking at. And it thus wastes code space.

gcc needs to deal with code you don't look at.
Even if gcc never used R0, there would still be the possibility that someone's assembly routine would use it.
Quote:
Code space is increasingly scarce on this 2KB part that I'm using, so I'm scanning for obvious wastage ... but when GCC is the source of the wastage, it's harder to fix.
The problem acknowledged is not specific to R0.
In ISRs, gcc is very pessimistic when it comes to saving registers.
If you use any registers at all, it will save a whole slew of registers.
If, from an ISR, you call a function in the same file,
gcc will not use the available register information.
If, from an ISR, you call another ISR,
gcc will save registers as though it were an ordinary function.

Even without modifying the compiler, there is a fairly simple workaround.
Use the compiler.
Edit the resulting assembly.
Use the assembler.

International Theophysical Year seems to have been forgotten..
Anyone remember the song Jukebox Band?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Code space is increasingly scarce on this 2KB part that I'm using, so I'm scanning for obvious wastage ... but when GCC is the source of the wastage, it's harder to fix.

Then it might be that you are trying to cram too much functionality to your 2KB part with C code - I would not expect nothing fancy with that combination.

So you should either consider a bigger part or assembler. Or if you think GCC is the problem, try IAR compiler which offers a 4KB limited memory version (KickStart) so you can fill your 2KB part with that. See if it produces more compact code for your application.

- Jani

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mojojojo wrote:
lfmorrison wrote:
Hence, it is a scratch register - a function makes the assumption that it is free to modify R0 at any time without any need to save its previous contents ... The fact that GCC apparently sometimes fails to take fullest advantage of that scratch register space is unfortunate.
OMG ... someone understands my point! Stop teh presses!

A "normal" function is free to make that assumption. That is because the caller will preserve it, if it has used it.

An ISR on the other hand, is NOT an normal function, and it can make no such assumption. An ISR is not "called", so the main code, has no chance to save any scratch registers before the ISR is executed. As such the ISR must save any registers it modifies.

Now the fact that nothing in the ISR modifies R0 makes it seem odd that it is being saved, however, some instructions will use R0, as such GCC is probably taking the defensive position of always saving it. For example the code might make use of the "mul" instruction, which places the result in R0:R1, but the code only ever looks at the high byte in R1. As a result R0 appears to be not used, but it has in fact been modifed.

Note that if you call another function, from within an ISR, GCC must save ALL scratch registers, as it has no idea which ones the normal function might touch. If the function is declared in the same module as the ISR, GCC does have some idea as to what registers are used, and it could optimize. (better to declare the function as "static inline", and save the call overhead altogether) however the same does not hold true for functions that are declared in another compilation unit.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

> to get more information about the technical underpinnings of that pessimization.

avr-gcc-list at nongnu.org is much more likely a source of that knowledge than
here. Noone of the long-time AVR-GCC hackers is reading here.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

glitch wrote:
mojojojo wrote:
lfmorrison wrote:
Hence, it is a scratch register - a function makes the assumption that it is free to modify R0 at any time without any need to save its previous contents ... The fact that GCC apparently sometimes fails to take fullest advantage of that scratch register space is unfortunate.
OMG ... someone understands my point! Stop teh presses!

A "normal" function is free to make that assumption. That is because the caller will preserve it, if it has used it.

An ISR on the other hand, is NOT an normal function, and it can make no such assumption. An ISR is not "called", so the main code, has no chance to save any scratch registers before the ISR is executed. As such the ISR must save any registers it modifies.

Now the fact that nothing in the ISR modifies R0 makes it seem odd that it is being saved, however, some instructions will use R0, as such GCC is probably taking the defensive position of always saving it. For example the code might make use of the "mul" instruction, which places the result in R0:R1, but the code only ever looks at the high byte in R1. As a result R0 appears to be not used, but it has in fact been modifed.

Note that if you call another function, from within an ISR, GCC must save ALL scratch registers, as it has no idea which ones the normal function might touch. If the function is declared in the same module as the ISR, GCC does have some idea as to what registers are used, and it could optimize. (better to declare the function as "static inline", and save the call overhead altogether) however the same does not hold true for functions that are declared in another compilation unit.


I'm fairly sure the OP is already aware of these facts. I think he's asking why the ISR prologue cannot be made more intelligent to deal with these issues.

Something like this would satisfy him I think:

1) Does the ISR itself contain any op-codes that directly or indirectly affect the state of R0?  NO: go to 2.  YES: go to 5.

2) Does any function inlined within the ISR contain any op-codes that directly or indirectly affect R0?  NO: go to 3.  YES: go to 5.

3) Does the ISR call any other (non-inline) function?  NO: go to 4.  YES: go to 5.

4) It is guaranteed that R0 is safe to ignore.  Don't bother saving/restoring R0.  DONE

5) It is impossible given the current information to determine whether R0 might be trashed.  To be on the safe side, save/restore it.  DONE

A similar decision tree could be constructed for deciding whether or not it is necessary to save/restore R1.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The main problem was, that on the beginning of the AVR-GCC development nobody watched on the special meaning of r0 and r1.
Thus r0 was decided to save the SREG and r1 as zero value.

But unfortunately older AVRs need r0 for LPM and ATMega need r0,r1 for MUL instructions.
Thus the 3 PUSH/POP instructions are needed to avoid conflicts.

The problem can be solved, if it was changed to r2 as zero reg and r3 to save SREG.
Then r2 must be never touched and r3 must only be pushed once (on nested interrupts only).
But it seems nobody want to do so.

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is true that I have never seen GCC making use of r0 when compiling C statements. But there is a single situation where GCC uses r0; reserving stack space in function prologues:

in r28,__SP_L__
in r29,__SP_H__
sbiw r28,36
in __tmp_reg__,__SREG__
cli
out __SP_H__,r29
out __SREG__,__tmp_reg__
out __SP_L__,r28

This doesn't detract from the points made by the OP, I just wanted to mention that r0 is at least used for something.

-Brad

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In my large project:

grep "r0" TestApp.lss | cut -d' ' -f13- | sort

Yields:

	dec	r0
	in	r0, 0x29	; 41
	in	r0, 0x29	; 41
	in	r0, 0x3f	; 63
	in	r0, 0x3f	; 63
	in	r0, 0x3f	; 63
	in	r0, 0x3f	; 63
	in	r0, 0x3f	; 63
	in	r0, 0x3f	; 63
	in	r0, 0x3f	; 63
	in	r0, 0x3f	; 63
	in	r0, 0x3f	; 63
	in	r0, 0x3f	; 63
	in	r0, 0x3f	; 63
	ld	r0, Z+
	ld	r0, Z+
	ld	r0, Z+
	ld	r0, Z+
	mov	r0, r18
	out	0x3f, r0	; 63
	out	0x3f, r0	; 63
	out	0x3f, r0	; 63
	out	0x3f, r0	; 63
	out	0x3f, r0	; 63
	out	0x3f, r0	; 63
	out	0x3f, r0	; 63
	out	0x3f, r0	; 63
	out	0x3f, r0	; 63
	out	0x3f, r0	; 63
	out	0x3f, r0	; 63
	pop	r0
	pop	r0
	push	r0
	push	r0
	sbrs	r0, 0
	sbrs	r0, 0
	st	X+, r0
	st	X+, r0
	st	X+, r0
	st	X+, r0
 	adc	r26, r0
 	and	r0, r0
 	cp	r0, r22
 	elpm	r0, Z+
 	in	r0, 0x3f	; 63
 	in	r0, 0x3f	; 63
 	in	r0, 0x3f	; 63
 	in	r0, 0x3f	; 63
 	ld	r0, Z+
 	lpm	r0, Z+
 	lsr	r0
 	mov	r0, r26
 	mov	r30, r0
 	movw	r24, r0
 	out	0x3f, r0	; 63
 	out	0x3f, r0	; 63
 	out	0x3f, r0	; 63
 	out	0x3f, r0	; 63
 	pop	r0
 	pop	r0
 	pop	r0
 	pop	r0
 	pop	r0
 	pop	r0
 	pop	r0
 	pop	r0
 	pop	r0
 	pop	r0
 	pop	r0
 	pop	r0
 	pop	r0
 	pop	r0
 	push	r0
 	push	r0
 	st	X+, r0
 	sub	r19, r0

While some of those uses are from inline assembly, i'd still wager that GCC is using it for *something*.

- Dean :twisted:

Make Atmel Studio better with my free extensions. Open source and feedback welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Add "| grep -iv "3f" to that to remove all the __SREG__ saves and stores, and its a shorter list.

Also seems odd that you have 16 "pop r0" and 4 "push r0". I guess something else could be getting popped into r0 temporarily, but then there should be some clears. Or maybe the name __tmp_reg__ is used for some pushes?

-Brad

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dl8dtl wrote:

avr-gcc-list at nongnu.org is much more likely a source of that knowledge than
here. Noone of the long-time AVR-GCC hackers is reading here.

Except, of course, Joerg and I ...

But, yes, avr-gcc-list is a much better place for this discussion.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

danni wrote:

> But it seems nobody want to do so.

It changes the ABI, so any and all compiled code will be rendered
binary incompatible, without any chance that the linker might complain
about it. Trying to link "old" and "new" object files together will
silently end up in disaster. That's why nobody is taking that lightly.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

Last Edited: Wed. Apr 23, 2008 - 08:12 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Eric wrote:

[avr-gcc-list]
> Except, of course, Joerg and I ...

Well, I don't feel like a GCC hacker, really. I don't have the slightest
grasp about GCC RTL and such, tried to understand, but generally failed.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is it possible to add a compiler switch to WinAVR?
The default setting would be to use r0 and r1 - no compatibility problems.
When setting the switch, r2 and r3 would be used instead of r0, r1 - tight ISR but in some cases incompatible to old code.

Don't know the neccessary effort to implement a switch like that but I guess it isn't impossible.

Regards
Sebastian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kinda hard to do, albeit not impossible.

Note that you couldn't use the standard library then.

If you ask me: this might be part of a well-planned ABI change if we
really want that, but it should be ensured that the binary incompatible
object files will at least trigger a linker warning.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So it is unfortunately not only changing __tmp_reg__ and __zero_reg__ definition and changing the ISR prologue/epilogue.
Nice idea of an innocent WinAVR user... bubble bursted. Bummer!

Regards
Sebastian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am sure that mojojojo would like to eliminate his register pushes.

Without altering any switches or compiler re-builds, he can just create some dependencies so that make will:

file.o: crunched.s
crunched.s: file.c

gcc -S file.c
sed -f some_script.sed crunched.s
gcc -o file.o -c crunched.s

He can edit his C files in the normal way. He can either make a crunching rule, or just have a specific dependency for his named files.

I have done similar things to pre-process assembly files or post-process non-standard error files. Once set up, make looks after everything.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

After looking at the gcc-4.3.0/gcc/config/avr files, I think the best fix probably has to be along the lines of teaching the GCC internal representation (RTL) to treat __temp_reg__ (R0) differently. Somewhere, it's excluded from normal allocation policy; and various RTL patterns (sigh) know that, and use it internally ... preventing more general use of R0. The calling conventions have long been clear that GCC can use R0 for any state that doesn't persist across function calls, so that change can't break anything that's not already broken. (And it'd shrink some of those ISRs, since the compiler would be able to use R0 instead of R24.)

Don't over-emphasize the IRQ handler scenarios. Those are informative because they happen to produce code sequences that very obviously demonstrate this particular poor behavior (because of the needless save/restore). They aren't the only cases where R0 handling is flakey though, and not all ISRs can avoid pushing R0 ... they may call some other function (which is allowed to clobber R0), unlike the ones I was looking at.

Re the R1 thing, I'd have a patch for it if I could tell when __udivmodsi4() is used. Although that's likely to be more problematic as a general optimization, for numerous reasons. It could only really help IRQ handlers on systems where MUL instructions (and __udivmodsi4) are not used, so most modern non-Tiny AVRs couldn't use it by default.

@david.prentice: a postprocessor like that really needs to be part of the compiler itself, since analysing the object code to tell whether the R0 push/pop is *needed* is a bit complicated.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I suggested the post-processor because it means you could adapt it for your own purposes.

Yes. It would be your responsibility to remove instructions only for your own satisfaction. I would imagine you are thinking about one or two ISRs. All other code would be left to the compiler. So you just identify and process one or two named code blocks.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

IMHO, one shouldn't be focused on such a micro-optimization. Removing a push r0 and pop r0 only removes 2 instructions per ISR. Is code space *that* tight? You should either be using a larger chip, or focus on other parts of the AVR back-end to look for better optimization in other areas. Not get obsessed with just these two instructions.

What compiler switches are you using, and specifically optimization type switches?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Is code space *that* tight?
With GCC 4.2.2, yes it is -- very much so. With GCC 4.3.0, much less so ... if I prevent bad inlining decisions with "noinline" attributes, it takes three to five percent less space. Currently around 60 bytes free; though I confess I've yet to *run* with code from gcc 4.3.0 (it's on Linux, a handy edit/build environment), since I'm sticking to the latest WinAVR release for code I download and debug. No fancy compiler options, just "-Os" and other standards you'd see in WinAVR checkboxes.

I need that space to allow some longer string constants, and (most important) to handle a few more commands. Larger chip is not in the cards; the boards are already built. (Except that I can prototype on a tiny84 ... if I make sure the "real" code fits in a tiny24.)

Agreed about better optimization in other areas, but as I said: that was just too darn blatant; it's a register that's just not used at all! The shrink from GCC 4.3.0 is attractive, but that doesn't seem "ready" to use yet. Once it's ready, I can look at ways to let the optimizer stages chew on more code ... there seems to be interprocedural optimization going on, but it's missing some obvious things because of functions living in different files.

I've yet to look at how 4.3.0 handles them, but I noticed that 4.2.2 is stupid about global register variables. Rather than loading constants directly into them, it loads into a temp register then moves temp into the global register. Which takes more code space than just using GPIORx for such values. (I got a significant size shrink by moving some key variables into GPIORx registers ...) This code should be able to move several values into registers, since most of them are never used, and I'd think that would provide a bit more shrinkage.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Try these compiler flags:
-mrelax
-fno-inline-small-functions
-fno-tree-scev-cprop
-fno-split-wide-types

Try this, but it could make your code size larger depending on the size of your application
-mcall-prologues

You can also try doing whole program optimization. If so, then use the "Makefile.wpo" template with the 20080411 WinAVR release. It contains the proper flags, but avr-gcc has to be called differently than normal and that Makefile template handles this correctly. Note that whole program optimization does not always reduce code size, again this is application dependent.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There *is* no WinAVR 20080411, at least nothing downloadable ... ;)

Since this fits in 2KB, "-fwhole-program --combine" saves a bunch of bytes (52); I'm glad to know I can make GCC do that without forcing everything into a single file (by hand). The "-fno-inline-small-functions" saved another ten bytes on top of the "noinline" attributes I already had; a bit surprising. "-mrelax" segfaulted, and the other options did nothing (in terms of code size). Looking at the code, the only obvious wastage is with the USI "overflow" ISR saving registers that no code uses; I guess there are still some iffy assumptions being made.

That's probably enough space to finish cramming in what needs to be crammed. Thanks for the tip about whole-program, that's nice.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Instead of "-mrelax", do:
-Wl,--relax
for your *linker* flags.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mojojojo wrote:
There *is* no WinAVR 20080411, at least nothing downloadable ... ;)

Did you look here: https://sourceforge.net/project/... ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

EW wrote:
-Wl,--relax
No difference, on top of the whole-program optimization at least. But no coredumps either.

Hmm, it's back! It was removed for a while, due to (evidently) codegen bugs...

(edit) I noticed one issue with the -fwhole-program --combine *.c compile. All but two of the bytes saved came from removing an embedded copyright string, which should not be removed. And marking it as being __attribute__ ((externally_visible)) does not prevent that from being deleted; contrary to docs, a bug. So it's not really the help I thought it was; too bad.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Unfortunately, 20080411 is now removed for good due to code gen issue. I will re-release soon with offending patch removed.

Sorry about your experience with -combine -fwhole-program. It's not always perfect.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

R1 and R0 al all but hidden from compiler. They used in final instruction sequences. Each these assembler sequences is mapped to the internal RTL language.

The sequences might be only one instruction for byte operations or 4+ instruction for Long operations.

(that is why you sometime see two assembler instruction next to each other that look like they should have been optimised together)

R1 was used to help with the many situation where a zero is needed. Adding two chars togther that are Zero extented being a common one.
This avoids the need for compiler to use word registers or find a scratch register (which may have been more problematic in the early days)

Then there are other situation where a single byte temp is need to juggle some operations. For example, loading register pair such as R30 with contents of memory pointed to by R30. Shifts are another case.

Its true most of this could probably be done other ways and treat R0,R1 as general register. ABI could be maintained for R0 by allowing any function to modify it. R1 would be more of a problem - as assembler code no doubt assumes it is zero.

Overall code saving would be small. The best advantage would probably be for long or float operation, since r0..r4 could be used.

Changing gcc to do this would not be too difficult, but testing it would be. It only takes one problem with scratch, spill or reload operations and your busted back to square one. None of these cases are easily simulated.

More gain can be achieved in other ways - like using the registers more effectively.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A while ago there was quite a discussion on avr-gcc-list about changing the register allocation algorithm around: http://lists.gnu.org/archive/htm...

It should be noted that R0 and R1 were left alone to not disturb binary compatibility.

Math is cool.
jevinskie.com