ISR __zero_reg__ optimization

Go To Last Post
11 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So I have a nice ISR like so,

ISR(TIMER2_OVF_vect)
{
	uint8_t cycle = SSTART_CYCLE;
	cycle--;
	if (~cycle)
	{
		uint8_t tempduty = TEC_DUTY;
		tempduty++;
		uint8_t tempvTEC = vTEC;
		if (tempvTEC <= tempduty)
		{
			TEC_DUTY = tempvTEC;
			CLEARBIT(TEC_INTERRUPT);
		}
		else
		{
			TEC_DUTY = tempduty;
			SSTART_CYCLE = SSTART_FREQ;
		}
	}
	else
	{
		SSTART_CYCLE = cycle;
	}
}

which compiles to give (at -O3)

000000c6 <__vector_11>:
      c6:	1f 92       	push	r1
      c8:	0f 92       	push	r0
      ca:	0f b6       	in	r0, 0x3f	; 63
      cc:	0f 92       	push	r0
      ce:	11 24       	eor	r1, r1
      d0:	8f 93       	push	r24
      d2:	9f 93       	push	r25
      d4:	8a b5       	in	r24, 0x2a	; 42
      d6:	80 91 b3 00 	lds	r24, 0x00B3
      da:	8f 5f       	subi	r24, 0xFF	; 255
      dc:	90 91 37 03 	lds	r25, 0x0337
      e0:	89 17       	cp	r24, r25
      e2:	40 f0       	brcs	.+16     	; 0xf4 <__vector_11+0x2e>
      e4:	90 93 b3 00 	sts	0x00B3, r25
      e8:	80 91 70 00 	lds	r24, 0x0070
      ec:	8e 7f       	andi	r24, 0xFE	; 254
      ee:	80 93 70 00 	sts	0x0070, r24
      f2:	04 c0       	rjmp	.+8      	; 0xfc <__vector_11+0x36>
      f4:	80 93 b3 00 	sts	0x00B3, r24
      f8:	85 e0       	ldi	r24, 0x05	; 5
      fa:	8a bd       	out	0x2a, r24	; 42
      fc:	9f 91       	pop	r25
      fe:	8f 91       	pop	r24
     100:	0f 90       	pop	r0
     102:	0f be       	out	0x3f, r0	; 63
     104:	0f 90       	pop	r0
     106:	1f 90       	pop	r1
     108:	18 95       	reti

All well and good. Now correct me if I'm wrong but produced code doesn't actually use __zero_reg__ (r1) or trash it, so the push, eor and pop on it are useless. Are the requirements for it (__zero_reg__) so strong that GCC won't optimize them away or is it something that the great makers haven't told the compiler it can do?

Edward

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The ISR prologues and epilogues are simply hard-coded to always
provide __zero_reg__.

OTOH, omitting it will save you 5 out of some 45 (or so) CPU cycles.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Now what would be the best way to omit them? ISR_NAKED it and then how do I deal with those push an pops in C?

Edward

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

We need to check how this code compiles with GCC 4.3.0, which will be releasing soon (it currently has a release candidate). In 4.3.0, the prologues and epilogues were changed to produce RTL instead of fixed assembly, so it has a chance to be optimized. I'm not saying that it *will* be optimized... but it should tested to see if it will produce any better code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Now what would be the best way to omit them? ISR_NAKED it and then how do I deal with those push an pops in C?

asm volatile ("PUSH r24");

etc. perhaps?

Or if it's really precious to you lift the entire ISR and dump it into a .S file - in fact use the above as you starting point then hand optimize it.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you start doing your own pushing and popping, I think you would want to use asm 'only'. If you keep the isr in 'C', how will you know which registers the compiler decided to use? You will have to keep checking it after you compile. I doubt it would change unless you change your isr code, but you then still have to remember to keep checking it every time you change the isr (guess what happens when you forget).

I suspect there are bigger fish to fry than this that would get you more than a few saved bytes/cycles.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

EW: Great to know, will check it out.

clawson and curtvm; I was thinking using asm only but not knowing anything about coding in it how do I do that and still allow myself (or the compiler) to change the locations of variables without messing the ISR up?

Edward

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks to you guys I'm now learning assembler and trying to optimize this.
Going through the assembler I've just noticed;

      d4:   8a b5          in   r24, 0x2a   ; 42
      d6:   80 91 b3 00    lds   r24, 0x00B3 

(snippet taken from the above ISR).
Now it seems to me that the "in r24, 0x2a" really does nothing as r24 immediately gets overwritten (there are no loops back to d6). The location 0x2a is a GPIO register aliased to SSTART_CYCLE so the code is pretty much broken as I read it. Just want some confirmation before I bug report it.

Edward

Last Edited: Sat. Mar 1, 2008 - 09:54 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I could be wrong, but I think your problem may be this line:

if (~cycle)

Bitwise operators promote the arguments to 16 bit so ~cycle will always produce a true result. The compiler optimizes this away and this is why when you look at the assembler output, the value is not actually referenced.

Perhaps you should be doing this instead:

if (!cycle)
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Doh! Yeah that fixed it. Had bitwise operators on the brain at the time apparently.

Edward