bad code generation and ISR optimization

Go To Last Post
4 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've been looking into the code generated, as I found it slow. I found this rather alarming post from october:

 

  https://community.atmel.com/forum...

 

No progress has been made since and noone has been assigned. A very bad situation. This is the thread that needs your attention:

 

  https://gcc.gnu.org/bugzilla/sho...

 

 

In my case I have an ISR and I need to get things done in the first few instructions. When I enable optimization -O2 or bigger the ISR gets a header like this:

 

00000C1C   push	{r4, r5, r6, r7, lr}
00000C1E   mov	r7, r11
00000C20   mov	r6, r10
00000C22   mov	r5, r9
00000C24   mov	r4, r8
00000C26   push	{r4, r5, r6, r7}	

 

Up from -O1:

 

00000C88   push	{r4, r5, r6, r7, lr}		 

Which btw could have been done much later in the routine, to great benefit in my case.

 

The other problem with which you might be able to help is:

 

    if (sync_convert_im > 0)
00000CC8   ldr	r3, [pc, #768]
00000CCA   ldr	r3, [r3, #4]
// rest is equal
00000CCC   cmp	r3, #0
00000CCE   ble	#8		

vs the bloated:

   if (sync.convert_im > 0)
00000CC8   ldr	r3, [pc, #772]
00000CCA   adds	r3, #248
00000CCC   ldrb	r3, [r3]
00000CCE   sxtb	r3, r3
// rest is equal
00000CD0   cmp	r3, #0
00000CD2   ble	#8		 

 

Note that:

- the use of structs worsen the code compared to the use of globals!

- the use of bytes gets penalized.

 

So for faster code (at least at level -O1) use 32b vars, which results in:

    if (sync.convert_im > 0)
00000CC8   movs	r3, #248
00000CCA   ldr	r2, [pc, #768]
00000CCC   ldr	r3, [r2, r3]
// rest is equal
00000CCE   cmp	r3, #0
00000CD0   ble	#8		 

 

Still worse than globals. So what am I to do? Is there a way to maintain the structs, but get better code?

 

Last Edited: Wed. Feb 10, 2016 - 01:42 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

If you have special requirements, then hand coded asm is probably your solution.
You could also try the MBED compiler or IAR to see how they do.
Otherwise, fix the gcc compiler or get a M3/4/7

It is well known that bytes impact heavily on the code. This is the downside of a 32bit cpu. There's also the issue of packed structs to save memory but have a code size and performance impact or unpacked to shrink the code, go faster but chew up memory.
The size of the struct can have an impact. Normally you'd do a literal load for the base address then an indirect read or write with offset. I'm not sure of the range of offset is allowed.

Last Edited: Thu. Feb 11, 2016 - 09:32 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks Kartman. And thanks for calling attention to the size of the struct. I'll put any arrays at the end. Then I'll fix GCC ;) Anyone using LLVM?

 

I found the Keil MDK Extension which would have solved my problems:

 

   https://gallery.atmel.com/Produc...

 

Alas, no luck, it's not available for AS7. Did they have to break it? ;)

 

Yet another really big motive (besides seeing my data live) to move to Keil I guess. I guess I'll have to take the plunge, but I'd prefer to terminate the project first.

 

As these project use so few advanced features, I bet code compiled for the M3 would work :(  Maybe there is some compiler flag I could try.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I doubt code for the M3 will work in the M0 - the compiler will generate code for M3 specific operations. Only way way to be sure - try it. I'd actually look at the generated asm first.