| Author |
Message |
|
|
Posted: Oct 15, 2007 - 01:35 PM |
|

Joined: Oct 15, 2007
Posts: 3
Location: Wheat Ridge, CO 80033
|
|
This long post may be cooked down to three questions:
1) Can we get GCC to optimize RCALL/RET to RJMP?
2) Can we get GCC to optimize LDI/OR to SBR?
(and likewise LDI/AND to CBR?)
3) Can we reduce or eliminate the long list of PUSHes at ISR entry?
My most recent AVR project was to "fix" a Tiny26 assembly-language program that I had to take on "cold" -- that is, no documentation other than the .asm file, no contact with earlier developer(s?) possible, ...
After stumbling around a bit, I decided the best tactic was to "de-compile" the assembly language into its equivalent C program, both for my own comprehension as well as for discussability [warning -- that might not be a real English word!] with the client (this was their only AVR project, they were C programmers, etc.)
Happy enough ending (so far -- toi toi toi, knock on wood, how do you say it in Norsk?), we found the code path that was doing all the right things (the earlier developers were not total dummies, after all) but not setting the right LED color on in one circumstance (understandable oversight that should have been caught in testing). In the C "pseudocode" it was immediately obvious that the LED setting was following logic with the wrong priority of conditions evaluated; in assembly it was hopeless to figure that out.
So since then, on my own time, I thought it would be interesting to see if the whole project could be ported from assembly into C this way, if my C-ish decompiling could be "recompiled" back into the original code.
The answer I have so far seems to be "almost" (which is, of course, not good enough if the original project filled the available code space). The AVR-GCC compiler (so far as I have figured out how to use it) costs me extra instructions in 3 ways:
ISR entry has a long prologue of register saving PUSH instructions.
Function calls from tail ends of functions could be just RJMP, but compile to RCALL/RET instead.
Single-bit setting/clearing could be SBR or CBR but compiles to LDI/OR or LDI/AND instead.
I have particular examples appended after my signature block. I would be interested to know who/where might be more interested in them, or whether IAR C compiler (or other) does much better.
Best regards,
--
Peter F. Klammer, P.E. / NETRONICS Professional Engineering, Inc.
3200 Routt Street / Wheat Ridge, Colorado 80033-5452
(303)915-2673 / f:(303)274-6182 / e:PKlammer@NETRONICS-PE.com
" Net:Working Programmable Electronics!"
---
Here is an example snippet of the code I was given ...
Code:
toggle_yellow:
; toggle only when tick_cnt is zero. So, we get 0.5Hz
tst tick_cnt
breq toggle_y
ret
toggle_y:
sbrs misc_flag,error_state
rjmp toggle_y_on
cbr misc_flag,(1<<error_state) ; toggle to off
rjmp yellow_led_off
// return through yellow_led_off
toggle_y_on:
sbr misc_flag,(1<<error_state) ; toggle to on
rjmp yellow_led_on
// return through yellow_led_on
9 instructions.
Here is how I reverse-engineered the code with renaming and C-ish comments:
Code:
;; ! void vLEDsYellowBlinking( void ) {
vLEDsYellowBlinking:
;; toggle only when bTimeTicks is zero. So, we get 0.5Hz
;; ! if( bTimeTicks NE 0 ) {
tst bTimeTicks
breq EndIf_27
If_27:
;; ! return ;
ret
;; ! }
EndIf_27:
;; ! if( fIsBlinkingLEDonNow ) {
sbrs bFlags1,nIsBlinkingLEDonNow
rjmp ElseIf_28
If_28:
;; ! fIsBlinkingLEDonNow = FALSE ;
cbr bFlags1,(1<<nIsBlinkingLEDonNow) ; toggle to off
;; ! vLEDsYellowOff() ;
rjmp vLEDsYellowOff
// return through vLEDsYellowOff
;; ! } else {
ElseIf_28:
;; ! fIsBlinkingLEDonNow = TRUE ;
sbr bFlags1,(1<<nIsBlinkingLEDonNow) ; toggle to on
;; ! vLEDsYellowOn() ;
rjmp vLEDsYellowOn
// return through vLEDsYellowOn
EndIf_28:
;; ! }
vLEDsYellowBlinking_end:
;; ! }
9 instructions.
If I just keep my C-ish comments, I have "de-compiled" the code into C:
Code:
void vLEDsYellowBlinking( void ) {
if( bTimeTicks NE 0 ) {
return ;
}
if( fIsBlinkingLEDonNow ) {
fIsBlinkingLEDonNow = FALSE ;
vLEDsYellowOff() ;
} else {
fIsBlinkingLEDonNow = TRUE ;
vLEDsYellowOn() ;
}
}
12 lines of C.
Compiled with -O0 :
Code:
351: void vLEDsYellowBlinking( void ) {
+000000AF: 93CF PUSH R28 Push register on stack
+000000B0: 93DF PUSH R29 Push register on stack
+000000B1: B7CD IN R28,0x3D In from I/O location
+000000B2: B7DE IN R29,0x3E In from I/O location
352: if( bTimeTicks NE 0 ) {
+000000B3: 2F83 MOV R24,R19 Copy register
+000000B4: 2388 TST R24 Test for Zero or Minus
+000000B5: F469 BRNE PC+0x0E Branch if not equal
355: if( fIsBlinkingLEDonNow ) {
+000000B6: 2D84 MOV R24,R4 Copy register
+000000B7: 7880 ANDI R24,0x80 Logical AND with immediate
+000000B8: 2388 TST R24 Test for Zero or Minus
+000000B9: F029 BREQ PC+0x06 Branch if equal
356: fIsBlinkingLEDonNow = FALSE ;
+000000BA: 2D84 MOV R24,R4 Copy register
+000000BB: 778F ANDI R24,0x7F Logical AND with immediate
+000000BC: 2E48 MOV R4,R24 Copy register
357: vLEDsYellowOff() ;
+000000BD: D034 RCALL PC+0x0035 Relative call subroutine
+000000BE: C004 RJMP PC+0x0005 Relative jump
359: fIsBlinkingLEDonNow = TRUE ;
+000000BF: 2D84 MOV R24,R4 Copy register
+000000C0: 6880 ORI R24,0x80 Logical OR with immediate
+000000C1: 2E48 MOV R4,R24 Copy register
360: vLEDsYellowOn() ;
+000000C2: D01A RCALL PC+0x001B Relative call subroutine
+000000C3: 91DF POP R29 Pop register from stack
+000000C4: 91CF POP R28 Pop register from stack
+000000C5: 9508 RET Subroutine return
23 instructions.
Compiled with -O2 :
Code:
351: void vLEDsYellowBlinking( void ) {
+0000002D: 2333 TST R19 Test for Zero or Minus
+0000002E: F429 BRNE PC+0x06 Branch if not equal
355: if( fIsBlinkingLEDonNow ) {
+0000002F: FC47 SBRC R4,7 Skip if bit in register cleared
+00000030: C004 RJMP PC+0x0005 Relative jump
359: fIsBlinkingLEDonNow = TRUE ;
+00000031: E880 LDI R24,0x80 Load immediate
+00000032: 2A48 OR R4,R24 Logical OR
360: vLEDsYellowOn() ;
+00000033: DFF3 RCALL PC-0x000C Relative call subroutine
+00000034: 9508 RET Subroutine return
356: fIsBlinkingLEDonNow = FALSE ;
+00000035: E78F LDI R24,0x7F Load immediate
+00000036: 2248 AND R4,R24 Logical AND
357: vLEDsYellowOff() ;
+00000037: DFF2 RCALL PC-0x000D Relative call subroutine
+00000038: 9508 RET
12 instructions.
Compiled with -O3 :
Code:
351: void vLEDsYellowBlinking( void ) {
+00000069: 2333 TST R19 Test for Zero or Minus
+0000006A: F431 BRNE PC+0x07 Branch if not equal
+0000006B: FC47 SBRC R4,7 Skip if bit in register cleared
+0000006C: C005 RJMP PC+0x0006 Relative jump
+0000006D: E880 LDI R24,0x80 Load immediate
+0000006E: 2A48 OR R4,R24 Logical OR
+0000006F: 9AD4 SBI 0x1A,4 Set bit in I/O register
+00000070: 98DC CBI 0x1B,4 Clear bit in I/O register
+00000071: 9508 RET Subroutine return
+00000072: E78F LDI R24,0x7F Load immediate
+00000073: 2248 AND R4,R24 Logical AND
+00000074: 9AD4 SBI 0x1A,4 Set bit in I/O register
+00000075: 9ADC SBI 0x1B,4 Set bit in I/O register
+00000076: 9508 RET Subroutine return
14 instructions.
Compiled with -Os :
Code:
351: void vLEDsYellowBlinking( void ) {
+00000027: 2333 TST R19 Test for Zero or Minus
+00000028: F459 BRNE PC+0x0C Branch if not equal
355: if( fIsBlinkingLEDonNow ) {
+00000029: FE47 SBRS R4,7 Skip if bit in register set
+0000002A: C005 RJMP PC+0x0006 Relative jump
356: fIsBlinkingLEDonNow = FALSE ;
+0000002B: E78F LDI R24,0x7F Load immediate
+0000002C: 2248 AND R4,R24 Logical AND
385: vPIN_SET_OUTPUT( YellowLED_bar ) ;
+0000002D: 9AD4 SBI 0x1A,4 Set bit in I/O register
386: vPIN_bar_DEASSERT( YellowLED_bar ) ;
+0000002E: 9ADC SBI 0x1B,4 Set bit in I/O register
+0000002F: 9508 RET Subroutine return
359: fIsBlinkingLEDonNow = TRUE ;
+00000030: E880 LDI R24,0x80 Load immediate
+00000031: 2A48 OR R4,R24 Logical OR
379: vPIN_SET_OUTPUT( YellowLED_bar ) ;
+00000032: 9AD4 SBI 0x1A,4 Set bit in I/O register
380: vPIN_bar_ASSERT( YellowLED_bar ) ;
+00000033: 98DC CBI 0x1B,4 Clear bit in I/O register
+00000034: 9508 RET Subroutine return
There are two optimizations that AVR-GCC seems to be missing:
1) Tail-return function calls: RCALL/RET can always be replaced by RJMP, no?
2) Register single-bit bitfield operations: SBR is better than LDI/OR, and CBR beats LDI/AND, no?
Besides that, ISR entry has many PUSH instructions that I have not determined yet how to suppress.
/// |
|
|
| |
|
|
|
|
|
Posted: Oct 15, 2007 - 01:50 PM |
|


Joined: Jul 18, 2005
Posts: 34381
Location: (using avr-gcc in) Finchingfield, Essex, England
|
|
1) No I don't think so - I already tried to find a way to do this to make a "tight" bootloader" where a lot of functions ended with a CALL/RET and I just wanted it to JUMP - I couldn't find a way to force this kind of optimisation - so I'll be interested if you locate something.
2) if you locate your bit vars in 0x00..0x1F IO registers then the compiler will use CBI, SBI to change individual bits. I typically use something like:
Code:
typedef struct
{
unsigned char bit0:1;
unsigned char bit1:1;
unsigned char bit2:1;
unsigned char bit3:1;
unsigned char bit4:1;
unsigned char bit5:1;
unsigned char bit6:1;
unsigned char bit7:1;
}io_reg;
#define BIT_buffer_status ((volatile io_reg*)_SFR_MEM_ADDR(TWAR))->bit0
#define BIT_recv_error ((volatile io_reg*)_SFR_MEM_ADDR(TWAR))->bit1
#define BIT_active ((volatile io_reg*)_SFR_MEM_ADDR(TWAR))->bit2
#define BIT_use_sample1 ((volatile io_reg*)_SFR_MEM_ADDR(TWAR))->bit3
#define BIT_byte_count ((volatile io_reg*)_SFR_MEM_ADDR(TWAR))->bit4
#define BIT_receive_132 ((volatile io_reg*)_SFR_MEM_ADDR(TWAR))->bit5
#define BIT_wait_uart_low ((volatile io_reg*)_SFR_MEM_ADDR(TWAR))->bit6
#define BIT_RS232_calibrated ((volatile io_reg*)_SFR_MEM_ADDR(TWAR))->bit7
Then "BIT_rs232_calibrated = 1" will be compiled to a single SBI etc. This relies on picking an "unused" read/write 8 bit register in the 0x00..0x1F range. some of the modern AVRs have dedicated GPIORn registers located in this range to be used in this way (but TWAR on a mega16 in this example does the job)
3) Any registers (and SREG) used in ISR will be push/pop'd but it helps the compiler a LOT if all the code in the ISR is self-contained (or provided by 'static inline' functions in the same file). Otherwsie, as soon as an ISR calls to another function then compiler has no choice but to push/pop pretty much all the registers that may be clobbered by that function. (it can't analyse which ones the function actually uses because the function could well be in a separate compilation unit) |
_________________
|
| |
|
|
|
|
|
Posted: Oct 15, 2007 - 02:02 PM |
|

Joined: Dec 08, 2004
Posts: 4502
Location: Nova Scotia, Canada
|
|
If you call any functions as part of servicing an ISR, you can avoid many (but not always all) register pushes/pops by eliminating the function calls. For example, it may be possible to convert some of the functions into macros.
Alternatively, if you cannot convert all of the functions used by the ISR into macros (or if always inlining them would result in missed optimization opportunities in terms of eliminating duplicated code which is shared between the ISR and the mainline), then you should obtain some benefit by declaring any such functions as static and define them within the same compilation unit (C source file) as any ISRs and mainline code which call them.
In terms of the SBR and CBR instructions (actually pseudonyms for the ANDI and ORI instructions), remember that they can only be used to reach registers r16 through r31. If the target of the operation resides in r0 through r15, then there is no alternative but to use the LDI/AND or LDI/OR formulation. So maybe there's justification in this case to call into question GCC's reasons for allocating the affected variable in r4 in the first place.
Alternatively, as Cliff says, you might be able to give the compiler some help by explicitly placing such variables in unused "low" I/O registers. In the ATtiny26, a good candidate (for example, if you're not using the internal EEPROM) might be EEDR. As another alternaitve, it might be desirable to switch up to the pin-compatible ATtyny261, which actually has a number of general-purpose I/O register slots set aside for specifically that sort of function.
As for the RCALL/RET issue:
I don't see it showing up anywhere in the -O3 or -Os cases, where GCC apparently inlined the call to the function vLEDsYellowOff(); for you automatically. |
|
|
| |
|
|
|
|
|
Posted: Oct 15, 2007 - 03:16 PM |
|


Joined: Dec 20, 2002
Posts: 6697
Location: Dresden, Germany
|
|
> 1) Can we get GCC to optimize RCALL/RET to RJMP?
Yes, you can. The magic keyword is called `linker relaxations'. You
can trigger it by passing --relax to the linker. By now, you have to
use -Wl,--relax when calling the linker through the compiler driver.
Future GCC versions have been taught that the AVR supports linker
relaxations, so the simpler -mrelax option will be available for them. |
_________________ Jörg Wunsch
Please don't send me PMs, use email instead.
Please read the `General information...' article before.
|
| |
|
|
|
|
|
Posted: Oct 15, 2007 - 03:21 PM |
|


Joined: Jul 18, 2005
Posts: 34381
Location: (using avr-gcc in) Finchingfield, Essex, England
|
|
Jörg,
Wow, thanks for that - I looked high and low for a way to do this!
(and welcome back - I was starting to worry that you hadn't been here for quite a while!) |
_________________
|
| |
|
|
|
|
|
Posted: Oct 15, 2007 - 03:38 PM |
|


Joined: Dec 20, 2002
Posts: 6697
Location: Dresden, Germany
|
|
> Wow, thanks for that - I looked high and low for a way to do this!
It's been part of Björn Haase's work, who also provided us the
ATmega256x patch.
> (and welcome back - I was starting to worry that you hadn't been
> here for quite a while!)
Despite of the usual ``busy with many things'', it's autumn school
vacation time here in Germany now, so I've been off for a week.
It's the first school vacation for my 6-year old son, so we enjoyed
a two-day bicycle trip down the Elbe valley (among some other things). |
_________________ Jörg Wunsch
Please don't send me PMs, use email instead.
Please read the `General information...' article before.
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 03:26 AM |
|

Joined: Oct 15, 2007
Posts: 3
Location: Wheat Ridge, CO 80033
|
|
|
dl8dtl wrote:
> 1) Can we get GCC to optimize RCALL/RET to RJMP?
Yes, you can. The magic keyword is called `linker relaxations'. You
can trigger it by passing --relax to the linker. By now, you have to
use -Wl,--relax when calling the linker through the compiler driver.
Future GCC versions have been taught that the AVR supports linker
relaxations, so the simpler -mrelax option will be available for them.
Q: is it possible to accomplish this through AVR Studio?
I've just about got a C port that's shorter & faster than the ASM! Yes, I got the one-instruction SBR/CBR (ANDI/ORI) when I changed the register number assignment. Thanks, all! |
|
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 11:33 AM |
|


Joined: Jul 18, 2005
Posts: 34381
Location: (using avr-gcc in) Finchingfield, Essex, England
|
|
Any such change is always possible in Studio simply by ticking the "use external makefile" box and using a "proper" makefile.  |
_________________
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 12:34 PM |
|


Joined: Dec 20, 2002
Posts: 6697
Location: Dresden, Germany
|
|
But to be fair, they've also got a GUI item to include additional options
to the compiler during either the compiler and/or the link stage. |
_________________ Jörg Wunsch
Please don't send me PMs, use email instead.
Please read the `General information...' article before.
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 02:23 PM |
|

Joined: Sep 21, 2005
Posts: 2111
|
|
I always use the Studio custom options to do stuff like this (I sure don't need to screw up make files on top of everything else I can possibly do wrong).
I have added '-Wl,--relax' to the linker options, but still end up with rcall/ret. I'm guessing my version (still using winavr20060421) is an uptight relaxer, or I have the above option wrong. |
|
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 02:54 PM |
|


Joined: Jul 18, 2005
Posts: 34381
Location: (using avr-gcc in) Finchingfield, Essex, England
|
|
I just tried adding this to a Studio project but got an unhandled exception when trying to buuild. So I then switched to a command line and use an Mfile based Makefile with:
Code:
LDFLAGS = -Wl,-Map=$(TARGET).map,--cref
LDFLAGS += -nostdlib
LDFLAGS += -Wl,--relax
LDFLAGS += $(EXTMEMOPTS)
LDFLAGS += $(patsubst %,-L%,$(EXTRALIBDIRS))
LDFLAGS += $(PRINTF_LIB) $(SCANF_LIB) $(MATH_LIB)
and this also lead to an "unhandled exception" which leads me to believe that:
Code:
N:\fred>avr-ld -v
GNU ld version 2.17 + coff-avr-patch (20050630)
may have a problem with --relax
(and it is the --relax doing it, if I comment that --relax then the code builds just fine) |
_________________
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 03:03 PM |
|

Joined: Sep 21, 2005
Posts: 2111
|
|
mine is - GNU ld version 2.16.1 + coff-avr-patch (20050630), no crashes, but no relaxing either
my Studio generated make file has 'LDFLAGS += -Wl,--relax', so I think that's correct |
|
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 03:16 PM |
|

Joined: Sep 21, 2005
Posts: 2111
|
|
| It could be my 'test' is wrong-
Code:
void func1(void){
asm volatile("nop");
}
void func2(void){
func1();
}
int main(void){
func2();
return 0;
}
(compiled with -Os), I end up with a rcall/ret in func2. That should be changed to a rjmp (if the linker is willing, anyway), correct?
I may have to put this in the useful tricks file until I update to the latest version of winavr. |
|
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 03:19 PM |
|


Joined: Dec 20, 2002
Posts: 6697
Location: Dresden, Germany
|
|
> mine is - GNU ld version 2.16.1 + coff-avr-patch (20050630), no
> crashes, but no relaxing either
That's for sure too old. Björn Haase's relax patch came in later.
No idea why it crashes for Cliff. Any chance to run it through a
debugger? (I just tried it here on a Linux installation, and it works
as expected there for a mini application I'm just working on.) |
_________________ Jörg Wunsch
Please don't send me PMs, use email instead.
Please read the `General information...' article before.
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 03:22 PM |
|


Joined: Dec 20, 2002
Posts: 6697
Location: Dresden, Germany
|
|
> (compiled with -Os), I end up with a rcall/ret in func2.
I end up with two functions, both containing just a NOP. ;-)
> That should be changed to a rjmp (if the linker is willing, anyway), correct?
I don't think the relaxations are doing that already. I think all they are
doing so far is replacing CALL/JMP by RCALL/RJMP when noticing the target
is within range for the shorter relative instructions. |
_________________ Jörg Wunsch
Please don't send me PMs, use email instead.
Please read the `General information...' article before.
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 04:00 PM |
|


Joined: Jul 18, 2005
Posts: 34381
Location: (using avr-gcc in) Finchingfield, Essex, England
|
|
Jörg,
Without symbols for the copy of ld.exe I'm using I'm not sure how useful this will be but a look at EAX shows it to contain 0 so this is a classic access through a NULL pointer error
Cliff |
_________________
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 10:48 PM |
|


Joined: Dec 20, 2002
Posts: 6697
Location: Dresden, Germany
|
|
Yeah Cliff, you're right, not useful at all without the symbolic debugging
information. I've got no clues about PE-COFF symbolic debugging... Perhaps
there's a GDB version that could debug these? It would be interesting to
know whether Eric can also reproduce that crash. |
_________________ Jörg Wunsch
Please don't send me PMs, use email instead.
Please read the `General information...' article before.
|
| |
|
|
|
|
|
Posted: Oct 22, 2007 - 11:51 PM |
|


Joined: Mar 01, 2001
Posts: 4227
Location: Rocky Mountains
|
|
|
dl8dtl wrote:
It would be interesting to
know whether Eric can also reproduce that crash.
Somebody email me the test case and I'll try it. |
|
|
| |
|
|
|
|
|
Posted: Oct 23, 2007 - 08:27 AM |
|


Joined: Dec 20, 2002
Posts: 6697
Location: Dresden, Germany
|
|
> Somebody email me the test case and I'll try it.
It ought to be enough to use a standard Makefile, and add
LDFLAGS += -Wl,--relax
to it (as Cliff's snippet suggests). He also uses -nostdlib, no idea
whether that would have any influence on the crash. |
_________________ Jörg Wunsch
Please don't send me PMs, use email instead.
Please read the `General information...' article before.
|
| |
|
|
|
|
|
Posted: Oct 23, 2007 - 11:17 AM |
|


Joined: Jul 18, 2005
Posts: 34381
Location: (using avr-gcc in) Finchingfield, Essex, England
|
|
I'd already removed the -nostdlib and found that it still crashes but...
Ah ha!, got it, I thouhgt it couldn't have been as simple as the --relax alone. I checked and found that the other "non standard" thing I was doing in this bootloader code built for the mega16 to try and be within 512 bytes was also -mshort-calls. It's an interaction between -mshort-calls and --relax that causes the crash. Either alone works fine (but I kind of need the benefit of both - I'd abandoned the development of this C based 901 compatible bootloader when it got to 490 bytes even with -mshort-calls)
Cliff |
_________________
|
| |
|
|
|
|
|
|