ISR prologue/epilogue optimization in avr-gcc

Go To Last Post
50 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sort of a question about how avr-gcc optimizes ISRs:

 

It seem like r0,r1, and sreg always get stored/restored whether they need it or not. This adds up to 14 cycles to an interrupt. 

 

Here's an example of a super-minimal ISR that doesn't use r0, r1, or sreg:

 

ISR(HAL_OVF_VECTOR)
{
  a6:   1f 92           push    r1
  a8:   0f 92           push    r0
  aa:   0f b6           in  r0, 0x3f    ; 63
  ac:   0f 92           push    r0
  ae:   11 24           eor r1, r1
  b0:   8f 93           push    r24
    ready=1;
  b2:   81 e0           ldi r24, 0x01   ; 1
  b4:   80 93 20 02     sts 0x0220, r24
}
  b8:   8f 91           pop r24
  ba:   0f 90           pop r0
  bc:   0f be           out 0x3f, r0    ; 63
  be:   0f 90           pop r0
  c0:   1f 90           pop r1
  c2:   18 95           reti

 

Here's a more realistic scenario. It does use sreg, but it doesn't use r1. r24 or r25 could have been used to save/restore sreg. The compiler could have saved 9 cycles out of ~30 by not push/popping r0 and r1

 

ISR(HAL_PWM_OVF_VECTOR)
{
  a6:   1f 92           push    r1
  a8:   0f 92           push    r0
  aa:   0f b6           in  r0, 0x3f    ; 63
  ac:   0f 92           push    r0
  ae:   11 24           eor r1, r1
  b0:   8f 93           push    r24
  b2:   9f 93           push    r25
    HAL_TRACE_PORT ^= HAL_TRACE_PIN;
  b4:   95 b1           in  r25, 0x05   ; 5
  b6:   80 e2           ldi r24, 0x20   ; 32
  b8:   89 27           eor r24, r25
  ba:   85 b9           out 0x05, r24   ; 5
}
  bc:   9f 91           pop r25
  be:   8f 91           pop r24
  c0:   0f 90           pop r0
  c2:   0f be           out 0x3f, r0    ; 63
  c4:   0f 90           pop r0
  c6:   1f 90           pop r1
  c8:   18 95           reti

 

I found some speculation on stackoverflow that r0 and r1 are.... "not tracked by GCC's register allocator"

 

http://stackoverflow.com/questio...

 

Is there a good reason for this that would be obvious to someone more closely involved with gcc? It seems like this could be a pretty simple and self-contained optimization to do after assembly generation. I guess I'd be open to educating myself to contribute a patch if that's what it takes...

 

I'm on gcc version 4.9.2

 

edit: title

Last Edited: Sun. Oct 9, 2016 - 02:39 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The "preamble" and "postamble" are usually referred to as the prologue and epilogue, respectively.  The reason R0 and R1 need to be saved is because avr-gcc use R0 as the __tmp_ reg__ and R1 as the __zero_reg__.  An ISR can not be permitted to alter them.  See this page for an explanation/example of why this is necessary.  This page also describes "NAKED" ISRs, where the programmer provides the prologue and epilogue code.

 

Greg Muth

Portland, OR, US

Xplained Boards mostly

Atmel Studio 7.0 on Windows 10 VM hosted by Ubuntu 17

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I understand that any registers that the ISR uses must be saved on the stack and restored before reti().

 

The issue here is that gcc is saving registers that were NOT used in the ISR

 

 

The ISR_NAKED attribute is a good direction. It would work, but I'd like to avoid creating a handwritten prologue/epilogue because that forces me to either

a) write the rest of the ISR in inline assembly as well

b) check the generated assembly listing to be sure the compiler didn't behave in an unanticipated way

 

I see both those options as making my code less portable and more difficult for others to edit, so I'd like to stay reliant on the compiler as much as possible

Last Edited: Sun. Oct 9, 2016 - 02:36 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The issue here is that gcc is saving registers that were NOT used in the ISR

I don't think that gcc "understands" ISRs sufficiently to optimize the register saves.  It had its idea of the function call ABI, and which registers are caller-saved vs callee-saved, and as far as it's concerned, an ISR function is just a function.   Any additional register saving (R0 and R1 in particular) that isn't normally ever part of the ABI, has to be done by the prologue, which is all assembler and not "further optimizeable."

(although, it does know somehow that more registers have to be saved from the ISR than from normal code, when calling an additional function from the ISR...)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The prologues and epilogues of ISRs is one of avr-gcc's failings.

Another workaround is to write the ISR in plain assembly.

International Theophysical Year seems to have been forgotten..
Anyone remember the song Jukebox Band?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If/When I have an interrupt routine (or function) that is (or might be) impacting time critical operation, I will analyze the output of the C code, then rewrite in in assembler to eliminate the cruft.

 

Edit: and I usually leave the (commented out) original C code in the file so that someone later can understand what needs to happen in the routine.

Last Edited: Sun. Oct 9, 2016 - 11:54 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The issue here is that gcc is saving registers that were NOT used in the ISR

blush   I misunderstood your post.  blush

Greg Muth

Portland, OR, US

Xplained Boards mostly

Atmel Studio 7.0 on Windows 10 VM hosted by Ubuntu 17

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you want a reasonably ISR speed write in plain ASM, and reserve some low registers for only that ISR.   

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
If you want a reasonably ISR speed write in plain ASM, and reserve some low registers for only that ISR.

Here we go again.

 

I guess I'd need to agree with that -- >>IF<< you are locked into GCC.  As I've often said, CodeVision's "smart" ISR peeks to see what if anything is altered, including SREG flags.  And CV's global "register" variables when indeed bound to low GP registers are of great advantage.  I've posted examples over the years, including this past week. http://www.avrfreaks.net/comment...

 

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Sun. Oct 9, 2016 - 02:08 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Read this:

 

https://gcc.gnu.org/bugzilla/sho...

 

Nothing has changed since that comment in 2005.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Read this:

 

https://gcc.gnu.org/bugzilla/sho...

 

Nothing has changed since that comment in 2005.

Is the task really as far-reaching as Haase suggests?

It seems to me to thing to do would be to hold off prologue and epilogue

generation until after the rest of the ISR code has been generated.

That code would tell the compiler what registers need to be saved.

 

If one wants it bad enough and cannot get it from avr-gcc,

a post-processor for the assembly might be in order.

The Python or Awk code would do what I suggested the compiler do.

Edit: more like the reverse: test to see what can be removed.

International Theophysical Year seems to have been forgotten..
Anyone remember the song Jukebox Band?

Last Edited: Sun. Oct 9, 2016 - 05:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OP is using GCC, not CV.

 

And yes it seems that CV does a better job about ISR.

I have played with CV (long time ago and only the free version), and for simple and "direct" code it does a way better job that GCC, (because it's written for the AVR), 

but I have to say that the overall optimizing the GCC is better (a more intelligent compiler).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Michael, they aren't going to change the fundamental way GCC works just for AVR. So anything it requires has to fit in with the compilation mechanism for ARM and x86 - they can't just add new phases for AVR.

Having said that LTO and the plugins it uses are probably designed for exactly this (and it wasn't around in 2005) so if someone wants to explore LTO for AVR this may be the solution.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avr-gcc treats registers R0 and R1 as fixed, which means that the register allocator will never use them, and no lifeness information for these registers is available.  These registers are just used during final asm printing if a scratch or zero register is handy.  T flag is handled the same way, and the condition code is not modeled as a register at all.  For reduced Tiny, R16 / R17 are used instead.

 

There is nothing in GCC that prevents you from handling R0, R1 and SREG (including T flag) as registers that are available for register allocation and data flow analysis.  Apart from the fact that introducing such a change basically means a rewrite of GCC's avr back end, and doing it close to optimal will increase the effort substancially, I doubt it would noticeably improve overall code performance.  It's not unlikely the overall performance would decrease, but you won't even know before accomplishing the work...

 

Hence, for the time being, you'll have to bear with that avr-gcc implementation detail, or switch to naked functions for ISRs (which implies inline asm), or switch compiler brand — or do the avr-gcc source transition.

 

If you prefer documentation over guessing, the GCC wiki explains how R0 and R1 are implemented: http://gcc.gnu.org/wiki/avr-gcc#...

 

avrfreaks does not support Opera. Profile inactive.

Last Edited: Sun. Oct 9, 2016 - 06:06 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Georg, what about LTO? Could that be used to analyse code (specifically blocks ending RETI) and spot when the SREG or R1 preservation stuff could be removed? 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

LTO has absolutely nothing to do with this issue; it just might have a more global view of the matter.  Even with the current implemantation it would be feasible to add bookkeeping for R0, R1, SREG an T but would also be quite some effort and error prone.  The problem is that usage of these registers might highly depend on the context, e.g. which value is used in an addition, if the source register overlaps the destination, if the source register dies after the insn, how the condition code is supposed to be used, etc.
 

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I simply meant that LTO might give an opportunity at the very end of the build to mop up such "missed optimisation". Perhaps it cannot be used for this kind of mechanism? 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'd expect that most of LTO operates at an "intermediate code" level that doesn't see individual AVR instructions (and doesn't know that R0/1 are "registers" or that the ISR prologue/epilogue does anything.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How could that be? The input to the linker (ELF .o and .a files) are already binary. All the "internal" stuff like GIMPLE are long gone at this stage of the process.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nope, the compiler actually streams and reads back IR. In fact, if you compile with -fno-fat-lto (if it isn't the already configured default), then there won't be any binary code at all in the ELF object file :).

Regards

Senthil

 

blog | website

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

saaadhu wrote:

Nope, the compiler actually streams and reads back IR. In fact, if you compile with -fno-fat-lto (if it isn't the already configured default), then there won't be any binary code at all in the ELF object file :).

Now I'm totally lost.  "IR"?  InfraRed?  Internal Representation?

 

While ELF might indeed carry some baggage, as far as I can tell it doesn't have a relocation table or similar  (LOC86, anyone?)

https://en.wikipedia.org/wiki/Ex...

In computing, the Executable and Linkable Format (ELF, formerly called Extensible Linking Format) is a common standard file format for executables, object code, shared libraries, and core dumps.

So, what does what the compiler does have to do with what the linker does?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Internal Representation?
That's where my money was headed ;-)

 

I know Senthil knows about 1,000 times more about this stuff than any of us here but it does come as something of a shock to learn that ELF contains not plain binary sections (.data, .text, .eeprom, etc) but some kind of "internal representation". Obviously when ELF from a compilation (-c) is viewed it has to contain both the binary itself but also, because all the unresolved branches, calls and jumps are to a 0 offset there is also some "fixup" tables too that say what needs resolving at link time. But I always imagined the output of the assembler (during compilation) was just plain AVR opcodes. Not some kind of compiler internal representation of it. If that's the case it means that libbfd and the objdump and objcopy tools that use it is a whole heap more clever than I had ever imagined as it must have some kind of "internal representation" to "plain binary" conversion system so that when, for example the .hex is extracted from the ELF it is doing more than just picking up the "binary blob" that is .txt (and .data).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I meant internal representation. All the gnu.lto* sections you see below hold serialized IR. No, binutils doesn't know anything about what's inside them. When you link with -flto, the compiler driver sequences deserialization to get back IR for the translation unit, "actual" code generation and assembly before passing on the "real" .o file(s) to the linker.

 

$ cat simple.c

int main() { return 0; }

$ avr-gcc -flto simple.c -c

$ avr-objdump -h simple.o

simple.o:     file format elf32-avr

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000012  00000000  00000000  00000034  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000046  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000046  2**0
                  ALLOC
  3 .gnu.lto_.inline.122e9b53350ba447 0000001f  00000000  00000000  00000046  2**0
                  CONTENTS, READONLY, EXCLUDE
  4 .gnu.lto_main.122e9b53350ba447 000000ee  00000000  00000000  00000065  2**0
                  CONTENTS, READONLY, EXCLUDE
  5 .gnu.lto_.symbol_nodes.122e9b53350ba447 00000021  00000000  00000000  00000153  2**0
                  CONTENTS, READONLY, EXCLUDE
  6 .gnu.lto_.refs.122e9b53350ba447 0000000f  00000000  00000000  00000174  2**0
                  CONTENTS, READONLY, EXCLUDE
  7 .gnu.lto_.decls.122e9b53350ba447 000000c4  00000000  00000000  00000183  2**0
                  CONTENTS, READONLY, EXCLUDE
  8 .gnu.lto_.symtab.122e9b53350ba447 00000014  00000000  00000000  00000247  2**0
                  CONTENTS, READONLY, EXCLUDE
  9 .gnu.lto_.opts 00000047  00000000  00000000  0000025b  2**0
                  CONTENTS, READONLY, EXCLUDE
 10 .comment      00000031  00000000  00000000  000002a2  2**0
                  CONTENTS, READONLY

 

If you compile with thin LTO, you don't even get code in the .text section

$ avr-gcc -flto -fno-fat-lto-objects simple.c -c
$ avr-objdump -h simple.o

simple.o:     file format elf32-avr

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000000  00000000  00000000  00000034  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000034  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000034  2**0
                  ALLOC
  3 .gnu.lto_.inline.4aabd39e75da184c 00000020  00000000  00000000  00000034  2**0
                  CONTENTS, READONLY, EXCLUDE
  4 .gnu.lto_main.4aabd39e75da184c 000000ed  00000000  00000000  00000054  2**0
                  CONTENTS, READONLY, EXCLUDE
  5 .gnu.lto_.symbol_nodes.4aabd39e75da184c 0000001e  00000000  00000000  00000141  2**0
                  CONTENTS, READONLY, EXCLUDE
  6 .gnu.lto_.refs.4aabd39e75da184c 0000000f  00000000  00000000  0000015f  2**0
                  CONTENTS, READONLY, EXCLUDE
  7 .gnu.lto_.decls.4aabd39e75da184c 00000128  00000000  00000000  0000016e  2**0
                  CONTENTS, READONLY, EXCLUDE
  8 .gnu.lto_.symtab.4aabd39e75da184c 00000014  00000000  00000000  00000296  2**0
                  CONTENTS, READONLY, EXCLUDE
  9 .gnu.lto_.opts 000000ad  00000000  00000000  000002aa  2**0
                  CONTENTS, READONLY, EXCLUDE
 10 .comment      0000002a  00000000  00000000  00000357  2**0
                  CONTENTS, READONLY

Regards

Senthil

 

blog | website

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK so I looked at this:

$ cat avr.c
#include <avr/io.h>

int foo(int, long);

int main(void) {
    DDRB = 0xFF;
    while(1) {
        PORTB ^= 0xFF;
        PORTD = foo(12345, 0xBABEFACE);
    }
}
$ avr-gcc -c -mmcu=atmega16 -Os avr.c -o avr.elf
$ ls -l avr.elf
-rw-r--r-- 1 uid23021 domain_users 760 Oct 18 14:56 avr.elf

That's a completely fictitious program that calls a function that does not exist. I created a .elf but it is unlinked (I only used -c) and it is 760 bytes. If I use avr-readelf...

$ avr-readelf -a avr.elf
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Atmel AVR 8-bit microcontroller
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          132 (bytes into file)
  Flags:                             0x85
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         8
  Section header string table index: 5

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 00001e 00  AX  0   0  1
  [ 2] .rela.text        RELA            00000000 0002e0 000018 0c      6   1  4
  [ 3] .data             PROGBITS        00000000 000052 000000 00  WA  0   0  1
  [ 4] .bss              NOBITS          00000000 000052 000000 00  WA  0   0  1
  [ 5] .shstrtab         STRTAB          00000000 000052 000031 00      0   0  1
  [ 6] .symtab           SYMTAB          00000000 0001c4 0000d0 10      7  11  4
  [ 7] .strtab           STRTAB          00000000 000294 00004c 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

There are no program headers in this file.

Relocation section '.rela.text' at offset 0x2e0 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
00000016  00000c12 R_AVR_CALL        00000000   foo + 0
0000001c  00000203 R_AVR_13_PCREL    00000000   .text + 4

There are no unwind sections in this file.

Symbol table '.symtab' contains 13 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 FILE    LOCAL  DEFAULT  ABS avr.c
     2: 00000000     0 SECTION LOCAL  DEFAULT    1
     3: 00000000     0 SECTION LOCAL  DEFAULT    3
     4: 00000000     0 SECTION LOCAL  DEFAULT    4
     5: 0000003f     0 NOTYPE  LOCAL  DEFAULT  ABS __SREG__
     6: 0000003e     0 NOTYPE  LOCAL  DEFAULT  ABS __SP_H__
     7: 0000003d     0 NOTYPE  LOCAL  DEFAULT  ABS __SP_L__
     8: 00000034     0 NOTYPE  LOCAL  DEFAULT  ABS __CCP__
     9: 00000000     0 NOTYPE  LOCAL  DEFAULT  ABS __tmp_reg__
    10: 00000001     0 NOTYPE  LOCAL  DEFAULT  ABS __zero_reg__
    11: 00000000    30 FUNC    GLOBAL DEFAULT    1 main
    12: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND foo

No version information found in this file.

Also:

$ avr-objdump -x avr.elf

avr.elf:     file format elf32-avr
avr.elf
architecture: avr:5, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x00000000

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000001e  00000000  00000000  00000034  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000052  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000052  2**0
                  ALLOC
SYMBOL TABLE:
00000000 l    df *ABS*	00000000 avr.c
00000000 l    d  .text	00000000 .text
00000000 l    d  .data	00000000 .data
00000000 l    d  .bss	00000000 .bss
0000003f l       *ABS*	00000000 __SREG__
0000003e l       *ABS*	00000000 __SP_H__
0000003d l       *ABS*	00000000 __SP_L__
00000034 l       *ABS*	00000000 __CCP__
00000000 l       *ABS*	00000000 __tmp_reg__
00000001 l       *ABS*	00000000 __zero_reg__
00000000 g     F .text	0000001e main
00000000         *UND*	00000000 foo

RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE              VALUE
00000016 R_AVR_CALL        foo
0000001c R_AVR_13_PCREL    .text+0x00000004
$ avr-objdump -S avr.elf

avr.elf:     file format elf32-avr

Disassembly of section .text:

00000000 <main>:
   0:	8f ef       	ldi	r24, 0xFF	; 255
   2:	87 bb       	out	0x17, r24	; 23
   4:	88 b3       	in	r24, 0x18	; 24
   6:	80 95       	com	r24
   8:	88 bb       	out	0x18, r24	; 24
   a:	89 e3       	ldi	r24, 0x39	; 57
   c:	90 e3       	ldi	r25, 0x30	; 48
   e:	4e ec       	ldi	r20, 0xCE	; 206
  10:	5a ef       	ldi	r21, 0xFA	; 250
  12:	6e eb       	ldi	r22, 0xBE	; 190
  14:	7a eb       	ldi	r23, 0xBA	; 186
  16:	0e 94 00 00 	call	0	; 0x0 <main>
  1a:	82 bb       	out	0x12, r24	; 18
  1c:	00 c0       	rjmp	.+0      	; 0x1e <__zero_reg__+0x1d>

 

So here are the 760 bytes:

Offset      0  1  2  3  4  5  6  7   8  9  A  B  C  D  E  F

00000000   7F 45 4C 46 01 01 01 00  00 00 00 00 00 00 00 00    ELF
00000010   01 00 53 00 01 00 00 00  00 00 00 00 00 00 00 00     S
00000020   84 00 00 00 85 00 00 00  34 00 00 00 00 00 28 00   „   …   4     (
00000030   08 00 05 00 8F EF 87 BB  88 B3 80 95 88 BB 89 E3        ˆ³€•ˆ»‰ã
00000040   90 E3 4E EC 5A EF 6E EB  7A EB 0E 94 00 00 82 BB    ãNìZïnëzë ”  ‚»
00000050   00 C0 00 2E 73 79 6D 74  61 62 00 2E 73 74 72 74    À .symtab .strt
00000060   61 62 00 2E 73 68 73 74  72 74 61 62 00 2E 72 65   ab .shstrtab .re
00000070   6C 61 2E 74 65 78 74 00  2E 64 61 74 61 00 2E 62   la.text .data .b
00000080   73 73 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ss
00000090   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
000000A0   00 00 00 00 00 00 00 00  00 00 00 00 20 00 00 00
000000B0   01 00 00 00 06 00 00 00  00 00 00 00 34 00 00 00               4
000000C0   1E 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00
000000D0   00 00 00 00 1B 00 00 00  04 00 00 00 00 00 00 00
000000E0   00 00 00 00 E0 02 00 00  18 00 00 00 06 00 00 00       à
000000F0   01 00 00 00 04 00 00 00  0C 00 00 00 26 00 00 00               &
00000100   01 00 00 00 03 00 00 00  00 00 00 00 52 00 00 00               R
00000110   00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00
00000120   00 00 00 00 2C 00 00 00  08 00 00 00 03 00 00 00       ,
00000130   00 00 00 00 52 00 00 00  00 00 00 00 00 00 00 00       R
00000140   00 00 00 00 01 00 00 00  00 00 00 00 11 00 00 00
00000150   03 00 00 00 00 00 00 00  00 00 00 00 52 00 00 00               R
00000160   31 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00   1
00000170   00 00 00 00 01 00 00 00  02 00 00 00 00 00 00 00
00000180   00 00 00 00 C4 01 00 00  D0 00 00 00 07 00 00 00       Ä   Ð
00000190   0B 00 00 00 04 00 00 00  10 00 00 00 09 00 00 00
000001A0   03 00 00 00 00 00 00 00  00 00 00 00 94 02 00 00               ”
000001B0   4C 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00   L
000001C0   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
000001D0   00 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00 
000001E0   04 00 F1 FF 00 00 00 00  00 00 00 00 00 00 00 00     ñÿ
000001F0   03 00 01 00 00 00 00 00  00 00 00 00 00 00 00 00 
00000200   03 00 03 00 00 00 00 00  00 00 00 00 00 00 00 00 
00000210   03 00 04 00 07 00 00 00  3F 00 00 00 00 00 00 00           ?
00000220   00 00 F1 FF 10 00 00 00  3E 00 00 00 00 00 00 00     ñÿ    >
00000230   00 00 F1 FF 19 00 00 00  3D 00 00 00 00 00 00 00     ñÿ    =
00000240   00 00 F1 FF 22 00 00 00  34 00 00 00 00 00 00 00     ñÿ"   4
00000250   00 00 F1 FF 2A 00 00 00  00 00 00 00 00 00 00 00     ñÿ*
00000260   00 00 F1 FF 36 00 00 00  01 00 00 00 00 00 00 00     ñÿ6
00000270   00 00 F1 FF 43 00 00 00  00 00 00 00 1E 00 00 00     ñÿC
00000280   12 00 01 00 48 00 00 00  00 00 00 00 00 00 00 00       H
00000290   10 00 00 00 00 61 76 72  2E 63 00 5F 5F 53 52 45        avr.c __SRE
000002A0   47 5F 5F 00 5F 5F 53 50  5F 48 5F 5F 00 5F 5F 53   G__ __SP_H__ __S
000002B0   50 5F 4C 5F 5F 00 5F 5F  43 43 50 5F 5F 00 5F 5F   P_L__ __CCP__ __
000002C0   74 6D 70 5F 72 65 67 5F  5F 00 5F 5F 7A 65 72 6F   tmp_reg__ __zero
000002D0   5F 72 65 67 5F 5F 00 6D  61 69 6E 00 66 6F 6F 00   _reg__ main foo
000002E0   16 00 00 00 12 0C 00 00  00 00 00 00 1C 00 00 00 
000002F0   03 02 00 00 04 00 00 00                                    

The ELF header itself tells us it occupies the first 0x34 (52) bytes. The .text is 0x1E bytes at 0x0034.The 0x31 bytes from 0x0052 onwards are the .shstrtab. The 0xD0 bytes at 0x1C4 are the .symtab. The 0x4C bytes at 0x294 are the .strtab.The 0x18 bytes at 0x02E0 are the .rela.text that has details of the fixups to be made at link time.

 

So the only "unknown" bit in there is from 0x0083 to 0x1C3. Is that the "internal representation" ?

 

EDIT: except I notice it said:

Start of section headers:          132 (bytes into file)
Size of section headers:           40 (bytes)

132 = 0x84. So 0x84 to 0xAC at least are the section headers (black)

Last Edited: Tue. Oct 18, 2016 - 02:44 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

OK so I looked at this:

$ cat avr.c
#include <avr/io.h>

int foo(int, long);

int main(void) {
    DDRB = 0xFF;
    while(1) {
        PORTB ^= 0xFF;
        PORTD = foo(12345, 0xBABEFACE);
    }
}
$ avr-gcc -c -mmcu=atmega16 -Os avr.c -o avr.elf
$ ls -l avr.elf
-rw-r--r-- 1 uid23021 domain_users 760 Oct 18 14:56 avr.elf

You're missing the -flto command line flag. Turn it on and you should see the gnu.lto* sections under Section Headers:

Regards

Senthil

 

blog | website

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh I see - so that "carries over" the internal rep into the binary then?

 

Ah yes...

  [ 4] .gnu.lto_.profile PROGBITS        00000000 000034 000013 00   p  0   0  1
  [ 5] .gnu.lto_.jmpfunc PROGBITS        00000000 000047 000041 00   p  0   0  1
  [ 6] .gnu.lto_.inline. PROGBITS        00000000 000088 000026 00   p  0   0  1
  [ 7] .gnu.lto_.purecon PROGBITS        00000000 0000ae 000012 00   p  0   0  1
  [ 8] .gnu.lto_main.b7d PROGBITS        00000000 0000c0 0001c8 00   p  0   0  1
  [ 9] .gnu.lto_.symbol_ PROGBITS        00000000 000288 000035 00   p  0   0  1
  [10] .gnu.lto_.refs.b7 PROGBITS        00000000 0002bd 00000f 00   p  0   0  1
  [11] .gnu.lto_.decls.b PROGBITS        00000000 0002cc 000218 00   p  0   0  1
  [12] .gnu.lto_.symtab. PROGBITS        00000000 0004e4 000027 00   p  0   0  1
  [13] .gnu.lto_.opts    PROGBITS        00000000 00050b 000046 00   p  0   0  1

and size jumps from 760 bytes to 3,148 bytes.

Last Edited: Tue. Oct 18, 2016 - 03:30 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, when compiling, the flag instructs gcc to write the IR into the binary. When linking, the flag tells gcc to deserialize and perform code gen with IR from all translation units. See https://gcc.gnu.org/onlinedocs/g... for the gory details.

Regards

Senthil

 

blog | website

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

it does come as something of a shock to learn that ELF contains not plain binary sections (.data, .text, .eeprom, etc) but some kind of "internal representation".

 Only with the -lto option.  My "interpretation" is that -lto stops the "compile" step at the end of pass N (some form of internal representation), rather than going all the way to N+M (assembler/binary) before passing control onto the linker.  It freaked me out not so much because of what would be in the .o files (let's face it; a elf format file can hold anything!), but because it meant that link was now doing a LOT of things that I'd more traditionally think of as being done by a compiler...

 

(Hmm.  Now I'm wondering how -lto handles C source with inline assembler...)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Clearly it does both on the basis that you might then use the binary in an non-LTO link so there's both the half complete internal and a "final" copy too as before.

 

I suppose that now, to know if the internal rep stuff is a possible for spotting removing the unnecessary register preservation in ISRs one would now need to explore the form of that internal representation.

 

When I suggested LTO above I had a much more naive view of how it might work. I was thinking more of a kind of "pattern matching" process that could just look at generated ISR binary (that is code between an entry point label and a RETI opcode) and somehow determine if the "bit in the middle" might be doing anything that affected R1 and SREG or not and then ditch the R1 and/or SREG preservation stuff if you found it not to affect them. But I guess it's more complex (but possibly easier) if you can "see" the reason why the code was generated at this stage?
 

Last Edited: Tue. Oct 18, 2016 - 05:54 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

When I suggested LTO above I had a much more naive view of how it might work. I was thinking more of a kind of "pattern matching" process that could just look at generated ISR binary (that is code between an entry point label and a RETI opcode) and somehow determine if the "bit in the middle" might be doing anything that affected R1 and SREG or not and then ditch the R1 and/or SREG preservation stuff if you found it not to affect them. But I guess it's more complex (but possibly easier) if you can "see" the reason why the code was generated at this stage?

 

That would be more like linker relaxation. Would be too much work to disassemble the entire ISR and do that, I guess. Right now, the relaxation logic uses relocs to find out "interesting" locations to look at (say calls or jumps), disassembles just a few bytes around that location to see what it can do, and deletes/writes back new data there.

Regards

Senthil

 

blog | website

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

it does come as something of a shock to learn that ELF contains not plain binary sections (.data, .text, .eeprom, etc) but some kind of "internal representation".

 Only with the -lto option.  My "interpretation" is that -lto stops the "compile" step at the end of pass N (some form of internal representation), rather than going all the way to N+M (assembler/binary) before passing control onto the linker.  It freaked me out not so much because of what would be in the .o files (let's face it; a elf format file can hold anything!), but because it meant that link was now doing a LOT of things that I'd more traditionally think of as being done by a compiler...

 

(Hmm.  Now I'm wondering how -lto handles C source with inline assembler...)

 

 

Well, the driver invokes the compiler and assembler again for each object file with LTO'd data, before finally invoking the linker for the actual "link". The actual linker doesn't know or care about this at all. The only help binutils provides is a plugin that lets gcc look inside library archives - this way, gcc gets to do LTO on object files from them as well.

 

Oh, and the streaming is done at the GIMPLE level, so inline assembly would simply be a gimple_asm node.

Regards

Senthil

 

blog | website

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

FYI, GNU tools now can generate better code for such situations:

 

#include <avr/io.h>
#include <avr/interrupt.h>
#include <util/atomic.h>

uint8_t volatile ready;

ISR (__vector_1)
{
  ready = 1;
}

ISR (__vector_2)
{
  PORTD ^= 1;
}

ISR (__vector_3, ISR_NOBLOCK)
{
  ATOMIC_BLOCK (ATOMIC_RESTORESTATE)
  {
    uint8_t r = ready;
    ready = (r << 4) | (r >> 4);
  }
}

ISR (__vector_4)
{
  __asm ("inc r2" ::: "r2");
}

int main;

$ avr-gcc-8 -mmcu=atmega8 isr.c -Os && avr-objdump -d a.out

 

00000048 <__vector_1>:
  48:   8f 93           push    r24
  4a:   81 e0           ldi     r24, 0x01       ; 1
  4c:   80 93 62 00     sts     0x0062, r24     ; 0x800062 <ready>
  50:   8f 91           pop     r24
  52:   18 95           reti

00000054 <__vector_2>:
  54:   8f 93           push    r24
  56:   8f b7           in      r24, 0x3f       ; 63
  58:   8f 93           push    r24
  5a:   9f 93           push    r25
  5c:   82 b3           in      r24, 0x12       ; 18
  5e:   91 e0           ldi     r25, 0x01       ; 1
  60:   89 27           eor     r24, r25
  62:   82 bb           out     0x12, r24       ; 18
  64:   9f 91           pop     r25
  66:   8f 91           pop     r24
  68:   8f bf           out     0x3f, r24       ; 63
  6a:   8f 91           pop     r24
  6c:   18 95           reti

0000006e <__vector_3>:
  6e:   78 94           sei
  70:   8f 93           push    r24
  72:   9f 93           push    r25
  74:   9f b7           in      r25, 0x3f       ; 63
  76:   f8 94           cli
  78:   80 91 62 00     lds     r24, 0x0062     ; 0x800062 <ready>
  7c:   82 95           swap    r24
  7e:   80 93 62 00     sts     0x0062, r24     ; 0x800062 <ready>
  82:   9f bf           out     0x3f, r25       ; 63
  84:   9f 91           pop     r25
  86:   8f 91           pop     r24
  88:   18 95           reti

0000008a <__vector_4>:
  8a:   2f 92           push    r2
  8c:   2f b6           in      r2, 0x3f        ; 63
  8e:   2f 92           push    r2
  90:   23 94           inc     r2
  92:   2f 90           pop     r2
  94:   2f be           out     0x3f, r2        ; 63
  96:   2f 90           pop     r2
  98:   18 95           reti

 

You'll need GCC that implements PR20296 (future v8) and Binutils that implement PR21683, e.g. 2.29.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks Georg-Johann, your contributions to avr-gcc are turning this into a really excellent C compiler!!
 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There are situations that will no more work like clobbering SREG by writing to its memory location. If such and similar weird code is used, the feature has to be turned off by -mno-gas-isr-prologues or per-function by no_gccisr attribute.  The GCC online docs should catch up within the next few days.

 

Maybe someone has time left and fun trying to break it for legal use-cases.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
ISR (__vector_2) { PORTD ^= 1; }

Out of curiosity, is there no extra baggage with e.g. PIND = 1; to do only an SBI?

 

Bad example -- how about PIND |= 1: (which is another bad example)

 

Sigh -- PORTD |= 1;

 

[I'm aksing whether you can get an ISR with the single instruction e.g. SBI]

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

> Out of curiosity, is there no extra baggage with e.g. PIND = 1; to do only an SBI?

 

If so, it's up to the user to write PIND |= 1.

 

And no, adding information about all SFRs of all > 200 MCUs to the compiler is not something anyone will go to implement and support.

 

Moreover, replacing PORTD ^= 1 by PIND |= 1 is just plain wrong from the language perspective.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
If so, it's up to the user to write PIND |= 1.

I think this has been discussed before.  A real grey area -- the programmer calls for a RMW on PIND, which if carried out would also tollge other set bits in the PIN register.

 

SprinterSB wrote:
Moreover, replacing PORTD ^= 1 by PIND |= 1 is just plain wrong from the language perspective.

I never said to do PIND|=1;

 

==============

All I was looking for is to see if the new mechanism would actually produce a sing;e instruction (plus RETI) ISR.  Just curious.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
#include <avr/interrupt.h>

ISR (__vector_1)
{
  PIND = 1;
}

ISR (__vector_2)
{
  PIND |= 1;
}

 

ISR (__vector_3)
{
  PIND = 0;
}

 

int main;

$ avr-gcc-8 -mmcu=atmega88 isr.c -Os && avr-objdump -d a.out

 

00000056 <__vector_1>:
  56:   8f 93           push    r24
  58:   81 e0           ldi     r24, 0x01       ; 1
  5a:   89 b9           out     0x09, r24       ; 9
  5c:   8f 91           pop     r24
  5e:   18 95           reti

00000060 <__vector_2>:
  60:   48 9a           sbi     0x09, 0 ; 9
  62:   18 95           reti
00000064 <__vector_3>:
  64:   1f 92           push    r1
  66:   1f b6           in      r1, 0x3f        ; 63
  68:   1f 92           push    r1
  6a:   11 24           eor     r1, r1
  6c:   19 b8           out     0x09, r1        ; 9
  6e:   1f 90           pop     r1
  70:   1f be           out     0x3f, r1        ; 63
  72:   1f 90           pop     r1
  74:   18 95           reti

 

 

 

avrfreaks does not support Opera. Profile inactive.

Last Edited: Mon. Jul 10, 2017 - 03:39 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I hate this JavaScript scrap.  Wow to I edit that gaga? ...I think you got the point of the example even with the great ui going bananas...

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

PIND|=1 has problems that have been discussed at length.

In avr-gcc, the obvious solution is

#define sbi1(ioreg, bitnum) \
             asm(" sbi %0, %1" :: "I"(_SF_IO_ADDR(ioreg)), "I"(bitnum))

GNU's avr-libc already has an sbi macro that does the wrong thing for PINx and isn't all that great for PORTx.

International Theophysical Year seems to have been forgotten..
Anyone remember the song Jukebox Band?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Wow, thanks! :) But I think you forgot the download links. I'm eager to test, but I could not find any Windows builds..

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
00000060 <__vector_2>: 60: 48 9a sbi 0x09, 0 ; 9 62: 18 95 reti

So the "smart ISR" looks pretty good.

 

skeeve wrote:
PIND|=1 has problems that have been discussed at length.

I know; I wasn't really trying to dig that up.  I just wanted to give an example that could/should resolve into a single instruction with no flag changing in SREG.

 

After I wrote, I realized that by trying to carry forward the "toggle" example that does something useful, that there be dragons there.

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

would it be legal "C" if the generated code for ISR __vector 2  was implemented this way :

    sbis PORTD,1
    rjmp L0
    cbi PORTD,1
    rjmp L1
L0: sbi PORTD,1
L1: reti

 That way you avoid any register and flag use.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

blubbb wrote:

Wow, thanks! :) But I think you forgot the download links. I'm eager to test, but I could not find any Windows builds..

As far as I know, the only one regularly doing Windows builds of avr-gcc these days is Atmel/Microchip. You'll probably have to wait for their next build of the toolchain, and check if they've incorporated the change Georg-Johann announced above.

 

clawson wrote:

Thanks Georg-Johann, your contributions to avr-gcc are turning this into a really excellent C compiler!!

+ ULONG_MAX ;-)

"He used to carry his guitar in a gunny sack, or sit beneath the tree by the railroad track. Oh the engineers would see him sitting in the shade, Strumming with the rhythm that the drivers made. People passing by, they would stop and say, "Oh, my, what that little country boy could play!" [Chuck Berry]

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
would it be legal "C" if the generated code for ISR __vector 2 was implemented this way :

...

 

I'll have to try to dig out the recent discussion on this. ;)  IIRC just this construct was discussed.  Was it in the context of "setting a bit to an arbitrary value without using 'if' "?  Something like that.

 

Anyway, that construct doesn't have the momentary "glitch" of "set it to 0 first, then see if it should be set to 1".  But it does have jitter that the setting to low will happen 1 or 2 cycles before the set to high.

 

What problem are we trying to solve here?  ;)  As I opined, toggling rarely comes up in practice in a real app -- at least IME.  And when it does, it is a don't-care situation re a couple of clocks.

 

If important, don't let the C compiler mess with it.  Make your own macro that carries out the SBI to the PIN register for the correct bit.

 

[edit] This is the thread I was thinking of:  http://www.avrfreaks.net/forum/p... In #3, the if/else (on a register bit) is resolved using the sequence you laid out.

 

[edit]  If more than one bit is to be toggled, then I cannot think of any other "clean" alternative than LDI and OUT to the PIN.  One register save/restore.  No flags, right?

 

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Mon. Jul 10, 2017 - 08:10 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

First I missed the discussion you linked to.

And yes I guess that it depends, if a glitch is aloud, but one the other hand C don't care about that.

and to avoid a glitch you can just do this :

    sbis PORTD,1
    rjmp L0
    nop
    cbi PORTD,1
    rjmp L1
L0: sbi PORTD,1
L1: reti

But what is one clk glitch on a ISR that have jitter anyway?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

> That way you avoid any register and flag use.

 

Re. GCC this wouldn't help much.  GCC is still as dumb as before and doesn't know anything about SREG and friends.  Hence emitting long, strange sequences to avoid SREG in situations where other code might clobber SREG would be couterproductive.

 

GCC just shifts the unpleasant job of scanning to the assembler, which then generates a prologue preamble to please GCC. This was the only feasible approach...

 

The upside is that this also scans through inline asm (-:

 

avrfreaks does not support Opera. Profile inactive.

Last Edited: Tue. Jul 11, 2017 - 08:47 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

What problem are we trying to solve here?  ;)  As I opined, toggling rarely comes up in practice in a real app -- at least IME.  And when it does, it is a don't-care situation re a couple of clocks.

 

Not sure anyone cares, but I had a use for toggling pins at speed.

 

Given a byte bus (unidirectional, asynchronous) with two control signals "Data Ready" (on the bus and stable) and "Data Ack" (got it, send another) it was more efficient to toggle the control signals than try to make them strobes or level-defined.  Saved having to reset them every time (eg. if "Data Ready" == 1 then the transmitter has to set it = 0 before putting the next byte on the bus, then set it high again after "out PORTx".  With toggling, it's just "out PORTx, sbi PINxx" and you're off to the races).

 

S.

 

PS - Another application that I didn't actually get to build because the customer bailed on me involved (not exactly this, but imagine it as...) a 32-bit shift register in which the first 16 bits are 'don't care'.  Loading up that register with 'sbi PINxx, sbi PINxx, sbi PINxx" for a shift-in clock was twice as fast as "sbi PORTxx, cbi PORTxx, sbi PORTxx, ..."   Edited to add postscript.  S.

Last Edited: Wed. Jul 12, 2017 - 05:23 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 
 

avrfreaks does not support Opera. Profile inactive.

Last Edited: Mon. Jul 17, 2017 - 10:34 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

blubbb wrote:

I think you forgot the download links. I'm eager to test, but I could not find any Windows builds.

 

You find a mingw32 build from today at

 

https://sourceforge.net/projects...

 

It's built from GCC trunk, Binutils master, AVR-LibC trunk + https://savannah.nongnu.org/patc...

 

Unzip to where you like it and use by absolute path or via PATH.

avrfreaks does not support Opera. Profile inactive.