avr-gcc generates no useful code for simple function

Go To Last Post
16 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wanted to see if avr-gcc supports alloca so I generated a little function file and peered at the assembly.

To me this is not dead code, but the assembly does nothing.   Can you reproduce?   Can you make sense of it?

 

zz.c:

#include <alloca.h>

float foo(float y, float z) {
  float *x = alloca(2*sizeof(float));

  x[0] = y*z;
  x[1] = y+z + x[0];
  return x[1];
}
$ avr-gcc -mmcu=atmega328p -DF_CPU=16000000UL -Os -c zz.c -o zz.o
$ avr-objdump -d zz.o > zz.asm
$ avr-gcc --version
avr-gcc (GCC) 5.4.0

zz.asm:

00000000 <foo>:
   0:	4f 92       	push	r4
   2:	5f 92       	push	r5
   4:	6f 92       	push	r6
   6:	7f 92       	push	r7
   8:	8f 92       	push	r8
   a:	9f 92       	push	r9
   c:	af 92       	push	r10
   e:	bf 92       	push	r11
  10:	cf 92       	push	r12
  12:	df 92       	push	r13
  14:	ef 92       	push	r14
  16:	ff 92       	push	r15
  18:	6b 01       	movw	r12, r22
  1a:	7c 01       	movw	r14, r24
  1c:	49 01       	movw	r8, r18
  1e:	5a 01       	movw	r10, r20
  20:	0e 94 00 00 	call	0	; 0x0 <foo>
  24:	2b 01       	movw	r4, r22
  26:	3c 01       	movw	r6, r24
  28:	a5 01       	movw	r20, r10
  2a:	94 01       	movw	r18, r8
  2c:	c7 01       	movw	r24, r14
  2e:	b6 01       	movw	r22, r12
  30:	0e 94 00 00 	call	0	; 0x0 <foo>
  34:	9b 01       	movw	r18, r22
  36:	ac 01       	movw	r20, r24
  38:	c3 01       	movw	r24, r6
  3a:	b2 01       	movw	r22, r4
  3c:	0e 94 00 00 	call	0	; 0x0 <foo>
  40:	ff 90       	pop	r15
  42:	ef 90       	pop	r14
  44:	df 90       	pop	r13
  46:	cf 90       	pop	r12
  48:	bf 90       	pop	r11
  4a:	af 90       	pop	r10
  4c:	9f 90       	pop	r9
  4e:	8f 90       	pop	r8
  50:	7f 90       	pop	r7
  52:	6f 90       	pop	r6
  54:	5f 90       	pop	r5
  56:	4f 90       	pop	r4
  58:	08 95       	ret

 

This topic has a solution.
Last Edited: Fri. Dec 13, 2019 - 11:41 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The return value of the function can be determined at compile time - so there's no need to generate any code ?

 

This is always the trouble with trivial "test" functions in an optimising compiler

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You haven't linked the file. The fixups are bound to be 0 ?!?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MattRW wrote:

To me this is not dead code, but the assembly does nothing.   Can you reproduce?   Can you make sense of it?

Eh?

The listing you show for foo does plenty, it certainly isn't doing nothing.

Those calls are probably to floating point routines (remember this code is unlinked).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I added your function to a simple test program, this is the extract relating to foo from the overall .lss file (ie. from objdump on the executable, which is now fully linked). So you can now see what those calls are actually doing. Looks like floating point stuff to me.

00000080 <foo>:
  80:   4f 92           push    r4
  82:   5f 92           push    r5
  84:   6f 92           push    r6
  86:   7f 92           push    r7
  88:   8f 92           push    r8
  8a:   9f 92           push    r9
  8c:   af 92           push    r10
  8e:   bf 92           push    r11
  90:   cf 92           push    r12
  92:   df 92           push    r13
  94:   ef 92           push    r14
  96:   ff 92           push    r15
  98:   6b 01           movw    r12, r22
  9a:   7c 01           movw    r14, r24
  9c:   49 01           movw    r8, r18
  9e:   5a 01           movw    r10, r20
  a0:   0e 94 91 00     call    0x122   ; 0x122 <__addsf3>
  a4:   2b 01           movw    r4, r22
  a6:   3c 01           movw    r6, r24
  a8:   a5 01           movw    r20, r10
  aa:   94 01           movw    r18, r8
  ac:   c7 01           movw    r24, r14
  ae:   b6 01           movw    r22, r12
  b0:   0e 94 ba 01     call    0x374   ; 0x374 <__mulsf3>
  b4:   9b 01           movw    r18, r22
  b6:   ac 01           movw    r20, r24
  b8:   c3 01           movw    r24, r6
  ba:   b2 01           movw    r22, r4
  bc:   0e 94 91 00     call    0x122   ; 0x122 <__addsf3>
  c0:   ff 90           pop     r15
  c2:   ef 90           pop     r14
  c4:   df 90           pop     r13
  c6:   cf 90           pop     r12
  c8:   bf 90           pop     r11
  ca:   af 90           pop     r10
  cc:   9f 90           pop     r9
  ce:   8f 90           pop     r8
  d0:   7f 90           pop     r7
  d2:   6f 90           pop     r6
  d4:   5f 90           pop     r5
  d6:   4f 90           pop     r4
  d8:   08 95           ret

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I did the same:

C:\One\SysGCC\avr\bin>type main.c
#include <avr/io.h>

float foo(float, float);

int main(void) {
    PORTB = foo(3.14, 2.71);
}
C:\One\SysGCC\avr\bin>type zz.c
#include <alloca.h>

float foo(float y, float z) {
  float *x = alloca(2*sizeof(float));

  x[0] = y*z;
  x[1] = y+z + x[0];
  return x[1];
}

C:\One\SysGCC\avr\bin>avr-gcc -mmcu=atmega328p -DF_CPU=16000000UL -g -Os main.c zz.c -o zz.elf

C:\One\SysGCC\avr\bin>avr-objdump -S zz.elf

zz.elf:     file format elf32-avr


Disassembly of section .text:

00000000 <__vectors>:
   0:   0c 94 34 00     jmp     0x68    ; 0x68 <__ctors_end>
   4:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
   8:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
   c:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  10:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  14:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  18:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  1c:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  20:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  24:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  28:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  2c:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  30:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  34:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  38:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  3c:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  40:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  44:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  48:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  4c:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  50:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  54:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  58:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  5c:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  60:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>
  64:   0c 94 3e 00     jmp     0x7c    ; 0x7c <__bad_interrupt>

00000068 <__ctors_end>:
  68:   11 24           eor     r1, r1
  6a:   1f be           out     0x3f, r1        ; 63
  6c:   cf ef           ldi     r28, 0xFF       ; 255
  6e:   d8 e0           ldi     r29, 0x08       ; 8
  70:   de bf           out     0x3e, r29       ; 62
  72:   cd bf           out     0x3d, r28       ; 61
  74:   0e 94 6d 00     call    0xda    ; 0xda <main>
  78:   0c 94 d7 01     jmp     0x3ae   ; 0x3ae <_exit>

0000007c <__bad_interrupt>:
  7c:   0c 94 00 00     jmp     0       ; 0x0 <__vectors>

00000080 <foo>:
  80:   4f 92           push    r4
  82:   5f 92           push    r5
  84:   6f 92           push    r6
  86:   7f 92           push    r7
  88:   8f 92           push    r8
  8a:   9f 92           push    r9
  8c:   af 92           push    r10
  8e:   bf 92           push    r11
  90:   cf 92           push    r12
  92:   df 92           push    r13
  94:   ef 92           push    r14
  96:   ff 92           push    r15
  98:   6b 01           movw    r12, r22
  9a:   7c 01           movw    r14, r24
  9c:   49 01           movw    r8, r18
  9e:   5a 01           movw    r10, r20
  a0:   0e 94 6a 01     call    0x2d4   ; 0x2d4 <__mulsf3>
  a4:   2b 01           movw    r4, r22
  a6:   3c 01           movw    r6, r24
  a8:   a5 01           movw    r20, r10
  aa:   94 01           movw    r18, r8
  ac:   c7 01           movw    r24, r14
  ae:   b6 01           movw    r22, r12
  b0:   0e 94 7e 00     call    0xfc    ; 0xfc <__addsf3>
  b4:   9b 01           movw    r18, r22
  b6:   ac 01           movw    r20, r24
  b8:   c3 01           movw    r24, r6
  ba:   b2 01           movw    r22, r4
  bc:   0e 94 7e 00     call    0xfc    ; 0xfc <__addsf3>
  c0:   ff 90           pop     r15
  c2:   ef 90           pop     r14
  c4:   df 90           pop     r13
  c6:   cf 90           pop     r12
  c8:   bf 90           pop     r11
  ca:   af 90           pop     r10
  cc:   9f 90           pop     r9
  ce:   8f 90           pop     r8
  d0:   7f 90           pop     r7
  d2:   6f 90           pop     r6
  d4:   5f 90           pop     r5
  d6:   4f 90           pop     r4
  d8:   08 95           ret

000000da <main>:
#include <avr/io.h>

float foo(float, float);

int main(void) {
    PORTB = foo(3.14, 2.71);
  da:   24 ea           ldi     r18, 0xA4       ; 164
  dc:   30 e7           ldi     r19, 0x70       ; 112
  de:   4d e2           ldi     r20, 0x2D       ; 45
  e0:   50 e4           ldi     r21, 0x40       ; 64
  e2:   63 ec           ldi     r22, 0xC3       ; 195
  e4:   75 ef           ldi     r23, 0xF5       ; 245
  e6:   88 e4           ldi     r24, 0x48       ; 72
  e8:   90 e4           ldi     r25, 0x40       ; 64
  ea:   0e 94 40 00     call    0x80    ; 0x80 <foo>
  ee:   0e 94 ea 00     call    0x1d4   ; 0x1d4 <__fixunssfsi>
  f2:   65 b9           out     0x05, r22       ; 5
  f4:   80 e0           ldi     r24, 0x00       ; 0
  f6:   90 e0           ldi     r25, 0x00       ; 0
  f8:   08 95           ret

000000fa <__subsf3>:
  fa:   50 58           subi    r21, 0x80       ; 128

000000fc <__addsf3>:
  fc:   bb 27           eor     r27, r27
  fe:   aa 27           eor     r26, r26
 100:   0e 94 95 00     call    0x12a   ; 0x12a <__addsf3x>
 104:   0c 94 30 01     jmp     0x260   ; 0x260 <__fp_round>
 108:   0e 94 22 01     call    0x244   ; 0x244 <__fp_pscA>
 etc.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You GCC gurus will need to help me out -- is OP's concern that there is no apparent return value?  Rather it is just the content of certain registers.  R19-R22?  Nope, R22-R25...

For example, an 8-bit value is returned in R24 and an 32-bit value is returned R22... R25.

Also https://www.microchip.com/webdoc...

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Fri. Dec 13, 2019 - 04:56 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The return is the output of the final:

 bc:   0e 94 7e 00     call    0xfc    ; 0xfc <__addsf3>

following the ABI of GCC this function will be returning something (a 4 byte "float") in R24:R23:R22:R21

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think I would  come up with simpler tests.

 

int main(void) {
    char* p = __builtin_alloca(1);
    *(volatile char*)(0x100) = *p;
    for(;;){};
}
#if 0

result
000000c4 <main>:
  c4:    cf 93           push    r28
  c6:    df 93           push    r29
  c8:    cd b7           in    r28, 0x3d    ; 61
  ca:    de b7           in    r29, 0x3e    ; 62
  cc:    1f 92           push    r1
  ce:    ed b7           in    r30, 0x3d    ; 61
  d0:    fe b7           in    r31, 0x3e    ; 62
  d2:    81 81           ldd    r24, Z+1    ; 0x01
  d4:    80 93 00 01     sts    0x0100, r24    ; 0x800100 <_end>
  d8:    ff cf           rjmp    .-2          ; 0xd8 <main+0x14>
#endif

 

edit-

a simple test that requires no linking, so will not get confused by addresses yet to be filled in by the linker. I guess. Or simply go all the way and get the linker involved. Or just use an ide like mplabx to create simple tests, where you get it all in a few short steps- map file, asm listing, etc. The above took about 30 seconds to create a project fill in some code and compile, another 10 seconds to check the lss listing box and set to -Os.

Last Edited: Sat. Dec 14, 2019 - 01:55 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The OP's concern is that the assembly shows no floating point calls.   But that's gonna happen at link time.

If I link indeed the calls to floating point routines show up.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

also no calls to alloca() istelf...

(though writing that as an actual function is a bit mind-boggling!)

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just think about it. alloca cannot be implemented by a (library) call because it's supposed to allocate stack. If you did it in a function then RET would no more work. Same for deallocating.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think You could do it with an asm function...

ugly, though, and I’m not sure how it would interact with other local variables...

in modern C, it’s just equiv to a byte array local variable, right?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

in modern C, it’s just equiv to a byte array local variable, right?

It would be the same or very simlar to using C99 variable length array. (There might be some subtle difference, not sure, I've never used either of them, but same idea).

In the OPs example, since the size is constant, then you may as well just have a normal fixed size array

{

    float x[2];

    x[0] = etc.

}

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

also no calls to alloca() istelf...

(though writing that as an actual function is a bit mind-boggling!)

 

I believe the way this is handled is that <alloca.h> defines alloca(N) as __builtin_alloca(N) and the compiler expands __builtin_alloca into suitable asm code.

I'd guess the asm code is modifying the frame pointer, or something along those lines.

 

Same goes for a number of other __builtin_ symbols; in gcc-8 sources I see references to __builtin_memcpy, _memset, _memmove.

 

 

Last Edited: Sat. Dec 14, 2019 - 04:25 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
in modern C, it’s just equiv to a byte array local variable, right?

 

Almost, but not quite.

 

An automatic variable (including a VLA) in C obeys block-imposed storage duration rules: once the block ends the storage duration of the variable ends as well (i.e. it is destroyed/deallocated, at least conceptually). Meanwhile, memory allocated with `alloca` ignores block boundaries: it stays allocated until the function exits

Last Edited: Tue. Dec 24, 2019 - 07:29 PM