Empty ASM statements change code?

Go To Last Post
24 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Optibootl apparently gets about 14 bytes bigger with the gcc 7.3.0 compiler that Arduino is distributing...

Hmmph.

 

For the sake of analysis, I created an ASM macro:

#define TICK(name) asm(".global " #name "\n .equ " #name ", .-1b\n1:\n")

I can sprinkle statements like:

if (cmd == FOO) {
    TICK(FOO_START);
    // some code
    TICK(FOO_END);
} else if (cmd == BAR) {
    // more code
    TICK(FOO_END)
}

throughout the code, and the elf file will contain nice absolute symbols who value is the distance between "ticks", and it should give me an idea where things have gotten bigger.

It seems to work pretty well:

avr-nm -S optiboot_atmega328.elf | grep A
 :
0000000c A set_device_size
00000024 A size_getparameter
0000007e A size_init
0000001a A size_loadaddress
0000002a A size_readPage
0000001c A size_readSign
0000000c A size_setdevext
00000012 A size_universal
0000006c A size_writePage

 

EXCEPT that the use of the macro changes the size of the executable.  Even though it produces no actual code, doesn't have any clobbers specified, and is not volatile (which in theory gives gcc free reign to move it around for the sake of optimization.)

 

This is less than ideal (though still useful, probably.)  Does anyone have ideas one how I can make my empty asm() statements have LESS impact on code generation?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm confused. If there is no extra code, then where is there a problem? What does a complete small test program show, including listing of generated code and full map?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It seems to defeat some optimizations or something.   The asm() statement doesn't produce any code, but the compiler output changes anyway...

 

With tick:

make atmega328
avr-gcc (AVR_8_bit_GNU_Toolchain_3.6.1_495) 5.4.0
avr-gcc -g -Wall -Os -fno-split-wide-types -mrelax -mmcu=atmega328p -DF_CPU=16000000L  -DBAUD_RATE=115200 -DLED_START_FLASHES=3              -c -o optiboot.o optiboot.c
avr-gcc -g -Wall -Os -fno-split-wide-types -mrelax -mmcu=atmega328p -DF_CPU=16000000L  -DBAUD_RATE=115200 -DLED_START_FLASHES=3            -Wl,--section-start=.text=0x7e00 -Wl,--section-start=.version=0x7ffe -Wl,--relax -nostartfiles -o optiboot_atmega328.elf optiboot.o
avr-size optiboot_atmega328.elf
   text    data     bss     dec     hex filename
    490       0       0     490     1ea optiboot_atmega328.elf

 

Without tick:

avr-size optiboot_atmega328.elf
   text    data     bss     dec     hex filename
    482       0       0     482     1e2 optiboot_atmega328.elf

 

 

It looks like pretty subtle optimization issues.  Here's the first difference from the existing program.  I don't know if I'll be able to make a small example. (I'll try...)

;; Source code.

    else if(ch == STK_SET_DEVICE) {
      // SET DEVICE is ignored
      getNch(20); 
      TICK(set_device_size);
   }
    else if(ch == STK_SET_DEVICE_EXT) {
      // SET DEVICE EXT is ignored
      getNch(5);
      TICK(size_setdevext);
    }
    else if(ch == STK_LOAD_ADDRESS) {
      // LOAD ADDRESS
      address.bytes[0] = getch();
      address.bytes[1] = getch();
      :


;;Without TICK macro:
;;
    else if(ch == STK_SET_DEVICE) {
    7e9c:	82 34       	cpi	r24, 0x42	; 66
      getNch(20); 
;;; ***HERE*** the compiler detects common "tail" (getNch, goto end of loop)
    7e9e:	11 f4       	brne	.+4      	; 0x7ea4 <main+0xa0>
    7ea0:	84 e1       	ldi	r24, 0x14	; 20
    else if(ch == STK_SET_DEVICE_EXT) {
    7ea2:	03 c0       	rjmp	.+6      	; 0x7eaa <main+0xa6>
      getNch(5);
    7ea4:	85 34       	cpi	r24, 0x45	; 69
    7ea6:	19 f4       	brne	.+6      	; 0x7eae <main+0xaa>
    7ea8:	85 e0       	ldi	r24, 0x05	; 5
    else if(ch == STK_LOAD_ADDRESS) {
    7eaa:	83 d0       	rcall	.+262    	; 0x7fb2 <getNch>
    7eac:	5e c0       	rjmp	.+188    	; 0x7f6a <main+0x166>


;; With TICK macro defined:
;;
    else if(ch == STK_SET_DEVICE) {
    7e9c:	82 34       	cpi	r24, 0x42	; 66
      getNch(20); 
    7e9e:	19 f4       	brne	.+6      	; 0x7ea6 <main+0xa2>
    7ea0:	84 e1       	ldi	r24, 0x14	; 20
      TICK(set_device_size);
;;;  ***HERE*** we get the getNch, rjmp main+0x16e twice.
    7ea2:	8b d0       	rcall	.+278    	; 0x7fba <getNch>
    else if(ch == STK_SET_DEVICE_EXT) {
    7ea4:	66 c0       	rjmp	.+204    	; 0x7f72 <main+0x16e>
      getNch(5);
    7ea6:	85 34       	cpi	r24, 0x45	; 69
    7ea8:	19 f4       	brne	.+6      	; 0x7eb0 <main+0xac>
      TICK(size_setdevext);
    7eaa:	85 e0       	ldi	r24, 0x05	; 5
    else if(ch == STK_LOAD_ADDRESS) {
    7eac:	86 d0       	rcall	.+268    	; 0x7fba <getNch>
      address.bytes[0] = getch();
    7eae:	61 c0       	rjmp	.+194    	; 0x7f72 <main+0x16e>

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is this something to do with "memory barriers"? I think "asm()" may invoke one which upsets the compiler's ability to optimize.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

IIRC R1 R0 and T of SREG aka cc are implicitly clobbered by design.

The other seven bits of SREG are implicitly clobbered

as a result of the inline assembly implementation.

The implementation change  to keep avr-gcc away from

the headsman would unclobber the above seven bits.

 

Edit: Correction per ralpd, #17

Iluvatar is the better part of Valar.

Last Edited: Fri. Nov 13, 2020 - 05:12 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

 

EXCEPT that the use of the macro changes the size of the executable.  Even though it produces no actual code,

My first electronics teacher (high school) told me when you measure something, you change it!  Looks like that holds for software too!

 

 

 

(Possum Lodge oath) Quando omni flunkus, moritati.

"I thought growing old would take longer"

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

IIRC R1 and T of SREG aka cc are implicitly clobbered by design.

The other seven bits of SREG are implicitly clobbered

as a result of the inline assembly implementation.

The implementation change  to keep avr-gcc away from

the headsman would unclobber the above seven bits.

I don't think that applies to basic asm statements, which this is.  Basic asm is any asm statement without operands.

 

However, all basic asm statements are implicitly volatile:

https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html

... although with only a label, I can't see why it would have any effect on code generation:

GCC does not parse the assembler instructions themselves and does not know what they mean or even whether they are valid assembler input.

In particular, the OP's use appears to be exactly one of the forseen use cases:

  • Extended asm statements have to be inside a C function, so to write inline assembly language at file scope (“top-level”), outside of C functions, you must use basic asm. You can use this technique to emit assembler directives, define assembly language macros that can be invoked elsewhere in the file, or write entire functions in assembly language. Basic asm statements outside of functions may not use any qualifiers.

As already noted:

Do not expect a sequence of asm statements to remain perfectly consecutive after compilation. If certain instructions need to remain consecutive in the output, put them in a single multi-instruction asm statement. Note that GCC’s optimizers can move asm statements relative to other code, including across jumps.

In addition:

GCC has no visibility of symbols in the asm and may discard them as unreferenced.

Curiously:

t also does not know about side effects of the assembler code, such as modifications to memory or registers. Unlike some compilers, GCC assumes that no changes to general purpose registers occur. This assumption may change in a future release.

Which suggests that the effects observed by the OP should not occur.

 

OP, you may want to use extended asm to see if it makes a difference:

 

#define TICK(name) asm(".global " #name "\n .equ " #name ", .-1b\n1:\n", : :)

 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ki0bk wrote:

westfw wrote:

 

 

EXCEPT that the use of the macro changes the size of the executable.  Even though it produces no actual code,

 

 

My first electronics teacher (high school) told me when you measure something, you change it!  Looks like that holds for software too!

 

 

I had trouble grasping OP's statement.  What is an "executable" in this construct?  A debugging format with source-code information?  Well, of course that would change when more lines are added, wouldn't it?

 

Then we see the puzzling "no actual code".  But finally, the dump shows that there ARE more instructions. 

 

Probably the gurus will point to where the "memory barrier" or analogous is documented.  My first thought was similar, as when you do #asm in CV all bets are off about internal values being carried over.  I suppose I could construct an empty case and see what happens; I guess I never really considered doing

it.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

I had trouble grasping OP's statement.  What is an "executable" in this construct?  A debugging format with source-code information?

 

The OP basically wants to generate global labels inside the elf file to mark interesting places in the code, to help with size optimization profiling I think. But only the elf file should be larger, not the binary code itself. However, the binary is larger.

Last Edited: Thu. Nov 12, 2020 - 07:22 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:
the binary is larger

Is it?

 

I think that was theusch's point - what was the thing that was "bigger" ?

 

only the elf file should be larger, not the binary code itself

I wonder if it's something like padding being added; so there's not any extra executable code - but it does make the binary larger ... ?

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:

El Tangas wrote:

the binary is larger

 

Is it?

 

Well, that's my understanding from post #3.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think that was theusch's point - what was the thing that was "bigger" ?

.text segment size as reported by avr-size.  That's ONLY the stuff that ends up in flash, right?   (I guess I didn't actually SAY that in the first post...)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

when you measure something, you change it!  Looks like that holds for software too!

Oh, certainly!  We even had a name for them at my last employer: "Heisenbugs"

 

However, all basic asm statements are implicitly volatile:

I had not realized that.  Hmm.

you may want to use extended asm to see if it makes a difference.

However...  No difference.

 

 

The other seven bits of SREG are implicitly clobbered as a result of the inline assembly implementation.

Ah.  This sounds like a likely culprit.

 

The implementation change  to keep avr-gcc away from the headsman would unclobber the above seven bits.

And thank you for providing a bit of insight as to what that whole change is about!

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

Ah.  This sounds like a likely culprit.

Or this:

For basic asm with non-empty assembler string GCC assumes the assembler block does not change any general purpose registers, but it may read or write any globally accessible variable.

From the same I linked to in #7.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

it may read or write any globally accessible variable.

 I'm not sure that the code has anything that the compiler would think is a "globally accessible variable."  SFRs?  The example in https://www.avrfreaks.net/commen... is all registers...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

Optibootl apparently gets about 14 bytes bigger with the gcc 7.3.0 compiler that Arduino is distributing...

Hmmph.

 

For the sake of analysis, I created an ASM macro:

#define TICK(name) asm(".global " #name "\n .equ " #name ", .-1b\n1:\n")

I can sprinkle statements like:

if (cmd == FOO) {
    TICK(FOO_START);
    // some code
    TICK(FOO_END);
} else if (cmd == BAR) {
    // more code
    TICK(FOO_END)
}

throughout the code, and the elf file will contain nice absolute symbols who value is the distance between "ticks", and it should give me an idea where things have gotten bigger.

 

Clever macro.  I wonder if it could be tripped if you use any avr-libc inline asm functions that use the "1:" lablel.  I think some of them use the ".L" form of local labels, so maybe there would be no conflict.

 

I can't offer any help on fixing the problem, though I can confirm I've seen similar weirdness before, and not just with avr-gcc 7.3.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

IIRC R1 and T of SREG aka cc are implicitly clobbered by design.

 

I think you have a typo there, and meant to say R0, not R1:

"Register r0 r0 may be freely used by your assembler code and need not be restored at the end of your code. "

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
I think you have a typo there, and meant to say R0, not R1:

"Register r0 r0 may be freely used by your assembler code and need not be restored at the end of your code. "

Correction made.

Thank you for the benefit of the doubt.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My guess is that basic asm inside a function is treated like extended asm:

R0 and SREG are implicitly clobbered.

Would the .s from the unsprinkled code work if SREG and R0

were clobbered by imaginary sprinkles?

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

... by imaginary sprinkles?

I have them on my imaginary sundae when trying to lose weight.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've uploaded a stripped and cleaned copy of the code/makefile to https://github.com/WestfW-patche...

(It's still all of atmega328 optiboot; i just deleted a lot of the code and conditionals for options and other processors to make things clearer.)
(I hope to make additional changes and provide additional annotated output files to make it easier to look at the problem, but this is a start.)

 

Would the .s from the unsprinkled code work if SREG and R0 were clobbered by imaginary sprinkles?

I actually don't think so.  r0 only seems to get used by inner-loop spm-related stuff, and the code that's getting longer is pretty much a string of if/elseif statements continually resetting SREG.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

 

Would the .s from the unsprinkled code work if SREG and R0 were clobbered by imaginary sprinkles?

I actually don't think so.  r0 only seems to get used by inner-loop spm-related stuff, and the code that's getting longer is pretty much a string of if/elseif statements continually resetting SREG.

I think that that is your answer:

Your blank asms are clobbering SREG and more instructions are needed to fix.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

(oops.  I got the sense of the question wrong.   I think the code would work fine if the macros clobbered SREG.)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
I think the code would work fine if the macros clobbered SREG.
The compiler may not think so.  Of course, it's been wrong before :-)

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]