avr-gcc inline assembler concrete documentation?

Go To Last Post
48 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is anyone aware of a very good, technical description of avr-gcc inline asm?  It took me only a few months to learn straight asm (i.e. write main.S, compile with avr-gcc), but after years of AVR 8-bit development I still haven't mastered inline asm.

I've read the official docs numerous times:

https://www.nongnu.org/avr-libc/...

gcc.gnu.org/wiki/avr-gcc

 

I've read lots of code examples with inline asm, and a few blog posts such as Jim Eli's:

https://ucexperiment.wordpress.c...

While generally helpful, it's littered with mistakes such as using "=r" with "ldi" (which means the constraint should be "=d" instead).

 

Much of what I've learned is through slow trial and error looking through disassembled code.  Sometimes I figure out how something works, and despite working consistently across different compiler versions, I wonder if it will change in the future.

Other times it does things (particularly with register allocation and use) that makes no sense to me, and may even be bugs.

 

For a specific example of the kind of things I'm trying to do with inline asm, I wrote a blog post a few years ago about lightweight asm wrappers.

http://nerdralph.blogspot.com/20...

It's now four years later, and I still haven't figured a way to get the compiler to use r24 for an input parameter. i.e.

asm volatile ("rcall _foo" :: "r24" (arg));

 

So has anyone found an avr-gcc guru that has written down the secrets of inline assembler?

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There are two parts to the documentation. Generic stuff about asm() in GCC in the GCC manual and then specific AVR stuff in the AVR-LibC manual. I don't believe there's anything beyond that but most users of inline asm have found that info alone sufficient.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
It took me only a few months to learn straight asm (i.e. write main.S, compile with avr-gcc), but after years of AVR 8-bit development I still haven't mastered inline asm

So why not just use proper, "straight" asm?

 

IMO, inline assembler is always a kludge - so why not just do it properly?

 

https://www.avrfreaks.net/commen...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:
IMO, inline assembler is always a kludge - so why not just do it properly?

And "straight" asm would be portable across tool chains as well.

Click Link: Get Free Stock: Retire early! PM for strategy

share.robinhood.com/jamesc3274
get $5 free gold/silver https://www.onegold.com/join/713...

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, assembler is inherently not portable.

 

But keeping it in separate source files, with well-defined interfaces to the HLL, will make it easier to refactor, and leaves the HLL (a lot more) portable ...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:

ralphd wrote:

It took me only a few months to learn straight asm (i.e. write main.S, compile with avr-gcc), but after years of AVR 8-bit development I still haven't mastered inline asm

 

So why not just use proper, "straight" asm?

 

IMO, inline assembler is always a kludge - so why not just do it properly?

 

https://www.avrfreaks.net/commen...

 

Because I want to do things that I still haven't found ways of doing in straight asm.  For example one of the soft UART libraries I wrote years ago is pure asm, interfacing with C through the standard ABI.

http://nerdralph.blogspot.com/20...

 

The tx function uses one parameter (r24), and clobbers two other registers, for a total of three registers used.  Yet the standard ABI requires gcc to reserve r18-27 & Z for the called function.  That usually means the compiler adds a lot of wasted instructions to push & pop registers before and after the function call.

With the updated version I'm working on, using inline asm, the compiler is free to use 8 more of the upper registers.  If there's a way to do this without resorting to inline asm, I'd like to know, because I rather dislike writing inline asm.

 

Another reason I'm using inline asm is so I can use __builtin_avr_delay_cycles().

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

ralphd wrote:
It's now four years later, and I still haven't figured a way to get the compiler to use r24 for an input parameter. i.e.
This might do what you want.

{
    register unsigned char foo asm("r24");
    foo=arg;
    asm ("call _foo")
}

If the compiler rejects the foo definition, methinks it cannot be done.

You'd need a MOV within the asm.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Yeah, I was going to suggest the same, except maybe tell the compiler that r24 is going to be needed, or the compiler may optimize it away, who knows.

 

inline void eelog(uint8_t data)
{
	register uint8_t my_reg asm ("r24") = data;

	asm volatile (
	"rcall eelog_\n"
	:
	: "r" (my_reg)
	);
}

 

edit: BTW, I found some stuff about inline asm I haven't seen before, so:

https://rn-wissen.de/wiki/index.php/Inline-Assembler_in_avr-gcc (german)

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob_plain;f=gcc/config/avr/constraints.md;hb=HEAD (source code of gcc that defines(?) all existing constraints)

Last Edited: Thu. Jan 30, 2020 - 01:42 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

ralphd wrote:
It's now four years later, and I still haven't figured a way to get the compiler to use r24 for an input parameter. i.e.
This might do what you want.

{
    register unsigned char foo asm("r24");
    foo=arg;
    asm ("call _foo")
}

If the compiler rejects the foo definition, methinks it cannot be done.

You'd need a MOV within the asm.

 

Thanks, that works(with foo as an input parameter).  At least with avr-gcc 5.3.0, the compiler optimizes away the assignment, and just uses r24 for the argument from the calling code.

 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Is concrete documentation written out with hammer and chisel?surprise

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
    asm ("call _foo")

 

Will that prevent the compiler from using the defined ABI in general?    That's potentially useful!  Presumably you can mix inline asm and real assembler to put arguments in whatever registers you want?

 

Can you set up "real assembler" to be further optimized at link time (-flto)?  I'd guess not - the compiler isn't actually doing that optimization at assembly level, is it?

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The generic gcc documentation (extended asm) suggests this will work:

{
    register unsigned char foo asm("r24");
    foo=arg;
    asm ("call _foo" : : "r"(foo))
}

Note that so far as the compiler can tell, the call to _foo will only affect register r0.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
Will that prevent the compiler from using the defined ABI in general?    That's potentially useful!  Presumably you can mix inline asm and real assembler to put arguments in whatever registers you want?
Yes and yes.

To the compiler, the inline asm statement is not a function call.

The compiler only applies the function call ABI to function calls using C syntax.

 

Occasionally we get questions regarding software interrupts.

Calling an ISR should get the desired effect:

asm ("cli\n"
     "call isrname" );

 

 

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:

source code of gcc that defines(?) all existing constraints)

https://gcc.gnu.org/viewcvs/gcc/...

Is the avr-gcc source file that defines constraints.  This is more than documented in

https://gcc.gnu.org/onlinedocs/g...

for example, but it's unlikely you ever need something like "Constant 3-byte integer that allows AND without clobber register."

 

Then there are the generic constraints

https://gcc.gnu.org/onlinedocs/g...

From these, you only want to use "r", "i", "s" and "n".  For very special purposes "m" might be helpful, but on AVR a mnemonic depends on the address, hence "m" is nor very useful.

 

For the print modifiers, see

https://gcc.gnu.org/viewcvs/gcc/...

 

Apart from constraints and print modifiers, AVR inline asm is like inline asm for any other architecture.

 

 

 
   

 

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

    asm ("call _foo")

 

Will that prevent the compiler from using the defined ABI in general?    That's potentially useful!  Presumably you can mix inline asm and real assembler to put arguments in whatever registers you want?

 

Can you set up "real assembler" to be further optimized at link time (-flto)?  I'd guess not - the compiler isn't actually doing that optimization at assembly level, is it?

 

Avoiding the heavy register cost of the standard ABI was why I originally came up with the lightweight asm call idea.

http://nerdralph.blogspot.com/20...

Now with the trick described above, you can use any register (instead of just x, y, or z) for function parameters (and return data).

 

As for the optimizer, it doesn't do much with asm.  The only thing I've seen it do is tail elimination.  One of my recent picoboot builds was 2 bytes smaller than I expected.  When I looked at the disassembly, "rcall foo", "ret" was optimized to "rjmp foo".

 

With LTO in 5.3.0 and 7.4.0, writing in C is much better than it was prior to 4.9.2 due to whole-program inter-procedural optimization and constant propagation.  I try to avoid writing OO C++ code for AVRs because gcc doesn't devirtualize as well as it could.

 

 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:

El Tangas wrote:

source code of gcc that defines(?) all existing constraints)

https://gcc.gnu.org/viewcvs/gcc/...

Is the avr-gcc source file that defines constraints.  This is more than documented in

https://gcc.gnu.org/onlinedocs/g...

for example, but it's unlikely you ever need something like "Constant 3-byte integer that allows AND without clobber register."

 

Then there are the generic constraints

https://gcc.gnu.org/onlinedocs/g...

From these, you only want to use "r", "i", "s" and "n".  For very special purposes "m" might be helpful, but on AVR a mnemonic depends on the address, hence "m" is nor very useful.

 

For the print modifiers, see

https://gcc.gnu.org/viewcvs/gcc/...

 

Apart from constraints and print modifiers, AVR inline asm is like inline asm for any other architecture.

 
   

 

I hadn't read those before, though I'm not doing to well figuring out how to use them.  I'm trying to pass the address for rcall as an input parameter, but always get an error.

'asm volatile ("rcall %1" : "+r"(ch) : "m"(pu_tx) : "r18", "r19");'

I tried "i" and got the same error, "Error: garbage at end of line".

I also tried "p", and get the same "garbage at end of line" error.

 

I'm getting the impression that constraints are really meant for implementing the back-end for different targets, and using it for inline asm is more of an afterthought.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
I hadn't read those before, though I'm not doing to well figuring out how to use them.  I'm trying to pass the address for rcall as an input parameter, but always get an error.

'asm volatile ("rcall %1" : "+r"(ch) : "m"(pu_tx) : "r18", "r19");'

I tried "i" and got the same error, "Error: garbage at end of line".

I also tried "p", and get the same "garbage at end of line" error.

My guess is that it is coming from the assembler.

Use -save-temps to find out for sure.

I've not found constraints for rcall or call.

Constraints for branch instructions include "label".

I think that means you have to put the actual label in the string. pu_tx?

register char ch asm("r24");

Correct?

You will need something like that for pu_tx to know where to put its return value.

 

Once you get rid of the input parameter and make the string "rcall pu_tx"

the result might work.

I infer that pu_tx clobbers r18 and r19.

 

For future-proofing, you might add cc to the clobber list.

Currently avr-gcc implicitly adds SREG's flags to the clobber list.

If the bounty is claimed, that will change.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

When beginning on AVR some many years ago,  I was obsessed with code space & efficiency and wrote several routines in inline assembly.

 

Here's an example that calls functions using the defined ABI. You may find it useful.

 

//Binary 20-Bit to 5 ASCII hex chars (Writes to dest)
//---------------------------------------------------------------
//IP: char *dest       = R25,R24
//IP: uint32_t data    = R23,R22,R21,R20
//OP: char* end_of_str = R25,R24
//---------------------------------------------------------------
char *BIN2AH20 (char *dest, uint32_t data)
{
    __asm__ (	        			\
    " mov	R24,%C[data]"	"\n"\
    " andi	R24,lo8(15)"	"\n"\
    " call	BIN2AHC"		"\n"\
    " st	Z+,R24"			"\n"\
    " movw	R24,R30"		"\n"\

    " mov	R22,%B[data]"	"\n"\
    " call	BIN2AH8"		"\n"\
    " mov	R22,%A[data]"	"\n"\
    " call	BIN2AH8"		"\n"\
    " movw	%[dest],R24"	"\n"\

    :[dest] "+z" (dest)
    :[data] "r" (data)
    :"r24","r25", "r22");

    return dest;
}

I had commented this out years ago but it still compiles OK.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

ralphd wrote:
I hadn't read those before, though I'm not doing to well figuring out how to use them.  I'm trying to pass the address for rcall as an input parameter, but always get an error.

'asm volatile ("rcall %1" : "+r"(ch) : "m"(pu_tx) : "r18", "r19");'

I tried "i" and got the same error, "Error: garbage at end of line".

I also tried "p", and get the same "garbage at end of line" error.

My guess is that it is coming from the assembler.

Use -save-temps to find out for sure.

I've not found constraints for rcall or call.

Constraints for branch instructions include "label".

I think that means you have to put the actual label in the string. pu_tx?

register char ch asm("r24");

Correct?

You will need something like that for pu_tx to know where to put its return value.

 

Once you get rid of the input parameter and make the string "rcall pu_tx"

the result might work.

I infer that pu_tx clobbers r18 and r19.

 

For future-proofing, you might add cc to the clobber list.

Currently avr-gcc implicitly adds SREG's flags to the clobber list.

If the bounty is claimed, that will change.

 

Thanks for the suggestions, but it seems to be taking me further down the rabbit hole with no exit in sight.

It wasn't evident from the documentation that "label" was a valid constraint, so I had never tried it.  When I just did, I get some strange errors including:

"pu.c:(.text.print+0x12): undefined reference to `r14'"

I added -save-temps to the gcc options (and removed -flto because it generates incomprehensible temps), and the line in the .s file causing the error is, "rcall r14".

Here's the inline asm line I used:

asm volatile ("rcall %1" : "+r"(ch) : "label"(_pu_tx): "r18", "r19");

I also tried "label"(&_pu_tx) in case _pu_tx wasn't treated as the address of the function.

 

I'm going to call it quits, at least for now, as I have it working a slightly different way:

inline void pu_tx(char c)
{
    register char ch asm("r24") = c;
    asm volatile ("" : "+r"(ch) :: "r18", "r19");
    _pu_tx();
}

 

The asm line with no code "" sets up the input param in 24, and declares r18 and r19 clobbered.  Then the "_pu_tx();" gets compiled to rcall _pu_tx, which is a naked function written in C, containing mostly inline asm.

You can find the code (still what I'd call beta) here:

https://github.com/nerdralph/ner...

 

You'll see I call " __builtin_avr_delay_cycles(PUTXWAIT()); " in _pu_tx, which ordinarily would use any available upper register, so I pin some dummy registers to try to ensure the delay uses r18.  Now that I'm writing this, I realize that's a bad choice since __builtin_avr_delay_cycles can't use r19:18 with a long delay that requires the sbiw instruction.  I guess I'll change that to leave one of the 4 upper register pairs free.  It's kind of a serendipitous example of how hard (at least for me) it is to get the compiler to generate the asm code that I want.

 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

ralphd wrote:
I tried "i" and got the same error, "Error: garbage at end of line".

 

You are right, I tried all kinds of constraints and always get that error. Intermediate files show that the argument gets wrapped in some gs() operator, probably only documented in the head of the developer who invented it, since apparently even the assembler doesn't know what it is.

 

main:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
	ldi r24,lo8(42)
/* #APP */
 ;  21 ".././main.cpp" 1
	call gs(eelog_)

 ;  0 "" 2
/* #NOAPP */
	ldi r30,lo8(95)
	ldi r31,0
.L2:
	ld r24,Z
/* #APP */
 ;  21 ".././main.cpp" 1
	call gs(eelog_)

 ;  0 "" 2
/* #NOAPP */
	sbiw r30,1
	cpi r30,31
	cpc r31,__zero_reg__
	brne .L2
	ldi r25,0
	ldi r24,0
/* epilogue start */
	ret
	.size	main, .-main
	.ident	"GCC: (GNU) 7.3.0"

 

Now, if this intermediate file was .S instead of .s, maybe I could hack a #define gs(x) x inside the inline assembly to get rid of this abomination.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
I'm trying to pass the address for rcall as an input parameter, but always get an error.

'asm volatile ("rcall %1" : "+r"(ch) : "m"(pu_tx) : "r18", "r19");'

Suppose you have an asm function that returns an int in R18 but otherwise does not change anything except read / write from memory:

int call (void)
{
    register int r18 asm ("18");
    extern int callee (void);

    asm ("%~call %x1" : "=r" (r18) : "i" (callee) : "memory");
    return r18;
}

I am preferring C/C++ prototypes instead of just dropping in the symbol into the asm. The signature of the prototype does not matter, it's just for the record (you wouln't call it from C, anyway, because callee uses a different ABI).

call:
	rcall callee
	movw r24,r18
	ret

This code if for ATmega8 (has MOVW but no CALL). In practice, it's likely you want this as static inline.  In C++, you'd write "extern "C" int callee();" or extern int callee() __asm("callee")".

 

(???How does one get rid of that silly, empy table??? - post edited by Cliff - I just selected "across" the thing and removed with before/after white-space)

avrfreaks does not support Opera. Profile inactive.

Last Edited: Fri. Jan 31, 2020 - 04:14 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:

ralphd wrote:
I'm trying to pass the address for rcall as an input parameter, but always get an error.

'asm volatile ("rcall %1" : "+r"(ch) : "m"(pu_tx) : "r18", "r19");'

Suppose you have an asm function that returns an int in R18 but otherwise does not change anything except read / write from memory:

int call (void)
{
    register int r18 asm ("18");
    extern int callee (void);

    asm ("%~call %x1" : "=r" (r18) : "i" (callee) : "memory");
    return r18;
}

I am preferring C/C++ prototypes instead of just dropping in the symbol into the asm. The signature of the prototype does not matter, it's just for the record (you wouln't call it from C, anyway, because callee uses a different ABI).

call:
	rcall callee
	movw r24,r18
	ret

This code if for ATmega8 (has MOVW but no CALL). In practice, it's likely you want this as static inline.  In C++, you'd write "extern "C" int callee();" or extern int callee() __asm("callee")".

 

(???How does one get rid of that silly, empy table???)

 

Thanks.  Not only does that work, but the resulting code is even a bit smaller.  I don't think I could've come up with that just from reading the official docs.  Only the German doc that El Tangas posted might've gotten me close.

Looking at the differences in the generated code was enlightening.  Because callee (_pu_tx) is naked in my code, I had thought that meant the function was responsible for saving/restory ALL registers (like an ISR).  However it's now apparent that when calling a naked function, the compiler still assumes it clobbers registers according to the standard ABI (r18-27 & Z).

 

I'm even starting to understand inline asm now (just a bit).  It seems the trick is to stop thinking like a normal software developer and to start thinking like a compiler developer.  It reminds me of meta programming languages (which I don't particularly like) where you have to write code that generates the code you actually want.

 

p.s. I noticed you didn't use volatile after the asm statement.  I'm not clobbering memory, so I think I still need the volatile, right? i.e.:

     asm volatile ("rcall %x1" : "+r"(ch) : "i"(_pu_tx) : "r26", "r27");

 

p.p.s. Kudos to you for being so helpful.  When I made the original post, I thought the chances were about 50/50 for someone having useful knowledge to share about inline asm.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

Last Edited: Fri. Jan 31, 2020 - 04:29 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ah, so the secret incantation to remove gs() was that x inserted in the argument. Praise the Machine Spirit!

Yes, it is documented in the German link:

%xn Ab 4.5. Gibt ein Label ohne Operand-Modifier gs() aus.

 

But c'mon... how were we supposed to find that? This document has important information that can't be found anywhere else, except some hints in the source code. I had never seen it prior to this threadangry

 

Note to self: relaaaax... remember: infinite value... breathe...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gs() is supposed to be meaningful to the assembler.

It's supposed to be used on platforms with more than 64K words of program memory.

'Twould be nice if were harmless on others.

In a one-off, I'd just put the label in the string.

In macros that expand to inline assembly, I've used C's implicit string concatenation.

Had not heard of %xn before.

 

volatile is not necessary if an output is used.

Clobbers, including memory, do not count as outputs.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:

The tx function uses one parameter (r24), and clobbers two other registers, for a total of three registers used.  Yet the standard ABI requires gcc to reserve r18-27 & Z for the called function.  That usually means the compiler adds a lot of wasted instructions to push & pop registers before and after the function call.

With the updated version I'm working on, using inline asm, the compiler is free to use 8 more of the upper registers.  If there's a way to do this without resorting to inline asm, I'd like to know, because I rather dislike writing inline asm.

 

Ralph,

 

Have you tried putting all the code in one file and declaring the non-main's static?   IIRC has resulted in removin the spurious push/pop operations.

If you want to prevent inlining you can add __attribute__((noinline)) to function definitions (e.g., static void __attribute__((noinline)) foo(int a) { ... }).

 

I believe the compiler optimizes up to the translation-unit (i.e., c-file) level only, and declaring functions static guarantees the API does not have to be honored.

 

Matt

Last Edited: Sun. Feb 2, 2020 - 04:17 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:

skeeve wrote:

I think that means you have to put the actual label in the string. pu_tx?

register char ch asm("r24");

Correct?

No.

 

You have either:

  • A label or immediate like int address.  Then use "%~call %x0" with constraint "i".
  • The address in a register. Then use "%!icall" with constraint "z".
  • Or, if Z is part of the ABI and cannot be used for indirect call, then use "push %A0\;push %B0\;ret" with constraint "r".

 

I also tried "label"(&_pu_tx) in case _pu_tx wasn't treated as the address of the function.

Don't try random things.  The likelihood that you hit the jackpot is 0.

 

Constraint "label" is the union of "l", "a", "b" and "e".

I noticed you didn't use volatile after the asm statement.  I'm not clobbering memory, so I think I still need the volatile, right?

"volatile" means volatile: You have some actions that must not be optimized away like SFR access.

 

Memory clobber means you are changing memory, or your asm must not be reordered w.r.t. other memory accesses.

 

Suppose your asm is just computing the sum of 2 registers. Then it need neither be volatile nor memory clobber.  In particular, if you do not use the result, the compiler may optimize out the asm altogether.

 

But c'mon... how were we supposed to find that?

It's in German because back then, it was the only embedded-affine, AVR-affine wiki that worked reasonable (mediawiki).  The avrfreaks wiki was a pain in the bum, and even today it's unpopular and deserted.  Anyone who can read .de can pick it up and part to whatever wiki / site they prefer :-)

 

gs() is supposed to be meaningful to the assembler.

gs() only makes sense when _taking_ an address for later indirect call.  It makes no sense when calling a label. Why would you have / use a linker stub (trampoline) for a _call_?

 

 

declaring functions static guarantees the API does not have to be honored.

Wrong.  The compiler _always_ calls functions according to the ABI.  If a function is inlined, then there is no call-ABI to comply to.  If a function is not inlined, hence called, the compiler always (like in ALWAYS) uses the ABI.  If the compiler pops a clone for a function, then it creates a new function with a different prototype (that's the whole point of cloning and partial inlining), but the compiler passes argument following the ABI for that new function.

 

If a CALL instruction uses a different interface, then it is always a transparent call, i.e. the compiler does not see a function call at all.  An example is the asm ("%~call..) from above that's just text bu not a call in any way.

 

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks to @SprinterSB's help, I've pushed a beta of picoUART to github.

https://github.com/nerdralph/ner...

 

While the code works well (i.e. outputs a properly-timed signal), I find all the inline asm makes the code look messy.  Does anyone have suggestions for how to write clean & maintainable code that has lots of inline asm?

 

 

 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
    __asm__ __volatile__ (
                        "0:                                       \n\t"
    /* read next char */  "lpm      %[c], %a0+                    \n\t" // 3
                          "tst      %[c]                          \n\t" // 1
                          "breq     1f                            \n\t" // 1/2
                          "%~call   %x2                           \n\t" // 3
                          "rjmp     0b                            \n"   // 2
                        "1:                                       \n\t"
                        :
                               "+e" (s),
                           [c] "+r" (c)
                        :
                               "i" (_pu_tx)
                        :
                          "r19", "r24", "r25"
                         );

 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That's wrong: LPM always uses Z, not some "e" = pointer register (X, Y, or Z).  And it's likely that _pu_tx is clobbering memory like storing the values read from flash somewhere in RAM.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
That's wrong
I was merely formatting the OP's code linked in #29.  All content is his.

 

But that is true of course.  As written, if the compiler selects anything but Z, the assembler will barf.  If it happens to select Z, it will silently work, albeit by accident.

 

The correct constraint should be:

                               "+z" (s),

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Wed. Feb 5, 2020 - 06:11 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


awneil wrote:

I found some concrete documentation: ...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Does that make you shit bricks?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:

That's wrong: LPM always uses Z, not some "e" = pointer register (X, Y, or Z).  And it's likely that _pu_tx is clobbering memory like storing the values read from flash somewhere in RAM.

 

Ugh.  I only added inline asm for prints_P because __flash doesn't work for C++.  It was quite simple in straight C:

void prints_P (const __flash char* s){
    char c;
    while (c = *s++) pu_tx(c);
}

I might just take it out so it's just the transmit & receive with no helper functions since there's at least 3 different ways I'm aware of (__flash for C, PROGMEM for C++, and Arduino's __FlashStringHelper).

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
It was quite simple in straight C:
It's not really that much more complex in C++ (which is the same as the old PROGMEM way we were doing things for years before __flash existed):

void prints_P (const char* s){
    char c;
    while (c = pgm_read_byte(s++)) pu_tx(c);
}

All that changes is that the *s deference needs a bit of hand holding to make it do LPM no LD.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

ralphd wrote:

It was quite simple in straight C:

 

It's not really that much more complex in C++ (which is the same as the old PROGMEM way we were doing things for years before __flash existed):

void prints_P (const char* s){
    char c;
    while (c = pgm_read_byte(s++)) pu_tx(c);
}

All that changes is that the *s deference needs a bit of hand holding to make it do LPM no LD.

 

Sure, it's not difficult.  Just not as clean and simple as C.  Using __flash is also more efficient, as (*s++) compiles to "lpm r, Z+", but when using pgm_read_byte(s++) it compiles to "lpm r, Z" and "addiw r, 1".

There may be a way to implement "pgm_read_byte(s++)" with an autoincrement constraint '>', but I've done enough experimenting with inline asm for now.

https://gcc.gnu.org/onlinedocs/g...

 

 

 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
compiles to "lpm r, Z+", but when using pgm_read_byte(s++) it compiles to "lpm r, Z" and "addiw r, 1".
Does that actually matter? Are you writing something like video generation where every last cycle counts then?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

where every last cycle counts then?

I *think* the OP's objective is every last byte due to his specialty in itty-bitty boot loaders.  I'm sure he will come along to correct me if I'm wrong.

Letting the smoke out since 1978

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I didn't like how ugly the code turned out, so I'm re-writing it in C, with minimal use of inline asm.  I've discovered that in the past several years avr-gcc optimizes well enough that I can often get the compiler to generate the asm code I want.  Here's one example:

psave.b2 = f.lo8 & 0x01 ? 1 : 0;

psave.b2 is a bitfield (bit 2), and f.lo8 is a byte.  When I started using avr-gcc almost a decade ago, the above code would compile to several asm instructions.  As of avr-gcc 5 (and maybe even 4.9), it compiles to the optimal bst + bld 2-instruction sequence.

It would be even better if I didn't have to use a union with a bitfield struct, but it's still a lot cleaner than a whole bunch of inline asm.  And I think there's a lot more embedded programmers that are familiar with bitfield structs, so integrating a C version of my softuart code into other software will be simpler.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:

I didn't like how ugly the code turned out, so I'm re-writing it in C, with minimal use of inline asm.  I've discovered that in the past several years avr-gcc optimizes well enough that I can often get the compiler to generate the asm code I want.  Here's one example:

psave.b2 = f.lo8 & 0x01 ? 1 : 0;

psave.b2 is a bitfield (bit 2), and f.lo8 is a byte.  When I started using avr-gcc almost a decade ago, the above code would compile to several asm instructions.  As of avr-gcc 5 (and maybe even 4.9), it compiles to the optimal bst + bld 2-instruction sequence.

It would be even better if I didn't have to use a union with a bitfield struct, but it's still a lot cleaner than a whole bunch of inline asm.  And I think there's a lot more embedded programmers that are familiar with bitfield structs, so integrating a C version of my softuart code into other software will be simpler.

 

 

Here's the C version, with only 3 asm statements (or just 2 if you don't count the pinned register).

https://github.com/nerdralph/pic...

 

I had to use asm goto in the tx/rx loops to get the tightest timing.  The closest I could get with pure C was one extra cycle for both loops.

 

 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've discovered that in the past several years avr-gcc optimizes well enough that I can often get the compiler to generate the asm code I want.

  1. That means a lot, coming from you.  (seriously!)
  2. My main worry, once I've gotten the compiler to generate code that I like, is that it will suddenly change because of some "improvement" to the compiler.  (Optiboot has several issues of the form "doesn't fit when compiled with <new version> of avr-gcc.")

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

I've discovered that in the past several years avr-gcc optimizes well enough that I can often get the compiler to generate the asm code I want.

  1. That means a lot, coming from you.  (seriously!)
  2. My main worry, once I've gotten the compiler to generate code that I like, is that it will suddenly change because of some "improvement" to the compiler.  (Optiboot has several issues of the form "doesn't fit when compiled with <new version> of avr-gcc.")

 

 

I also commented how hard it is to get the compiler to generate the asm code that I want.  And the parts of the code that I really care about are hot spots like the loops in bit-banged communication.  For something as large as a even a minimal stk500 bootloader, I wouldn't even try writing it in C.  I'll go so far as to say it would be impossible to write an Arduino compatible bootloader in C and still have it fit in 256 bytes like I did with picobootArduino.  You might be able to do it with heavy use of inline asm, but that's much worse than just writing it all in asm.

 

I still have concerns that future compiler versions may change the resulting asm.  One of the ways I minimize that risk is trying several different compiler versions (4.9, 5.4, 7.3, 9.2) to confirm that the compiler has been consistent in how it compiles a given segment of code.  I also look at the output asm to make sure it's as good as I would do with hand-tuned asm.  If the compiler generates one extra instruction in a loop, I'll try writing the C in different ways to see if I can find one that compiles to the optimal asm.  I talked about that technique in my Fastest SPI in the West blog post. http://nerdralph.blogspot.com/20...

I know you've seen the post before, but I mentioned it for the benefit of others reading this thread.

 

 

 

 

 

 

 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
I still have concerns that future compiler versions may change the resulting asm. 
So write the entire thing in Asm so there's no reliance on the C compiler's code generation model at all? For 256 bytes or whatever what would be the point in writing some bits in C and some in inline Asm. You might as well just put the entire thing into a .S and forget the C bits (which are surely easy to replicate - probably more efficiently).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

ralphd wrote:
I still have concerns that future compiler versions may change the resulting asm. 
So write the entire thing in Asm so there's no reliance on the C compiler's code generation model at all? For 256 bytes or whatever what would be the point in writing some bits in C and some in inline Asm. You might as well just put the entire thing into a .S and forget the C bits (which are surely easy to replicate - probably more efficiently).

 

I write open-source code for altruistic reasons.  I want my code to be useful to others, and that often means modifying and incorporating my code into their projects.  Many more people can write good embedded systems code in C than can do it in AVR asm.  So when I can write something in C/C++ that compiles to near-optimal code, I generally do.  That way more people can benefit from it.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How does giving them inline asm() written in "gobble-di-gooK" make it "better" than clear Asm? In either case they need to understand AVR assembler to "maintain" it. But in the inline case they have to not only know the asm but the archaic and impenetrable runic scribblings that are avr-gcc's inline syntax too ! How does that actually help?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

clawson wrote:

How does giving them inline asm() written in "gobble-di-gooK" make it "better" than clear Asm? In either case they need to understand AVR assembler to "maintain" it. But in the inline case they have to not only know the asm but the archaic and impenetrable runic scribblings that are avr-gcc's inline syntax too ! How does that actually help?

 

I think you either didn't read the whole thread, or you are intentionally making a straw man argument.  Just a few comments up I said, "I didn't like how ugly the code turned out, so I'm re-writing it in C, with minimal use of inline asm."
 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
How does giving them inline asm() written in "gobble-di-gooK" make it "better" than clear Asm? In either case they need to understand AVR assembler to "maintain" it. But in the inline case they have to not only know the asm but the archaic and impenetrable runic scribblings that are avr-gcc's inline syntax too ! How does that actually help?
I've managed to use it, so it cannot be completely impenetrable.  I prefer it to IAR's.

In any case, the skill set needed to read well-commented inline assembly

is less than the skill set needed to write it.

Iluvatar is the better part of Valar.