megaAVR mixing assembly with [gcc] C language, confusion (saving & restoring registers).

Go To Last Post
55 posts / 0 new

Pages

Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello everyone,

 

I am having trouble understanding this concept. I want to sprinkle some assembly at places in C program where it is slow/can be improved and my preferred way is to write assembly routine in separate file and include it C program where it sees it as a function.

 

Basically I am doing exactly what they describe in the AT1886 app note.

 

To reiterate what is mentioned in app note.

 

  • Register r2-r17, r28, r29 -  An assembly language routine called from C that uses these registers will need to save and restore the contents of any of these registers it uses.

 

I get it. So I have to push these registers to stack before using and pop them back at the end of the routine.

 

  • Register r18-r27 ,r30 ,r31 - The registers are available for any code to use. Also described as "Can freely use" 

 

This is the confusion, how can these registers be free to use? Which as I understand, it means there is no need to push/pop them. But how can this be right? What if prior to calling this assembly routine in C program, these registers are in use by C for whatever reason, and I overwrite them in my assembly routine wouldn't that affect the execution of rest of the C program?

 

Can someone explain this in detail in noob friendly way? 

Thank you for your time.
 

This topic has a solution.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

Last Edited: Mon. Oct 4, 2021 - 12:00 PM
This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Heisen wrote:
What if prior to calling this assembly routine in C program, these registers are in use by C for whatever reason,

The C compiler knows that these registers are free for use by any called function - so it will save & restore them if it needs to.

 

ADDENDUM - References

 

https://www.nongnu.org/avr-libc/user-manual/group__asmdemo.html

 

https://gcc.gnu.org/wiki/avr-gcc#Register_Layout

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Fri. Oct 1, 2021 - 09:40 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:
so it will save & restore them if it needs to.

Ah ha, so only when it needs to. Is this why in my small test programs the complier never saves these call-used registers which I used in assembly routine, because rest of the C program had nothing to do? Compiler is way too smart.

 

Okay, so any registers used in ISR's written in assembly must be saved and restored manually, compiler will not do any saving and restoring for me. 

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Heisen wrote:
Is this why in my small test programs the complier never saves these call-used registers which I used in assembly routine, because rest of the C program had nothing to do?

(almost) certainly.

 

Okay, so any registers used in ISR's written in assembly must be saved and restored manually, compiler will not do any saving and restoring for me. 

Of course - the compiler has no idea when an interrupt will occur, so cannot save registers before the call.

 

A compiler-generated ISR will save & restore all it needs to.

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

About register R0. It is clearly mentioned that is must be saved and restored if it is used in ISR, but what if it's a simple routine?

 

In the app note they say

 

But on the other site

 

Which one is correct?

 

                                                                                                                                                                                                                           

 

And about register R1, I understand that it needs to stay zero, some instructions can make it and leave it in non-zero state. The only ones I could find are

 

MUL

MULS

MULSU

FMUL

FMULS

FMULSU

 

except these ones are their any other? If anyone can remember?

So if these instructions are not used, no need to set r1 to 0 while returning from assembly routine.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

Last Edited: Fri. Oct 1, 2021 - 11:51 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

This might help you AVR Assembler - 101

Happy Trails,

Mike

JaxCoder.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The words before the table are more clear: R0 is a call-used register,

i.e. a register that a called function may use without saving and restoring.

If the caller needs the value of a call-used register,

it must save said value before the function call.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mike32217 wrote:
This might help you AVR Assembler - 101

And perhaps a study of the AVR Instruction Set document.

 

I don't quite get it.  First, we need to confirm which toolchain is involved.  Each will have its own code generation model.  Are we to assume and extrapolate from the link given?

This application note describes how to mix both C and assembly code in an
AVRGCC project using Studio 6 IDE. ...

Is it that hard just to state the conditions in the original statement?

 

Heisen wrote:
I want to sprinkle some assembly at places in C program where it is slow/can be improved ...

Aaah, now we are getting somewhere.  Let's quantify at least a few of these situations.  Show examples of such bloated code, and how many cycles you might be saving.  After all, instead of inline assembler you are proposing ...

Heisen wrote:
my preferred way is to write assembly routine in separate file and include it C program where it sees it as a function

... with all the attendant save/restore as well as function invocation and cleanup.

 

Yeah, I'd like to see a few of the "bad" sections and the new and improved solutions.  For now, the last register saving can be ignored for proof-of-concept.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Inline assembly is a horse of a different number of legs.

As OP seemed to be asking about function calls,

OP got responses about function calls.

 

Absent evidence to the contrary,

I generally assume a GNU toolchain.

Doesn't bite very often.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Lee, given that AT1886 mentioned by OP in #1 is entitled "APPLICATION NOTE Atmel AT1886: Mixing Assembly and C with AVRGCC" is there any doubt he's using avr-gcc? 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

This application note describes how to mix both C and assembly code in an
AVRGCC project using Studio 6 IDE. ...

Is it that hard just to state the conditions in the original statement?

My apologies for not mentioning, I have made this mistake multiple times. It's AVR-GCC. Brain keeps forgetting.

 

                                                                                                                                                                                                          

 

theusch wrote:
Show examples of such bloated code, and how many cycles you might be saving.

It's not a lot, but for example if I save like say 5 clock cycles from the ISR and if my ISR is running at fixed interval, indefinitely. The time it takes to run 32 consecutive ISR is considered one step is passed in my app. So actual clocks saved are result of 5 times 32 which is 160 clks. Now the big question is what can be done from these precious 160 clock cycles that we have squeezed out? Can we use it to our advantage? Few things that can be done.

 

  • It allows us to increase the frequency of our ISR.
  • Run some extra code that needs to complete in between 32 ISR's.

 

                                                                                                                                                                                                          

 

mike32217 wrote:
This might help you AVR Assembler - 101

Thanks, nice article.

 

                                                                                                                                                                                                          

 

skeeve wrote:

If the caller needs the value of a call-used register,

it must save said value before the function call.

Got it. Thanks.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

Last Edited: Sat. Oct 2, 2021 - 03:17 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Heisen wrote:
if I save like say 5 clock cycles from the ISR

If it's an ISR, then you don't have to worry about it being called by C, and the whole call-used / call-saved thing becomes irrelevant ...

 

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


awneil wrote:
If it's an ISR, then you don't have to worry about it being called by C, and the whole call-used / call-saved thing becomes irrelevant ...

 

 

I think I have to. Or are we interpreting things differently?

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If it's an ISR you have to leave all the registers unchanged.

Unless you know that the C-compiler don't use them because you have told it not to!

 

If it's only only one interrupt you need to speed up, perhaps reserve some registers (r2-r6 or something like that) that only is used as register variables that hold the "main" interrupt variables. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:

If it's an ISR you have to leave all the registers unchanged.

Unless you know that the C-compiler don't use them because you have told it not to!

Yes, understood.

 

sparrow2 wrote:
If it's only only one interrupt you need to speed up, perhaps reserve some registers (r2-r6 or something like that) that only is used as register variables that hold the "main" interrupt variables. 

Oh, clever, but how do I tell the C compiler that let's say to not use register R2 forever for the rest of the C program. Can we even do that?

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is

 

Heisen wrote:
clever

 

to program it without high-level language and get rid of such artificial problems.

Last Edited: Sat. Oct 2, 2021 - 04:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

+1

 

#15

yes I know that you can but can newer remember how (it's somewhere in the compiler doc's)

Last Edited: Sat. Oct 2, 2021 - 05:01 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

to program it without high-level language and get rid of such artificial problems.

To your point, I sometimes wonder how to tell the compiler to keep a few important/most used variables in registers (fast access), versus putting them in ram (much slower access). 

For example, if you are making a motor controller, you may want to keep RPM in a register, while hardly used ambient temperature is fine in ram. I keep meaning to look into this.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avrcandies wrote:
how to tell the compiler

 

I can not tell you that.

But I can tell you there are no such problems with assembler.

Its everything in own hands.

If you want to benefit from compilers, you have to accept their disadvantages.

No profit without costs.

Last Edited: Sat. Oct 2, 2021 - 06:08 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think the compiler switch is -fixed-N such as -fixed-16 to instruct it to not use r16 in code generation but beware of lib code! 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And an other benefit with ASM is that if you have a couple of 16 bit variables you share between "main" and an interrupt is that it can be atomic without problems. (use MOVW in main).   

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ASM is more fancy.

I always have rSREG, rz, ro (zero, one) rGF (or GPIOR) for flags, r0 and r1 for MUL... XL and XH sometimes for fast buffering when happen on Interrupt...
Each reg in ASM is precious and valuable. They are 32 plus GPIOR, plus /parts of/ unused IO registers.

 

ASM code reducing in 'Tiny13 (other devices are not worthy) is always an adventure. Try to reduce from 98 to 88%, with a chance to end at 82%, yes, Tiny does deserve such efforts, and btw. it is the best brain-teaser of them all.

Last Edited: Sun. Oct 3, 2021 - 06:42 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Heisen wrote:
I think I have to.

Yes, of course an ISR always has to save everything it uses - unless you can be sure that none of the rest of the system ever uses particular registers.

 

My point is that you cannot rely on the compiler doing that - because the compiler never calls an ISR.

 

Which also means that the compiler's parameter-passing and value return stuff is irrelevant.

 

So this does make ISRs an ideal candidate for ASM in an otherwise C system.

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

awneil wrote:

Yes, of course an ISR always has to save everything it uses - unless you can be sure that none of the rest of the system ever uses particular registers.

 

My point is that you cannot rely on the compiler doing that - because the compiler never calls an ISR.

 

...

 

 

So this does make ISRs an ideal candidate for ASM in an otherwise C system.

I tried to do start a skirmish in the C vs ASM War earlier in the thread, but the enemy turned tail and did not provide examples of this "slow/can be improved" code, along with the "better" ASM solution(s).

 

But I must take exception to awneil rolling out the old claims, as I have in the past every time I've noticed them made here.

 

[side note/full disclosure:  I've only done this type of detailed work "traditional" AVR8 models.  There may well be Cortex or other architectures that are different, but I've seen threads about how it may well be impractical to have normal people program them in assembler?]

 

I'll try to dig up the thread I'm remembering about an ISR "challenge" that 'you can't do this in C'.  Now, during my productive period my toolchain of choice was CodeVisionAVR.  If you let the toolchain do its work:

 

-- "Smart" ISR will only save and restore what is needed.

-- "Smart" ISR will only save and restore SREG if affected.

-- Most low GP registers are indeed available for "stay in register" duty.

 

And a quick re-read of the posts above poking at "you can't do that in C" touch upon these points.

 

Sure there is a place for crafting stuff 'cause you are smarter than the compiler.  But IME it doesn't come up every day, and it isn't for normal punters.  While you try to come up with the counter examples, I'll try to hunt up the remembered thread...no sense re-inventing.

 

From 2005, one example from me, but if you look at my posts in the thread you will see that I've been objecting to bloat characterizations for over 15 years, with the same reasoning:

https://www.avrfreaks.net/forum/...

 

Also in 2005, before GCC thought that ISRs were important, there was a to-do about null ISR and how GCC made a bunch of instructions.  Is this the basis for y'all's claims?  Maybe you used the wrong toolchain, as show by the relevant C rebuttal from me:

https://www.avrfreaks.net/forum/...

https://www.avrfreaks.net/commen...

The follow-on example, non-null, was shown to have serious flaws in the "better" ASM implementation.  It does give a link to the thread I was thinking of...

 

https://www.avrfreaks.net/forum/...

The first "challenge" that you-can't-do-this-as-well-in-C is at #20 and #21.

 

Then, I outlined cases where IMO/IME you CAN'T do it as well in C:

theusch wrote:

Quote:

But wait, there must be a better code example. Would any of the ASM gurus help me, please? There MUST be a really good reason to use ASM!

 

If you would have used the rotate method for your bit reversal (or other algorithms) it would be nearly impossible to duplicate in C, as C has no concept of rotating registers or the CARRY bit. In practice, I've only run into it in the case of bit-reversal (as is your example) and one case of CRC generation.

I >>will<< drop into #asm from time to time in CV programs, not necessarily to save the last cycle or word, but rather to avoid building a very complex C expression. An example of mine would also involve those darned hw layout people: a 2x6 multiplexed keypad, with the 6 row bits scattered all around the AVR. I found that the resulting #asm code was more straightforward to read & understand than the complex C expression with masking & shifting of each bit capture. I believe that most of the C compilers have facilities for inline ASM, but I'm only sure of CV.

Another example might be (say) 24-bit arithmetic, where 16 bits just ain't wide enough and 32 bits just ain't fast enough (or takes too many registers, and results in too many register "spills" to SRAM). Or arithmetic wider than the 32-bits or whatever your compiler supports.

 

...and then went on with my quadrature example; no bloat in sight.

 

There is another "challenge" thread somewhere, but I want to leave you with another piece of the thread above, about thinking in machine language and programming in C:

theusch wrote:

But, let's put this in context: >>You<<, as the programmer, looked at the expression and thought: "I only need the top 2 bits, and that is stored at address+3. I'll grab the value at that address, and extract the top 2 bits & put them into the low 2 bits of a register, then store the result." >>I<<, as the programmer, can have EXACTLY THE SAME THOUGHT PROCESSES. We can be writing in two different languages, but C does not prevent the programmer from doing what needs to be done. >>You<<, as the programmer often "know" what is in a register pair or 16-bit storage location. >>I<< can also know that. If my variable is a signed 16-bit value, an int, I can know, just as you do, that at this particular stage of computation there is a small positive integer stored in that location. The ASM programmer would just grab the low byte & use it. The C programmer can give the compiler writer a chance with a well-placed cast:

uchar2 = (unsigned char)sint * uchar1;

will generate smaller and faster code than the vanilla

uchar2 = sint * uchar1;

where the compiler would not only be forced to do the 16-bit multiply, but handle mixed signd vs. unsigned as well. You as the ASM programmer make those decisions in every code sequence; the C programmer can have exactly the same thought processes.

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Sun. Oct 3, 2021 - 12:56 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For people that don't know the CV compiler:

It's a compiler made for the AVR, so it's very good at things that is special for the AVR, you can place variables (in the low) registers.

It use Z (r31:r30) as "main" 16 bit register. (often good but can crowded when pointers are used)

When things get's complex the compiler is far from the best.

 

compared to GCC

Not easy to use registers direct from C

use R25:R24 as "main" 16 bit register

A better compiler to complex things

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Got it, crystal clear now.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

Last Edited: Sun. Oct 3, 2021 - 10:01 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
I tried to do start a skirmish in the C vs ASM War earlier in the thread, but the enemy turned tail and did not provide examples of this "slow/can be improved" code, along with the "better" ASM solution(s).

I am just scratching the surface of ASM. What I say might not translate into real world exactly. :P But working on it.

 

theusch wrote:

-- "Smart" ISR will only save and restore what is needed.

-- "Smart" ISR will only save and restore SREG if affected.

Okay, this is interesting.

 

In AVR-GCC, if we have an interrupt. It could be any interrupt like.

 

ISR(TIMER1_COMPA_vect)
{

}

The complier generates this, even though the ISR is empty

00000040  PUSH R1		Push register on stack
00000041  PUSH R0		Push register on stack
00000042  IN R0,0x3F		In from I/O location
00000043  PUSH R0		Push register on stack
00000044  CLR R1		Clear Register 

00000045  POP R0		Pop register from stack
00000046  OUT 0x3F,R0		Out to I/O location
00000047  POP R0		Pop register from stack
00000048  POP R1		Pop register from stack
00000049  RETI 		Interrupt return 

0x3F is SREG.

 

Even if we assume that SREG register will change in ISR, in ASM we can simply do.

TIMER1_COMPA_vect:

        push r16
        in r16, 0x3F

        out 0x3F, r16
        pop r16
        reti

This is much better. We saved about 5 clks right there.

Now I see the level of control in ASM is astonishing.

 

If we don't change R0,R1 in ISR not need to back them up. But compiler assumes that in ISR register R1, R0, SREG register >>will<< change, so he must save/restore them every single time.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

grohote wrote:
ASM is more fancy.

100% agree.

 

grohote wrote:
it is the best brain-teaser of them all.

I just see it as a game.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If we don't change R0,R1 in ISR not need to back them up.

The HARDWARE may change them in some cases as the result of MUL is stored there.

 

I would not use them at all, I think R1 is expected to be zero pretty much at all times with the compiler to save time in some operation.

John Samperi

Ampertronics Pty. Ltd.

https://www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The complier generates this, even though the ISR is empty

Giving more smarts to the compiler is never-ending evolution.  It should just generate an reti  in this case.  Perhaps some optimization level does.

Talking to a compiler can be like talking to a little kid-- it may turn out great or very surprising!

 

If you program B = 2*A*A*A  - A*(A*A + A*A)

Does it set up a lengthy calculation, or does it just set  B = 0 ??  Do you want to be able tell it which one you prefer?

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Mon. Oct 4, 2021 - 12:24 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The complier generates this, even though the ISR is empty

[why couldn't I hit Quote under that post?]

 

Old news.  I gave links with the discussion of the null ISR.  GCC has been at least semi-smart for some time.

 

I want to sprinkle some assembly at places in C program where it is slow/can be improved ...

I still want some examples of that.  While examining and critiquing generated code at the low level is indeed vital, you cannot just do a peephole without full context.

 

Tell version and build options and target model.

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi all. Signed up to share some fun GCC outputs that are fresh in my mind from the past few days. If there are clean solutions to any of these I'm all ears. 5.4.0 with -03. I've generated these with godbolt.org but get the same results locally.

ISR(TIMER0_COMPA_vect)
{
    GPIOR0 |= 1;
}

ISR(TIMER0_COMPB_vect)
{
    GPIOR0 |= 1;
}
__vector_15:
        push r1
        push r0
        in r0,__SREG__
        push r0
        clr __zero_reg__
        sbi 0x1e,0
        pop r0
        out __SREG__,r0
        pop r0
        pop r1
        reti
__vector_14:
        push r1
        push r0
        in r0,__SREG__
        push r0
        clr __zero_reg__
        push r18
        push r19
        push r20
        push r21
        push r22
        push r23
        push r24
        push r25
        push r26
        push r27
        push r30
        push r31
        call __vector_15
pop r31
        pop r30
        pop r27
        pop r26
        pop r25
        pop r24
        pop r23
        pop r22
        pop r21
        pop r20
        pop r19
        pop r18
        pop r0
        out __SREG__,r0
        pop r0
        pop r1
        reti
    

The content of the second interrupt, generated presumably in the name of code reuse, is quite hilarious. ISR_NAKED lets me drop the fat but I'd like something better.

 

Generally I find the compiler struggles when mixing different sized operands together.

 

extern uint32_t A;
uint32_t T(uint16_t X, uint16_t Y)
{
    A = ((uint32_t)X << 16) + Y;
}
T(unsigned int, unsigned int):
        ldi r26,0
        ldi r27,0
        movw r26,r24
        clr r25
        clr r24
        add r24,r22
        adc r25,r23
        adc r26,__zero_reg__
        adc r27,__zero_reg__
        sts A,r24
        sts A+1,r25
        sts A+2,r26
        sts A+3,r27
        ret

Bitwise or is worse but the corect code is produced with a union structure. As a bonus an inline Make64/32/16 function makes this operation explicit.

 

uint8_t T(uint16_t X)
{
    return X >> 6;
}
T(unsigned int):
        clr __tmp_reg__
        lsl r24
        rol r25
        rol __tmp_reg__
        lsl r24
        rol r25
        rol __tmp_reg__
        mov r24,r25
        mov r25,__tmp_reg__
        ret

Change the return type to 16 bits and the code generated is identical. Multiply seems to have more specialisation over the set of operand sizes but there's room for improvement. It generates optimal code for an 8x16=16 but the instant you want an expanding 8x16=32 it will outsource to __umulhisi3 even though it's the exact same work just don't throw away that top byte.

 

In my current project one of my variables needs 20 bits of range which is a nightmare for C. How do I store it efficiently? Then, how do I 8x24 and 16x24 multiply without driving myself mad with inline assembly? Zero issues in asm.

 

I find it generally difficult to express in C specialty numerical operations. It's easy in asm for example to do an 8x16 multiply where I want the upper 16 bits as the output.

 

My final example is simply copying a 16 bit IO register into a 16 bit memory location.

uint16_t T;
ISR (TIMER1_CAPT_vect)
{
    T = ICR1;
}
__vector_10:
        push r1
        push r0
        in r0,__SREG__
        push r0
        clr __zero_reg__
        push r24
        push r25
        lds r24,134
        lds r25,134+1
        sts T+1,r25
        sts T,r24
        pop r25
        pop r24
        pop r0
        out __SREG__,r0
        pop r0
        pop r1
        reti

This does not require two temporary registers and yet there they are.

 

That's all for now!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not sure that the AVR GCC ever have made good -O3 code, (perhaps it has changed but it don't look like it )

 

In general use -O2 or -Os  (and g for debugging)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:

So this does make ISRs an ideal candidate for ASM in an otherwise C system.

theusch wrote:
But I must take exception to awneil rolling out the old claims

 

What "old claims" ?

 

I'm simply saying that  if  you're going to mix some assembler in your mostly-C system, then ISRs are an ideal place to do that - because you then don't have the issues of the call interface with C.

 

Nothing more.

 

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Mon. Oct 4, 2021 - 09:05 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

 

Heisen wrote:
The complier generates this, even though the ISR is empty
if you are using avr-gcc 5.4.0 (the one in Studio 7) that is true. But try a later avr-gcc and see what happens. For example:

 

5.4.0:

 

11.1.0:

(in fact if you go back through Godbolt versions you'll see this in their 9.x too - I think it was actually introduced at gcc 8 in fact).

 

What is now generated are special Asm macros. These do need you to be using a later Binutils so the avr-as assembler understands it but basically you will end up with a virtually empty ISR code as this great fix by Georg-Johann Lay ("SprinterSB") does not simply generate that "preserve SREG, CLR R1" stuff every time unless it is actually needed.

 

EDIT: confirm fixed in V8 : https://gcc.gnu.org/bugzilla/sho... if you follow the links you arrive at: https://sourceware.org/bugzilla/... which explains the __gcc_ISR N macros.

Last Edited: Mon. Oct 4, 2021 - 09:16 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's easy in asm for example to do an 8x16 multiply where I want the upper 16 bits as the output

I'd hope you can add some shifting & it would be smart compiled to grab the bits (without actually shifting!)

It should also be smart enough not to waste any steps saving the low 8 bits in the first place. 

 

More specific, some times you will have limited ranges, such as a 10bit ADC values (using16 bits), and you may multiply by a value known to be limited to 12 bits (0-4093).

In asm, knowing this, you can do a truncated16 bitx 16bit mult (10bx12b -->22b ) 3 byte result, whereas generically (16bx16b) there are extra steps to generate a 4 byte result. 

This is not to blame the compiler, it has no way of knowing these values are of a limited range in the variable's width.

On the other hand, this shortcutting can be dangerous when someone comes along later with a new 14 bit ADC & complains the ADC is "working strange".

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Mon. Oct 4, 2021 - 09:55 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:
What "old claims" ?

I outlined them, and pointed with links to the old discussions.  Yes, there are places for ASM in an AVR8 app.  Yes, they can be important to the app.  Yes, GCC does a marvelous optimizing job with complex stuff.  Yes, GCC has lost a lot of weight and the ISRs are much skinnier.

 

So far, the Q.E.D. examples given in the thread are all of the obese ISRs in GCC.  Many of the others have to do with reserving registers.  So if you change the title to reflect not "C language" but rather "with GCC's version of the C language for AVR targets", then I can sit quietly in the corner.

 

Now, regarding GCC's optimizer and handling of complex code:  I've thought about this some years back, after having written reams of production AVR8 code.   I'd say most are complex and detailed apps, with lots of stuff (ADC, comms, sensors, output devices, displays, ...) crammed into the chip.

 

But looking at my code itself, it isn't complex.  Nearly all is pretty straightforward.  Most ISRs will fit on a screen.  Data structures are quite simple (after all, how complex can you jam into an AVR8-class chip even the bigger ones?), so no real problems requiring the optimizing compiler to simplify a page-long expression.

 

Yes, if I would have had the ISR bloat outlined by some of the complaintants above I would have been a campaigner for machine language ISRs as well.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
if you change the title to reflect not "C language" but rather "with GCC's version of the C language for AVR targets", then I can sit quietly in the corner.
I put a "[gcc]" in the thread title - hope that works for ya ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Complex is many things, but one of the good things about C is how flexible pointeors are, therefor a good C program is both small and (relative) efficient.

I remember once (about 15 years back) I downloaded CV to do some tests, and even a function for swap two numbers made some ugly bloated code (perhaps it's better now I don't know). 

 

One of the problems with C (any C) over ASM is that you don't know the generated code, and if you add or even just declare same variables in a new order, the speed of the program might change.

and if there come a new version of the compiler perhaps the AVG speed increase but perhaps where you need speed it got slower. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:

What "old claims" ?

theusch wrote:
I outlined them,

But I made no claims

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
One of the problems with C (any C) over ASM is that you don't know the generated code

What C compiler are you using that does not show the generated code?  Every one I have used does just that!

 

Keys to wealth:

Invest for cash flow, not capital gains!

Wealth is attracted, not chased! 

Income is proportional to how many you serve!

Lets go Brandon!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ki0bk wrote:

sparrow2 wrote:

One of the problems with C (any C) over ASM is that you don't know the generated code

 

What C compiler are you using that does not show the generated code?  

I think the point is that you don't know it in advance - you have to go looking for it after the fact.

 

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think the point is that you don't know it in advance - you have to go looking for it after the fact.

MOST of the time what you don't know doesn't hurt.  Oh it is swapping 2 bytes that it doesn't need to & my hot water calculation is then taking an extra 200ns.

So you live a bit dumb, happy, and generally satisfied...works out most of the time. 

On the other hand, lose a bit here, then lose some more over there, and some over here & you wonder why your graphics 3d game update seems "sluggish".     

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For #41 it's explained in #42, yes you and see the generated code but what does it help you it worked yesterday, and the code today have nothing with the failing code to do (not btecause the code is wrong but timing changed.)

 

Some are easier to understand than others, like a array of int change size from 30 to 35. (getting bigger than 64 byte)

 

And about OP's problem where a problem like this can be avoided:

 

A volatile uint variable have the code count++; in a timer ISR

And the code in main:

 

  if (counter>5000){

     counter=0;

     do time gone code

 

this is of course a no go , and making it atomic cost time, by putting counter in registers you can avoid problem (at least in ASM, and perhaps in C but it depends how the code is generated. None of the compilers grantee that it will work) 

 

 

 

  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
A volatile uint variable have the code count++; in a timer ISR

And the code in main:

 

  if (counter>5000){

     counter=0;

     do time gone code

 

this is of course a no go , and making it atomic cost time, by putting counter in registers you can avoid problem (at least in ASM, and perhaps in C but it depends how the code is generated. None of the compilers grantee that it will work)

A better choice is for main to leave counter alone and have the ISR set a flag.

Keeping counter in registers would make using a library difficult.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

this was just for the example, lets say main read a counter that count in an interrupt.

 

add:

And I wrote it that way because there actually are people that don't see that the code I wrote is wrong.

Last Edited: Tue. Oct 5, 2021 - 04:00 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Certain things seems so easy to understand, after you get the idea.

One of the most complicated things that took some my time explaining to a girlfriend that became system analyst, was convince her that "the data containing in a certain memory address, could also be an address for another data"... 

 

If you think in assembly it becomes more simpler.

 

Each register is a small cup in the kitchen counter, it can hold any cooking ingredient, spices, liquid, grains, etc.

For the ingredientes that you use more frequently, like flour, salt, sugar, oil, you could reserve some cups, even write those names in the cup.

The advantage to do that (reserve cups) is that you don't need to wash and clean them after every use.

The flour cup can still have flour at its bottom, you can pour more flower without risking anything.

For other ingredients less used, you could in fact wash and dry the cup after every use, you don't know what you can use the cup next.

 

Reserving cups for often used ingredients means exactly that, you can not use them for any other ingredients, and, if necessary to use them (by lack of free cups) you will need to wash clean before the new use and wash clean after the use, even restoring the name "flour" on its tag.

 

C will reserve certain registers for more frequent use, in fact, to speed up the process, but it will NOT reserve all the registers, since other processes and routines can and will want to have free registers for their own use.

 

What means "free registers"?   A "free cup" means that the cook can get a clean cup, without any tag name, and start to use it during the omelet preparation, mixing and stirring eggs, he can do that for half an hour, knowing that particular cup is messed up with eggs, but will only use for eggs anyway.  At the end of the omelet, he will wash and clean such cup and return along to other "free cups".   Also, a "free cup" is known that anytime during the lunch preparation, that cup will never be necessary by the reserved cups, like flour, salt, etc, those were already tagged as exclusive.  A free is really that, free, you can use without messing up with the reserved ones, but it is your responsibility to keep account of that, a free register is free until you take over and use it for something else, you can not use the same free register for two independent routines without paying attention.   Being free doesn't mean the cook can mix eggs and the next minute use the same eggs dirty cup to prepare spices without cleaning it first.

 

In assembly language, before you start to write the code, you think about what will be the most used variables or pointers in your code, and "reserve" such registers for only that use, saving push and pull from stack, saving clock cycles.

 

It is very common in assembly to name certain registers as "TEMP", "TEMP1", etc... they have no specific function and can be used at anytime, anywhere, with exceptions.

They have very short life, for example in a counter that use few code lines, you will be quite sure that you WILL NOT use TEMP register in a long routine that can hold the value into the register for many instructions.  But you know that such TEMP registers should be free when entering a routine, and should carry nothing on exit, just "temporary" use.   The exception is when an interruption happens, if such TEMP register will be used in the interruption routine, it must be pushed on entry and pulled on exit.

 

Also, upon running the assembler, it can create a list of registers used in the code, even saying where they were used.  You can use that to improve your code.  A register that has massive use MUST be reserved, and maybe analyzed to see if the code is not wasting cycles saving such register among routines, sometimes is smart to spread its use to other registers with less use.

 

In C the compiler TENDS to be smart and optimize those things, but it is not perfect.

 

Because how C create variables, it tends to create them in RAM and use registers just during the processing time of such variables.

I use to see C pushing a bunch of registers, loading variables from RAM into such registers, run the routine, then store the registers back to RAM, pulling the original register values from stack.

 

In several of my assembly programs, I don't use RAM to store variables, they are all in registers.

Ask me why my codes are fast and small.  Of course, not large codes with hundreds of variables. 

 

Every time a register is saved on stack, there goes 4 clock cycles to trash.  Not counting the new data being loaded to such register and at the end such value send somewhere else.

If counting all of this, a minimum of 8 clock cycles per register;

 

PUSH  R18

LDS  R18, FOOVAR

...

STS FOOVAR R18

POP R18

 

If FOOVAR could be fixed on a reserved R18, it is there all the time, just use, will not need to waste 8 cycles, but nothing else can use such cup dirty with flour.

This is why AVR has 32 registers, a bunch of code can be written without using RAM.

 

Wagner Lipnharski
Orlando Florida USA

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Registers? Free registers? We never had that luxury when I were a lad...

 

6502: A, X, Y

INS8060: A

6800: A, B, X, Y

8080: A, B, C, D, E, H, L - luxury

Z80: all those of the 8080, twice, and more! Pure decadence...

 

All these new-fangled chips with dozens of registers and hidden register renaming and other exciting things... it'll all end in tears, I tell you!

 

Neil

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

barnacle wrote:
Z80: all those of the 8080, twice, and more! Pure decadence...

 

When I realized that AVR does have 32 registers, I said AHA! and switched from Z80.

(The reason was also ADC- and many other, of course).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wagnerlip wrote:

In assembly language, before you start to write the code, you think about what will be the most used variables or pointers in your code

 

Definitely not before.

Many of this only arises when you write the code, when you understand better and better your application problem.

Nevertheless, it makes sense to carefully pre-define certain registers. A fixed system of register usage can make code easier to write and existing code to understand.The selection of these registers for specific purposes should be based on their different properties and possible uses.

One system I prefer would be to use X, Y, Z as the most versatile standard registers for main programs and interrupts.

Their backup before interrupts takes place as quickly as possible in the restricted usable R10-R15 with movw.

Usually these registers are sufficient in interrupts, if necessary a few more registers (from R16-R25) have to be pushed.

R9 saves the flag status in interrupts. With newer AVRs there are two interrupt priorities.  The highest priority interrupt program (can interrupt lower interrupt program) should then really only do very important things with as few registers as possible.  A register for the flag status (R8) can be reserved for this and a pointer register (movw) can be branched off as required, or R5-R7 can be provided separately for backup (movw) or a few registers can be pushed. My programs always contain a cyclical system interrupt to coordinate tasks and time delays.  This then mostly uses a running 8 or 16 bit counter which is also located in fixed registers for fast access from anywhere. R0-R4 (R0/R1 for multiplication) or the universally usable R16-R25 can be used freely as temporary variables or for fixed purposes.

Last Edited: Sat. Oct 16, 2021 - 03:26 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Math is demanding and register-hungry.

Float or 32b math does require a lot of registers, only 12 will survive.

 

Pages