Inline assembly to average two large uint16_t

Go To Last Post
19 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
uint16_t largeNum1;
uint16_t largeNum2;
uint16_t avgOfLargeNums;

// avgOfLargeNums = (largeNum1 + largeNum2) / 2; // this wont work if the sum of the two large numbers overflows uint16_t;  Fix with a little inline asm:

asm ("add	%A[p1_avg], %A[p2]\n /* add low bytes of p1 and p2 (result overwrites low byte of p1) */\
      adc	%B[p1_avg], %B[p2]\n /* add with carry high bytes of p1 and p2 (result overwrites high byte of p1) overflow stored in carry bit */\
      ror	%B[p1_avg]\n         /* rotate high byte of avg to the right one bit through carry (carry goes in to b7 and old b0 goes to carry)*/\
      ror	%A[p1_avg]\n"        /* rotate low byte of avg to the right one bit through carry.  This divides result by 2 (including the 17th bit)*/\
      :[p1_avg]"=r" (avgOfLargeNums) /* Output named p1_avg is write only any register "=r" and will be written to avgOfLargeNums */\
      :"0" (largeNum1), [p2]"r" (largeNum2):); // First input uses same name/register as output "0" and is read from largeNum1, Second input named p2 is any register "r" and is read from largeNum2

// Results in the following assembly (LDS and STS instructions were added by the compiler):
 100:   a0 91 30 3f 	lds	r26, 0x3F30	; 0x803f30 <largeNum1>
 104:	b0 91 31 3f 	lds	r27, 0x3F31	; 0x803f31 <largeNum+0x1>
 108:	80 91 2e 3f 	lds	r24, 0x3F2E	; 0x803f2e <largeNum2>
 10c:	90 91 2f 3f 	lds	r25, 0x3F2F	; 0x803f2f <largeNum2+0x1>
 110:	a8 0f       	add	r26, r24
 112:	b9 1f       	adc	r27, r25
 114:	b7 95       	ror	r27
 116:	a7 95       	ror	r26
 118:	a0 93 1e 3f 	sts	0x3F1E, r26	; 0x803f1e <avgOfLargeNums>
 11c:	b0 93 1f 3f 	sts	0x3F1F, r27	; 0x803f1f <avgOfLargeNums+0x1>
        

I have learned a lot browsing this forum and I wanted to contribute, so here is an inline assembly block for averaging two large uint16_t's.  I used the inline assembler cookbook and quite a bit of trial and error to get it working for my ATtiny416...

 

Enjoy -- Bob

Last Edited: Mon. Apr 20, 2020 - 08:18 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There are some alternatives (with tradeoffs) to using inline assembly.

  1. Cast the values to uint32_t (or __uint24 since this is AVR so long as you don't mind the lack of portability) to avoid the overflow so long as this doesn't impact any performance requirements.
  2. Divide before summing (avrOfLargeNums = largeNum1 / 2 + largeNum2 / 2) so long as your application can tolerate the introduced error.

github.com/apcountryman/build-avr-gcc: a script for building avr-gcc

github.com/apcountryman/toolchain-avr-gcc: a CMake toolchain for cross compiling for the Atmel AVR family of microcontrollers

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The inline assembly is 4 instructions (not counting lds/sts).

If you cast to uint32_t,  the code increases to 17 instructions.

If you cast to __uint24 the code is 8 instructions.

If you divide first, the code is 7 instructions (and you lose a bit of precision).

Obviously it would be ideal if C had a built-in AVG function/operator, and they could do this automatically for all types, but I doubt that will happen any time soon.

 

I wasn't aware of the __uint24 type on AVR.  Very interesting!  That may prove useful elsewhere.  --Thanks

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What I wonder is the average of 63100 and 63300 using this code? I rather suspect the answer will be 30,432

 

(this is why most folks would trust this kind of thing to a C compiler as it knows about things like overflow and how to avoid them - it's true this does "cost" in terms of size/speed but sometimes the size/speed is worth it).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AVG = A - (A-B)/2

 

This don't overflow but the calc needs to be signed. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

What I wonder is the average of 63100 and 63300 using this code? I rather suspect the answer will be 30,432

 

Doesn't the adc set the carry flag (if the add overlows), which would then be picked up by the first ror, so you should end up with 63200 ?

 

 

Last Edited: Mon. Apr 20, 2020 - 11:46 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

#4

That is the hole idea with ASM, the carry is the 17th bit, so the result is correct.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nice, but your inline assembler is an unreadable mess:

 

#include <stdint.h>

uint16_t avg(uint16_t a, uint16_t b)
{

  uint16_t avg;

  __asm__ __volatile__ (
  /* add LSB         */ "add     %A[a], %A[b]                       \t\n" // 1
  /* add MSB         */ "adc     %B[a], %B[b]                       \t\n" // 1
  /* divide by 2...  */ "ror     %B[avg]                            \t\n" // 1
  /* ... including C */ "ror     %A[avg]                            \t\n" // 1
                      :
                        [avg] "=r" (avg)
                      :
                        [a]    "0" (a),
                        [b]    "r" (b)
                      :
                       );

  return avg;

}
$ avr-gcc -g -c -Os foo.c -o foo.o
$ avr-objdump -S foo.o

foo.o:     file format elf32-avr


Disassembly of section .text:

00000000 <avg>:
uint16_t avg(uint16_t a, uint16_t b)
{

  uint16_t avg;

  __asm__ __volatile__ (
   0:   86 0f           add     r24, r22
   2:   97 1f           adc     r25, r23
   4:   97 95           ror     r25
   6:   87 95           ror     r24
                      :
                       );

  return avg;

}
   8:   08 95           ret

 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
AVG = A - (A-B)/2

 

This don't overflow but the calc needs to be signed.

The subtraction can overflow,

but if the overflow is ignored and everything is done in twos-complement,

I think it give the right answer for unsigned.

Edit: It doesn't.

Note that signed arithmetic overflow is undefined and conversion of unsigned to signed

is partially implementation-defined.

In the case of avr-gcc, it does the right thing for this code.

Portability is clearly not a concern,

but the formula deserves a comment on the

degree to which it relies on compiler details.

 

What do standards for safety-critical code say about this tradeoff?

I'm pretty sure that casting to a larger type is preferred if it is fast enough.

What if it isn't?

Inline assembly will compile or not,

but the given formula could be quietly broken by a compiler change.

 

BTW the formula might not be fast enough.

IIRC the current C standard precludes doing the division by two with just a signed shift.

Iluvatar is the better part of Valar.

Last Edited: Tue. Apr 21, 2020 - 09:12 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:

AVG = A - (A-B)/2

 

This don't overflow but the calc needs to be signed. 

This seems to work fine using unsigned!

Tested using window calc (programmer mode) Word size values, entering the hex values for 63300 and 63100 gives 63200 average, it did not seem to mater which value was A or B, both ways gave the same value out.

 

Jim

 

 

 

(Possum Lodge oath) Quando omni flunkus, moritati.

"I thought growing old would take longer"

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ki0bk wrote:

sparrow2 wrote:

 

AVG = A - (A-B)/2

 

This don't overflow but the calc needs to be signed. 

 

 

This seems to work fine using unsigned!

Tested using window calc (programmer mode) Word size values, entering the hex values for 63300 and 63100 gives 63200 average, it did not seem to mater which value was A or B, both ways gave the same value out.

I think not.

Say A and B are uint16t, with 16 bit int

Say A=100, B=200

A - B = 65436    (-100)

the divide is a problem, it will be unsigned, so 65436 / 2 = 32718

so you have

100 - 32718 =  32918    (-32618)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Don't fall into a trap....depending on what you need, sometimes you don't need to average, just use the sum.   People divide for no reason sometimes.

Someone asked me how to divide by 11, I said forget it, just make your thereshhold 11x bigger.

 

Maybe same with RMS...skip the square root.

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But the hole point here is that the sum will overflow!!!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avrcandies wrote:
sometimes you don't need to average, just use the sum

Agreed - Think you need to average a bunch of ADC readings, just bear in mind when overflow occurs and use the SUM instead.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

But the hole point here is that the sum will overflow!!!

My point was generic..people may work hard to divide some result, when all they had to do was add. 

 

I remember telling a student worker once to put the pcb in a plastic bin in case his software overflowed....I think he thought I was serious (or nuts).  

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MrKendo wrote:
the divide is a problem, it will be unsigned, so 65436 / 2 = 32718

Ah, windose calc was sign extending so doing a signed divide!!!  I missed that, so yes the divide needs to be signed for that to work correctly.

 

Jim

 

 

(Possum Lodge oath) Quando omni flunkus, moritati.

"I thought growing old would take longer"

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For the formula A-(A-B)/2,

I've been assuming that A and B were the result of casting unsigned

to signed and that the result was cast back to unsigned.

(unsigned)((signed)A - ((signed)A-(signed)B)/2) .

 

Given M=2**16, A=0 and B=M-2q> M/2, (signed)B= -2q.

The quotient is (0 - -2q)/2 = q

(unsigned)(0 - q) = M-q

The correct answer is clearly M/2-q.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Signed numbers are like floating point.  If you think you need them, you don't understand the problem.  cheeky  S.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A C formula that works: (A>>1) + (B>>1) + (A & 1U & B).

It rounds down.

Absent a rather clever compiler,

I'm pretty sure it's slower than the inline assembly.

Rounding up or rounding ties to even

are left as exercises for the reader.

Iluvatar is the better part of Valar.