ADD with Saturation [need inline Asm help]

Go To Last Post
36 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm looking to optimize this function:

// Add signed to unsigned using saturation, return unsigned
uint8_t addwsat(uint8_t a, int8_t b) {
    if(b>=0) {
        if(a>255-b) return 255;
    }
    else {
        if(a<(-b)) return 0;
    }
    return a+b;
}

which the compilers converts to:

uint8_t addwsat(uint8_t a, int8_t b) {
        MOV       R23,R24        Copy register
        MOV       R20,R24        Copy register
        LDI       R21,0x00       Load immediate
        MOV       R18,R22        Copy register
        CLR       R19            Clear Register
        SBRC      R18,7          Skip if bit in register cleared
        LAT       R19            Load and Toggle
if(b>=0) {
        SBRC      R22,7          Skip if bit in register cleared
        RJMP      PC+0x000A      Relative jump
if(a>255-b) return 255;
        SER       R24            Set Register
        LDI       R25,0x00       Load immediate
        SUB       R24,R18        Subtract without carry
        SBC       R25,R19        Subtract with carry
        CP        R24,R20        Compare
        CPC       R25,R21        Compare with carry
        BRGE      PC+0x0C        Branch if greater or equal, signed
        SER       R24            Set Register
        RET                      Subroutine return
if(a<(-b)) return 0;
        CLR       R24            Clear Register
        CLR       R25            Clear Register
        SUB       R24,R18        Subtract without carry
        SBC       R25,R19        Subtract with carry
        CP        R20,R24        Compare
        CPC       R21,R25        Compare with carry
        BRGE      PC+0x03        Branch if greater or equal, signed
        LDI       R24,0x00       Load immediate
        RET                      Subroutine return
return a+b;
        MOV       R24,R22        Copy register
        ADD       R24,R23        Add without carry
}
        RET                      Subroutine return

I'm convinced that there is a more optimum way to achieve the same result. Any suggestions?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why not use an int16_t intermediate register - just do the add.

If high-byte == 0x01 return 0xFF
If high-byte == 0xFF return 0x00
if high-byte == 0x00 return low-byte

Not sure if it's going to be "tighter" but maybe worth a look.

EDIT: haven't counted but it "looks" tighter:

00000080 :
  union {
	int16_t i;
	uint8_t b[2];
  } join;
  
  join.i = a + b;
  80:	77 27       	eor	r23, r23
  82:	67 fd       	sbrc	r22, 7
  84:	70 95       	com	r23
  86:	68 0f       	add	r22, r24
  88:	71 1d       	adc	r23, r1
  if (join.b[1] == 1) {
  8a:	71 30       	cpi	r23, 0x01	; 1
  8c:	11 f4       	brne	.+4      	; 0x92 
  8e:	8f ef       	ldi	r24, 0xFF	; 255
  90:	08 95       	ret
	return 0xFF;
  }
  else if (join.b[1] == 0xFF) {
  92:	7f 3f       	cpi	r23, 0xFF	; 255
  94:	11 f4       	brne	.+4      	; 0x9a 
  96:	80 e0       	ldi	r24, 0x00	; 0
  98:	08 95       	ret
    return 0x00;
  }
  else {
     return join.b[0];
  9a:	86 2f       	mov	r24, r22
  }
} 
  9c:	08 95       	ret

Clean copy of source:

uint8_t addwsat(uint8_t a, int8_t b) __attribute__((noinline));
uint8_t addwsat(uint8_t a, int8_t b) {
  union {
	int16_t i;
	uint8_t b[2];
  } join;
  
  join.i = a + b;
  if (join.b[1] == 1) {
	return 0xFF;
  }
  else if (join.b[1] == 0xFF) {
    return 0x00;
  }
  else {
     return join.b[0];
  }
}

(EDIT2: except it doesn't work - I think I have a sign promotion issue somewhere)

EDIT3: no, it does work (as far as I can tell). My initial test code was:

	DDRB = addwsat((uint8_t)240, (int8_t)237)

Only thing is that 237 is not a valid int8_t value ;-)

I've now bounds checked it at both ends and it seems to work.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, thanks! It is working.
Although my assembly output is different:

@00000A0D: addwsat
286:        join.i = a + b;
        CLR       R23            Clear Register
        SBRC      R22,7          Skip if bit in register cleared
        LAT       R23            Load and Toggle
        ADD       R22,R24        Add without carry
        ADC       R23,R1         Add with carry
287:        if (join.b[1] == 1) {
        CPI       R23,0x01       Compare with immediate
        BRNE      PC+0x03        Branch if not equal
        SER       R24            Set Register
        RET                      Subroutine return
290:        else if (join.b[1] == 0xFF) {
        CPI       R23,0xFF       Compare with immediate
        BRNE      PC+0x03        Branch if not equal
        LDI       R24,0x00       Load immediate
        RET                      Subroutine return
294:           return join.b[0];
        MOV       R24,R22        Copy register
296:      } 
        RET                      Subroutine return

I am using AVR-GCC 4.3.3 with -Os optimization. Is this the latest version?

I bet it can be optimized further by using inline assembly, by checking directly the carry bits...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What is lat? XMEGA?

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, the target is an xmega, although the instruction set is mostly the same.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I got LAT when I built for mega168. It's the Atmel's debugger name for some other more normal opcode. (in fact it is COM)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I bet it can be optimized further by using inline assembly, by checking directly the carry bits...

Sure it can be. AVRs have hardware unsigned overunderflow(both SREG_C and SREG_H), signed overunderflow(which is called SREG_V) and if signed, then it can distinguish overflow from underflow (which is called SREG_S). These are calculated automatically. I suppose you could make this code about 6-8 opcodes +ret long.
But since that is -Os, I do not see the point.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
I got LAT when I built for mega168. It's the Atmel's debugger name for some other more normal opcode. (in fact it is COM)
The latest instruction manual appears to have some sort of a LAT instruction. It is probably used in one of those tiny tiny MCUs? If that is the case, then this must be a bug in disassembler?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you seen that?

Warning: Grumpy Old Chuff. Reading this post may severely damage your mental health.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wanted to optimize the assembly output, but the compiler gives me an error at the LAT instruction:

../asmutil.S:23: Error: unknown opcode `lat'

Any ideas on how to fix it?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

LAT=COM

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Shirley, LAT is a pseudonym for COM. i.e. flip every bit.

You simply use your pre-processor to change the mnemonic. Or personally, I would create a conditional macro.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No.

LAT – Load And Toggle
Description:
Operation:
(Z) ← Rd ⊕ (Z), Rd ← (Z)

Syntax: Operands: Program Counter:
LAT Z,Rd 0 ≤ d ≤ 31 PC ← PC + 1

16-bit Opcode: 1001 001r rrrr 0111
Words: 1 (2 bytes)
Cycles: 1

Warning: Grumpy Old Chuff. Reading this post may severely damage your mental health.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Observe:

uint8_t addwsat(uint8_t a, int8_t b) __attribute__((noinline));
uint8_t addwsat(uint8_t a, int8_t b) {
  union {
   int16_t i;
   uint8_t b[2];
  } join;
 
  join.i = a + b;
  if (join.b[1] == 1) {
   return 0xFF;
  }
  else if (join.b[1] == 0xFF) {
    return 0x00;
  }
  else {
     return join.b[0];
  }
} 
// test.s
addwsat:
.LFB2:
.LM1:
.LVL0:
/* prologue: function */
/* frame size = 0 */
.LM2:
	clr r23
	sbrc r22,7
	com r23
// test.lss
00000080 :
  union {
   int16_t i;
   uint8_t b[2];
  } join;
 
  join.i = a + b;
  80:	77 27       	eor	r23, r23
  82:	67 fd       	sbrc	r22, 7
  84:	70 95       	com	r23
// Atmel's weird and whacky disassembler in Studio
---- test.c ---------------------------------------------------------------------------------------
6:        uint8_t addwsat(uint8_t a, int8_t b) {
+00000040:   2777        CLR       R23            Clear Register
+00000041:   FD67        SBRC      R22,7          Skip if bit in register cleared
+00000042:   9570        LAT       R23            Load and Toggle

You pays yer money and you takes yer choice as to how it's annotated.

Clearly this IS a fault in 4.18SP3's disassembler. Why you'd use this when you have access to avr-as source of avr-obdump's own disassembly I don't know?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why is the assembly in Studio weird and wacky if it generates code that works?
And like MBedder says, the LAT and COM are not the same instruction...
Well, while we figure that out, I optimized the assembly a little bit, 2 instructions less:

; join.i = a + b; 
    eor   r23, r23 
    sbrc  r22, 7 
    com   r23 
    add   r24, r22 
    adc   r23, r1 
; if (join.b[1] == 1) { 
    cpi   r23, 0x01   ; == 1? 
    brne   .+4        ; no=>jump
    ldi   r24, 0xFF   ; return 255 
    ret 
; else if (join.b[1] == 0xFF) { 
    cpi   r23, 0xFF   ; == 255 ? 
    brne  .+2         ; no=>return join.b[0]
    ldi   r24, 0x00   ; return 0 
    ret 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Why is the assembly in Studio weird and wacky if it generates code that works?

It doesn't generate the code?!? It takes code that has been generated by a C compiler and tries to disassemble it and on this occasion gets it wrong.

BTW do not use PC relative addressing in branches - labels cost nothing!

In fact if you are going to use GCC generated Asm why not use the .s files?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, I am placing the assembly function in a .s file.
But I'm still wondering why I can't use the LAT instruction if I wanted to...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

But I still wondering why I can't use the LAT instruction if I wanted to..

Because it's the WRONG instruction. The instruction that inverts the contents of a register to make 0x00 into 0xFF is COM. Atmel's stupid disassembler made a mistake when it disassembled the code that he C compiler had created.

And when I talk about .s I don't mean the ones you are creating but the one that the C compiler generates for each .c file you build. You simply give the option --save-temps and for every foo.c that is built you get a foo.i (the pre-proc file) and foo.s (the avr-as source output by the compiler and fed into the assembler).

This is the .s file for that routine (including an empty main):

	.file	"test.c"
__SREG__ = 0x3f
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__CCP__  = 0x34
__tmp_reg__ = 0
__zero_reg__ = 1
 ;  GNU C (WinAVR 20100110) version 4.3.3 (avr)
 ; 	compiled by GNU C version 3.4.5 (mingw-vista special r3), GMP version 4.2.3, MPFR version 2.4.1.
 ;  GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
 ;  options passed:  -fpreprocessed test.i -mmcu=atmega168 -Os -Wall
 ;  -std=gnu99 -funsigned-char -funsigned-bitfields -fpack-struct
 ;  -fshort-enums -fverbose-asm
 ;  options enabled:  -falign-loops -fargument-alias -fauto-inc-dec
 ;  -fbranch-count-reg -fcaller-saves -fcommon -fcprop-registers
 ;  -fcrossjumping -fcse-follow-jumps -fdefer-pop -fearly-inlining
 ;  -feliminate-unused-debug-types -fexpensive-optimizations
 ;  -fforward-propagate -ffunction-cse -fgcse -fgcse-lm
 ;  -fguess-branch-probability -fident -fif-conversion -fif-conversion2
 ;  -finline-functions -finline-functions-called-once
 ;  -finline-small-functions -fipa-pure-const -fipa-reference -fivopts
 ;  -fkeep-static-consts -fleading-underscore -fmath-errno
 ;  -fmerge-constants -fmerge-debug-strings -fmove-loop-invariants
 ;  -fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls
 ;  -fpack-struct -fpeephole -fpeephole2 -freg-struct-return -fregmove
 ;  -freorder-functions -frerun-cse-after-loop -fsched-interblock
 ;  -fsched-spec -fsched-stalled-insns-dep -fsigned-zeros
 ;  -fsplit-ivs-in-unroller -fsplit-wide-types -fstrict-aliasing
 ;  -fstrict-overflow -fthread-jumps -ftoplevel-reorder -ftrapping-math
 ;  -ftree-ccp -ftree-copy-prop -ftree-copyrename -ftree-dce
 ;  -ftree-dominator-opts -ftree-dse -ftree-fre -ftree-loop-im
 ;  -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops=
 ;  -ftree-reassoc -ftree-salias -ftree-scev-cprop -ftree-sink -ftree-sra
 ;  -ftree-store-ccp -ftree-ter -ftree-vect-loop-version -ftree-vrp
 ;  -funit-at-a-time -fverbose-asm -fzero-initialized-in-bss

 ;  Compiler executable checksum: 61d68a374065d489330774d2533cbbdf

	.text
.global	addwsat
	.type	addwsat, @function
addwsat:
/* prologue: function */
/* frame size = 0 */
	clr r23	 ;  tmp49
	sbrc r22,7	 ;  tmp49
	com r23	 ;  tmp49
	add r22,r24	 ;  tmp49, a
	adc r23,__zero_reg__	 ;  tmp49
	cpi r23,lo8(1)	 ;  join,
	brne .L2	 ; ,
	ldi r24,lo8(-1)	 ;  D.1336,
	ret
.L2:
	cpi r23,lo8(-1)	 ;  join,
	brne .L4	 ; ,
	ldi r24,lo8(0)	 ;  D.1336,
	ret
.L4:
	mov r24,r22	 ;  D.1336, tmp49
	ret
	.size	addwsat, .-addwsat
.global	main
	.type	main, @function
main:
/* prologue: function */
/* frame size = 0 */
.L7:
	rjmp .L7	 ; 
	.size	main, .-main

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Because it's the WRONG instruction
What I meant is what if I wanted to use the LAT instruction in general (not just in this ADDWSAT function). Neither the editor or the assembler recognize it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

These instructions (along with LAC, LAS and XCH) are valid only for new XMega CPU core (V3XJ). They can be successfully assembled after selecting an XMega as a target or after issuing the following #pragma directive:

#pragma AVRPART CORE CORE_VERSION V3XJ 

        xch     z,r1            ; Exchange r1 with RAM[Z}

        lac     z,r2            ; Load And Clear R2 bits of RAM{Z}
        las     z,r3            ; Load And Set R3 bits of RAM{Z}
        lat     z,r4            ; Load And Toggle R4 bits of RAM{Z}

Warning: Grumpy Old Chuff. Reading this post may severely damage your mental health.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, I have optimized much further:

; ADD with Saturation
; uint8_t addwsat(uint8_t a, int8_t b);
.global addwsat
.func addwsat
addwsat:
    add   r24, r22  ; add a + b
    sbrs  r22, 7    ; skip if b is negative
    rjmp  checkoverflow
checkunderflow:
    brcs  exitadd1  ; carry not set-> underflow
    ldi   r24, 0x00 ; return 0
exitadd1:
    ret
checkoverflow:
    brcc  exitadd2  ; carry set-> overflow
    ldi   r24, 0xFF ; return 255
exitadd2:
    ret
.endfunc

Now I think the code is small enough that it might be efficient to make it inline. Can someone help with this? I don't know much about register clobbering and such...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, I am reading the inline assembler cookbook and I am trying to write the inline assembly code, I still need some help, so far I have this:

#define addwsat(_a,_b)                  \
    ({                                  \
        asm (                           \
        "add   %[a], %[b]"    "\n\t"    \
        "sbrs  %[b], 7"       "\n\t"    \
        "rjmp  2f"            "\n\t"    \
        "brcs  1f"            "\n\t"    \
        "ldi   %[a], 0x00"    "\n\t"    \
"1: "   "ret"                 "\n\t"    \
"2: "   "brcc  3f"            "\n\t"    \
        "ldi   r24, 0xFF"     "\n\t"    \
"3: "   "ret"                           \
        : [a] "=r" (_a)                 \
        : [a] "r" ((uint8_t)(_a)),  [b] "r" ((int8_t)(_b)));\
    }) 

But the jumps to the RET instruction, and the RET instruction itself, need to be replaced by a "jump outside of this section of code". Also, can someone check the overall code?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why would you have "RET" in inline code? It's not CALL'd - it's inline - that's the point. You do want one common exit point that all code paths ultimately jump to and that falls through to the next Asm or C statements.

Shall I move this to the GCC forum now as you are more likely to be seen by the inline Asm experts there?

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Exactly, that is what I meant about the ret. Yes, please move to the gcc forum.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I do not get your intention. You are trying to squeeze a 12w code into the one which takes only 8w. You explicitly point out your aim is to make the code smaller (-Os). And now you want to make it inlined. Can you explain what is the purpose of it?

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This particular function is used in a loop that needs to be executed very fast. But regardless, Smaller code and Faster code are not mutually exclusive...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here is the code I use to add-saturate resp. sub-saturate unsigned char + char, 7 Instructions each:

Addition

;; unsigned R16
;; signed   R0
;; R16 := R16 + R0 with Saturation
; Transform [0x00, 0xff] -> [0x80, 0x7f]
subi R16, 0x80
add  R16, R0
brvc 0f
; Signed overflow -> MAX
ldi  R16, 0x7f
sbrc R0, 7
; R0 is negative -> load MIN
ldi  R16, 0x80
0:
; Back-Transform [0x80, 0x7f] -> [0x00, 0xff]
subi R16, 0x80

Subtraktion

;; unsigned R16
;; signed   R0
;; R16 := R16 - R0 with Saturation
; Transform [0x00, 0xff] -> [0x80, 0x7f]
subi R16, 0x80
sub  R16, R0
brvc 0f
; Signed Overflow -> MAX
ldi  R16, 0x80
sbrc R0, 7
; R0 ist negativs -> MIN
ldi  R16, 0x7f
0:
; Back-Trafo [0x80, 0x7f] -> [0x00, 0xff]
subi R16, 0x80

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And here wrapped into Inline Assembler for you :-)

#include 

static inline uint8_t
addsat (uint8_t a, int8_t b)
{
    asm ("subi #0, 0x80" "\n\t"
         "add  #0, #1"   "\n\t"
         "brvc 0f"       "\n\t"
         "ldi  #0, 0x7f" "\n\t"
         "sbrc #1, 7"    "\n\t"
         "ldi  #0, 0x80" "\n"
         "0:"            "\n\t"
         "subi #0, 0x80"
         : "+d" (a)
         : "r" (b));

    return a;
}

You have to replace all # by % because this f***orum software crashed with %.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for your code! That is just what I needed...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Use % (all FIVE chars) in the places where you need to display a % that's within quotes and it then works.

asm ("subi %0, 0x80" "\n\t"

1) Studio 4.18 build 716 (SP3)
2) WinAvr 20100110
3) PN, all on Doze XP... For Now
A) Avr Dragon ver. 1
B) Avr MKII ISP, 2009 model
C) MKII JTAGICE ver. 1

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Use %

Welcome to the "inner circle" - those of us who know how to type % into a post so it isn't turned into % sign :lol:

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah, thanks, you and another 'freaks veteran schooled me on that one . I'm a straightup "crime-fighter" ( dissipating posting frustrations wherever I go ) with that one. :)

1) Studio 4.18 build 716 (SP3)
2) WinAvr 20100110
3) PN, all on Doze XP... For Now
A) Avr Dragon ver. 1
B) Avr MKII ISP, 2009 model
C) MKII JTAGICE ver. 1

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

indianajones11 wrote:
Use %

That works just once. If someone quotes that code he will run in the same error
www.avrfreaks.net wrote:
Bad Request

Your browser sent a request that this server could not understand.
Apache Server at www.avrfreaks.net Port 80

and wonder what's wrong with his browser.

Why is it not possible to fix that bug in the first place?

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Why is it not possible to fix that bug in the first place?

Atmel don't have two sheckels to rub together :-(

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Why is it not possible to fix that bug in the first place?

Haven't we been here before, SprinterSB?

Atmels budget for AVRfreaks is asymptotically close to zero.

Do us freaks like it? Nah!
Will that change? Prolly not in the near future.
Is it the only 'freaks site PITA? Far from it.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0