## simple way to round?

36 posts / 0 new
Author
Message

Is there a simple way to round a 24 bit number to the nearest 100 (ex: 1234567 gives 1234600, 14445970 gives 1446000, 223344 gives 223300).

This is using avr assembler. I suppose I may have to resort to some large mult or divide routines again.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

divide by 100. if the remainder > 50 then add 1 to the dividend then multiply the dividend by 100. If you already have the number as bcd, then test the 10's digit for >=5 and add 100 else zero the 1's and 10's digits.

value = value + 50
value = value - (value mod 100)

Regards,
Steve A.

The Board helps those that help themselves.

Is there an easy way to do the mod? I suppose there's no other way than a long 24 bit division..wan't sure if there were any "tricks" that would work.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

You could easily round to the nearest 128...
Or you could do it during printing, when you're dividing anyway.

There are assorted "interesting" algorithms for dividing or multiplying by a constant. This "divide by 10" function (from "Hacker's Delight") looks particularly interesting because of all the shifts that are multiples of 8bits.

```unsigned divu10(unsigned n) { unsigned q, r;
q = (n >> 1) + (n >> 2);
q = q + (q >> 4);
q = q + (q >> 8);
q = q + (q >> 16);
q = q >> 3;
r = n - q*10;
return q + ((r + 6) >> 4);
// return q + (r > 9);
}```

Thanks--That's an interesting trick, I suppose it can be done in asm, even with the *10 required.
To round to the nearest 100 wiould require doing this twice.
I wonder if ther is something even trickier & shorter. I did get my program working using a long divide & long multiply (add 50, div by 100 & mult by 100), so now its more of an interesting quetion.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Surely it depends on the rounding you need.
http://en.wikipedia.org/wiki/Rounding#Types_of_rounding

Four legs good, two legs bad, three legs stable.

Quote:
Is there an easy way to do the mod?
Mod is simply a divide where you keep the remainder instead of the result. So if you already have the (add 50, div by 100 & mult by 100) solution, then you only need a rearrangement of the code to get my solution which saves you the multiply.

Regards,
Steve A.

The Board helps those that help themselves.

Why do you want to do this? The more we know about why (and under what circumstances) you want to do it, the better answers we might provide.

Quote:
I wonder if ther is something even trickier & shorter.
There was similar "divide by 100" code, but it didn't look like it would optimize as well for a CPU without a barrel shifter.

Here is some pseudocode that might help.
The loops should be completely unrolled.
Use a macro for the big one.
I think it will come out to 65 instructions total.

```hi, mid, lo=src
if carry
lo+=56
if carry
lo+=56

while lo>=100
lo-=100

if lo>=50
result=src+(100-lo)
else
result=src-lo```

It just rounds to a multiple of 100.
If you want decimal digits, this won't give them to you.

Iluvatar is the better part of Valar.

I don't get [skeeve's code] at all.
If you HAVE modulo, you'd just do
r = (v+50) - ((v+50)%100)
But Modulo is usually just as expensive and long as division.
Where do 36 and 56 come from?

westfw wrote:
I don't get [skeeve's code] at all.
If you HAVE modulo, you'd just do
r = (v+50) - ((v+50)%100)
But Modulo is usually just as expensive and long as division.
Where do 36 and 56 come from?
56==0x100%100
Once you unroll the big loop, mask becomes constants and the assembler can do the percent work.
That is why the big loop unrolls to only 65 instructions.

Iluvatar is the better part of Valar.

Quote:

I think it will come out to 65 instructions total.

Quote:

That is why the big loop unrolls to only 65 instructions.

Quote:

But Modulo is usually just as expensive and long as division.

+1 on the last...riddle me that--how are the two %100 each loop done "free"?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

theusch wrote:
Quote:

But Modulo is usually just as expensive and long as division.

+1 on the last...riddle me that--how are the two %100 each loop done "free"?
skeeve wrote:
Once you unroll the big loop, mask becomes constants and the assembler can do the percent work.

Iluvatar is the better part of Valar.

Ah. I get it. p Part of it, anyway...
When you said to unroll the loop, you meant to expand all the unrolled constants as well.

```if hi & 1
lo += (1*36)%100
if carry
lo +=56
if mid & 1
lo += (1*56)%100
:
; next unrolled iteration
if hi & 2
lo += (2*36)%100
if carry
lo += 56
if mid & 2
lo += (2*56)%100
:
```

That should work out pretty well using .irp and such; perhaps I'll give it a try. Thanks for the hints!

mid shouldn't nest inside hi.

Iluvatar is the better part of Valar.

Hmm. Does this look right? It expands to a lot more than 65 instructions (100+); don't forget that you can't do "sbic sreg, x" (that would have helped.)

```.irp mask, 1, 2, 4, 8, 16, 32, 64, 128
breq 0f
brcc 0f
subi NL, -56
breq 1f
brcc 1f
subi NL, -56
1:
.endr```

Quote:
But Modulo is usually just as expensive and long as division.
Modulo is division, only what you keep as the return value is different.

Regards,
Steve A.

The Board helps those that help themselves.

The .irp code seems like it would work.
It expands to 80 instructions.
One can get it down to 65 if one realizes that my nested ifs do not need to be nested.

@Koshchi: long division naturally produces a remainder,
but not all division is done that way, especially division by a constant.

Iluvatar is the better part of Valar.

But if you have a form of division that does not produce a remainder, then it can't be used for modulo, so I don't see the relevance.

Regards,
Steve A.

The Board helps those that help themselves.

What do you mean with easy ? :)
Size, AVG speed, worst case speed, easy to understand?
I don't like the +50 version because it can overflow.
If it's worst case speed (and a AVR with HW mul) I would go for some ASM code that do the division with a mul by 1/100.
mul by 167772 and forget about the lower 24 bit, could be a way.

Quote:
I don't like the +50 version because it can overflow.

Surely not unless the round operation itself would actually overflow?

westfw wrote:
Quote:
I don't like the +50 version because it can overflow.

Surely not unless the round operation itself would actually overflow?
2**24-1 will round down.
2**24-1+50 will overflow.

BTW with a slightly different algorithm, my big loop can be replaced by 57 instructions.

With hardware multiply and careful ordering,
my big loop can be replaced by 39 words and 41 cycles.

Iluvatar is the better part of Valar.

```  ; rounds input to multiple of 100
; input  src2, src1, src0
; output src2, src1, src0
; they need not be upper registers
; clears r1
; 35 words, up to 40 cycles
; temporaries s56, s36, sum0 need to be upper registers
; temporaries s0, sum1 do not
; s56, s36 could be the same register

ldi s36, 36  ; 2**16  mod 100
mul src0, s36
mov sum0, r0
mov sum1, r1

ldi s56, 56  ; 2**8 mod 100
mul src2, s56

clr s0
adc sum1, s0 ; <= 92

; 0x100*sum1 + sum0 == 36*src2 + 56*src1 + src0

; r1, sum0 <- 36*sum1 + sum0
mul sum1, s56
adc r1, s0 ; <= 21

; r1, sum0 <- 36*r1 + sum0
mul r1, s56
adc r1, s0   ; <= 4

; r1, sum0 <- 36*r1 + sum0
mul r1, s56  ; clears r1
brcc 1f
brcc 1f
1:
sbrc sum0, 7 ; 128
subi sum0, 100 ; <=155

cmpi sum0, 100
brlt .+2
subi sum0, 100 ; <=99

; s0, s0, sum0 <- sum0 or sum0-100
cmpi sum0, 50
brlt .+4
subi sum0, 100
dec s0  ; s0 repurposed as upper 2 bytes of subtrahend

; src2, src1, src0 -= s0, s0, sum0
sub src0, sum0
subc src1, s0
subc src2, s0```

Iluvatar is the better part of Valar.

Debugging code you don't understand is difficult!
It's a good thing that this comment was there: "; 0x100*sum1 + sum0 == 36*src2 + 56*src1 + src0", cause that's not what the first few lines actually calculated.

Here is a corrected version that should be callable from C (with a "long" as argument, where we only use 24 bits.) It worked for all powers of 2. It's still not perfect, though; I'll try to do additional debugging, but am hoping something jumps out. First errors:

```***ERROR***Rounding 228
Got:     100
***ERROR***Rounding 229
Got:     100
***ERROR***Rounding 230
Got:     100```
```.global round100
round100:
; rounds input to multiple of 100
; input  src2, src1, src0
; output src2, src1, src0
;;; If we use r24:r22, we should be compatible with C sending us a long
;;; (and sending it back.)
#define src2 r24
#define src1 r23
#define src0 r22
; they need not be upper registers
; clears r1 (and r0)
push r0
push r1

; 35 words, up to 40 cycles
; temporaries s56, s36, sum0 need to be upper registers
; s56, s36 could be the same register
#define s36 r26
#define s56 r26
#define sum0 r27
; temporaries s0, sum1 do not
#define s0  r18
#define sum1 r19

;;; convenience macros
subi \r, -(\c)
.endm
.macro cmpi r, c
cpi \r, \c
.endm

ldi s36, 36	; 2**16  mod 100
mul src2, s36
mov sum0, r0
mov sum1, r1

ldi s56, 56  ; 2**8 mod 100
mul src1, s56

clr s0
adc sum1, s0 ; <= 92

; 0x100*sum1 + sum0 == 36*src2 + 56*src1 + src0

; r1, sum0 <- 36*sum1 + sum0
mul sum1, s56
adc r1, s0 ; <= 21

; r1, sum0 <- 36*r1 + sum0
mul r1, s56
adc r1, s0   ; <= 4

; r1, sum0 <- 36*r1 + sum0
mul r1, s56  ; clears r1
brcc 1f
brcc 1f
1:
sbrc sum0, 7 ; 128
subi sum0, 100 ; <=155

cmpi sum0, 100
brlt .+2
subi sum0, 100 ; <=99

; s0, s0, sum0 <- sum0 or sum0-100
cmpi sum0, 50
brlt .+4
subi sum0, 100
dec s0  ; s0 repurposed as upper 2 bytes of subtrahend

; src2, src1, src0 -= s0, s0, sum0
sub src0, sum0
sbc src1, s0
sbc src2, s0

pop r1
pop r0
ret
```

And the Arduino test sketch. (no, Arduino doesn't like .S files. I let the compile fail, then manually link in the assembled .S file and subsequent upload steps.)

```void setup()
{
Serial.begin(9600);
}
extern "C" unsigned long round100(unsigned long);

unsigned long n;
void loop()
{
unsigned long r;
unsigned char i = 0;

for (n=1; n < (1L<<24); n+=1) {
i++;
r = round100(n);
if (r != ((n+50)/100L)*100L) {
Serial.print("***ERROR***");
i=0;
}
if (i == 0) {
Serial.print("Rounding ");
Serial.println(n);
Serial.print("Got:     ");
Serial.println(round100(n));
}
}
Serial.println("Done!");

;
}
```

In this section, the comments say *36, but the code has *56. I get more correct answers if I go with the code, but...

And I was wrong about powers of 2 working; 2048 didn't work.

```  ; r1, sum0 <- 36*sum1 + sum0
mul sum1, s56
adc r1, s0 ; <= 21

; r1, sum0 <- 36*r1 + sum0
mul r1, s56
adc r1, s0   ; <= 4

; r1, sum0 <- 36*r1 + sum0
mul r1, s56  ; clears r1
brcc 1f
brcc 1f

I think the problem is the first brlt instruction.
Changing the brlt's to brcc's should do the trick.
brlt is for a signed test.
For the second test, the values are in 0..127,
whether or not interpreted as signed,
so the difference is not important.
What are the pop instructions for?
I didn't see the corresponding pushes.

Also, some of the comments still have 36 where they should have 56.

The code runs on the principle that
0x10000*d2 + 0x100*d1 + d0 == 36*d2 + 56*d1 + d0 mod 100.
Except for the initial input, d2 was 0.

Iluvatar is the better part of Valar.

Quote:
Changing the brlt's to brcc's should do the trick.

Changing to "brlo" (unsigned equiv of brlt, which is actually brcs) helps. Still gets errors:

```***ERROR***Rounding 21456
Got:     21444
***ERROR***Rounding 21457
Got:     21444
***ERROR***Rounding 21458
```

Quote:
What are the pop instructions for?
I didn't see the corresponding pushes.

they're there: r0 isn't necessary, I guess, but r1 is the gcc "known zero" register, so it needs saved.

```  ; clears r1 (and r0)
push r0
push r1```

Quote:

so it needs saved.

Surely all it actually needs is clearing to 0:

`eor r1, r1`

at the end?

clawson wrote:
Quote:

so it needs saved.

Surely all it actually needs is clearing to 0:

`eor r1, r1`

at the end?

As noted in a comment, r1 is already clear.
Perhaps another comment is in order.

Iluvatar is the better part of Valar.

(I see how it works, now. Mostly.)

I found my problem:

```  add sum0, r0
brcc 1f
brcc 1f

Remember that "addi" is a macro that does actually does a subi (because avr doesn't have addi!) Apparently that makes the state of the carry bit opposite what I'd expect from an add operation. It worked better when i reverse the condition, but since 56 is already in a register anyway, I thought it would be clearer if I just did a reg-reg add.
The code now works up to an input of: 14548980
It now looks like:

``` ;; deleted; see subsequent post.
```
Last Edited: Sat. Apr 20, 2013 - 08:01 AM

If this "function" should be able to be in a ISR they (r0 and r1) need to be stored, and you don't know the value!

Ahh. It turns out that the assumption that r1 is cleared by the last multiply is not quite true. (I knew I had saved it for a reason!) Changing it thusly results in correct answers over the whole range:

```;;; one more time...  0x100*r1 + sum0 = 36*(0) + 56*r1 + sum0
; r1, sum0 <- 56*r1 + sum0
mul r1, s56  ; clears r1
cpse r1, s0  ;  (except occasionally)
```

Quote:
If this "function" should be able to be in a ISR

The function is currently written to the ABI specs of gcc. If you wanted to call it from an ISR, the ISR would be responsible for saving R0 and R1 (and the standard gcc ISR prologue does do this.) If you wanted to include the code in a non-C ISR, it would have to save a LOT more registers.

BTW: what made things finally 'click' for me was that THIS:

Quote:
0x10000*d2 + 0x100*d1 + d0 == 36*d2 + 56*d1 + d0 mod 100.

is repeated multiple times. Each time, the result gets smaller. Once the result fits in a single byte, the "modulo 100" becomes a trivial "up to two subtractions." Very neat.
.
Final (?) version:

```;;; minimized code to round a 24bit number to the nearest 100 (decimal)
;;; Algorithm by Michael Hennebry ("Skeeve")
;;; https://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=131682
;;;
;;; The code runs on the principle that
;;;   0x10000*d2 + 0x100*d1 + d0 == 36*d2 + 56*d1 + d0 mod 100.

.global round100
round100:
; rounds input to multiple of 100
; input  src2, src1, src0
; output src2, src1, src0
;;; If we use r24:r22, we should be compatible with C sending us a long
;;; (and sending it back.)  All registers used are "call-used" and shouldn't
;;; need to be saved (except for r1, which is assigned "known zero" by C, but
;;; used by the avr multiply instructions.
#define src2 r24
#define src1 r23
#define src0 r22
; they need not be upper registers
; clears r1
;;; if r1 is really cleared, we don't need to save it.  But be safe.
push r1

; 35 words, up to 40 cycles
; temporaries s56, s36, sum0 need to be upper registers
; s56, s36 could be the same register
#define s36 r26
#define s56 r26
#define sum0 r27
; temporaries s0, sum1 do not
#define s0  r18
#define sum1 r19

ldi s36, 36	; 2**16  mod 100
mul src2, s36
mov sum0, r0
mov sum1, r1

ldi s56, 56  ; 2**8 mod 100
mul src1, s56

clr s0
adc sum1, s0 ; <= 92

; 0x100*sum1 + sum0 == 36*src2 + 56*src1 + src0

;;; repeat...  0x100*sum1 + sum0 = 36*(0) + 56*sum1 + sum0
; r1, sum0 <- 56*sum1 + sum0
mul sum1, s56
adc r1, s0 ; <= 21  (sum < 0x21FF)

;;; repeat...  0x100*r1 + sum0 = 36*(0) + 56*r1 + sum0
; r1, sum0 <- 56*r1 + sum0
mul r1, s56
adc r1, s0   ; <= 4  (sum < 0x4FF)

;;; one more time...  0x100*r1 + sum0 = 36*(0) + 56*r1 + sum0
; r1, sum0 <- 56*r1 + sum0
mul r1, s56  ; clears r1
cpse r1, s0  ;  (except occasionally)

brcc 1f		;last 56*r1 + sum0
brcc 1f
1:
sbrc sum0, 7 ; 128
subi sum0, 100 ; <=155

cpi sum0, 100
brlo 2f
subi sum0, 100 ; <=99
2:
; s0, s0, sum0 <- sum0 or sum0-100
cpi sum0, 50
brlt 3f
subi sum0, 100
dec s0  ; s0 repurposed as upper 2 bytes of subtrahend
3:
; src2, src1, src0 -= s0, s0, sum0
sub src0, sum0
sbc src1, s0
sbc src2, s0

pop r1
ret
```

Quote: