Nope remainder is where you for example divide 13 by 3 and get the dividend 4 with a remainder of 1. 3 goes into 13 four times with 1 left over.

It's true that 13/3 can also be seen as 4.33 where the 4 is the integer result and the .333 (which represent the "remainder 1") is the fractional result but it would be wrong to refer to the .33 part of this as the "remainder".

Posted by Catalin Ioan Stanciu: Sat. Feb 6, 2021 - 02:17 PM(Reply to #108)

1

2

3

4

5

Total votes: 0

These are measurements without counting INPUT and without BCD packing.

I am wrong now, and I was wrong also the first time I did the measurement:

my original version (V2) is AVG=50.532 Clks on 0-65535

my actual version (see bellow) AVG=51.98 Clks on 0-65535

Actual version has this code bellow:

;------

add r16,r0
adc r1,r1
breq PC+11
; r1 can be 1 or 3
dec r1
breq PC+3 ; it means it was 1, but now is 0
inc r17
mov r1,SIX
add r1,SIX ; now r1 is SIX or TWELVE
inc r17 ; increment Quotient
add r16,r1 ; add SIX or TWELVE
brcc PC+3 ; possible one do-it-again on Carry resulted from last instruction
inc r17
add r16,SIX
; previous 13 instructions: min. 4 and max. 14 clk

;---------------

and this is worse than V2 version:

;----------

lsr r1 ; r1=0 and possible 1 shifted to C
brcc PC+4 ; branch if zero (evident)
inc R17 ; increment Quotient(R1 was one)
add r16,SIX ; add SIX; because we INC Quotient
brcs PC-2 ; possible one do-it-again on Carry resulted from last instruction
; On previous 5 instructions
; if High(INPUT)>$A8 5 or 9 clk, depending on RB256
; if High(INPUT)<$A8 3 or 5 clk, depending on RB256

add r16,r0 ; we add Remainder on the old one
brcc PC+4 ; but if we have CARRY?
inc r17 ; increment Quotient
add r16,SIX ; add SIX; because we INC Quotient
brcs PC-2 ; possible one repeat on Carry resulted from last instruction
; if there were 6 or 9 on previous paragraph we have only 3 clk here
; if there were 3 or 5 on previous paragraph we have 3 to max. 9 here
; so, this is somehow compensated (on both) to 6 to max. 15 clk

;---------

Were I can see that "min. 4 and max. 14 clk" is worse than "6 to max. 15 clk"

yes the 99 split works fine without correction for 65, the only errors are at 69 79 89 and 99.(I write 64 because the check is for that bit)

See #41 there are all the numbers.

Add:

One thing could actually be to make a version that don't use the HW mul. That will be slower (about 70-80 clk), but by far the fastest on on chips like tiny85.

Posted by Catalin Ioan Stanciu: Tue. Feb 9, 2021 - 07:11 AM(Reply to #116)

1

2

3

4

5

Total votes: 0

See Att. retested Atiny85 BCD UnPacked version.

Result in: R20:R19:R18:R17:R16 ( R0 is also used )

Max 74 & Min 56 Cycles and measured AVG=65.18

Used Code Segment memory 76 Words

The Unpacked version is less or equal than 74 Cycles.

Above modifications in Red, due to:

mov r0,r17 ;
add r0,r0 ; These are replacement for MUL ( +3 clk )
add r0,r17 ; as suggested by
add r0,r0 ; https://www.avrfreaks.net/users/...
sbc r19,r19 ; r19 = minus carry

Posted by Catalin Ioan Stanciu: Wed. Feb 10, 2021 - 10:41 AM(Reply to #116)

1

2

3

4

5

Total votes: 0

47.86 clk

MySecond Version on Two Bytes AVG under50 Cycles. (AVRe)

Tested.

; 3529736 Counter with program
; 393215 Counter run empty
; 3136520 Diffrence
; 47.86 Diffrence/65536 My Second Under 50 Cyles Average!
; uses 7 MUL's so it should be 43 Words length Min/Max 47/50

It was indeed a welcome pause to do the:

Atiny85 BCD UnPacked version.

Result in: R20:R19:R18:R17:R16 ( R0 is also used )

Max 74 & Min 56 Cycles and measured AVG=65.18

Used Code Segment memory 76 Words

The Unpacked version is less or equal than 74 Cycles.

I just tried a couple and got broadly similar answers for cycles in the simulator. I was a bit surprised that megas (MUL) actually return almost identical numbers to tiny.

The core code (itoa leads to ultoa_ncheck) is here:

I am reopening this post for a small, but important issue.

I am using the word "Remainder" as in a result of Modulo(Integer,1000).

In https://www.avrfreaks.net/commen...

You are using remainder as the decimal part of an integer division.

When I look at https://en.wikipedia.org/wiki/Re..., I might

think that I haven't incorrectly used the word.

Am I wrong?

- Log in or register to post comments

TopNope

remainderis where you for example divide 13 by 3 and get the dividend 4 with aremainderof 1. 3 goes into 13 four times with 1 left over.It's true that 13/3 can also be seen as 4.33 where the 4 is the integer result and the .333 (which represent the "remainder 1") is the fractional result but it would be wrong to refer to the .33 part of this as the "remainder".

- Log in or register to post comments

TopSo, I just raised the problem to El Tangas:

"so the next digits pop up from the remainder just by mul with 10"

I think that's the decimal part of the result of dividing the

"Two Bytes" Number by 1000.

He multiplies the number by constant 256^3/1000;

that is multiplied by 256^3 and then dividing by 1000; he is making an unambiguous Approximation.

Thus, you have to round the Decimal part on very, very many divisions. It's normal.

The Problem is Remainder is an Integer, not a Fractional Number.

So, I thing here he was unintentionally misleading.

When you multiply by 10 the decimal part of a division it's obvious that the "next digits pop up".

This is not True about Remainder.

- Log in or register to post comments

TopI was trying to understand El Tangas's algorithm, and I encounterd difficulties.

Than it poped my mind that maybe he's not meaning Remainder.

I think that many people are using Remainder as "the decimal part of a fractional number",

but it is not absolutely correct.

Thank you.

- Log in or register to post comments

Topclawson wrote:Catalin Ioan Stanciu wrote:That's right, I mean the fractional part, not "remainder" in a strict mathematical sense. Sorry if it caused confusion.

IIRC I used some kind of fixed point format.

- Log in or register to post comments

TopI studied your algo. Though uses aproximation it's avg. just few fractions of Cycle faster than mine.

I tried hard to get one instruction faster on my algo, but in vain.

I guess aproximation works harder on many Bytes. I'll try there. :-)

I'll update all my sources, from time to time, on first post. (#1)

Thanks a lot.

- Log in or register to post comments

TopAre you now talking about a 16 or 24 bit version , packed / unpacked and what is the number of clocks ?

Then I will see if I can solve at better way :)

For others yes I do this for fun, like a crossword, since we know it can be less than than 1% of the AVR's time there is no need!

- Log in or register to post comments

TopI'm hopeless. I'm still at contest; can't abstain.

Still the Two Bytes; were I might have found something; though Not Sure Yet!

If you discount the BCD packing, on my 16 bit, I'm at 49,53 clk AVG. <---WRONG , it is 51.53

That's one instruction, and AVG will fall below 49.

Yes, I now I'll be up on MAX; but as I said: "I can't abstain".

- Log in or register to post comments

Toptry this on debug, AS7:

ldi r16,low(01234)

ldi r17,high(01234)

it gives:

and without first zero:

ldi r16,low(1234)

ldi r17,high(1234)

gives:

01234 means Octal(1234) which equals Hex (29C). It seems that I got screwd up by this!

- Log in or register to post comments

TopThese are measurements without counting INPUT and without BCD packing.

I am wrong now, and I was wrong also the first time I did the measurement:

my original version (

V2) isAVG=50.532 Clkson 0-65535my actual version (see bellow) AVG=51.98 Clks on 0-65535

Actual version has this code bellow:

;------

add r16,r0

adc r1,r1

breq PC+11

; r1 can be 1 or 3

dec r1

breq PC+3 ; it means it was 1, but now is 0

inc r17

mov r1,SIX

add r1,SIX ; now r1 is SIX or TWELVE

inc r17 ; increment Quotient

add r16,r1 ; add SIX or TWELVE

brcc PC+3 ; possible one do-it-again on Carry resulted from last instruction

inc r17

add r16,SIX

; previous 13 instructions: min. 4 and max. 14 clk

;---------------

and this is worse than V2 version:

;----------

lsr r1 ; r1=0 and possible 1 shifted to C

brcc PC+4 ; branch if zero (evident)

inc R17 ; increment Quotient(R1 was one)

add r16,SIX ; add SIX; because we INC Quotient

brcs PC-2 ; possible one do-it-again on Carry resulted from last instruction

; On previous 5 instructions

; if High(INPUT)>$A8 5 or 9 clk, depending on RB256

; if High(INPUT)<$A8 3 or 5 clk, depending on RB256

add r16,r0 ; we add Remainder on the old one

brcc PC+4 ; but if we have CARRY?

inc r17 ; increment Quotient

add r16,SIX ; add SIX; because we INC Quotient

brcs PC-2 ; possible one repeat on Carry resulted from last instruction

; if there were 6 or 9 on previous paragraph we have only 3 clk here

; if there were 3 or 5 on previous paragraph we have 3 to max. 9 here

; so, this is somehow compensated (on both) to 6 to max. 15 clk

;---------

Were I can see that "min. 4 and max. 14 clk" is worse than "6 to max. 15 clk"

"much ado about nothing"

- Log in or register to post comments

TopDo you know the record on 24 and 32 bits?

- Log in or register to post comments

TopNope.

And it will start to get crowded in the registers so perhaps it should include store result in RAM.

- Log in or register to post comments

Top(FYI)

My "posted here"

Three Bytes(24bit)version isAVG=145.76from 00:00:00 to FF:FF:FFMy "posted here" Three Bytes(24bit) version is AVG=148.07 from FF:00:00 to FF:FF:FF

(it'll take hours to count from 00:00:00 to FF:FF:FF; it should thus be under 150 clk)

- Log in or register to post comments

Top49.86 clk

My First Version on Two Bytes AVG under 50 Cycles.

## Attachment(s):

- Log in or register to post comments

Topnice job

yes the 99 split works fine without correction for 65, the only errors are at 69 79 89 and 99.(I write 64 because the check is for that bit)

See #41 there are all the numbers.

Add:

One thing could actually be to make a version that don't use the HW mul. That will be slower (about 70-80 clk), but by far the fastest on on chips like tiny85.

- Log in or register to post comments

TopYou made me do that nice job; and I'm very glad you did it!

You are right about 65; it was in my mind to change something in comment there, too.

But, I'm still searching on why can't I get one instr. faster. Can't move along until that hepens.

HW mul is very powerfull; I can make "a touch" on the subject, as a pause.

Yes, it would be a good thing to make a change in my thoughts; at least for a while.

- Log in or register to post comments

TopGuess first "MUL by 6" can be relaced with something like:

add R16,R16

mov Rn,R16

add Rn,Rn

add Rn,R16

brcc ; check Carry

The rest can be done with with "my 12.5clk version"; see. #25 .

I'll try.

- Log in or register to post comments

TopAVG 73.18 CLK

This is a DRAFT, still

untested, but seems to work; I'll try to fasten and update.Att. Removed. It is WRONG. Has 26200/65536 errors on output. I'll be back.

ReplacedR18with .DEF SIX=R19;becauseI was using R18!Now

works 0-65535.## Attachment(s):

- Log in or register to post comments

TopOn an AVR you normally don't check for carry, but have a register that is zero, so you just ADC with zero (I always use R15, don't use R1 as GCC)

- Log in or register to post comments

TopYes, this I'll have to learn.

It's just a rapid transformation; copy and paste ... Draft, No Deep Thinking. Please excuse.

Thanks.

- Log in or register to post comments

TopSee Att. tested

Atiny85 BCD Packed version.Result BCD packed in R18:R17:R16

Used Registers: R0,R2,R3,R16,R17,R18

Max 83 & Min 65 Cycles and measured AVG=73.18

Used Code Segment memory 84 Words

I believe The Unpacked version will be Max<80 Cycles, also.

## Attachment(s):

- Log in or register to post comments

TopI'm doing the same but, Here I have

bothto:- Increment Quotient-

AddSix to RemainderAnd I know i spent few days optimizing this, 'cause the "below 50" uses this.

- Log in or register to post comments

TopSee Att. retested

Atiny85 BCD UnPackedversion.Result in:

R20:R19:R18:R17:R16( R0 is also used )Max 74 & Min 56 Cycles and measured

AVG=65.18Used Code Segment memory

76 WordsThe Unpacked version is

less or equalthan74 Cycles.Above modifications in Red, due to:

mov r0,r17 ;

add r0,r0 ; These are replacement for MUL ( +3 clk )

add r0,r17 ; as suggested by

add r0,r0 ;

https://www.avrfreaks.net/users/...sbc r19,r19 ; r19 = minus carry

sub R17,r19 ; add carry

## Attachment(s):

- Log in or register to post comments

TopIf this code

is replaced by

and nothing gets broken (I didn't check) a cycle may be saved.

- Log in or register to post comments

TopI checked.

This is finesse!- Log in or register to post comments

Top47.86clkMySecondVersion on Two Bytes AVGunder50 Cycles. (AVRe)Tested.; 3529736 Counter with program

; 393215 Counter run empty

; 3136520 Diffrence

;

47.86Diffrence/65536 My Second Under 50 Cyles Average!; uses 7 MUL's so it should be

43 Wordslength Min/Max47/50It was indeed a welcome pause to do the:

## Attachment(s):

- Log in or register to post comments

TopPresumably, things like itoa() and printf() must be doing basically this - so has anyone looked at their implementations ... ?

Top Tips:

oscillator: https://www.avrfreaks.net/comment...- Log in or register to post comments

TopNo.

We are speaking here about

65 ns (at 16MHz)! <-- There, in red, It wasmistakenlywritten "S".This is, I have to do it "

because it is there" (was first uttered byEdmund Hillary whenheandTenzing Norgay conquered Mount Everest in 1953).Added later: May be, some should see my "

/1000" algo.- Log in or register to post comments

TopCatalin Ioan Stanciu wrote:lowercase 's' for seconds.

(uppercase 'S' = siemens)

https://www.avrfreaks.net/commen...

Top Tips:

oscillator: https://www.avrfreaks.net/comment...- Log in or register to post comments

Tops= second.

I do apologizefor any inconvenience.- Log in or register to post comments

TopSeven for the Dwarven Lords, in their halls of stone.

- Log in or register to post comments

TopI deleted the original post.

It was inappropriate.

- Log in or register to post comments

TopSeven Units for Seven Quantities ... ?

Top Tips:

oscillator: https://www.avrfreaks.net/comment...- Log in or register to post comments

Tophere was the answer back in 2016

https://www.avrfreaks.net/commen...

So itoa() take about 875clk for 12345

(And since AVR's with and without HW take about the same time I guess it use a normal div routine.)

But perhaps the libs have changed.

Add:

if it's a signed or unsigned version don't really make a difference, (neg16 take 3 clk)

- Log in or register to post comments

Topsparrow2 wrote:So just the binary-to-BCD part should be a little less?

Top Tips:

oscillator: https://www.avrfreaks.net/comment...- Log in or register to post comments

Topcited from "Eurythmics, Annie Lennox, Dave Stewart - Sweet Dreams (Are Made Of This) (Official Video)".

- Log in or register to post comments

Topsparrow2 wrote:The core code (itoa leads to ultoa_ncheck) is here:

http://svn.savannah.gnu.org/view...

- Log in or register to post comments

Topbut that code use 32bit , there should be a 16bit version !?

- Log in or register to post comments

TopFor the sake of the argument,

let us say that the Assembly Language is like a motorcycle

and other programming languages are like a car.

Now, if you are on a motorcycle you know that drivers may fail to see motorcycles in plain sight.

You may feel you want to ride a motorcycle or not,

but for me being on a motorcycle there is no point in arguing with someone who may fail to see me.

- Log in or register to post comments

TopI have a 4-byte(32bit unsigned) version of about 170clk; hope I'll finish bugs and testing in the next days.

Is there realy a need for numbers larger than 256^4-1 = 4_294_967_295 ?

This post is about speed and anyone may question the need for speed.

As I've already already said,

nobody's compelled to do caving discovery or mind entertainment exercises.

- Log in or register to post comments

Top## Pages