Nope remainder is where you for example divide 13 by 3 and get the dividend 4 with a remainder of 1. 3 goes into 13 four times with 1 left over.
It's true that 13/3 can also be seen as 4.33 where the 4 is the integer result and the .333 (which represent the "remainder 1") is the fractional result but it would be wrong to refer to the .33 part of this as the "remainder".
These are measurements without counting INPUT and without BCD packing.
I am wrong now, and I was wrong also the first time I did the measurement:
my original version (V2) is AVG=50.532 Clks on 0-65535
my actual version (see bellow) AVG=51.98 Clks on 0-65535
Actual version has this code bellow:
;------
add r16,r0
adc r1,r1
breq PC+11
; r1 can be 1 or 3
dec r1
breq PC+3 ; it means it was 1, but now is 0
inc r17
mov r1,SIX
add r1,SIX ; now r1 is SIX or TWELVE
inc r17 ; increment Quotient
add r16,r1 ; add SIX or TWELVE
brcc PC+3 ; possible one do-it-again on Carry resulted from last instruction
inc r17
add r16,SIX
; previous 13 instructions: min. 4 and max. 14 clk
;---------------
and this is worse than V2 version:
;----------
lsr r1 ; r1=0 and possible 1 shifted to C
brcc PC+4 ; branch if zero (evident)
inc R17 ; increment Quotient(R1 was one)
add r16,SIX ; add SIX; because we INC Quotient
brcs PC-2 ; possible one do-it-again on Carry resulted from last instruction
; On previous 5 instructions
; if High(INPUT)>$A8 5 or 9 clk, depending on RB256
; if High(INPUT)<$A8 3 or 5 clk, depending on RB256
add r16,r0 ; we add Remainder on the old one
brcc PC+4 ; but if we have CARRY?
inc r17 ; increment Quotient
add r16,SIX ; add SIX; because we INC Quotient
brcs PC-2 ; possible one repeat on Carry resulted from last instruction
; if there were 6 or 9 on previous paragraph we have only 3 clk here
; if there were 3 or 5 on previous paragraph we have 3 to max. 9 here
; so, this is somehow compensated (on both) to 6 to max. 15 clk
;---------
Were I can see that "min. 4 and max. 14 clk" is worse than "6 to max. 15 clk"
yes the 99 split works fine without correction for 65, the only errors are at 69 79 89 and 99.(I write 64 because the check is for that bit)
See #41 there are all the numbers.
Add:
One thing could actually be to make a version that don't use the HW mul. That will be slower (about 70-80 clk), but by far the fastest on on chips like tiny85.
Result in: R20:R19:R18:R17:R16 ( R0 is also used )
Max 74 & Min 56 Cycles and measured AVG=65.18
Used Code Segment memory 76 Words
The Unpacked version is less or equal than 74 Cycles.
Above modifications in Red, due to:
mov r0,r17 ;
add r0,r0 ; These are replacement for MUL ( +3 clk )
add r0,r17 ; as suggested by
add r0,r0 ; https://www.avrfreaks.net/users/...
sbc r19,r19 ; r19 = minus carry
MySecond Version on Two Bytes AVG under50 Cycles. (AVRe)
Tested.
; 3529736 Counter with program
; 393215 Counter run empty
; 3136520 Diffrence
; 47.86 Diffrence/65536 My Second Under 50 Cyles Average!
; uses 7 MUL's so it should be 43 Words length Min/Max 47/50
It was indeed a welcome pause to do the:
Atiny85 BCD UnPacked version.
Result in: R20:R19:R18:R17:R16 ( R0 is also used )
Max 74 & Min 56 Cycles and measured AVG=65.18
Used Code Segment memory 76 Words
The Unpacked version is less or equal than 74 Cycles.
I just tried a couple and got broadly similar answers for cycles in the simulator. I was a bit surprised that megas (MUL) actually return almost identical numbers to tiny.
The core code (itoa leads to ultoa_ncheck) is here:
I am reopening this post for a small, but important issue.
I am using the word "Remainder" as in a result of Modulo(Integer,1000).
In https://www.avrfreaks.net/commen...
You are using remainder as the decimal part of an integer division.
When I look at https://en.wikipedia.org/wiki/Re..., I might
think that I haven't incorrectly used the word.
Am I wrong?
- Log in or register to post comments
TopNope remainder is where you for example divide 13 by 3 and get the dividend 4 with a remainder of 1. 3 goes into 13 four times with 1 left over.
It's true that 13/3 can also be seen as 4.33 where the 4 is the integer result and the .333 (which represent the "remainder 1") is the fractional result but it would be wrong to refer to the .33 part of this as the "remainder".
- Log in or register to post comments
TopSo, I just raised the problem to El Tangas:
"so the next digits pop up from the remainder just by mul with 10"
I think that's the decimal part of the result of dividing the
"Two Bytes" Number by 1000.
He multiplies the number by constant 256^3/1000;
that is multiplied by 256^3 and then dividing by 1000; he is making an unambiguous Approximation.
Thus, you have to round the Decimal part on very, very many divisions. It's normal.
The Problem is Remainder is an Integer, not a Fractional Number.
So, I thing here he was unintentionally misleading.
When you multiply by 10 the decimal part of a division it's obvious that the "next digits pop up".
This is not True about Remainder.
- Log in or register to post comments
TopI was trying to understand El Tangas's algorithm, and I encounterd difficulties.
Than it poped my mind that maybe he's not meaning Remainder.
I think that many people are using Remainder as "the decimal part of a fractional number",
but it is not absolutely correct.
Thank you.
- Log in or register to post comments
TopThat's right, I mean the fractional part, not "remainder" in a strict mathematical sense. Sorry if it caused confusion.
IIRC I used some kind of fixed point format.
- Log in or register to post comments
TopI studied your algo. Though uses aproximation it's avg. just few fractions of Cycle faster than mine.
I tried hard to get one instruction faster on my algo, but in vain.
I guess aproximation works harder on many Bytes. I'll try there. :-)
I'll update all my sources, from time to time, on first post. (#1)
Thanks a lot.
- Log in or register to post comments
TopAre you now talking about a 16 or 24 bit version , packed / unpacked and what is the number of clocks ?
Then I will see if I can solve at better way :)
For others yes I do this for fun, like a crossword, since we know it can be less than than 1% of the AVR's time there is no need!
- Log in or register to post comments
TopI'm hopeless. I'm still at contest; can't abstain.
Still the Two Bytes; were I might have found something; though Not Sure Yet!
If you discount the BCD packing, on my 16 bit, I'm at 49,53 clk AVG. <---WRONG , it is 51.53
That's one instruction, and AVG will fall below 49.
Yes, I now I'll be up on MAX; but as I said: "I can't abstain".
- Log in or register to post comments
Toptry this on debug, AS7:
ldi r16,low(01234)
ldi r17,high(01234)
it gives:
and without first zero:
ldi r16,low(1234)
ldi r17,high(1234)
gives:
01234 means Octal(1234) which equals Hex (29C). It seems that I got screwd up by this!
- Log in or register to post comments
TopThese are measurements without counting INPUT and without BCD packing.
I am wrong now, and I was wrong also the first time I did the measurement:
my original version (V2) is AVG=50.532 Clks on 0-65535
my actual version (see bellow) AVG=51.98 Clks on 0-65535
Actual version has this code bellow:
;------
add r16,r0
adc r1,r1
breq PC+11
; r1 can be 1 or 3
dec r1
breq PC+3 ; it means it was 1, but now is 0
inc r17
mov r1,SIX
add r1,SIX ; now r1 is SIX or TWELVE
inc r17 ; increment Quotient
add r16,r1 ; add SIX or TWELVE
brcc PC+3 ; possible one do-it-again on Carry resulted from last instruction
inc r17
add r16,SIX
; previous 13 instructions: min. 4 and max. 14 clk
;---------------
and this is worse than V2 version:
;----------
lsr r1 ; r1=0 and possible 1 shifted to C
brcc PC+4 ; branch if zero (evident)
inc R17 ; increment Quotient(R1 was one)
add r16,SIX ; add SIX; because we INC Quotient
brcs PC-2 ; possible one do-it-again on Carry resulted from last instruction
; On previous 5 instructions
; if High(INPUT)>$A8 5 or 9 clk, depending on RB256
; if High(INPUT)<$A8 3 or 5 clk, depending on RB256
add r16,r0 ; we add Remainder on the old one
brcc PC+4 ; but if we have CARRY?
inc r17 ; increment Quotient
add r16,SIX ; add SIX; because we INC Quotient
brcs PC-2 ; possible one repeat on Carry resulted from last instruction
; if there were 6 or 9 on previous paragraph we have only 3 clk here
; if there were 3 or 5 on previous paragraph we have 3 to max. 9 here
; so, this is somehow compensated (on both) to 6 to max. 15 clk
;---------
Were I can see that "min. 4 and max. 14 clk" is worse than "6 to max. 15 clk"
"much ado about nothing"
- Log in or register to post comments
TopDo you know the record on 24 and 32 bits?
- Log in or register to post comments
TopNope.
And it will start to get crowded in the registers so perhaps it should include store result in RAM.
- Log in or register to post comments
Top(FYI)
My "posted here" Three Bytes(24bit) version is AVG=145.76 from 00:00:00 to FF:FF:FF
My "posted here" Three Bytes(24bit) version is AVG=148.07 from FF:00:00 to FF:FF:FF
(it'll take hours to count from 00:00:00 to FF:FF:FF; it should thus be under 150 clk)
- Log in or register to post comments
Top49.86 clk
My First Version on Two Bytes AVG under 50 Cycles.
Attachment(s):
- Log in or register to post comments
Topnice job
yes the 99 split works fine without correction for 65, the only errors are at 69 79 89 and 99.(I write 64 because the check is for that bit)
See #41 there are all the numbers.
Add:
One thing could actually be to make a version that don't use the HW mul. That will be slower (about 70-80 clk), but by far the fastest on on chips like tiny85.
- Log in or register to post comments
TopYou made me do that nice job; and I'm very glad you did it!
You are right about 65; it was in my mind to change something in comment there, too.
But, I'm still searching on why can't I get one instr. faster. Can't move along until that hepens.
HW mul is very powerfull; I can make "a touch" on the subject, as a pause.
Yes, it would be a good thing to make a change in my thoughts; at least for a while.
- Log in or register to post comments
TopGuess first "MUL by 6" can be relaced with something like:
add R16,R16
mov Rn,R16
add Rn,Rn
add Rn,R16
brcc ; check Carry
The rest can be done with with "my 12.5clk version"; see. #25 .
I'll try.
- Log in or register to post comments
TopAVG 73.18 CLK
This is a DRAFT, still untested, but seems to work; I'll try to fasten and update.
Att. Removed. It is WRONG. Has 26200/65536 errors on output. I'll be back.
Replaced .DEF SIX=R18 with .DEF SIX=R19; because I was using R18!
Now works 0-65535.
Attachment(s):
- Log in or register to post comments
TopOn an AVR you normally don't check for carry, but have a register that is zero, so you just ADC with zero (I always use R15, don't use R1 as GCC)
- Log in or register to post comments
TopYes, this I'll have to learn.
It's just a rapid transformation; copy and paste ... Draft, No Deep Thinking. Please excuse.
Thanks.
- Log in or register to post comments
TopSee Att. tested Atiny85 BCD Packed version.
Result BCD packed in R18:R17:R16
Used Registers: R0,R2,R3,R16,R17,R18
Max 83 & Min 65 Cycles and measured AVG=73.18
Used Code Segment memory 84 Words
I believe The Unpacked version will be Max<80 Cycles, also.
Attachment(s):
- Log in or register to post comments
TopI'm doing the same but, Here I have both to:
- Increment Quotient
- Add Six to Remainder
And I know i spent few days optimizing this, 'cause the "below 50" uses this.
- Log in or register to post comments
TopSee Att. retested Atiny85 BCD UnPacked version.
Result in: R20:R19:R18:R17:R16 ( R0 is also used )
Max 74 & Min 56 Cycles and measured AVG=65.18
Used Code Segment memory 76 Words
The Unpacked version is less or equal than 74 Cycles.
Above modifications in Red, due to:
mov r0,r17 ;
add r0,r0 ; These are replacement for MUL ( +3 clk )
add r0,r17 ; as suggested by
add r0,r0 ; https://www.avrfreaks.net/users/...
sbc r19,r19 ; r19 = minus carry
sub R17,r19 ; add carry
Attachment(s):
- Log in or register to post comments
TopIf this code
is replaced by
and nothing gets broken (I didn't check) a cycle may be saved.
- Log in or register to post comments
TopI checked. This is finesse!
- Log in or register to post comments
Top47.86 clk
My Second Version on Two Bytes AVG under 50 Cycles. (AVRe)
Tested.
; 3529736 Counter with program
; 393215 Counter run empty
; 3136520 Diffrence
; 47.86 Diffrence/65536 My Second Under 50 Cyles Average!
; uses 7 MUL's so it should be 43 Words length Min/Max 47/50
It was indeed a welcome pause to do the:
Attachment(s):
- Log in or register to post comments
TopPresumably, things like itoa() and printf() must be doing basically this - so has anyone looked at their implementations ... ?
Top Tips:
- Log in or register to post comments
TopNo.
We are speaking here about 65 ns (at 16MHz)! <-- There, in red, It was mistakenly written "S".
This is, I have to do it "because it is there" (was first uttered by Edmund Hillary when he and Tenzing Norgay conquered Mount Everest in 1953).
Added later: May be, some should see my "/1000" algo.
- Log in or register to post comments
Toplowercase 's' for seconds.
(uppercase 'S' = siemens)
https://www.avrfreaks.net/commen...
Top Tips:
- Log in or register to post comments
Tops= second. I do apologize for any inconvenience.
- Log in or register to post comments
TopSeven for the Dwarven Lords, in their halls of stone.
- Log in or register to post comments
TopI deleted the original post.
It was inappropriate.
- Log in or register to post comments
TopSeven Units for Seven Quantities ... ?
Top Tips:
- Log in or register to post comments
Tophere was the answer back in 2016
https://www.avrfreaks.net/commen...
So itoa() take about 875clk for 12345
(And since AVR's with and without HW take about the same time I guess it use a normal div routine.)
But perhaps the libs have changed.
Add:
if it's a signed or unsigned version don't really make a difference, (neg16 take 3 clk)
- Log in or register to post comments
TopSo just the binary-to-BCD part should be a little less?
Top Tips:
- Log in or register to post comments
Topcited from "Eurythmics, Annie Lennox, Dave Stewart - Sweet Dreams (Are Made Of This) (Official Video)".
- Log in or register to post comments
TopThe core code (itoa leads to ultoa_ncheck) is here:
http://svn.savannah.gnu.org/view...
- Log in or register to post comments
Topbut that code use 32bit , there should be a 16bit version !?
- Log in or register to post comments
TopFor the sake of the argument,
let us say that the Assembly Language is like a motorcycle
and other programming languages are like a car.
Now, if you are on a motorcycle you know that drivers may fail to see motorcycles in plain sight.
You may feel you want to ride a motorcycle or not,
but for me being on a motorcycle there is no point in arguing with someone who may fail to see me.
- Log in or register to post comments
TopI have a 4-byte(32bit unsigned) version of about 170clk; hope I'll finish bugs and testing in the next days.
Is there realy a need for numbers larger than 256^4-1 = 4_294_967_295 ?
This post is about speed and anyone may question the need for speed.
As I've already already said,
nobody's compelled to do caving discovery or mind entertainment exercises.
- Log in or register to post comments
TopPages