Mixing C with ASM Assembler

Go To Last Post
83 posts / 0 new

Pages

Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sort of related to another current topic....

I have an ASM realtime application that I want to extend to use bilinear interpolation (http://en.wikipedia.org/wiki/Bil...) which looks a bit tough in ASM from scratch!

So my idea is to code the BI in C (I'm a complete novice C programmer and I hate it and it's environment passionately), grab the relevant bits of ASM and import it into my ASM. I'll obviously have to decode the ASM a little to work out the registers/memory location interfaces to and from it

Any mods to C code (for debugging etc) might be quite tortuous to re-inport into the ASM.

Do I have a chance and if so, what C compiler should I use? Or even BASCOM?

Or has anyone got any ASM BI implementations they will sell for virtual beer money :P

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Most of the compilers/IDEs allow C and asm source files to be mixed and built separately then linked together so why consider the step of writing C, getting ASM output and then using this in an ASM only project? As you say, any bug fix in the C will involve tortuous reprocessing of the output again. Why not just use a combined C and ASM project in the first place and maintain the C because, if you write a BI in it, then you are probably going to get quite good at writing/debugging C anyway!

I won't start a compiler war (well OK, maybe a little bit) but I will point out that for a "toe in the water" experiment then WinAVR that includes avr-gcc is a cheap place to start! Also, if you are already doing all your ASM work within the AVR Studio IDE then it's the one compiler that will intergrate into that currently so that you can continue working in an editing/building environment you are already familiar with. However, you will have to port your existing ASM into the GCC format from the Atmel format if you are going to have a mixed C/ASM project.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Call me old-fashioned (Chorus:"You're old-fashioned!"), but I've got several thousand lines of fantastically structured and commented ASM in multiple source files where I know exactly what is going on in every register at any point in time. And I need it to be like that, due to the realtime nature of the application and what might go wrong in the thing it's controlling if there is a software fault.

I'm just not convinced that I should port it (does that mean putting _ASM in front of every line of assembler I have?) to be a 2% C and 98% ASM project just because I want some BI functionality

Here's a typical linear interpretation example requirement (BI is 'just' LI done twice), in pseudo-ASM, if anyone wants to work out some ASM reference code for me!

r16 = 24; x1 value
r17 = 50; x2 value
r19, r18 = 0x24EE ; y1 value - 9454 decimal
r20, r21 = 0x37A1; y2 value - 14241 decimal
r22 = 47; x value at which y should be calculated  

should result in

r25, r24 = 0x3578; y value - 13688 decimal

Game on, for those with some spare time on their hands - probably based on the AVR202 app note utility routines?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think I know what needs to be done and you want it in assembler right?
How long before it's needed?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you have any integer math functions written already or are you really starting from scratch?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

RetroDan wrote:
Do you have any integer math functions written already or are you really starting from scratch?

Well there's quite a lot in the basic AVR instruction set!

Starting from scratch in general - you may find Atmel's Application Note #202 a useful starting point.

You don't even need an AVR to do this - just write it in AVR Studio and use the debugger/simulator to watch/check the program's behaviour.

Must have an optimal (ie short) run time - ie must use a minimum of clock cycles.

You gonna have a go for me? :D

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

RetroDan wrote:
I think I know what needs to be done and you want it in assembler right?
How long before it's needed?

OK, yep and yesterday.

By the way, there's some interesting test cases where the y values are much closer together than the example above.....

EDIT: ...and techniques from Application Note #200 will be needed as well!
EDIT2:...ATMega16 is the target

Last Edited: Thu. Mar 2, 2006 - 02:55 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I ran through the math and found that whatever routine you used for the above is introducing a small rounding error. The true answer for Y at X=47 is 16688.75 which was truncated instead of being rounded up to 13689. It's a small error but they can add up over a long period. It was one little rounding error that caused the loss of a mars mission.

Just to be sure I calculated using the same data as above: Y would be 12399
if X were 40 (actual answer is 12399.85) does that jive with whatever software your using now?
Shouldn't be too hard to convert to assembler. I wouldn't resort to matching up brackets and looking for null pointers just yet.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry, I'm a newbie to AVRs but they seem fairly simple, where can I grab those app notes?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

RetroDan wrote:
I ran through the math and found that whatever routine you used for the above is introducing a small rounding error.

It was the Windows calculator. Rounding to an integer answer either way is fine

RetroDan wrote:
It was one little rounding error that caused the loss of a mars mission.

Luckily I'm not doing that!

RetroDan wrote:
Just to be sure I calculated using the same data as above: Y would be 12399 if X were 40 (actual answer is 12399.85) does that jive with whatever software your using now?

Yep, exactly

RetroDan wrote:
Shouldn't be too hard to convert to assembler.

Agreed, but that's the challenge. If your solution finds it's way into my application you will be acknowledged!

RetroDan wrote:
I wouldn't resort to matching up brackets ...

Don't know what you mean....

RetroDan wrote:
...and looking for null pointers just yet.

It's basic ASM that's required. Hopefully you won't be needing pointers....

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

BTW I've got one here in assembler I did years ago, but it's not in AVR code.
I'll look it over and see if I can remember what the heck I was doing. If I can figure out this code shouldn't be too hard to convert. I didn't comment it, very rare from me, must have been in big hurry for it. If I remember there was some tricky business with negative slopes though.
What this for? Automobile Accelerations? Maps?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

RetroDan wrote:
Sorry, I'm a newbie to AVRs but they seem fairly simple, where can I grab those app notes?

http://www.atmel.com/dyn/product...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Geez, I been up all night... would be nice to bite right into this one.
Sorry, years ago I was right into C, before they had all these neat debuggers and stuff and found I was spending all my debugging time matching-up the braces and checking for null pointers in C. If you ever give it a try you'll see what I mean, but then maybe not, I think a lot of the new editors will match-up the braces for you,

Anyhow, what is the objective (that's not the right word...fatigue is setting in) do you need it to be smallest, fastest, most accurate, etc.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Okay I'm looking over those notes.... beautiful they should save a lot of time.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Priority 1 - Understandability/maintainability
Priority 2 - speed
Priority 3 - size
Priority 4 - accuracy...which might mean that novel solutions are applicable rather than all that math

The x values are always 8-bit, and x1 <= x2 always
The y values are always 16-bit, ...and yes, you reminded me, negative slopes need to be handled as well ie y1 > y2

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Okay so you want it well documented and well structured.

Geez, I'm gonna grab a coffee and see how long I can last.
If you'd a caught me yesterday I'd probably be done by now I been up
all night just screwing around with AVR assembler.

Send over some bennies will ya!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MartinM57 wrote:
Priority 1 - Understandability/maintainability
Priority 2 - speed
Priority 3 - size
Priority 4 - accuracy...which might mean that novel solutions are applicable rather than all that math

The x values are always 8-bit, and x1 <= x2 always
The y values are always 16-bit, ...and yes, you reminded me, negative slopes need to be handled as well ie y1 > y2

Well I'll take a crack at it but no promises.
I say we get a working version then I'll take a crack at optimizing it, that what I really like doing.

X is always 8 bits and your SURE that X1 will always be less than X2?
That might offer a some short-cuts.
Are the values going to fluctuating all over the place or are they going to be changing gradually within a certain range?

Do they use the full 8 and 16 bit range or do they fluctuate in a subset.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh ya, we're dealing with UNSIGNED integer here correct? 255 does mean -127 does it?
There's no negative numbers involved correct?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

<<X is always 8 bits and your SURE that X1 will always be less than X2?>>
Yes and yes

<<Are the values going to fluctuating all over the place or are they going to be changing gradually within a certain range? >>
Doesn't matter - I just need to be able to set up the registers as in my example and then just call the routine

<<Do they use the full 8 and 16 bit range or do they fluctuate in a subset>>
Full range on both

<<we're dealing with UNSIGNED integer here correct? 255 does mean -127 does it?>>
Yes and no - unsigned throughout and 255 means 255

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just out of interest, 255 would generally mean -1 in an 8 bit signed context.

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think -127 is the One's Compliment and -1 is the Two's Compliment.
Two's compliment is usually used because the routines are easier.
Of course it could just be my fatigue.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Few quick questions. Do I have to preserve all those registers or can I say overwrite X1,X2 etc during caclulations. Also are there registers I can use that are NOT being used elsewhere I can save & restore everything but that of course will increase size and slow down things.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Overwrite anything you want (within reason!) EDIT r0 to r15 are generally unused so you could have all of them (but note some ASM instructions don't work on these registers)

You could also use some SRAM if you want to

And if you want any constants, there's some room for them in the program memory/Flash

Got me a delivery date yet? 8)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

RetroDan wrote:
I think -127 is the One's Compliment and -1 is the Two's Compliment.
Two's compliment is usually used because the routines are easier.
Of course it could just be my fatigue.

I think it must be your fatigue.
One's complement of 255 is 0 (in eight bit context).
And please note the spelling.
Anyway, I'll go away now. Two's complement three's a crowd.

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

UPDATE: Sorry, I just had to go crash, it was the fatigie (Up been up since Monday)
Too bad this is on a new machine using a new editor or I think I'd be at least debugging
it by now. I've got the basic routine done but can't seem to get it to compile I keep getting a compile error I couldn't nail down so I crashed for a few hours.

It's very small (under 40 lines of code) but I took so many short-cuts I''m worried about accuracy.

I'm gonna' grab a coffee and start in on it as soon as my head clears.

You familiar with AVR coding? A few quick tips might really speed-up things. For instance I have been avoiding the use of r0 to r15 because I notice every sample code I see only uses r16-r32 what is the difference? What can't be done in those?

Also, if I'm declaring word-wide registers are they LSB:MSB or MSD:LSB

Hmm, what else.... oops kettle's boilling... gotta run.

TTYL

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh ya, you wanna' send me a private message with your email we could continue this offline?
Assuming, I can figure out how to read private messages.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You have a PM!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

UPDATE:

The first trials with 8 bit math were abysmal. The truncation errors made results unusable.
I figured that would happen, but I wanted to try it first because it would be very small and super fast.

I expanded the math routines and although it goes awry sometimes, when it behaves itself the errors are in the 0.03 percent area which should be suitable on an output of the 0-65,000 range you need. Which I think is very good without resorting to Floating-Point Routines that would expand the code and slow it down a lot. At the moment the code is under 150 bytes.

As soon as I can cage the code and make it behave I'll start optimizing for speed and you'll have a very tight, very fast useable routine in assmbler without resorting to C.

TTFN

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And, as always, I'll re-iterate the challenge: Post the result, the requirements, and the criteria that you are crowing about.

[I'm surprised that you did not write it in FORTRAN, as that translates directly to optimal machine code.]

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I believe the criteria appears in the first message of this thread.

Correct,Returned,Error
10559,10557,2
13688,13680,8
14241,14241,0
23447,23441,6
32653,32641,6
51064,51059,5
51985,51976,9

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
As soon as I can cage the code and make it behave I'll start optimizing for speed and you'll have a very tight, very fast useable routine in assmbler without resorting to C.

How you getting on Dan?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Almost there. Upto about 250 lines of code but it should be fast... short-fast math routines. When done it should extrapolate as well as interpolate.

Should have something for you to start testing shortly, where should I send it?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

RetroDan wrote:
Almost there. ...When done it should extrapolate as well as interpolate.

Should have something for you to start testing shortly, where should I send it?

Great.

Unless the extrapolate is for free then not needed - I don't want bloat-ware!

Send to email address(es) please....

This might be an interesting test case:
x1 = 24
x2 = 28
y1 = 17
y2 = 19
x = 26

The answer should be 18, of course, and any error is unacceptable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think my email is moving again (darn ISPs). Talk to you there.

PS. The extrapolation turns out to be a coding "free-bee"

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I hope nobody minds, I used this challenge to experment with my coding style.

This Interpolation function is roughly optimized for size. (~50 words)

Requires avrasm2.exe

Enjoy.
--Brian

.include "m8def.inc"


/* Interpolate has been tested for the following cases:
xi	x1	x2	y1	y2	result	hex-result
------------------------------------------------------------
47	24	50	9454	14241	13689	3579
47	24	50	14241	9454	10006	2716
0	0	255	0	65535	0	0
255	0	255	0	65535	65535	FFFF
0	0	255	65535	0	65535	FFFF
255	0	255	65535	0	0	0
203	199	210	333	43007	15851	3DEB
203	199	210	43007	333	27489	6B61
26	24	28	17	19	18	12
*/

.equ TestXI = 47
.equ TestX1 = 24
.equ TestX2 = 50
.equ TestY1 = 9454
.equ TestY2 = 14241
.equ TestResult = 13689


/******************
Interpolate Function Interface
---
This should be moved to a header/include file
*/

#define DefInterpolateParams 	\
.def X1 =	R16		\
.def X2 =	R17		\
.def Y1B0 =	R18		\
.def Y1B1 =	R19		\
.def Y2B0 =	R20		\
.def Y2B1 = 	R21		\
.def XI = 	R22		

#define UndefInterpolateParams	\
.undef X1			\
.undef X2			\
.undef Y1B0			\
.undef Y1B1			\
.undef Y2B0			\
.undef Y2B1			\
.undef XI		

#define DefInterpolateResult	\
.def ResultB0=R18		\
.def ResultB1=R19


/******************
Simple test routine for Interpolate
*/
	;setup stack
	ldi R16, LOW(RAMEND)
	out SPL, R16
	ldi R16, HIGH(RAMEND)
	out SPH, R16

	;setup special regs
.def zero = R15
.def one = R14
	clr zero
	clr one
	inc one

	;Test Interpolate
	DefInterpolateParams
	ldi X1, TestX1
	ldi X2, TestX2
	ldi Y1B0, LOW(TestY1) 
	ldi Y1B1, HIGH(TestY1) 
	ldi Y2B0, LOW(TestY2) 
	ldi Y2B1, HIGH(TestY2) 
	ldi XI, TestXI
	rcall Interpolate
	UndefInterpolateParams
	DefInterpolateResult

	;Make sure that Interpolate result matches expectations
	cpi ResultB0, LOW(TestResult)
	brne BadResult
	cpi ResultB1, HIGH(TestResult)
	brne BadResult
.undef ResultB0
.undef ResultB1


GoodResult: rjmp GoodResult


BadResult: rjmp BadResult


/******************
Interpolate -- Calculates: y = (y2-y1) * (x-x1) / (x2-x1) + y1  rounded to nearest integer
----
Parameters: (all unsigned)
 R16 = x1
 R17 = x2
 R18/19 = y1 (low/high)
 R20/21 = y2 (low/high)
 R22 = xi
 zero = 0
 one = 1

Result:
 R18/19 = y (unsigned)

Regs clobbered:
 R0-R12
 R16-R22

Notes:
-The result is rounded to the nearest integer in direction of the (y2-y1) slope; Negative -> round down; Positive -> round up.
-The function size can be reduced by 50% if a shared unsigned 24bit/8bit divide is available
*/


Interpolate:


; *** Parameter Register Definition ***
	DefInterpolateParams


; *** Parameter Validation ***

; TODO: Assert that (x1!=x2)
; TODO: Assert that (x1 <= x <= x2)


; *** Calculate Y_Range ***
	sub Y2B0, Y1B0
	sbc Y2B1, Y1B1
.undef Y2B0
.undef Y2B1
.def Y_RangeB0 = R20
.def Y_RangeB1 = R21

	clt					; clear inverted Y_Range flag

	;Ensure that Y_Range is positive
	brcc ip_yNotNeg

	set					; set inverted Y_Range flag

	;invert Y_Range
	neg Y_RangeB0
	adc Y_RangeB1, zero			; = Y_RangeB1 + borrow
	neg Y_RangeB1				; = -Y_RangeB1 - borrow

ip_yNotNeg:


; *** Calculate X_Offset ***
	sub XI, X1				; X_Offset = XI - X1
.undef XI
.def X_Offset = R22


; *** Calculate Quotient(24b) = Y_Range(16b) * X_Offset(8b) ***
.def QuotientB0 = R2
.def QuotientB1 = R3
	mul X_Offset, Y_RangeB0
	movw QuotientB0, R0

	mul X_Offset, Y_RangeB1
.def QuotientB2 = R1
	
	add QuotientB1, R0
	adc QuotientB2, zero
	

; *** Calculate trunc(abs(Result)) = Quotient / X_Range ***

	;Divisor = ( X_Range = (x2-x1) ) << 16
	sub X2, X1
.undef X2
.undef X1
.def DivisorB2 = R17


	;clear Divisor low-word
.def DivisorB0 = R6
.def DivisorB1 = R7
	clr DivisorB0
	clr DivisorB1


	;clear Dividend	
.def DividendB0 = R8				;Dividend will always be <= 16bits, because (xi-x1)<=(x2-x1)
.def DividendB1 = R9
	movw DividendB0, DivisorB0


.def DivCount = R16
	ldi DivCount, 16

; *** do {
ip_DivLoop:
	; Divisor is shifted before the first comparison; That's ok because the Dividend will be <= 16bits.
	lsr DivisorB2				
	ror DivisorB1
	ror DivisorB0

	;Diff = Quotient - Divisor
.def DiffB0 = R10
.def DiffB1 = R11
.def DiffB2 = R12

	movw DiffB0, QuotientB0		; Diff = Quotient
	mov DiffB2, QuotientB2

	sub DiffB0, DivisorB0		; Diff -= Divisor
	sbc DiffB1, DivisorB1
	sbc DiffB2, DivisorB2

	;if( Diff>=0 )  Quotient = Diff
	brcs ip_ZeroBit

	movw QuotientB0, DiffB0
	mov QuotientB2, DiffB2

ip_ZeroBit:
	; ResultBit = ~ borrow( Quotient - Divisor ) 
	rol DividendB0			; Bit = borrow(Quotient - Divisor)
	rol DividendB1

	eor DividendB0, one		; Bit = ~ borrow( Quotient - Divisor )

; *** } while (--DivCount)
	dec DivCount
	brne ip_DivLoop

.undef DivisorB1
.undef DivisorB2
.undef DiffB0
.undef DiffB1
.undef DiffB2
.undef DivCount	

.undef QuotientB0	
.undef QuotientB1
.undef QuotientB2
.def Remainder = R2	;[QuotientB0]


; *** Round Dividend ***
	; Round Division result away from zero  ( round-up, if Remainder >= (divisor+1)/2

	lsr DivisorB0			; C = Divisor LSB
	
	cpc Remainder, DivisorB0	;  = Remainder - ( Divisor/2 + DivisorLSB )
.undef Remainder
.undef DivisorB0

	; if ( !borrow ) Result += 1;
	brcs ip_noBorrow
	add DividendB0, one
	adc DividendB1, zero
ip_noBorrow:


; *** Invert Dividend if needed ***
	brtc ip_noFinalInvert

	neg DividendB0
	adc DividendB1, zero		; = DividendB1 + borrow
	neg DividendB1			; = -DividendB1 - borrow
ip_noFinalInvert:


; *** Result = Y1 + Dividend ***
	add Y1B0, DividendB0
	adc Y1B1, DividendB1
.undef Y1B0
.undef Y1B1
.undef DividendB0
.undef DividendB1
;.def ResultB0=R18
;.def ResultB1=R19

	ret

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I came up with something very similar to yours, bcb. However, I think that my divide routine is more efficient than yours. I took advantage of the fact that we know that the divisor is 8 bits or less. Because of that you can do just one subtract instead of 3, then shift the quotient left instead of the divisor right. It also allows you to use a compare before actually subtracting, eliminating the need for all that moving back and forth.

Once I've written it up more formally and tested it, I'll post it. Also, I didn't do the round at the end, so I'll add that.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nice catch Steve!

After glancing at my code above, I'm guessing that would save 4 moves and 2 subtractions in the ip_DivLoop.

And it'll free 5 regs by eliminating Diff and shrinking Divisor.

--Brian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There is also one optimization that might be done, but it depends entirely on the data set. If you know that the maximum difference between any two y values is, for instance, a 12 bit value, then you know that the dividend will also be at most a 12 bit value since you are dividing by a number that is guaranteed to be >= to the one you multiplied by. So you could shift the quotient by four first, then do the loop only 12 times. In fact, you could even shift before the multiply, so you are shifting a 16 bit number instead of a 24 bit one. Of course this makes it less general, but it could be done for specific data sets.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As a newbie AVR programmer, here is my two cents worth...

At the end of the divide you have something similar to this:

ZBIT:   ROL ANS         ;inside of
          ROL ANSH       ;the loop
          EOR ANS,ONE ;<==
            DEC N
              BRNE LOOPIE

Rather than invert the incomming bits how about...

ZBIT:   ROL ANS
          ROL ANSH
          DEC N
            BRNE LOOPIE
          COM ANS     ;<== outside
          COM ANSH   ;<==

This way we invert the final result instead. This take the EOR out of the loop and saves clocks.

We also no longer require the ONE register and the two steps required to initialize it:

          CLR ONE
          INC ONE
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
...we know that the divisor is 8 bits or less. Because of that you can do just one subtract instead of 3, then shift the quotient left instead of the divisor right. It also allows you to use a compare before actually subtracting, eliminating...

Wow, I'd really like to see that modification. That should really speed-up the loop.

Here's an additional challenge for those smart than me. I've often wondered if an 16x8 or 24x8 (or even a 32bit) couldn't be done from the opposite "perspective" and just loop 8 times doing everything twice (or three times for 24 bit). Theoretically it could be faster because of only 8 DECs, and looping-back (about 4 clocks right?) as versus 16 times for a potential savings of 32 clocks just from the loop's own overhead on a 16x8 division and 64 on a 24bit (96 for 32bit).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

JUST A QUICK NOTE:

Although I did not analyse why...

I tried just removing the one or two "extra" SUBs from inner loop and the routine fails.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Whoops, I think I found a bug in the above program. Perhaps it's just my version...

Here's what I tried:
X1=1, X2=250, X=200 ;so X1<=X<=X2
Y1=1, Y2=14241
Correct Anwser is: 11,380
Above code gives: 327

Second Test:
X1=10,X2=220,X=200
Y1=25,Y2=15000
Correct Answer: 13,573
Above gives: 154

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Dan, I think it's just your version. :)

The Interpolation function from my first post gives 11382 and 13574 for those two tests. The full answers should be 11381.56 and 13573.81.

--Brian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You're right about complementing the final Dividend instead of inverting each carry.

I went ahead and made the changes suggested by you and Steve.

Here are some stats. v2 uses Steves idea and v3 uses both ideas.

Ver FuncWords Cycles CyclesSaved %faster
v1		47		293			
v2		43		244		49		16.72
v3		44		230		14		05.73

(Using x=47, x1=24, x2=50, y1=9454, and y2=14241)

And here's the new function:

Interpolate:

; *** Parameter Register Definition ***
	DefInterpolateParams


; *** Parameter Validation ***

; TODO: Assert that (x1!=x2)
; TODO: Assert that (x1 <= x <= x2)


; *** Calculate Y_Range ***
	sub Y2B0, Y1B0
	sbc Y2B1, Y1B1
.undef Y2B0
.undef Y2B1
.def Y_RangeB0 = R20
.def Y_RangeB1 = R21

	clt					; clear inverted Y_Range flag

	;Ensure that Y_Range is positive
	brcc ip_yNotNeg

	set					; set inverted Y_Range flag

	;invert Y_Range
	neg Y_RangeB0
	adc Y_RangeB1, zero			; = Y_RangeB1 + borrow
	neg Y_RangeB1				; = -Y_RangeB1 - borrow

ip_yNotNeg:


; *** Calculate X_Offset ***
	sub XI, X1				; X_Offset = XI - X1
.undef XI
.def X_Offset = R22


; *** Calculate Quotient(24b) = Y_Range(16b) * X_Offset(8b) ***
.def QuotientB0 = R2
.def QuotientB1 = R3
	mul X_Offset, Y_RangeB0
	movw QuotientB0, R0

	mul X_Offset, Y_RangeB1
.def QuotientB2 = R1
.undef Y_RangeB0
.undef Y_RangeB1
	
	add QuotientB1, R0
	adc QuotientB2, zero
	

; *** Calculate trunc(abs(Dividend)) = Quotient / X_Range ***

	;Divisor = ( X_Range = (x2-x1) ) << 16
	sub X2, X1
.undef X2
.undef X1
.def DivisorB2 = R17				;Divisor B0 & B1 are not stored because they equal 0

	;clear Dividend	
.def DividendB0 = R20				;Dividend will always be <= 16bits, because (xi-x1)<=(x2-x1)
.def DividendB1 = R21
	clr DividendB0
	clr DividendB1


	;setup ip_DivLoop
.def DivCount = R16
	ldi DivCount, 16

; *** do {
ip_DivLoop:
	; Quotient is shifted before the first comparison; That's ok because the initial Quotient is known to be smaller than the Divisor
	lsl QuotientB0
	rol QuotientB1
	rol QuotientB2		;High bit is shifted into carry

	brcs ip_BigQuotient	;if C=1, Quotient is guaranteed to be larger than Divisor

	;Determine if Quotient is smaller than Divisor
	cp QuotientB2, DivisorB2
	
	;if( QuotientB2 < DividendB2 ) then don't do subtraction
	brcs ip_DontSub

ip_BigQuotient:
	sub QuotientB2, DivisorB2	;Carry is unknown after this
	clc				;Ensure that Carry=0 after a successful subtraction

ip_DontSub:	;if this branch _was_ taken, carry is set

	; Roll ResultBit into Dividend
	;  assumes:
	;   - that C=1, after an unsuccessful subtraction
	;   - that C=0, after a successful subtraction
	;  Therefore the dividend must be complemented to get the final answer
	rol DividendB0			; ResultBit = C
	rol DividendB1

; *** } while (--DivCount)
	dec DivCount
	brne ip_DivLoop

	; Complement the Dividend to get the Final result
	com DividendB0
	com DividendB1

.undef DivCount	

.undef QuotientB0	
.undef QuotientB1
.undef QuotientB2
.def Remainder = R1  ;[QuotientB2]


; *** Round Dividend ***
	; Round Division result away from zero  ( round-up, if Remainder >= (divisor+1)/2

	lsr DivisorB2			; C = Divisor LSB
	
	cpc Remainder, DivisorB2	;  = Remainder - ( Divisor/2 + DivisorLSB )
.undef Remainder
.undef DivisorB2

	; if ( !borrow ) Result += 1;
	brcs ip_noBorrow
	subi DividendB0, LOW(-1)
	sbci DividendB1, HIGH(-1)
ip_noBorrow:


; *** Invert Dividend if needed ***
	brtc ip_noFinalInvert

	neg DividendB0
	adc DividendB1, zero		; = DividendB1 + borrow
	neg DividendB1			; = -DividendB1 - borrow
ip_noFinalInvert:


; *** Result = Y1 + Dividend ***
	add Y1B0, DividendB0
	adc Y1B1, DividendB1
.undef Y1B0
.undef Y1B1
.undef DividendB0
.undef DividendB1
;.def ResultB0=R18
;.def ResultB1=R19

	ret

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

FYI.. I wasn't able to implement Steve's idea as well as I would have liked.

It required special handling for Quotients with the high bit set ( >=16777216 ). That required adding the "brcs ip_BigQuotient" and "clc" inside of the division loop.

If you find a way around that, let me know! :)

--Brian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think I still see that bug in your code.

I predict it will fail when Y! and Y2 are far apart and a big difference between X and X1.

I still see the problem in the coding of your second version, however I adjusted my version here to account for that and the bug went away.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I see another optimization. At the end of the divide you convert the number from Ones Compliment to a Positive Number then check the negation flag and then reconvert it to Two's Compliment if Slope was negative:

D248SKP: ROL ANS ;TAIL-END OF DIVISION
         ROL ANSH      
         DEC N            ;DECREMENTING COUNTER
          BRNE D248LP  ;LOOPING BACK

;DONE: WE HAVE ONES COMP IN ANS
          BRTS MKNEG
MKPOS:   COM ANS       ;WE COLLECTED INVERSE CARRYS
         COM ANSH       ;SO LETS INVERT
          JMP R248XT
MKNEG:   INC ANS
         ADC ANSH,ZERO
R248XT:  ADD   ANS,Y1   ;DO THE FINAL ADDITION     
	     ADC   ANSH,Y1H
         RET

This should save 3 program steps for every negative number but ads an extra jump for a positive number.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was just looking at that myself. I did find a slightly better way of doing it. In your solution it takes 12 or 13 clocks per loop, with the 13 being when you can subtract and the high bit is not set. My solution gives 12 clocks per loop in all circumstances, with the penalty of basically duplicating the second half of the loop.

Of course if you can insure that the maximum difference between x2 and x1 is 127, this is not needed, shortening the loop to a constant 11.

One thing I noticed in your code, you are clearing the Dividend before using. This is unnecessary since the whole thing is being replaced anyways.

The only other real difference between our code is that I did a very clever (and therefore obtuse) thing with the round and negate section. Since 2's complement is 1's complement + 1, and we have a 1's complement to do and possibly a +1 with rounding, I mushed it all up. Doing so saved unbelievably one whole command and one whole clock cycle! Here's my code. See if you can follow it.

.def x1 = r16
.def x2 = r17
.def y1L = r18
.def y1H = r19
.def y2L = r20
.def y2H = r21
.def x0 = r23

.def ansL = r24
.def ansH = r25

.def RA = r2
.def RB = r3
.def RC = r1

Interp:
	clr r4			;set r4 to 0 so we can use it later
	clt			;clear neg flag
;find (x - x1)
	sub x0, x1		;subtract x1 from x
;find (y2 - y1)
	sub y2L, y1L		;subtract y1 from y2	
	sbc y2H, y1H
	brge NotNeg		;branch if we are not negative
;y dif is negative, so negate and set flag
	neg y2L		;negate yDif
	adc y2H, r4		;adjust high byte for carry
	neg y2H
	set			;set neg flag
NotNeg:
;find (x dif * y dif) - 24 bit result
	mul y2l, x0		;multiply low byte
	movw RA, r0		;move answer to quotient
	mul y2h, x0		;multiply high byte
	add RB, r0		;add mid byte
	adc RC, r4		;adjust high byte for overflow
NoOvr:
;find (x2 - x1)
	sub x2, x1		;subtract x1 from x
	ldi x1, 1
	ldi x0, 16		;set divide count
Divide:			;24bit by 8bit divide, unsigned
	lsl RA			;roll the quotient left	
	rol RB	
	rol RC
	brcs subanyways
	cp RC, x2		;compare divisor with high byte
	brcs NotLess	 	;branch if we can't subtract
	sub RC, x2		;do the subtraction
NotLess:
	rol ansl		;roll the carry into the answer
	rol ansh
	dec x0			;decrement the loop count
	brne Divide		;branch if not done
;Round answer based on remainder and negate if we need to
;This is a clever hack. In the divide we cheated because
;the carry we shifted into the answer was the inverse of
;what you would normally want. So what we end up with is
;the complement of the answer. Since negation is complement
;and add, and round is add, we can cheat a little
;All this to save a couple of lines of code and a couple
;of clock cycles, but why be straight forward and boring
;when you can be obtuse!
	inc x2
	lsr x2			;shift dividend
	cpc x2, RC		;compare
	brcs Round		;branch if we need to round
	brtc Comp
	;c=0, t=1 (negative, no round) We already have complement, so just add 1
	adiw ansL, 1
Round:
	brts AddY1
	;c =1, t = 0 (positive, round) We need to subtract 1 and fall into complement
	sbiw ansL, 1
Comp: ;c=0, t=0	(positive, no round) We need to complement
	com ansL
	com ansH
	;c=1, t=0 (negative, round) We already have the complement, and the add and round cancel, so do nothing
AddY1:
	add ansL, y1L
	adc ansH, y1H
	ret

subanyways:
	sub RC, x2		;do the subtraction	
	rol ansl		;roll the carry into the answer
	rol ansh
	eor ansl, x1
	dec x0			;decrement the loop count
	brne Divide		;branch if not done

Total clocks - min 222, max 236
Total words - 48

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Another idea would be to convert to postive and after checking for negative slope rather than negating the number just go straight to a subtraction instead of an addition

           COM TMP     ;CONVERT TO POS
           COM TMPH
             BRTS SUBEM
ADDEM: ADD Y1,TMP
             ADC YH,TMPH
               RET
SUBEM: SUB Y1,TMP
              SBC Y1H,TMPH
                RET

With this method there is no penalty for a negative number except for the BRTS jump. However, it adds an extra exit-point that some might consider bad form.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oops, wrong version of the code. After the last line it should have:

rjmp RoundAndNegate

with RoundAndNegate: just after the divide loop.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Please excuse me if I'm wrong, I've only been into the ASM coding for about a week...

I think I see the same bug in your code Steve.

Should I tell you what fixed my version or do you want to spot it yourself?
Assuming of course this newbie isn't making some sort of stupid mistake.

Pages