Sum of many 2-byte signed numbers?

Go To Last Post
30 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How can one keep a running sum of many two-byte signed integers?

 

The path taken thus far has been to determine if the first signed integer  is negative. If so, it is placed in the lower two bytes of the total and  the upper three bytes of the total are filled with 0xFF FF FF.  If found to be positive it is just placed in the lower two bytes.

 

From that point on, the lower bytes are added.  If the Vflag is clear, nothing is done.  Where I get into trouble is when the Vflag is set (when I must modify the next higher byte(s)).

 

Is there a way to properly handle handle the overflow based on the other S-register flags?

 

Moderator:  If this is not the proper forum for this message, please transfer it.

 

Last Edited: Mon. May 1, 2017 - 05:58 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think you are making it too hard.

 

-- Start with 0.  No need to examine the first number for signedness.

-- Add each new number to the total.  No need to examine either the number or total for signedness.

 

How many 16-bit numbers are you going to add to the total?  If up to e.g. 256, then use a 24-bit or 32-bit "accumulator".  32-bit accumulator would handle up to 64K 16-bit numbers with max value.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 Add each new number to the total.  No need to examine either the number or total for signedness.

  Perhaps the unstated question (or answer) is that each sum must simply propagate over ALL 5 bytes (even though only 2 byte numbers are being added)..then the current sum is represented by the 5 byte total. 

 

 add sum0, low_byte    

 adc sum1, high_byte 

 adc sum2, zero   (some register set to zero)

 adc sum3, zero

 adc sum4, zero

 

for signed numbers just extend the leading ones

 

  

reset:  clr sum0
  	clr sum1
  	clr sum2
  	clr sum3
  	clr sum4
 
  	ldi ZL, low(-23555)
  	ldi ZH, high(-23555)
  	rcall addup

  	ldi ZL, low(30555)
  	ldi ZH, high(30555)
  	rcall addup

  	ldi ZL, low(-12345)
  	ldi ZH, high(-12345)
  	rcall addup 
 
done:   nop
	rjmp done

 
;  zh:zl contains number to add
addup:  clr extend  ;for positive value
   	sbrc ZH,7   ;check if neg value
   	ser extend  ;extend leading ones for negative values

	add sum0, ZL
  	adc sum1, ZH 
	adc sum2, extend
  	adc sum3, extend
  	adc sum4, extend
  	ret

 

 

 

 

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Tue. May 2, 2017 - 07:45 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avrcandies wrote:
then the current sum is represented by the 5 byte total.

Indeed, OP's mention of 5 bytes is interesting.  That is 2**24 of max value (actually 2**25 as it is signed so max is 2**15) or 16 million to 32 million "samples".  If the typical value is e.g. 1/10 of max then 10x more, etc.

 

I guess OP's mention of the V and S flags implies assembler.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

One way is to simply sign-extend each 2-byte value to 5 bytes.

Another is to have separate sums for positive and negative values.

Form the "sum" of the negatives by subtracting.

Subtract the "sum" of negatives from the sum of positives.

 

If efficiency is a consideration,

you might want 3- and 4-byte intermediate sums.
 

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avrcandies wrote:
for signed numbers just extend the leading ones

Aah, yes, I missed that case when I said "no fussing".

 

Just for fun, let's see what a couple compilers do adding a signed int to a signed long...

 

GCC:

volatile int addend;
volatile long accumulator;
		accumulator += addend;
  8a:	80 91 00 01 	lds	r24, 0x0100	; 0x800100 <_edata>
  8e:	90 91 01 01 	lds	r25, 0x0101	; 0x800101 <_edata+0x1>
  92:	40 91 02 01 	lds	r20, 0x0102	; 0x800102 <accumulator>
  96:	50 91 03 01 	lds	r21, 0x0103	; 0x800103 <accumulator+0x1>
  9a:	60 91 04 01 	lds	r22, 0x0104	; 0x800104 <accumulator+0x2>
  9e:	70 91 05 01 	lds	r23, 0x0105	; 0x800105 <accumulator+0x3>
  a2:	09 2e       	mov	r0, r25
  a4:	00 0c       	add	r0, r0
  a6:	aa 0b       	sbc	r26, r26
  a8:	bb 0b       	sbc	r27, r27
  aa:	84 0f       	add	r24, r20
  ac:	95 1f       	adc	r25, r21
  ae:	a6 1f       	adc	r26, r22
  b0:	b7 1f       	adc	r27, r23
  b2:	80 93 02 01 	sts	0x0102, r24	; 0x800102 <accumulator>
  b6:	90 93 03 01 	sts	0x0103, r25	; 0x800103 <accumulator+0x1>
  ba:	a0 93 04 01 	sts	0x0104, r26	; 0x800104 <accumulator+0x2>
  be:	b0 93 05 01 	sts	0x0105, r27	; 0x800105 <accumulator+0x3>

 

I'd have to work it out on paper, but the highlighted lines are the sign-extend?

 

CodeVision:  (CWD can be read as "Convert Word to Double-word I think")

 

00003c d010      	RCALL __CWD1
00003d d00a      	RCALL __ADDD12
...
                 __ADDD12:
000048 0fea      	ADD  R30,R26
000049 1ffb      	ADC  R31,R27
00004a 1f68      	ADC  R22,R24
00004b 1f79      	ADC  R23,R25
00004c 9508      	RET
                 
                 __CWD1:
00004d 2f6f      	MOV  R22,R31
00004e 0f66      	ADD  R22,R22
00004f 0b66      	SBC  R22,R22
000050 2f76      	MOV  R23,R22
000051 9508      	RET

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

is this for speed size or just to get it to work?

 

the easy one is like the GCC sign extend and make the 5 byte addition , (optimize have one register that is 0 and become FF if MSB of number is 1, and use that register for the last 3 adds).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

this is all you need

;  zh:zl contains number to add
addup:  clr extend  ;for positive value
    	sbrc ZH,7   ;check if neg value
    	ser extend  ;extend leading ones for negative values

 	add sum0, ZL
   	adc sum1, ZH 
 	adc sum2, extend
   	adc sum3, extend
   	adc sum4, extend
   	ret

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

On re-reading, I think OP's problem is just how to do a multibyte add.
I think OP sign-extends correctly,
but for addition he uses the V flag (overflow) instead of the C flag (carry).

; sign-extend  3 cycles
CLR A2   ; zero
SBRC A1,7  ; if negative
 DEC A2    ; 0xFF

; S += A   5 cycles
ADD S0, A0
ADC S1, A1
ADC S2, A2
ADC S3, A2  ; A3 would be same as A2
ADC S4, A2

Note that it takes 3 cycles to note the sign of the addend
and 5 cycles to do a 5-byte addition.
To do better than 8 cycles at least one will need improving.
Note that it takes at least 4 cycles just to fetch 2 bytes.

For 5 bytes to be useful, I expect that this is in a large loop.
The calculation above ignored the branching required for said loop.
Enough unrolling might make it neglible.
Nesting the loop so that the inner loop only does 3-byte
additions would remove 2 cycles from most additions.
Also, counting negative numbers instead
of adjusting A2 requires only 2 cycles.
Adjust the sum outside the inner loop.
The brings us down to 5 cycles for most additions,
4 cycles to fetch each addend and however
many cycles it took to generate each addend.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nesting the loop so that the inner loop only does 3-byte
additions would remove 2 cycles from most additions.

can you explain?  I believe you must ALWAYS add all 5 bytes, or risk chaos.  You could put in extra "skip" checks, but those would make things even longer. 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I still want to hear more about the 10s of millions of samples. ;)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avrcandies wrote:

Nesting the loop so that the inner loop only does 3-byte
additions would remove 2 cycles from most additions.

can you explain?  I believe you must ALWAYS add all 5 bytes, or risk chaos.  You could put in extra "skip" checks, but those would make things even longer. 

sum5=0
for
    sum3=0
    for ; at most 0x100 times  might need unrolling
        sum3+=addend2
    sum5+=sum3

 

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

with 8 clk for this, we need to know more about how many numbers we talk about, how big they normally are, and does the sum go up/down or will the sum be around 0.

which format do they come in, and what else does the micro do?

do you need the partial sum while you are doing this, or is this the sum of a known amount of numbers?

 

if it's the sum of 2^n amount of numbers it could be faster to handle this as positive numbers (I guess add 32768 can be done a fast way), and then add . the final result then need to be shifted but that is only once.    

 

An other way could be to add all negative numbers one place and the positive an other place. (but if that is smart depends of what is really needed)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Don't add the negative numbers separately.

Test and branch or jump is 2 or 3 cycles.

Just counting the negative numbers only requires 2 cycles.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Curious that OP seems to have disappeared - I wonder if he's not getting email notifications about this thread or something?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks everyone.  I set up for a five byte sum because I haven't decided how many additions to do.

 

I did get a notification after the first message from theusch.  The message gave me more confidence and I found I had made an error in one of the stim files I wrote.  I received no notification of the subsequent messages.

 

The goal is to add up a running sum as the data comes in and then, at some point, find the average.  The input data is in the range 0x00 to 0x1F and this is converted to two byte SIN and two byte COS signed values stored in a look up table.  The SINs and COSs are summed in two separated additions.  The average of each is then found.  The code seems to work at 128 measurements for all inputs 0x02 (22.5 degrees), for all inputs 0x1D (337.5 degrees), and for alternating inputs 0x02; 0x1D (the latter average express 0x0000 as 0x8000).

 

Yet to be written is a more complex stim file.  Also, my code may not be very efficient.  Luckily, the single byte data occurs only every 800 uS so it should get the job done.

 

As it now exists, my code uses BRCC to skip adjusting the higher bytes of the sum.  Otherwise the next higher byte is incremented by an ADC with a cleared register and a BRCC if the still higher byte can be left unmodified.

Last Edited: Wed. May 3, 2017 - 02:52 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

make a 0x20 array where you ++ (one more of this value), and then only do the rest one time.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

John3 wrote:
I set up for a five byte sum because I haven't decided how many additions to do.
theusch wrote:
I still want to hear more about the 10s of millions of samples. ;)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

make a 0x20 array where you ++ (one more of this value), and then only do the rest one time.

if you do y=sin(x), taking the avg of many x, then find sin using discrete look up is not the same as taking the avg of (many sin(x)) ,due to quantization. 

If you did an avg of many x followed by a true high res sin(x) calc, you would get the resolution.

The question is how much sin resolution is finally needed.  

 

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The final resolution should be a good as possible, but anything much over five degrees probably is overkill.  This is a hobby project to try to adapt a 1970's era radio direction finder to use with a modern day mapping program.

 

The input data is in 11.25 degree increments.  Sine and cosine values are from a table.  The table was made by multiplying by 16383 and converting to signed two-byte hex.  As each reading comes in, the sine and cosine value for it is added to the appropriate sum.  After 2^N (right now 128) the average sine and average cosine are found by right shifting the sums.

 

The quadrant is found by testing bit 7 of the high byte of the average sine and average cosine.

 

What is left is to resolve the angle represented by the sine and cosine averages.

 

The 10's of millions of samples can be limited to, perhaps, only 1024.  :)  That is approximately under one second of measurements.  It's also OK to postpone beginning a series of measurements while the result is output by the UART/USART.

 

Last Edited: Wed. May 3, 2017 - 05:18 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

linear relation:

in out

1  3

2  6

3  9

4  12

in: 1,2,3,4,1,2,2,2,2==> avg=2.111111, LUT out==>6 (not so accurate)

outs: 3,6,9,12,3,6,6,6,6==>avg=6.33333 much better

COULD interpolate 2.111 in the lookup table for same result

 

Beware when the relation is nonlinear, an extreme example:

Nonlinear square relation:

in out

1  1

2  4

3  9

4  16

in: 1,2,3,4,1,2,2,2,2==> avg=2.111111, LUT out==>4 (not so accurate)

outs:  1 4 9 16 1 4 4 4 4==>avg=5.2222 much better and correct

CAN'T interpolate 2.111 in the lookup table for same result (and 2.2111^2===4.457)

 

Bottom line---if your final answer comes straight from look up, it will be as low res as the table itself.  If the table is linear, interpolation can be used to improve.  If the table is nonlinear, then the avg of the input correlated to the avg of the output can range from good to horrible (depending upon the degree of nonlinearity).

 

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Wed. May 3, 2017 - 06:01 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No :

if we  only receive values [0..0x1f] it can't be better, just faster

 

add:

remember there is not a linear relation to the numbers!

 

You have to do this :

0x00 recerived 210 times

0x01 recerived 110 times

0x02 recerived 550 times

0x03 recerived 1260 times

 

We need to do this:

sin(0x00) *210 +sin(0x01) *110 +sin(0x02) *550 +sin(0x03) *1260.........

 

and then div with the sum (but if it a 2^n numbers it's just a shift).

It don't save much in addition we started with (the is a good gain if written in ASM), but 1000's of LUT reads will become 32 

 

(where sin (x) is a LUT)

 

 

 

Last Edited: Wed. May 3, 2017 - 05:55 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

We need to do this:

sin(0x00) *210 +sin(0x01) *110 +sin(0x02) *550 +sin(0x03) *1260.........

This approach if used "literally" is not so hot, since there are several totals to keep track of (1 total for each LUT item), plus a multiply for each.

For high accuracy it is best to take the incoming angle, look up the sin and form the answer summation then finally avg of those values.  Of course taking a million samples to reduce variance doesn't get very far if the lut values have limited accuracy themselves.  

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

John3 wrote:
The 10's of millions of samples can be limited to, perhaps, only 1024.
theusch wrote:
Indeed, OP's mention of 5 bytes is interesting. That is 2**24 of max value (actually 2**25 as it is signed so max is 2**15) or 16 million to 32 million "samples". If the typical value is e.g. 1/10 of max then 10x more, etc.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But you are missing one key point here, OP receive only the values 0..31 and nothing else! 

It will make no difference if you make the avg of  200 sums of the LUT or 1 LUT and multiply with 200!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
But you are missing one key point here, OP receive only the values 0..31 and nothing else!
I was working off the original post, which implied 16-bit signed values.

 

Indeed, if 8-bit signed values, then 256 fit into a 16-bit accumulator.

 

If really 5 bits extended to 8 (and if indeed 0-31 and not signed), then many thousands fit into a 16-bit accumulator and this whole discussion becomes relatively moot.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Read #16 again

it's because OP add the sin(x) and not x, but because x only can be 32 different numbers, it suggest just to find out how many of each, and only do the LUT once (for each input value group.)

 

But from the information in #16 we now know that there is 800us between numbers, so no real need for optimization.  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not clear what more OP needs to know.

By now he should know how to add a 2-byte signed number to a 5-byte signed number.

Five bytes is overkill, with only 128 numbers to add, three bytes is enough.

Even with twice that many, three bytes would be enough/

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not clear what more OP needs to know.

I suggest he looks up & studies CORDIC

 

https://www.dropbox.com/s/a8iaiopcvsz6tbn/CORDIC_Demo.asm 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not clear what more OP needs to know. 

 

I'm not  either.  :)   Seven years ago I used CORDIC rotation to figure angles with an accelerometer, but that was the last time I did anything with micro-controllers. 

 

I have written some code to move the bearing into the first geometric quadrant and hope to resole the bearing there.  That needs to be finished.   Then in quadrant I, I hope to find if the completed bearing needs more than 45 degrees added in.  That will determine whether the adjusted average sin or adjusted average cosine will be used to further reduce the angle.

 

I realize I need to learn more about using the stimulator.  Right now, the stim file generates the interrupt and the input data.  It would be nice to separate the interrupt stim from the data stim, but synchronize them so there is just one data input per interrupt.  

 

That may be the topic for another thread.