How to "trim" a 32- bit INT?

Go To Last Post
34 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Greetings -

 

I have 3 32 bit signed integer values that represent a processed version of signed 16-bit acceleration axis values. I need to do further processing to detect "events" in this data in real-time.

 

At the very least, I would like to trim, or truncate absolute value versions of these. That is, I would like to convert signed 32-bit values into unsigned 16-bit, using only the equivalent of the top 16 bits of the 32 bit originals. 

 

At first inspection, it appears that an absolute value operation ought to be the first thing, though it seems like it would be faster and take less code space to work on 16-bit values. Oh, did I say that I am very short on code space so I need to really pay attention to the executing code. Maybe there is no practical way around this, but maybe, also, I am overlooking something. For example, maybe there is a simple way to truncate a signed 32-bit to signed 16-bit? This is something I don't know much about :(  Well, this means that I know even less about this than I do some other things that I supposedly know something about!

 

So, I am soliciting suggestions about a general algorithm for converting int-32_t to unint16_t, preserving the high 16 bits. I suggest "general" because, depending on how those 32-bit values are filled, it may be possible to trim to 8-bit. Any takers?

 

Many thanks

Jim

 

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

maybe gcc can give ideas

 

#include <stdint.h>

uint16_t foo(int32_t x) {
  uint16_t y;

  if (x < 0) {
     return (-x)/0x10000;
  } else {
     return x/0x10000;
  }
}

$ avr-gcc -c -S -Os zz.c -o zz.s

foo:
	sbrs r25,7
	rjmp .L1
	mov r19,r23
	mov r18,r22
	mov r21,r25
	mov r20,r24
	clr r24
	clr r25
	clr r26
	clr r27
	sub r24,r18
	sbc r25,r19
	sbc r26,r20
	sbc r27,r21
	mov r25,r27
	mov r24,r26
.L1:
	ret

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That seems simple enough!

 

Did not realize that the unary minus sign could be used like that.

 

Thanks

Jim

 

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Last Edited: Mon. May 27, 2019 - 12:48 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

it still is early here, but I think the difficult part here is that you have to take care that you do not take only 15 bits if you have a negative number.

I would first strip the minus sign if it is present, and then return the 16bits you want by doing a shift and a type cast.

 

I think the function MattRW has made does that, but that is easily checked by giving it a known number

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Good point, and I was concerned about that. I think that Matt's algorithm does that, but will double check.

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It looks like Matt's code will give 15-bit numbers for

both positive and negative inputs, so change the

divisor to 0x8000.

 

There's also an edge case for when the input is the

most negative 32-bit number (0x80000000).  The

negative of this is maybe even undefined in the C
standard (I don't remember), but in practice you

get the same number back, which when divided by

0x8000 and converted to uint16_t gives zero!

 

So add a check for 0x80000000 and return 0xFFFF.

 

--Mike

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes.  If you want 16 bits, you need to do more.   What about this?   But is the extra code with the extra bit?

 

[EDIT: can't be right, I'm thinking now]

#include <stdint.h>

uint16_t foo(int32_t x) {
  if (x < 0) {
    return (2*((-x)/0x8000));
  } else {
     return x/0x8000;
  }
}

$ avr-gcc -c -S -Os zz.c -o zz.s

foo:
	sbrs r25,7
	rjmp .L2
	ldi r18,0
	ldi r19,lo8(-128)
	ldi r20,0
	ldi r21,0
	rcall __divmodsi4
	com r21
	com r20
	com r19
	neg r18
	sbci r19,lo8(-1)
	sbci r20,lo8(-1)
	sbci r21,lo8(-1)
	mov r25,r19
	mov r24,r18
	lsl r24
	rol r25
	ret
.L2:
	mov r27,r25
	mov r26,r24
	mov r25,r23
	mov r24,r22
	ldi r18,15
	1:
	asr r27
	ror r26
	ror r25
	ror r24
	dec r18
	brne 1b
	ret

 

Last Edited: Mon. May 27, 2019 - 01:08 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MattRW wrote:

$ avr-gcc -c -S -Os zz.c -o zz.s

 

That's the name of most of my programs!!!  ;-)

 

--Mike

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avr-mike wrote:

MattRW wrote:

$ avr-gcc -c -S -Os zz.c -o zz.s

 

That's the name of most of my programs!!!  ;-)

 

--Mike

 

 

Great minds think alike!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ka7ehk wrote:

So, I am soliciting suggestions about a general algorithm for converting int-32_t to unint16_t, preserving the high 16 bits. I suggest "general" because, depending on how those 32-bit values are filled, it may be possible to trim to 8-bit. Any takers?

 

You said you wanted the absolute value. What's wrong with the straightforward

 

int32_t src = ...;
uint16_t dst = labs(src) >> 16;

?

 

It is not clear what 16 bits you are talking about though in the subsequent messages. 16 upper value bits bits of absolute value (i.e. not including the zero sign bit)? Then its just

 

int32_t src = ...;
uint16_t dst = labs(src) >> 15;

 

I don't understand what is the point of any convoluted/multi-storied manipulations of the value when you can just directly state what you want. This is also an efficient way to do this, considering that `abs` is a built-in function in GCC.

 

In C code you will have to choose the proper version of `abs` depending on how the width of `int32_t` relates to the width of `int`. Or you can write your own `abs` as `(src >= 0 ? src : -src)`. (I used `labs` under assumption that `int` is a 16-bit type.)

 

ka7ehk wrote:
Did not realize that the unary minus sign could be used like that.

 

Um... But this is the only way to use unary minus! What other uses could you possibly be implying?

Last Edited: Tue. May 28, 2019 - 05:32 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's even faster and smaller in hard-coded assembler.  Presuming you can load and store SRAM to/from any arbitrary register, and the 32-bit signed value is in r3(MSB)..r0  :

 

lsl r1   ; Fetch the 16th bit
rol r2
rol r3   ; And the sign bit gets thrown away

And then put it back into SRAM with :

st Z+, r3
st Z+, r2

(I has a Big-Endian). And you're done.  This, of course, presumes a lack of sign extension.  For eight bits, just skip the 2nd store line.  S.

 

PS - The Code Window works now!  I dunno who fixed what, but thanks!  S.

Last Edited: Wed. May 29, 2019 - 06:33 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

After a bit of thought, I came up with an even dirtier trick.

 

See, my assembler routine doesn't need any high-side registers, but I didn't put in the preamble nor postamble that would protect registers and pointers and all that.  The problem was preserving the SREG - the status register.  To efficiently preserve that, you needed a high-side register.  So, you should not do what I said to do, and do something slightly different instead...

 

32s-16u-TRUNCATE:
push r0  ; just preserving registers, here.
push r1
push r2  ; And we don't need r3- see below
push ZL
push ZH  ; 'cause we'll be mucking with the Z pointer

; But we don't have to store and restore the SREG
; 'cause we're about to get clever

ldi ZL, low(32s-SOURCE)
ldi ZH, high(32s-SOURCE)

;We're big-endian here, so let's get the MSB first
ld r2, Z+
ld r1, Z+
ld r0, Z+
; We have no need for the low byte
; We're going to throw it away anyhow

rol r0  ; This is a change from my first idea
rol r1  ; Moving on
rol r2  ; Shoving the sign bit out into the carry

; Which trashes the carry flag in SREG
; Typically, we'd have to deal with this by storing and restoring
; the SREG, but because of the change...

ldi ZL, low(16u-DESTINATION)
ldi ZH, high(16u-DESTINATION)
; With carefully organized memory, this can be made more efficient

st Z+, r2
st Z+, r1   ; Deliberately throwing away r0, until...

ror r0      ; Putting the carry flag back!  HA!

pop ZH
pop ZL
pop r2
pop r1
pop r0

ret     ; And Bob's your uncle.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@Scroungre: to get the absolute value requires more than just discarding the sign bit. See #2.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hmmm, Just recognized that I need to change the process I proposed in the first message. I need to average the truncated value, in order to remove the sensor offset. That has to be done BEFORE extracting the absolute value, since average of a signed time series is not the same as average of the absolute value of the same signed time series.

 

So, I need to truncate, saving the sign. Here is the rub. Out of the 32 bit original, only 23 bits (plus sign) are actually used. What I would really like to do is compress the 32 bit signed into 16 bit signed, discarding several of those unused high bits and some low bits so that the result fits into a signed 16 bit.

 

The rational for doing this is that computing the resulting average as well as the subtraction of the average from the 16-bit original and extracting the absolute value (from the offset-corrected 16-bit value) and subsequent operations would benefit in terms of speed and code space. Some of those subsequent operations include a squaring operation to obtained the squared magnitude of equivalent space vector and this would be huge if I operate on the existing 32 bit values. All of this needs to be done in "real time" so time IS important. And, of course, I do not have much flash or SRAM remaining!

 

My apologies if this is confusing. Maybe a brief summary would help:

  1. I have 3 acceleration commponent vectors, signed 32 bits.

  2. I need to generate the magnitude-squared of the equivalent space vector

  3. Out of the 32-bit originals, 8 of the upper bits (just below the sign bit) are "unused" (always zero)

  4. I would like to truncate the 32 bit originals into 16 bits, discarding those unused 8 upper bits and 8 low bits to make a 16-bit signed integer.

 

Thanks, again, for considering my little modified

 

 

challenge!

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Last Edited: Wed. May 29, 2019 - 09:14 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

With my poor C skills, I would use unions or pointers

1. and byte 3 with 0x80 (extract the sign bit)

2. and byte 2 with 0x7F (make room for the sign bit in bit 23)

3. add the two above in byte 2 and bingo, you have your 16 signed in bytes 1 and 2.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ka7ehk wrote:

  3. Out of the 32-bit originals, 8 of the upper bits (just below the sign bit) are "unused" (always zero)

 

That is already incorrect/misleading, under assumption that you are talking about regular 2's-complement platform. You just said that your original values are signed. That means that the "unused" bits will be either all-zero (for non-negative values) or all-ones (for negative values). "Unused" bits on a 2's-complement platform always contain a copy of the sign bit (!). The upper portion of a negative value is always filled with 1's. So, your "always zero" claim seems to make no sense.

 

In any case, on a 2's complement platform in order to truncate a 32-bit signed integer into a 16-bit signed integer in accordance with the algorithm you described (discard upper 9 bits, discard lower 8 bits, preserve sign) you don't need to worry about the sign bit as a separate entity at all. It is just plain and simple

 

int32_t src = ...;
int16_t dst = src >> 8;

 

There's no need for any manipulations with sign bit or any bit masks.

 

(And no, there no way to make it more efficient through using assembler.)

Last Edited: Thu. May 30, 2019 - 01:31 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

All 1's or all 0's - whatever. They are not significant to the final numeric value except as place holders. The point is that I care about the most significant bit, don't care about the 8 below the high bit, and don't care about the lowest 8. I would like to convert what remains into a 16-bit signed int.

 

Union, as suggested by angelu, might be a way to do just that. 

 

Another way might be to do an 8x right shift, then mask the remaining low 16 bits to populate an int16_t. That ought to retain the sign, I think. On the undesirable side, that would require doing shifts on 4 bytes, repeated 8 times. Right now, that union sounds pretty good.

 

Thanks

Jim

 

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Last Edited: Thu. May 30, 2019 - 12:54 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ka7ehk wrote:

Union, as suggested by angelu, might be a way to do just that.

 

You can use union as a way to discard the lower 8 bits, but normally it is not worth it. The compiler is smart enough to realize that in a `>> 8` shift the lowest byte of the original value can be discarded as whole.

 

ka7ehk wrote:
Another way might be to do an 8x right shift, then mask the remaining low 16 bits to populate an int16_t. That ought to retain the sign, I think. On the undesirable side, that would require doing shifts on 4 bytes, repeated 8 times. Right now, that union sounds pretty good.

 

 

No, of course not. No self-respecting compiler will implement a shift by 8 bits as a sequence of single-bits shifts. It is also not clear why you want to "mask" anything. If the upper 8 value bits are unused (as you stated) then a `>> 8` shift will produce a value that will fit perfectly in a `int16_t` without any masking.

 

As I said

 

int32_t src = ...;
int16_t dst = src >> 8;

 

and that's it. Everything else is a solution looking for a problem.

 


 

int16_t __attribute__ ((noinline)) foo(int32_t src) 
{
  return src >> 8;
}

Compiling for `-Os`

clr r27
sbrc r25,7
dec r27
mov r26,r25
mov r25,r24
mov r24,r23
ret

 

No repetitive shifts, as you see. No shifts at all, actually. This is still less efficient than it could have been: the compiler bothers to form values in `r27` and `r26`, which is completely unnecessary. But if the code is allowed to inline into the actual calling context, it might easily become much more efficient.

Last Edited: Thu. May 30, 2019 - 01:56 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have been unable find an explanation of what is supposed to happen when you assign a 32 bit signed int to a 16 bit signed int. Does it take the low 16 or the high 16? Is it specified somewhere? I would hope so but have been unable to find it.

 

Question: Why is the cast operator not used?  Yes, you have showed that it is not "necessary" but my meagre knowledge says "you ought to cast". What in the standard says that you don't need to? Is not the purpose of a cast to change from one data type to a different data type? And, is that not what is being done here?

 

Thanks

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Last Edited: Thu. May 30, 2019 - 02:05 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ka7ehk wrote:

I have been unable find an explanation of what is supposed to happen when you assign a 32 bit signed int to a 16 bit signed int. Does it take the low 16 or the high 16? Is it specified somewhere? I would hope so but have been unable to find it.

 

It is specified in the language standard.

 

1. If the original `int32_t` value fits into the range of `int16_t` type then it is guaranteed that the assignment will preserve the value. E.g. `-5` of type `int32_t` is guaranteed to become `-5` of type `int16_t`. Which means, of course, that in 2's-complement representations the lower 16 bits are taken.

 

2. If the original `int32_t` value does not fit into the range of `int16_t` type, then the result is implementation-defined. In real life this also means that the compiler will simply take the lower 16 bits of the original value.

 

In your case, under your conditions (as you stated them), the value will fit, meaning that the branch 1 applies.

 

ka7ehk wrote:

Question: Why is the cast operator not used?  

 

Because in C and C++ conversions between arithmetic types are implicit. There's no need for an explicit cast.

 

However, some overcautious compilers might issue warnings for narrowing conversions. Just to suppress such warnings you might want to add a cast. By using a cast you tell the compiler that "yes, this is what I really want to do". But the language itself does not require any casts in this case.

 

The only exception from this rule is `{}` initializers in C++ (C++11 and later). Such initializers do not allow implicit narrowing conversions. But this is not our case.

 

ka7ehk wrote:
Is not the purpose of a cast to change from one data type to a different data type? And, is that not what is being done here?

 

Yes, but all data type conversions in C and C++ separate into two categories:

 

1. Conversions that can be done implicitly.

2. Conversions that require an explicit cast.

 

Conversions between arithmetic types in C and C++ belong to the first category.

 

P.S. This applies to contexts that naturally imply a conversion, e.g. assigning a 32-bit value to a 16-bit variable. If the conversion is not implied by the context, you have to use an explicit cast to change the type, e.g. multiplying two 16-bit values produces 16-bit result unless you explicitly cast one of the operands to 32-bit type.

 

Last Edited: Thu. May 30, 2019 - 01:47 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AndreyT wrote:

It is just plain and simple

 

int32_t src = ...;
int16_t dst = src >> 8;

While I agree with Andrey and kind of wonder why it's not as simple as shift/division I wonder if that was really supposed to be >>8. Surely a 32 to 16 downsample involves a >>16 ??
ka7ehk wrote:
I have been unable find an explanation of what is supposed to happen when you assign a 32 bit signed int to a 16 bit signed int. Does it take the low 16 or the high 16? Is it specified somewhere? I would hope so but have been unable to find it.
Surely the C standard or K&R explain truncation? What it does is simply take the lower bits - this is NOT what you want. You said previously "compress 32 bits to 16 bits". The only sensible way to do that is take the top (most significant) bits and discard the lower bits.  That's either /65536 or, because it's a binary multiple, the more obvious >>16

 

EDIT: oh I see the reason for >>8 - because you said you want bit 23 not bit 31 to become the new MSB ? Still trying to picture in my mind how that makes any sense - is this because, while the type is int32_t the numbers are never more than 24 bits? So a >>16 would discard to many relevant bits?

Last Edited: Thu. May 30, 2019 - 08:23 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

int32_t x = whatever ;

uint16_t y = x >> 16 ;

The largest known prime number: 282589933-1

In my humble opinion, I'm always right. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

AndreyT wrote:

 

It is just plain and simple

 

int32_t src = ...;
int16_t dst = src >> 8;

While I agree with Andrey and kind of wonder why it's not as simple as shift/division I wonder if that was really supposed to be >>8. Surely a 32 to 16 downsample involves a >>16 ??

 

The shift value depends on what bits you want to preserve and what bits you want to discard. The OP clearly stated that he wants to discard the lower 8 value-bits and the upper 8 value-bits (because the latter are supposedly unused). So, we shift by 8 (thus discarding the lower bits) and then force the result into `int16_t` (this discarding the upper bits).

 

In this case 8, not 16.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That works as long as the range in the int32 is just -8,388,608 .. +8,388,607. If it's outside the range the higher bits would be truncated.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The lower 8 bits are being discarded for two reasons: (1) they are really not necessary for the subsequent operations, and (2) I need to reduce the numeric width so that a downstream square operation does not take so much memory and time.

 

All of this is to compare a given X,Y,Z acceleration (vector magnitude) to a trigger threshold. To do that, I need to determine a block average (for each axis), subtract that from the sensor reading (so that I have close to zero-based values), compute a sum of squares (of the three axes, to get magnitude squared of the net vector acceleration), then compare that to the square of the trigger threshold.

 

All of this gets done during periodic wake-ups to service the sensor FIFO, so the longer it takes, the more battery energy it uses. I'm also down to a very few K of FLASH and 200 bytes of SRAM, so everything I can do to reduce size is really important. 

 

Thats the short version!

 

Thanks

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This may be too crude, but I'd be pleased to see you test it out.   I just stepped through one example on my simulator to see it looks to be working.

All math is signed.   When squaring hi:lo, it only accumulates hi*hi, hi*lo and lo*hi; lo*lo is ignored.

You pass pointer to an array of 3 int16_t.  EDIT: Oh, and assumes the bias is removed before calling.

EDIT: Above should read "accumulation of hi8(hi*lo) and hi8(lo*hi)".

 

 

  int16_t v[3], r;

  v[0] = 124;
  v[1] = 31123;
  v[2] = -30999;

  r = sumsq3_16(v);

 

	.section .text
	.global sumsq3_16

	.type sumsq3_s16,function
	;; int16_t sumsq3_16(int16_t *v)
	;; v: pointer to array of 3 int16's (r25:r24)
sumsq3_16:
	ldi r23,3
	movw r30,r24
	clr r20
	clr r24			; result
	clr r25
0: 	ld r20,Z+
	ld r21,Z+
	;;
	muls r21,r21		; v.hi*v.hi => r1:r0
	add r24,r0
	adc r25,r1
	mulsu r21,r20		; v.hi*v.lo => r1:r0
	sbc r25,r20
	add r24,r1
	adc r25,r20
	mulsu r21,r20		; v.hi*v.lo => r1:r0
	sbc r25,r20
	add r24,r1
	adc r25,r20
	;;
	dec r23
	brne 0b
	;;
	clr r1			; reset R1 to zero
	ret

 

Last Edited: Fri. May 31, 2019 - 05:25 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Jim,

 

If you are low on resources, you are going to want to avoid 32 bit as much as you can - eliminating even storing the values in 32 bit variables in the first place if that is possible.

 

int32_t x;

uint16_t y;

 

y=((uint16_t)(x >> 16)) & 0x7fff;

 

This will shift x to the right 16 bits only (keeping the top 16 bits) and convert it to a uin16_t.  Then it will strip out the top bit in the event that it was a negative.

 

Again, a better plan would be to not have the int32_t x in the first place unless it is absolutely necessary.  Where is the source?  Can it be brought in as a 16 bit value from the beginning?

 

Good luck!

 

Alan

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is a 16 bit value  from the sensor, but I average, then do not divide by N so as to retain as much resolution as possible. That is the source of the 32 bit value. HOWEVER, the top 8 bits are unused (only placeholders). Since I only need to compare the value against a threshold, I can discard the low 8, leaving 16 for subsequent analysis.

 

Jim

 

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you tried with a 24 variable ?  perhaps it save a tad of code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ka7ehk wrote:
All of this is to compare a given X,Y,Z acceleration (vector magnitude) to a trigger threshold.

I'm probably being unhelpful  at this stage of the design; but that requirement looks like something the accelerometer should be able to handle autonomously. I.e. you program a set of threshold registers and an averaging filter and the chip asserts an IRQ line if the acceleration threshold is exceeded.

 

BTW: I'm starting to notice this behaviour in a few commercial products now; for instance Bluetooth headphones that "wake up" when you lift them up off the desk.

 

Does the sensor you've chosen support this mode of operation ?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ka7ehk wrote:
It is a 16 bit value  from the sensor, but I average, then do not divide by N so as to retain as much resolution as possible. That is the source of the 32 bit value.

I wonder if not averaging makes much difference.

Particularly If you can choose N as a power of 2, it might be worth considering, maintain the sum of the readings as int32_t, then just divide by N to get average as int16_t.

You then have 16 bit values which can be squared to get the magnitude_squared as 32 bit eg.

int16_t x_ave, y_ave, z_ave;
uint32_t mag_sq = (uint32_t)((int32_t)x_ave * x_ave) +
                  (uint32_t)((int32_t)y_ave * y_ave) +
                  (uint32_t)((int32_t)z_ave * z_ave));

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You probably already know this, but keep the division to factors of 2 (2, 4, 8, 16, 32, 64, etc.) so it only has to slide bits right instead of perform actual division.

 

Maybe accruing values in a 24 bit or 32 bit won't be so bad if you do nothing else on that 24 bit or 32 bit variable, and the divide by 2 it back into a 16 bit one.

 

My experience with 32 bit variables, and especially manipulating them is that it doesn't take much to produce 10-20 instructions for even operations you think are no big deal.  Best thing to do is to dig through your lss file and look at your instructions and what the compiler emits for them.

 

I always set the debugging to default -g2 even for release because it doesn't change the contents of the hex file or binary produced, but it does give me an lss with source instructions to review.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Doh.  I forgot to udpate for "special int32s".  This one fixed for array of three int32's where for each int32 A:B:C:D only B:C are considered.  If the int32 is not in the range [-2^23,2^23) then result will be wrong.  Difference from above is the added "adiw" instructions.

 

	.section .text
	.global sumsq3_s16s32
	.type sumsq3_s16s32,function
	;; int16_t sumsq3_16(int32_t *v)
	;; v: pointer to array of 3 int32's (r25:r24)
	;; => approx sum of squares (r25:r24)
	;; easily modifiable to: int16_t sumsq_16(int16_t *v, uint8_t n)
	;; n: length of array (r23 == 3)
sumsq3_s16s32:
	ldi r23,3
	movw r30,r24
	clr r20
	clr r24			; result
	clr r25
	adiw Z,1
0: 	ld r20,Z+
	ld r21,Z+
	adiw Z,2
	;;
	muls r21,r21		; v.hi*v.hi => r1:r0
	add r24,r0
	adc r25,r1
	mulsu r21,r20		; v.hi*v.lo => r1:r0
	sbc r25,r20
	add r24,r1
	adc r25,r20
	mulsu r21,r20		; v.hi*v.lo => r1:r0
	sbc r25,r20
	add r24,r1
	adc r25,r20
	;;
	dec r23
	brne 0b
	;; 
	clr r1			; reset R1 to zero
	ret

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The averaging is to get a "baseline" that can be subtracted so that the triggering is based on changes. There is enough processing that I THINK that conversion to 16 bit may be a net gain. Implementation will tell.

 

Yes, the senosr has all this built in, but I did not connect up the interrupt line that function uses. Polling is still possible, I think, and that has to be explored. This is still in very early design as a capability increment on an existing product. I cannot afford to change the circuit board at this point. 

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net