How to "trim" a 32- bit INT?

34 posts / 0 new
Author
Message

Greetings -

I have 3 32 bit signed integer values that represent a processed version of signed 16-bit acceleration axis values. I need to do further processing to detect "events" in this data in real-time.

At the very least, I would like to trim, or truncate absolute value versions of these. That is, I would like to convert signed 32-bit values into unsigned 16-bit, using only the equivalent of the top 16 bits of the 32 bit originals.

At first inspection, it appears that an absolute value operation ought to be the first thing, though it seems like it would be faster and take less code space to work on 16-bit values. Oh, did I say that I am very short on code space so I need to really pay attention to the executing code. Maybe there is no practical way around this, but maybe, also, I am overlooking something. For example, maybe there is a simple way to truncate a signed 32-bit to signed 16-bit? This is something I don't know much about :(  Well, this means that I know even less about this than I do some other things that I supposedly know something about!

So, I am soliciting suggestions about a general algorithm for converting int-32_t to unint16_t, preserving the high 16 bits. I suggest "general" because, depending on how those 32-bit values are filled, it may be possible to trim to 8-bit. Any takers?

Many thanks

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

maybe gcc can give ideas

#include <stdint.h>

uint16_t foo(int32_t x) {
uint16_t y;

if (x < 0) {
return (-x)/0x10000;
} else {
return x/0x10000;
}
}

\$ avr-gcc -c -S -Os zz.c -o zz.s

foo:
sbrs r25,7
rjmp .L1
mov r19,r23
mov r18,r22
mov r21,r25
mov r20,r24
clr r24
clr r25
clr r26
clr r27
sub r24,r18
sbc r25,r19
sbc r26,r20
sbc r27,r21
mov r25,r27
mov r24,r26
.L1:
ret

That seems simple enough!

Did not realize that the unary minus sign could be used like that.

Thanks

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Last Edited: Mon. May 27, 2019 - 12:48 AM

it still is early here, but I think the difficult part here is that you have to take care that you do not take only 15 bits if you have a negative number.

I would first strip the minus sign if it is present, and then return the 16bits you want by doing a shift and a type cast.

I think the function MattRW has made does that, but that is easily checked by giving it a known number

Good point, and I was concerned about that. I think that Matt's algorithm does that, but will double check.

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

It looks like Matt's code will give 15-bit numbers for

both positive and negative inputs, so change the

divisor to 0x8000.

There's also an edge case for when the input is the

most negative 32-bit number (0x80000000).  The

negative of this is maybe even undefined in the C
standard (I don't remember), but in practice you

get the same number back, which when divided by

0x8000 and converted to uint16_t gives zero!

So add a check for 0x80000000 and return 0xFFFF.

--Mike

Yes.  If you want 16 bits, you need to do more.   What about this?   But is the extra code with the extra bit?

[EDIT: can't be right, I'm thinking now]

#include <stdint.h>

uint16_t foo(int32_t x) {
if (x < 0) {
return (2*((-x)/0x8000));
} else {
return x/0x8000;
}
}

\$ avr-gcc -c -S -Os zz.c -o zz.s

foo:
sbrs r25,7
rjmp .L2
ldi r18,0
ldi r19,lo8(-128)
ldi r20,0
ldi r21,0
rcall __divmodsi4
com r21
com r20
com r19
neg r18
sbci r19,lo8(-1)
sbci r20,lo8(-1)
sbci r21,lo8(-1)
mov r25,r19
mov r24,r18
lsl r24
rol r25
ret
.L2:
mov r27,r25
mov r26,r24
mov r25,r23
mov r24,r22
ldi r18,15
1:
asr r27
ror r26
ror r25
ror r24
dec r18
brne 1b
ret

Last Edited: Mon. May 27, 2019 - 01:08 PM

MattRW wrote:

\$ avr-gcc -c -S -Os zz.c -o zz.s

That's the name of most of my programs!!!  ;-)

--Mike

avr-mike wrote:

MattRW wrote:

\$ avr-gcc -c -S -Os zz.c -o zz.s

That's the name of most of my programs!!!  ;-)

--Mike

Great minds think alike!

ka7ehk wrote:

So, I am soliciting suggestions about a general algorithm for converting int-32_t to unint16_t, preserving the high 16 bits. I suggest "general" because, depending on how those 32-bit values are filled, it may be possible to trim to 8-bit. Any takers?

You said you wanted the absolute value. What's wrong with the straightforward

int32_t src = ...;
uint16_t dst = labs(src) >> 16;

?

It is not clear what 16 bits you are talking about though in the subsequent messages. 16 upper value bits bits of absolute value (i.e. not including the zero sign bit)? Then its just

int32_t src = ...;
uint16_t dst = labs(src) >> 15;

I don't understand what is the point of any convoluted/multi-storied manipulations of the value when you can just directly state what you want. This is also an efficient way to do this, considering that `abs` is a built-in function in GCC.

In C code you will have to choose the proper version of `abs` depending on how the width of `int32_t` relates to the width of `int`. Or you can write your own `abs` as `(src >= 0 ? src : -src)`. (I used `labs` under assumption that `int` is a 16-bit type.)

ka7ehk wrote:
Did not realize that the unary minus sign could be used like that.

Um... But this is the only way to use unary minus! What other uses could you possibly be implying?

Last Edited: Tue. May 28, 2019 - 05:32 AM

It's even faster and smaller in hard-coded assembler.  Presuming you can load and store SRAM to/from any arbitrary register, and the 32-bit signed value is in r3(MSB)..r0  :

lsl r1   ; Fetch the 16th bit
rol r2
rol r3   ; And the sign bit gets thrown away

And then put it back into SRAM with :

st Z+, r3
st Z+, r2

(I has a Big-Endian). And you're done.  This, of course, presumes a lack of sign extension.  For eight bits, just skip the 2nd store line.  S.

PS - The Code Window works now!  I dunno who fixed what, but thanks!  S.

Last Edited: Wed. May 29, 2019 - 06:33 AM

After a bit of thought, I came up with an even dirtier trick.

See, my assembler routine doesn't need any high-side registers, but I didn't put in the preamble nor postamble that would protect registers and pointers and all that.  The problem was preserving the SREG - the status register.  To efficiently preserve that, you needed a high-side register.  So, you should not do what I said to do, and do something slightly different instead...

32s-16u-TRUNCATE:
push r0  ; just preserving registers, here.
push r1
push r2  ; And we don't need r3- see below
push ZL
push ZH  ; 'cause we'll be mucking with the Z pointer

; But we don't have to store and restore the SREG
; 'cause we're about to get clever

ldi ZL, low(32s-SOURCE)
ldi ZH, high(32s-SOURCE)

;We're big-endian here, so let's get the MSB first
ld r2, Z+
ld r1, Z+
ld r0, Z+
; We have no need for the low byte
; We're going to throw it away anyhow

rol r0  ; This is a change from my first idea
rol r1  ; Moving on
rol r2  ; Shoving the sign bit out into the carry

; Which trashes the carry flag in SREG
; Typically, we'd have to deal with this by storing and restoring
; the SREG, but because of the change...

ldi ZL, low(16u-DESTINATION)
ldi ZH, high(16u-DESTINATION)
; With carefully organized memory, this can be made more efficient

st Z+, r2
st Z+, r1   ; Deliberately throwing away r0, until...

ror r0      ; Putting the carry flag back!  HA!

pop ZH
pop ZL
pop r2
pop r1
pop r0

ret     ; And Bob's your uncle.

@Scroungre: to get the absolute value requires more than just discarding the sign bit. See #2.

Hmmm, Just recognized that I need to change the process I proposed in the first message. I need to average the truncated value, in order to remove the sensor offset. That has to be done BEFORE extracting the absolute value, since average of a signed time series is not the same as average of the absolute value of the same signed time series.

So, I need to truncate, saving the sign. Here is the rub. Out of the 32 bit original, only 23 bits (plus sign) are actually used. What I would really like to do is compress the 32 bit signed into 16 bit signed, discarding several of those unused high bits and some low bits so that the result fits into a signed 16 bit.

The rational for doing this is that computing the resulting average as well as the subtraction of the average from the 16-bit original and extracting the absolute value (from the offset-corrected 16-bit value) and subsequent operations would benefit in terms of speed and code space. Some of those subsequent operations include a squaring operation to obtained the squared magnitude of equivalent space vector and this would be huge if I operate on the existing 32 bit values. All of this needs to be done in "real time" so time IS important. And, of course, I do not have much flash or SRAM remaining!

My apologies if this is confusing. Maybe a brief summary would help:

1. I have 3 acceleration commponent vectors, signed 32 bits.

2. I need to generate the magnitude-squared of the equivalent space vector

3. Out of the 32-bit originals, 8 of the upper bits (just below the sign bit) are "unused" (always zero)

4. I would like to truncate the 32 bit originals into 16 bits, discarding those unused 8 upper bits and 8 low bits to make a 16-bit signed integer.

Thanks, again, for considering my little modified

challenge!

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Last Edited: Wed. May 29, 2019 - 09:14 PM

With my poor C skills, I would use unions or pointers

1. and byte 3 with 0x80 (extract the sign bit)

2. and byte 2 with 0x7F (make room for the sign bit in bit 23)

3. add the two above in byte 2 and bingo, you have your 16 signed in bytes 1 and 2.

ka7ehk wrote:

3. Out of the 32-bit originals, 8 of the upper bits (just below the sign bit) are "unused" (always zero)

That is already incorrect/misleading, under assumption that you are talking about regular 2's-complement platform. You just said that your original values are signed. That means that the "unused" bits will be either all-zero (for non-negative values) or all-ones (for negative values). "Unused" bits on a 2's-complement platform always contain a copy of the sign bit (!). The upper portion of a negative value is always filled with 1's. So, your "always zero" claim seems to make no sense.

In any case, on a 2's complement platform in order to truncate a 32-bit signed integer into a 16-bit signed integer in accordance with the algorithm you described (discard upper 9 bits, discard lower 8 bits, preserve sign) you don't need to worry about the sign bit as a separate entity at all. It is just plain and simple

int32_t src = ...;
int16_t dst = src >> 8;

There's no need for any manipulations with sign bit or any bit masks.

(And no, there no way to make it more efficient through using assembler.)

Last Edited: Thu. May 30, 2019 - 01:31 AM

All 1's or all 0's - whatever. They are not significant to the final numeric value except as place holders. The point is that I care about the most significant bit, don't care about the 8 below the high bit, and don't care about the lowest 8. I would like to convert what remains into a 16-bit signed int.

Union, as suggested by angelu, might be a way to do just that.

Another way might be to do an 8x right shift, then mask the remaining low 16 bits to populate an int16_t. That ought to retain the sign, I think. On the undesirable side, that would require doing shifts on 4 bytes, repeated 8 times. Right now, that union sounds pretty good.

Thanks

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Last Edited: Thu. May 30, 2019 - 12:54 AM

ka7ehk wrote:

Union, as suggested by angelu, might be a way to do just that.

You can use union as a way to discard the lower 8 bits, but normally it is not worth it. The compiler is smart enough to realize that in a `>> 8` shift the lowest byte of the original value can be discarded as whole.

ka7ehk wrote:
Another way might be to do an 8x right shift, then mask the remaining low 16 bits to populate an int16_t. That ought to retain the sign, I think. On the undesirable side, that would require doing shifts on 4 bytes, repeated 8 times. Right now, that union sounds pretty good.

No, of course not. No self-respecting compiler will implement a shift by 8 bits as a sequence of single-bits shifts. It is also not clear why you want to "mask" anything. If the upper 8 value bits are unused (as you stated) then a `>> 8` shift will produce a value that will fit perfectly in a `int16_t` without any masking.

As I said

int32_t src = ...;
int16_t dst = src >> 8;

and that's it. Everything else is a solution looking for a problem.

int16_t __attribute__ ((noinline)) foo(int32_t src)
{
return src >> 8;
}

Compiling for `-Os`

clr r27
sbrc r25,7
dec r27
mov r26,r25
mov r25,r24
mov r24,r23
ret

No repetitive shifts, as you see. No shifts at all, actually. This is still less efficient than it could have been: the compiler bothers to form values in `r27` and `r26`, which is completely unnecessary. But if the code is allowed to inline into the actual calling context, it might easily become much more efficient.

Last Edited: Thu. May 30, 2019 - 01:56 AM

I have been unable find an explanation of what is supposed to happen when you assign a 32 bit signed int to a 16 bit signed int. Does it take the low 16 or the high 16? Is it specified somewhere? I would hope so but have been unable to find it.

Question: Why is the cast operator not used?  Yes, you have showed that it is not "necessary" but my meagre knowledge says "you ought to cast". What in the standard says that you don't need to? Is not the purpose of a cast to change from one data type to a different data type? And, is that not what is being done here?

Thanks

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Last Edited: Thu. May 30, 2019 - 02:05 AM

ka7ehk wrote:

I have been unable find an explanation of what is supposed to happen when you assign a 32 bit signed int to a 16 bit signed int. Does it take the low 16 or the high 16? Is it specified somewhere? I would hope so but have been unable to find it.

It is specified in the language standard.

1. If the original `int32_t` value fits into the range of `int16_t` type then it is guaranteed that the assignment will preserve the value. E.g. `-5` of type `int32_t` is guaranteed to become `-5` of type `int16_t`. Which means, of course, that in 2's-complement representations the lower 16 bits are taken.

2. If the original `int32_t` value does not fit into the range of `int16_t` type, then the result is implementation-defined. In real life this also means that the compiler will simply take the lower 16 bits of the original value.

In your case, under your conditions (as you stated them), the value will fit, meaning that the branch 1 applies.

ka7ehk wrote:

Question: Why is the cast operator not used?

Because in C and C++ conversions between arithmetic types are implicit. There's no need for an explicit cast.

However, some overcautious compilers might issue warnings for narrowing conversions. Just to suppress such warnings you might want to add a cast. By using a cast you tell the compiler that "yes, this is what I really want to do". But the language itself does not require any casts in this case.

The only exception from this rule is `{}` initializers in C++ (C++11 and later). Such initializers do not allow implicit narrowing conversions. But this is not our case.

ka7ehk wrote:
Is not the purpose of a cast to change from one data type to a different data type? And, is that not what is being done here?

Yes, but all data type conversions in C and C++ separate into two categories:

1. Conversions that can be done implicitly.

2. Conversions that require an explicit cast.

Conversions between arithmetic types in C and C++ belong to the first category.

P.S. This applies to contexts that naturally imply a conversion, e.g. assigning a 32-bit value to a 16-bit variable. If the conversion is not implied by the context, you have to use an explicit cast to change the type, e.g. multiplying two 16-bit values produces 16-bit result unless you explicitly cast one of the operands to 32-bit type.

Last Edited: Thu. May 30, 2019 - 01:47 PM

AndreyT wrote:

It is just plain and simple

int32_t src = ...;
int16_t dst = src >> 8;

While I agree with Andrey and kind of wonder why it's not as simple as shift/division I wonder if that was really supposed to be >>8. Surely a 32 to 16 downsample involves a >>16 ??
ka7ehk wrote:
I have been unable find an explanation of what is supposed to happen when you assign a 32 bit signed int to a 16 bit signed int. Does it take the low 16 or the high 16? Is it specified somewhere? I would hope so but have been unable to find it.
Surely the C standard or K&R explain truncation? What it does is simply take the lower bits - this is NOT what you want. You said previously "compress 32 bits to 16 bits". The only sensible way to do that is take the top (most significant) bits and discard the lower bits.  That's either /65536 or, because it's a binary multiple, the more obvious >>16

EDIT: oh I see the reason for >>8 - because you said you want bit 23 not bit 31 to become the new MSB ? Still trying to picture in my mind how that makes any sense - is this because, while the type is int32_t the numbers are never more than 24 bits? So a >>16 would discard to many relevant bits?

Last Edited: Thu. May 30, 2019 - 08:23 AM

int32_t x = whatever ;

uint16_t y = x >> 16 ;

The largest known prime number: 282589933-1

It's easy to stop breaking the 10th commandment! Break the 8th instead.

clawson wrote:

AndreyT wrote:

It is just plain and simple

int32_t src = ...;
int16_t dst = src >> 8;

While I agree with Andrey and kind of wonder why it's not as simple as shift/division I wonder if that was really supposed to be >>8. Surely a 32 to 16 downsample involves a >>16 ??

The shift value depends on what bits you want to preserve and what bits you want to discard. The OP clearly stated that he wants to discard the lower 8 value-bits and the upper 8 value-bits (because the latter are supposedly unused). So, we shift by 8 (thus discarding the lower bits) and then force the result into `int16_t` (this discarding the upper bits).

In this case 8, not 16.

That works as long as the range in the int32 is just -8,388,608 .. +8,388,607. If it's outside the range the higher bits would be truncated.

The lower 8 bits are being discarded for two reasons: (1) they are really not necessary for the subsequent operations, and (2) I need to reduce the numeric width so that a downstream square operation does not take so much memory and time.

All of this is to compare a given X,Y,Z acceleration (vector magnitude) to a trigger threshold. To do that, I need to determine a block average (for each axis), subtract that from the sensor reading (so that I have close to zero-based values), compute a sum of squares (of the three axes, to get magnitude squared of the net vector acceleration), then compare that to the square of the trigger threshold.

All of this gets done during periodic wake-ups to service the sensor FIFO, so the longer it takes, the more battery energy it uses. I'm also down to a very few K of FLASH and 200 bytes of SRAM, so everything I can do to reduce size is really important.

Thats the short version!

Thanks

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

This may be too crude, but I'd be pleased to see you test it out.   I just stepped through one example on my simulator to see it looks to be working.

All math is signed.   When squaring hi:lo, it only accumulates hi*hi, hi*lo and lo*hi; lo*lo is ignored.

You pass pointer to an array of 3 int16_t.  EDIT: Oh, and assumes the bias is removed before calling.

EDIT: Above should read "accumulation of hi8(hi*lo) and hi8(lo*hi)".

int16_t v[3], r;

v[0] = 124;
v[1] = 31123;
v[2] = -30999;

r = sumsq3_16(v);

.section .text
.global sumsq3_16

.type sumsq3_s16,function
;; int16_t sumsq3_16(int16_t *v)
;; v: pointer to array of 3 int16's (r25:r24)
sumsq3_16:
ldi r23,3
movw r30,r24
clr r20
clr r24			; result
clr r25
0: 	ld r20,Z+
ld r21,Z+
;;
muls r21,r21		; v.hi*v.hi => r1:r0
mulsu r21,r20		; v.hi*v.lo => r1:r0
sbc r25,r20
mulsu r21,r20		; v.hi*v.lo => r1:r0
sbc r25,r20
;;
dec r23
brne 0b
;;
clr r1			; reset R1 to zero
ret

Last Edited: Fri. May 31, 2019 - 05:25 PM

Hi Jim,

If you are low on resources, you are going to want to avoid 32 bit as much as you can - eliminating even storing the values in 32 bit variables in the first place if that is possible.

int32_t x;

uint16_t y;

y=((uint16_t)(x >> 16)) & 0x7fff;

This will shift x to the right 16 bits only (keeping the top 16 bits) and convert it to a uin16_t.  Then it will strip out the top bit in the event that it was a negative.

Again, a better plan would be to not have the int32_t x in the first place unless it is absolutely necessary.  Where is the source?  Can it be brought in as a 16 bit value from the beginning?

Good luck!

Alan

It is a 16 bit value  from the sensor, but I average, then do not divide by N so as to retain as much resolution as possible. That is the source of the 32 bit value. HOWEVER, the top 8 bits are unused (only placeholders). Since I only need to compare the value against a threshold, I can discard the low 8, leaving 16 for subsequent analysis.

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Have you tried with a 24 variable ?  perhaps it save a tad of code.

ka7ehk wrote:
All of this is to compare a given X,Y,Z acceleration (vector magnitude) to a trigger threshold.

I'm probably being unhelpful  at this stage of the design; but that requirement looks like something the accelerometer should be able to handle autonomously. I.e. you program a set of threshold registers and an averaging filter and the chip asserts an IRQ line if the acceleration threshold is exceeded.

BTW: I'm starting to notice this behaviour in a few commercial products now; for instance Bluetooth headphones that "wake up" when you lift them up off the desk.

Does the sensor you've chosen support this mode of operation ?

ka7ehk wrote:
It is a 16 bit value  from the sensor, but I average, then do not divide by N so as to retain as much resolution as possible. That is the source of the 32 bit value.

I wonder if not averaging makes much difference.

Particularly If you can choose N as a power of 2, it might be worth considering, maintain the sum of the readings as int32_t, then just divide by N to get average as int16_t.

You then have 16 bit values which can be squared to get the magnitude_squared as 32 bit eg.

int16_t x_ave, y_ave, z_ave;
uint32_t mag_sq = (uint32_t)((int32_t)x_ave * x_ave) +
(uint32_t)((int32_t)y_ave * y_ave) +
(uint32_t)((int32_t)z_ave * z_ave));

You probably already know this, but keep the division to factors of 2 (2, 4, 8, 16, 32, 64, etc.) so it only has to slide bits right instead of perform actual division.

Maybe accruing values in a 24 bit or 32 bit won't be so bad if you do nothing else on that 24 bit or 32 bit variable, and the divide by 2 it back into a 16 bit one.

My experience with 32 bit variables, and especially manipulating them is that it doesn't take much to produce 10-20 instructions for even operations you think are no big deal.  Best thing to do is to dig through your lss file and look at your instructions and what the compiler emits for them.

I always set the debugging to default -g2 even for release because it doesn't change the contents of the hex file or binary produced, but it does give me an lss with source instructions to review.

Doh.  I forgot to udpate for "special int32s".  This one fixed for array of three int32's where for each int32 A:B:C:D only B:C are considered.  If the int32 is not in the range [-2^23,2^23) then result will be wrong.  Difference from above is the added "adiw" instructions.

.section .text
.global sumsq3_s16s32
.type sumsq3_s16s32,function
;; int16_t sumsq3_16(int32_t *v)
;; v: pointer to array of 3 int32's (r25:r24)
;; => approx sum of squares (r25:r24)
;; easily modifiable to: int16_t sumsq_16(int16_t *v, uint8_t n)
;; n: length of array (r23 == 3)
sumsq3_s16s32:
ldi r23,3
movw r30,r24
clr r20
clr r24			; result
clr r25
0: 	ld r20,Z+
ld r21,Z+
;;
muls r21,r21		; v.hi*v.hi => r1:r0
mulsu r21,r20		; v.hi*v.lo => r1:r0
sbc r25,r20
mulsu r21,r20		; v.hi*v.lo => r1:r0
sbc r25,r20
;;
dec r23
brne 0b
;;
clr r1			; reset R1 to zero
ret