Taking the high, middle and low bytes of a 24-bit value

Go To Last Post
24 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi all,

Bit of a sanity check required here I think.

I'm working with an SPI memory device that requires a 24-bit address, which I have to transfer in 8-bit bytes of course - so ultimately require 3 bytes.

Say I'm starting to write at address 129606, can I simply take the high, middle and low bytes as below?

// 129606 = 0x1FA46
char 0_timestamp_start_high = (0x1FA46 >> 16);
char 0_timestamp_start_middle = (0x1FA46 >> 8);
char 0_timestamp_start_low = (0x1FA46);

Many thanks in advance

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Variable names cannot start with 0 but otherwise it should do something along the lines of what you want. By the way "char" is not a good choice for the type here. Either use "unsigned char" (preferable) or "signed char". "char" with neither signed nor unsigned is a 3rd distinct type and should only be used when dealing with characters. In fact for the signed and unsigned variants most people might choose to use the stdint.h types uint8_t and int8_t to emphasise that it is 8 bits and whether it's being treated unsigned (u) or not.

Oh and if you ever want to check stuff like this use a C program on a PC which is much easier to study than something on an AVR:

uid23021@lxl0060u:~$ cat test.c
#include 
#include 

uint8_t  timestamp_start_high = (0x1FA46 >> 16);
uint8_t  timestamp_start_middle = (0x1FA46 >> 8);
uint8_t  timestamp_start_low = (0x1FA46); 

int main(void) {
  printf("%02X, %02X, %02X\n", timestamp_start_high, timestamp_start_middle, timestamp_start_low);
}
uid23021@lxl0060u:~$ gcc -o test test.c
test.c:5:1: warning: large integer implicitly truncated to unsigned type [-Woverflow]
test.c:6:1: warning: large integer implicitly truncated to unsigned type [-Woverflow]
uid23021@lxl0060u:~$ ./test
01, FA, 46

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks very much - got more than I bargained for in that reply!

I think I might change a lot of my code to use the new defined datatypes, they do seem somewhat clearer.

However, when you say something along the lines of what you want - are you implying I'm still going to get compiler warnings, as opposed to errors, regarding truncation.

This was my exact worry, the line below is simply dumping a much larger than 8-bit value on a little 8-bit variable. Is there a better way to perform this truncation, i.e remove the leading bits that I don't need?

Is there indeed a better way to perform the high and middle byte parts too?

char 0_timestamp_start_low = (0x1FA46);
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Is there indeed a better way to perform the high and middle byte parts too?


"Better" is in the eye of the beholder.

Is the application at 99% of used flash/program space? Then a shorter (less code words) is important.

Does every cycle of the operation count? (e.g., repetitive graphics operations on each pixel) Then a sequence taking the least cycles is important.

Are the above of critical importance in most apps? Usually not.

Does your coding standard (and/or personal preference) require that a build be completely clean, with no warnings? Then you want to do more to get rid of them.

Examine the generated code for the shown fragment, and you'll probably find that the compiler has pre-processed the constant expressions and turned it into a load-store.

In operation, are you really going to have constant addresses? Or will the address be in a variable, such as a uint32_t?

One can use a union and extract the bytes that way. Some purists will object, as this isn't portable across little-endian/big-endian.

See if this clears up the warning:

uint8_t  timestamp_start_low = 0x1FA46 & 0xff;

or maybe

uint8_t  timestamp_start_low = (uint8_t)(0x1FA46 & 0xff);

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The warnings I got when I tried that test were because an int is being squeezed into a char and the compiler had my best interests at heart and was warning me that I might be losing something. As it happens that loss was intentional but I guess that a typecast would have been the way to tell the compiler I know what I'm up to. or how about this that compiles without warning at all and no typecast:

uint8_t  timestamp_start_high = ((0x1FA46 & 0xFF0000) >> 16);
uint8_t  timestamp_start_middle = ((0x1FA46 & 0x00FF00) >> 8);
uint8_t  timestamp_start_low = (0x1FA46 & 0x0000FF); 

Here I'm cutting out the 8bits of interest each time with an AND mask then just shifting the result into the correct (8bit) resting place.

There are, of course other ways to crack this chestnut:

uid23021@lxl0060u:~$ cat test.c
#include 
#include 

uint8_t  timestamp_start_high = ((0x1FA46 & 0xFF0000) >> 16);
uint8_t  timestamp_start_middle = ((0x1FA46 & 0x00FF00) >> 8);
uint8_t  timestamp_start_low = (0x1FA46 & 0x0000FF); 

typedef union {
  uint32_t bignum;
  uint8_t  bytes[4];
} joined_t;

int main(void) {
  printf("%02X, %02X, %02X\n", timestamp_start_high, timestamp_start_middle, timestamp_start_low);

  joined_t joined;
  joined.bignum = 0x1FA46;
  printf("%02X, %02X, %02X\n", joined.bytes[2], joined.bytes[1], joined.bytes[0]);

  uint32_t bignum = 0x1FA46;
  uint8_t * p =(uint8_t *)&bignum;
  printf("%02X, %02X, %02X\n", *(p+2), *(p+1), *p);
  printf("%02X, %02X, %02X\n", p[2], p[1], p[0]);
}
uid23021@lxl0060u:~$ gcc -o test test.c
uid23021@lxl0060u:~$ ./test 
01, FA, 46
01, FA, 46
01, FA, 46
01, FA, 46

However things like the union and the pointer need me to know how the compiler/processor choose to layout the large value in successive bytes of memory. I wrote the above on a PC. It is little-endian (lowest byte first). As it happens AVR is also little endian so the same code would work on an AVR.

Oh and as you can see in the last one, array access is really just pointer access in disguise.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

As it happens AVR is also little endian...

Oh, no, here we go again... :twisted:
]
What is the definitive test for determining endianess?

In my mind, the AVR has so few multi-byte operations that I'd call it agnostic.

But in practice, it "leans" toward little-endian. Of greater practical significance IMO is that the mainstream C compilers for the AVR work little-endian.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Program space wise I'm not pushed at all, I've got a number of data buffers etc and my total SRAM usage is at ~65%, but program wise I'm fine.

Personal preference does indeed require that all warnings be gone.

Interesting...

This gets rid of a warning...

unsigned char timestamp_0_start_low = (0x1FA46 & 0xFF);

...this gets rid of another

unsigned char timestamp_0_start_middle = ((0x1FA46 & 0xFF) >> 8);

...but this does not, during the high byte shift?

unsigned char timestamp_0_start_high = ((0x1FA46 & 0xFF) >> 16);

________________________________________________

Same with the method you suggested - the warnings for low and middle byte dissipear, but not for high byte.

uint8_t  timestamp_start_high = ((0x1FA46 & 0xFF0000) >> 16);
uint8_t  timestamp_start_middle = ((0x1FA46 & 0x00FF00) >> 8);
uint8_t  timestamp_start_low = (0x1FA46 & 0x0000FF);

EDIT - Actually, not quite - infact, there is no error at all for the high byte, even with the ANDing of the 0xFF0000. I presume this is because there are no bits left to the left of the unsigned char.

So the compiler doesn't warn you of losing bits to the right, only losing them to the left?

EDIT 2 - Ah, I think I've got it - an warning is not presenting when shifting 0x1FA46, as 0x1FA46, whilst I'm dealing with it in a 24-bit sense, will actually be stored as a 32-bit value within the AVR, so the first byte is going to all 0s, so when I shift it over 16 places, there are just 8 leading 0s, so nothing to lose, therefore nothing to warn over?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

+1 for using a union as clawson recommended above. Not only does it improve readability, it also provides for multiple access methods to the data.

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Quote:
As it happens AVR is also little endian...

Oh, no, here we go again...

I believe that in this case that should have read:
Quote:
avr-gcc on the AVR is also little endian...

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
...this gets rid of another

unsigned char timestamp_0_start_middle = ((0x1FA46 & 0xFF) >> 8);

But it is incorrect, regardless. It would need to be 0xFF00. But I would use this form:

unsigned char timestamp_0_start_middle = (0x1FA46 >> 8) & 0xFF;

This should get rid of any warnings since the compiler will know that no matter what the original value, the result will fit into a byte. but if you want to be extra sure, you can always do the explicit cast:

unsigned char timestamp_0_start_middle = unsigned char( (0x1FA46 >> 8) & 0xFF);

Or simply:

unsigned char timestamp_0_start_middle = unsigned char(0x1FA46 >> 8);

as the mask becomes redundant when the cast is present.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

avr-gcc on the AVR is also little endian...

AFAIK it would be:

avr-gcc, IAR, CV, ICC, Rowley on the AVR are little endian (not sure about MikroC - you just know they'd do it differently just to be perverse!).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not trying to start a war, actually want to know...

theusch wrote:
Quote:
As it happens AVR is also little endian...
Oh, no, here we go again... :twisted:

What is the definitive test for determining endianess?

How would you characterise these?:
adiw
sbiw
mul (and friends)
(e)ijmp
(e)icall
(e)lpm
spm
any of the indexed modes, i.e Rd,(-)X/Y/Z(+) or (-)X/Y/Z(+),Rd
any PC => stack or stack => PC operation [(r)jmp/(r)call/ret(i)]

I would characterise all of those as inherently little-endian. In the case of stack-operating instructions (call, ret, etc) and indexed operations, this little-endianness is w.r.t to the SRAM mapping. In the case of the arithmetic instructions it is w.r.t numbering of the register file, which is also mapped into SRAM such that (for example) the results of mul is placed (little-endian) into 0x20-0x21.

theusch wrote:
In my mind, the AVR has so few multi-byte operations that I'd call it agnostic.
Pretty much every instruction which could have an endianness and which is available to an 8-bit architecture is implemented, and is little-endian. What more would be required for it not to be considered 'agnostic'?

Quote:
But in practice, it "leans" toward little-endian. Of greater practical significance IMO is that the mainstream C compilers for the AVR work little-endian.

While I agree that from a HLL programmer's perspective the endianness of the language implementation is more important, does that not follow directly from the hardware? Sure, I could write a big-endian implementation of C for the AVR... but why would I? Has anyone done so? I'd have to jump through a fair number of hoops, and I can only imagine it would be sub-optimal.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

the register file, which is also mapped into SRAM

I did a bit of poking with Google, and didn't really find a checklist/criteria/definition to distinguish.

The SRAM mapping of GP registers might be the clue--I'm trying to think of any AVR8 operation that actually affects SRAM. PUSH/POP are only one byte. Indexing can go either way. Multi-byte operations? CALL/RET?

So it would seem to come down to the few new (with introduction of Mega) instructions in the ADIW family? The LDS/STS to SRAM could be done in the other order.

As I said, the AVR seems to lean to little-endian. Consistently, right? Are there counter examples?

When working with a big-endian protocol such as Modbus, indeed there is a lot of byte swapping. But in the end that is due to the fact that the C compiler used wants multi-byte values little-endian.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
I did a bit of poking with Google, and didn't really find a checklist/criteria/definition to distinguish.
There does seem to be quite a variety of metrics, with many examples of machines that mix little- and big- across the different metrics.

Quote:
Multi-byte operations? CALL/RET?
Those (along with other PC=>stack operations like the interrupt mechanism) would be the most direct argument for an conclusion on endianness w.r.t SRAM.

Quote:
Are there counter examples?
Possibly:
#include 
#include 
#include "serial.h" // roll-my-own serial

int main(void) {

  uint8_t foo, bar;

  serial_configure();

  __asm__ __volatile__ (
                        "rcall  foo%=               \n"
                      "foo%=:                       \n\t"
                        "in     r30,      __SP_L__  \n\t"
                        "in     r31,      __SP_H__  \n\t"
                        "ldd    %[foo],   Z+1       \n\t"
                        "ldd    %[bar],   Z+2       \n\t"
                        "adiw   r30,      2         \n\t"
                        "out    __SP_L__, r30       \n\t"
                        "out    __SP_H__, r31       \n\t"
                      :
                        [foo] "=r" (foo),
                        [bar] "=r" (bar)
                      :
                      :
                        "r30", "r31" 
                       );

  printf("sph+1==0x%02X sph+2==0x%02X\r\n", foo, bar);

  while(1);

}
sph+1==0x00 sph+2==0x2D
  __asm__ __volatile__ (
  58:	00 d0       	rcall	.+0      	; 0x5a 

0000005a :
  5a:	ed b7       	in	r30, 0x3d	; 61
  5c:	fe b7       	in	r31, 0x3e	; 62
  5e:	81 81       	ldd	r24, Z+1	; 0x01
  60:	92 81       	ldd	r25, Z+2	; 0x02
  62:	32 96       	adiw	r30, 0x02	; 2
  64:	ed bf       	out	0x3d, r30	; 61
  66:	fe bf       	out	0x3e, r31	; 62
                      :
                      :
                        "r30", "r31" 
                       );

We see that the address pushed onto the stack is 0x002D, which is of course a word address. As a byte address it is 0x005A which agrees with the assembly.

At first glance the address appears to be stored big-endian in that the MSB of the return address is stored in SRAM before the LSB. However considering that the stack grows downwards and access to the SRAM-based stack is 8-bit, in fact the LSB of the return address is actually pushed onto the stack first. In this context the PC=>stack operation is little-endian.

Code that needs to read/write a return address from the stack needs to account for this apparent inversion.

I recall reading (many years ago) a discussion on endianness as it pertains to just this kind of scenario. I can't find this discussion of course... and I recall it not coming to any specific conclusion.

Quote:
Indexing can go either way
When incrementing/decrementing/offsetting the index, yes. I was thinking more of the endianness of the index in the first place, i.e. Z is r30/r31 with the LSB in r30.

Quote:
But in the end that is due to the fact that the C compiler used wants multi-byte values little-endian.
Point taken.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The way you write this seems to be important for avr gcc.

 

Even if these 3 lines should be the same :

SPI0.DATA=(addr>>16);						// send msb addr
SPI0.DATA=((addr>>16)&0xff);				// send msb addr
SPI0.DATA=((addr&0xff0000)>>16);			// send msb addr

The two first produce unneeded code (eor opcodes).

 

void testAddr(uint32_t addr) {
    SPI0.DATA=(addr>>16);						// send msb addr
 234:	aa 27       	eor	r26, r26
 236:	bb 27       	eor	r27, r27
 238:	80 93 c4 08 	sts	0x08C4, r24	; 0x8008c4 <__TEXT_REGION_LENGTH__+0x7008c4>
 23c:	08 95       	ret

void testAddr(uint32_t addr) {
	SPI0.DATA=((addr>>16)&0xff);				// send msb addr
 234:	aa 27       	eor	r26, r26
 236:	bb 27       	eor	r27, r27
 238:	80 93 c4 08 	sts	0x08C4, r24	; 0x8008c4 <__TEXT_REGION_LENGTH__+0x7008c4>
 23c:	08 95       	ret

void testAddr(uint32_t addr) {
	SPI0.DATA=((addr&0xff0000)>>16);			// send msb addr
 234:	80 93 c4 08 	sts	0x08C4, r24	; 0x8008c4 <__TEXT_REGION_LENGTH__+0x7008c4>
 238:	08 95       	ret

 

So use this form :

SPI0.DATA=((addr&0xff0000)>>16);

 

AVR inside

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:

unsigned char timestamp_0_start_middle = unsigned char( (0x1FA46 >> 8) & 0xFF);

Or simply:

unsigned char timestamp_0_start_middle = unsigned char(0x1FA46 >> 8);

as the mask becomes redundant when the cast is present.

I find either of these two methods easier to read, although I was going to suggest a union or array method, but that has already been covered.   Thanks Cliff.

 

Jim

 

 

(Possum Lodge oath) Quando omni flunkus, moritati.

"I thought growing old would take longer"

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

doomstar wrote:

The way you write this seems to be important for avr gcc.

 

Even if these 3 lines should be the same :

SPI0.DATA=(addr>>16);						// send msb addr
SPI0.DATA=((addr>>16)&0xff);				// send msb addr
SPI0.DATA=((addr&0xff0000)>>16);			// send msb addr

The two first produce unneeded code (eor opcodes).

 

doomstar wrote:

void testAddr(uint32_t addr) {
    SPI0.DATA=(addr>>16);						// send msb addr
 234:	aa 27       	eor	r26, r26
 236:	bb 27       	eor	r27, r27
 238:	80 93 c4 08 	sts	0x08C4, r24	; 0x8008c4 <__TEXT_REGION_LENGTH__+0x7008c4>
 23c:	08 95       	ret

I found that hard to believe, so i just tried it with the following.

Initially i couldn't get the eor, all 3 were the same, just sts, ret.

Making xxx volatile, however, then I get the same as you, first 2 ways have the eors.

What ? Can't make any sense of it. And why r26, r27 ?

This is with

avr-gcc --version
avr-gcc (GCC) 4.9.2

and -Os

//uint8_t xxx;
volatile uint8_t xxx;

void testf(uint32_t x);
void testf(uint32_t x)
{
    //xxx = x >> 16;
    //xxx = (x >> 16) & 0xff;
    xxx = (x & 0xff0000) >> 16;
}

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

Of course everyone knows this is a 6 YO thread I guess.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Better Late than never !

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Post #15 (from 2020) is a valid point, see #17, at least with avr-gcc (GCC) 4.9.2

Maybe newer version doesn't have this anomaly ?

Why changing uint8_t xxx to volatile uint8_t xxx would cause 2 out of 3 of those ways of writing the same thing to introduce eor r26, r26 and eor r27, r27 makes no sense!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

At least in the case of avr-gcc, do not try to coax

the compiler into generating the desired assembly.

Write legibly.

If that fails use inline assembly.

Do you really want to check every build to

make sure that the coaxing was successful?

 

I specified avr-gcc because gcc has a mechanism

for connecting inline assembly to auto variables.

So far as I know, no other AVR compiler does.

For other compilers, one might need to resort

to writing an entire function in assembly

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

Do you really want to check every build to

make sure that the coaxing was successful?

For years and many apps, I used the union method for coaxing.  But I only checked the first development and not every build.  True, all would need to change porting to 68000.  I never really considered porting any of my AVR8 apps to another platform.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Thu. Aug 27, 2020 - 11:34 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If (known) speed matter don't use C.

Next release, or just some extra code (so the use of the registers change) can change the speed of different parts a lot. 

So if speed matter use ASM.     

Last Edited: Sun. Aug 30, 2020 - 04:48 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
For years and many apps, I used the union method for coaxing.  But I only checked the first development and not every build.  True, all would need to change porting to 68000.  I never really considered porting any of my AVR8 apps to another platform.
At one time the union method was invalid C and therefore not guaranteed to work.

The union method has since been legalized, so the result is now implementation-defined.

As it does not involve actual arithmetic on the input,

I'd expect the code to be consistent between builds.

Iluvatar is the better part of Valar.