how to define 6 byte integer constant?

Go To Last Post
51 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

I am using 6 byte integers, which are defined as an array of 6 bytes:

unsigned char i[6];

Now, I would like to assign the lower 6 bytes of an 8 byte precomputed constant to i, e.g.

i= 0x1000000000000ll/19;

Of course this won't work due to different types.
Can I do this with a type cast and if so, what would it look like?

Thanks for your help!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

An integer is an integer and a string is a string.

You appear to want to have a string representation of an integer:

integer 12345 -> string "12345"

Bear in mind that -12345 -> string "-12345" which needs 7 bytes of storage for the char[] array.

Yes you can use the pre-processor to create the initialised string. You evaluate the constant expression and then 'stringize' the expression.

You will need to do some experiments because of the rules of how CPP evaluates a pre-process time expression.

But really you will be safer with initialising a constant integer expression. And letting the run-time do the 'ltoa()'.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I guess I interpret the OP's question differently? I thought what he was looking for was something like:

union {
 struct {
   uint8_t i[6];
   uint8_t unused[2];
 };
 uint64_t longlong;
} combined;

combined.longlong = 0x1000000000000ll/19;
// now combined.i[] has the lower 6 bytes

(BTW this works in GCC as it supports anonymous structs but in standard C the struct would need to be named and referenced as combined.structname.i[])

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

.

Quote:

An integer is an integer and a string is a string.

True from a high language level view, but a long long integer is just an array of 8 bytes if you look at the machine level.

No, I don't want a string representation of an integer.
If my constant is e.g 0x123456789aLL,
I want

i[0]=0x9a;
i[1]=0x67;
...
i[4]=0x12;
i[5]=0;
i[6]=0;

It is easy to assign the chars individually like above. But the chars should come from a precomputed 6 byte integer constant, actually from the lower 6 bytes of a precomputed 8 byte integer constant (long long), as 6 byte integer is not a standard type.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am always horrified when unions are used for kludges.

The standard C language provides automatic casting between integer types.
The standard C library provides functions for converting between arithmetic expressions and strings.

So you should always be able to convert anything you like.

I know that there are various unpleasant 'tricks' for non-portable access via unions. I have no problems with these techniques. But it seems to me that you should do this in the privacy of your own home with consenting adults.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I guess I interpret the OP's question differently?

Basically yes, your interpretation is correct, Clawson.

But, can I also do this without the union 6bytes+8bytes?
I want to have many of those 6 byte integer constants in the flash and I don't want to waist the extra 100x2 bytes as I'm tight of flash.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So you want to have some 48 bit integers, and have tables of these constants in flash memory.

You should be able to do this with the pre-processor and some macros.

#define U(x)       ((x)&0xFF)
#define BITS_48(x) {U((x)>>0), U((x)>>8), U((x)>>16),\
                    U((x)>>24), U((x)>>32), U((x)>>40)}

typedef unsigned char bum[6];

bum constant = BITS_48(0x1000000000000LL/19);

Untested, but the Compiler should do the work for you.

Before you do some actual maths with these variables, you will need to cast dereferenced pointers to LONGLONG and mask off the bits 48..63 of the result.

I would go mental typing 100 of these constants that use 600 bytes of flash. When regular longlongs wold only use 800 bytes of flash.

David.

Last Edited: Wed. Apr 21, 2010 - 10:54 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This will make David's flesh creep :-)

uint64_t longlong = 0x1000000000000LL/19;
uint8_t * ptr = (uint8_t *) &longlong;
for (uint8_t j=0; j<6; j++) {
 i[j] = *ptr++;
}

But really, why do you need the bytes separated anyway? If you want 48 of 64 bits why not just:

uint64_t longlong = (0x1000000000000LL/19) & 0xFFFFFFFFFFFF;

Quote:
But, can I also do this without the union 6bytes+8bytes?

Guess again - it's a union - there are only 8 bytes involved there.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not quite.

4 byte integers are not long enough for me, 8 byte integers are too long, therefory I have defined myself 6 byte integers and I have written arithmetic routines in assembler for those.

All I need is a clever way to assign 6 byte precalculated integer constants to them without bothering about the bytes.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cliff,
I think what the OP actually want is to initialize 48 bit values in Flash:

unsigned char i[6] PROGMEM = INITIALIZATION;

He is looking for an easy way to use 64 bit values for INITIALIZATION.

David's code is the only possible solution for that I can think of.

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No. Actually I do not get upset by this. It is a straightforward case of dereferencing.

Likewise if iphi has :

bum constant = BITS_48(0x1000000000000LL/19); 

uint64_t longlong = ((uint64_t *)&constant) & 0xFFFFFFFFFFFFLL;

And it would be wise to write a macro for the above abortion.

#define VAL48(x)    (((uint64_t *)&(x)) & 0xFFFFFFFFFFFFLL)
...
uint64_t longlong = VAL48(constant);

I think it is the improper use of unions that I do not like. I am quite happy with the 'normal' use of a union:

typedef struct {
    int type;
    union {
        uint64_t ll;
        uint16_t ss;
    }u;
} composite;  

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I think it is the improper use of unions that I do not like

I'm trying to understand how your example differs from the "improper" one?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:

I think it is the improper use of unions that I do not like

I'm trying to understand how your example differs from the "improper" one?

David "recycles" the space for two different variables to be used in disjunct situations. That's the intended use of union as per specification.

Nevertheless, I personally do like the "illegal" function of quasi-typecasting through union as well, even if I am aware of the dangers.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

David "recycles" the space for two different variables to be used in disjunct situations.

So did I. I defined 8 bytes to be used either as a uint64_t or a struct with a 6 byte and a 2 byte array in it (at least that's what I *thought* I did??).

As two experts have now suggested I did otherwise what am I missing? :oops:

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cliff,
the difference between improper and proper use of a union is the access sequence to the members.

combined.longlong = 0x1000000000000ll/19;
// now combined.i[] has the lower 6 bytes

Writing to one member (longlong) and reading a different member (i) after that is not covered by the standard. After writing to longlong you can only expect a proper value in longlong and in no other member.

David's example suggest a proper use, because the intention of "type" is most likely to hold the information what member was written last.

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You should not assume any particular relationship between the members of union.

You should assume that storing to one member *destroys* the value of the other, i.e. sets it to an undefined value.

And I am no expert >:-< I am THE C-hater.

J.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

However, there appears to be controversy over the matter, as "abusing" union for "typecasting" (and alike) is very popular amongst programmers, in spite of being against the original intention and hence non-portable.

See discussion of verse 531 in the Derek Jones book:

Quote:
Accessing the same storage locations using different types depends on undefined and implementation-
defined behaviors. The standard only defines the behavior if the member being read from is the same member that was last written to. Performing such operations is often unconditionally recommended against in coding guideline documents.

[...]

If developers do, for whatever reason, want to make use of type punning, is the use of a union type better than the alternatives (usually casting pointers)? When a union type is used, it is usually easy to see which different types are involved by looking at the definitions. When pointers are used, it is usually much harder to obtain a complete list of the types involved (all values, and the casts involved, assigned to the pointer object need to be traced). Union types would appear to make the analysis of the code much easier than if pointer types are used.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Using long long was only very awkward implemented on the AVR-GCC.
You need not only 2 bytes more on the data.
You need several kbytes of code for the long long lib and tons of code on every access.

You can use long long for constant math at compile time.

But you should avoid long long math at runtime.
Then the code grows (bigger than on float) and CPU load rise up dramatically.

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
I guess I interpret the OP's question differently? I thought what he was looking for was something like:

union {
 struct {
   uint8_t i[6];
   uint8_t unused[2];
 };
 uint64_t longlong;
} combined;

combined.longlong = 0x1000000000000ll/19;
// now combined.i[] has the lower 6 bytes

(BTW this works in GCC as it supports anonymous structs but in standard C the struct would need to be named and referenced as combined.structname.i[])

The OP wants a 6-byte solution.
This is an 8-byte union.
The OP needs a macro and a compound literal:
#define we_have_six(llong) ((uint48_t){{ \
   (llong),     \
   (llong)>>8,  \
   (llong)>>16, \
   (llong)>>32, \
   (llong)>>36, \
   (llong)>>40  \
}})

typedef struct { unsigned char bytes[6]; } uint48_t;

uint48_t fred;
fred=we_have_six(0x1000000000000ULL/19);

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

The OP wants a 6-byte solution.

Yes and the i[6] array provides it.

As for not using a union to interpret data in more than one way I've never come across any situation where that does not work. In fact one of the standard solutions postulated here when people want to send the 4 bytes of a float off up the UART is:

union {
  float f;
  uint8_t bytes[4];
} combined;

combined.f = 3.1415926;
for (i=0; i<4; i++) {
  uart_send(combined.bytes[i]);
}

Is there REALLY anyone here who does not think that would send the 4 bytes of the IEEE754 off up the UART channel?

But as noted above it's surely no better or worse than the casting a byte pointer onto the float and looping four times sending *p ?

And yes I know there's an endianness issue in this but it doesn't take very long to find out whether PI is 0x40490FDA or 0xDA0F49400 and once both ends know the byte order they can disassemble/reassemble appropriately.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Is there REALLY anyone here who does not think that would send the 4 bytes of the IEEE754 off up the UART channel?

Yes. I have transferred data between machines of different endianness. And machines that used different float representation.

So I regard any serialisation or access with care. I generally wrap it in a macro.

You then have no problems between different platforms. And you get any efficiencies that you need when the macro simply does the sort of dereferencing you have posted.

I would guess that anyone who had used big-endian machines is horrified by the cavalier approach of the 'Intel camp'.

However if you want a real war, all you have to do is promote one indentation style over another.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cliff,

I believe you know very well what's the issue here.

It's just the question of how far on the safe side one wants to stay.

Standards are here to state where the borderline lies; but standards are not made for ever either.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No Jan I truly don't - If I want to interpret a block of data in many ways I'd happily cast a union onto it. I just don't see the difference between doing that and actually instantiating a union, assigning one member set within it then reading out a different interpretation. I just don't see where the problem lies? In experience I've never known such techniques to fail - either the cast or the actual object creation.

I mean this is how the vast majority of TCP/IP works, a union of structs is cast onto the anonymous buffered receive data to interpret in different ways depending on an early type indicating byte that shows which format to use.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:

The OP wants a 6-byte solution.

Yes and the i[6] array provides it.

As for not using a union to interpret data in more than one way I've never come across any situation where that does not work.

The problem with the union isn't heresy, it's obesity.
The union has 8 bytes.
The OP is planning to put 100 of them in flash
and does not want to waste the 200 bytes.
If the OP just wants initialization and does not need assignment,
he just needs a macro:
#define we_have_six(llong) { \
   (llong),     \
   (llong)>>8,  \
   (llong)>>16, \
   (llong)>>32, \
   (llong)>>36, \
   (llong)>>40  \
}

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
No Jan I truly don't - If I want to interpret a block of data in many ways I'd happily cast a union onto it.

As long as the current standard says reading of unwritten member of union is undefined, the unaware programmer will run into troubles. Even worse: he might be using "known good" libraries e.g. with a newer version of a compiler which makes use of the liberty of NOT returning the content of unwritten member of union in some optimisation.

This is very similar to what we see often here: people from other platforms/compilers write the loop delay and find it surprising it won't work.

clawson wrote:
I just don't see the difference between doing that and actually instantiating a union, assigning one member set within it then reading out a different interpretation.
No, there's no difference. Both are broken when looked at it strictly.

clawson wrote:
In experience I've never known such techniques to fail - either the cast or the actual object creation.
Oh, Cliff, in my experience, if in the night I blast through the city streets with my car 150+ (for you, roughly 100+), nothing wrong happens.

clawson wrote:
I mean this is how the vast majority of TCP/IP works, a union of structs is cast onto the anonymous buffered receive data to interpret in different ways depending on an early type indicating byte that shows which format to use.
Yes, the popularity of the technique (and, as quoted above, also a rational view) is what drives the changes in the forthcoming version of C standard (Derek Jones is AFAIK member of THE committee). I personally do use it often, too.

However, the present status quo is, that these techniques are illegal and thus prone to surprising failure.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

However, the present status quo is, that these techniques are illegal and thus prone to surprising failure.

That'd be a bit sad for the Linux kernel - it uses the union of structs technique all over the place!

But how could it "go wrong" anyway? union{} surely means "these are different data interpretations that all live at the same address". So it's not like one variant in the union is at a different place. So it then comes down to data ordering. But again, apart from struct{} packing/alignment issues I've never known a compiler that didn't order struct{} elements in the order defined. Is THAT what the issue is here? That there's no actual GUARANTEE that struct{} members will have a known order? I suppose that for efficient packing a compiler might choose to group all the unit8_t's up at one end or similar?

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:

However, the present status quo is, that these techniques are illegal and thus prone to surprising failure.

That'd be a bit sad for the Linux kernel - it uses the union of structs technique all over the place!

That's quite a specific situation: it is not going to be ported to a different compiler, is it? I'd say there are much more gcc-specific nonportabilities there, isn't it?

clawson wrote:
But how could it "go wrong" anyway? union{} surely means "these are different data interpretations that all live at the same address".
No. It says, "these different variables MAY live at the same address". There are some restrictions to this, say they HAVE to start at the same address (I am lazy to look up the exact verse now so don't take me literally), but apart from this, the compiler is mostly free to scatter it all around the memory at its will.

clawson wrote:
So it's not like one variant in the union is at a different place. So it then comes down to data ordering. But again, apart from struct{} packing/alignment issues [...]
... which may get grave enough in themselves ...

clawson wrote:
I've never known a compiler that didn't order struct{} elements in the order defined.
Nono, the standard does not allow to swap order of struct members; but the alignment issues say together with bitfields and similar may throw things quite prettely apart.

clawson wrote:
Is THAT what the issue is here? That there's no actual GUARANTEE that struct{} members will have a known order?

No, its worse than that. It's my poor English: I've said it out above. The compiler is free to return you anything if you try to read a member of union which is not the last one written. Say

union {
  uint8_t a, b;
}u;

u.a = 5;
{some stuff here, not influcencing u, but making the compiler to flush "5" from registers}
printf("I don't use printf at all so I don't know the formatting strings", u.b)

The compiler (optimiser) is free to print WHATEVER, as u.b 's content is undefined at that point, and the most optimal is NOT to reload from memory anything before calling printf.

One day I - as the ultimate C hater - should write a malicious C compiler, which would be 100% ANSI/ISO/MISRA/whatever conformant, yet deliberately throw mess whenever an undefined feature is used in the program... :-)

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
But how could it "go wrong" anyway?
It could go wrong if the compiler decides to use it for optimization. If member A is read while the last write went to member B, then the compiler could decide to eliminate the read and return some garbage. (*)

clawson wrote:
union{} surely means "these are different data interpretations that all live at the same address".
Sorry, but exactly that is the problem. It does NOT mean "different data interpretations", it means "different data at different time at the same address".

(*) Of course I don't really expect that to happen, because union abuse is so extremely popular. ;-)

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sternst wrote:
clawson wrote:
union{} surely means "these are different data interpretations that all live at the same address".
Sorry, but exactly that is the problem. It does NOT mean "different data interpretations", it means "different data at different time at the same address".
Even less than that: "different data at different time MAY BE at the same address (with some funny and not really useful restrictions)"

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
Even less than that: "different data at different time MAY BE at the same address (with some funny and not really useful restrictions)"
I think you are wrong here. The standard says:
Quote:
All pointers to members of the same union object compare equal.

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sternst wrote:
wek wrote:
Even less than that: "different data at different time MAY BE at the same address (with some funny and not really useful restrictions)"
I think you are wrong here. The standard says:
Quote:
All pointers to members of the same union object compare equal.
Yes; but if the members of union are structs (which used quite often) or even arrays, the "nested" members may be placed at the compiler's wish, thanks to the "alignment" relaxations.

Thus, while the first member of struct/array in union must have in common at least some part of it with other members of the same union, all other may be completely elsewhere.

This is a deliberately twisted interpretation of the standard, to bring out the point.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
Yes; but if the members of union are structs (which used quite often) or even arrays, the "nested" members may be placed at the compiler's wish, thanks to the "alignment" relaxations.
The question was whether all members of a union are at the same address or not. A "nested" member is not a member of the union. Only the struct/array as a whole is the union member. So I still say: all members of a union are at the same address.

And I found an additional quote in the standard to support it:

Quote:
A pointer to a union object, suitably converted,
points to each of its members

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
Yes; but if the members of union are structs (which used quite often) or even arrays, the "nested" members may be placed at the compiler's wish, thanks to the "alignment" relaxations.

Thus, while the first member of struct/array in union must have in common at least some part of it with other members of the same union, all other may be completely elsewhere

Not quite... structure members and array elements must be allocated sequentially in memory. While structure member order is not defined, it is left to the implementation, array element order is guaranteed. The only thing is that padding may be introduced to maintain alignment to support the underlying architecture. So members cannot be somewhere else entirely, they will fall in subsequent bytes in memory. The size of the union will be the size of the largest object it contains. [it is not "may" occupy the same space... it is MUST occupy the same space]

So while the spec does not explicately say that union members may be used for interpreting the same data in different ways, it implies (call it a side effect) it by requiring that each union member occupy the same memory space. The technique is not portable, as it is up to implementation/architecture as to what order the bits/bytes are stored, and how much padding [if any] is used.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

glitch wrote:
The only thing is that padding may be introduced to maintain alignment to support the underlying architecture. So members cannot be somewhere else entirely, they will fall in subsequent bytes in memory.

My malicious ANSI-C compiler will do the following:

union {
  struct {
    uint8_t a;
    uint16_t b;
  } s1;
  struct {
    uint8_t a;
// my malicious compiler inserts 2 byte "padding" here
    uint8_t b[2];
  } s2;
  uint8_t a[3]; // my malicious compiler inserts 4 byte padding between each element of this array
} u;

This is fully conforming, as there is absolutely no restriction on padding; yet only the first byte of the members overlaps.

But, I repeat, the *real* devil is elsewhere, namely in 6.2.6.1/7: writing to member of union causes other members of the same union to take undefined value. This is completely regardless of what is the actual layout of the members.

So, this is not only nonportable, but also inherently dangerous, strictly conforming to standard.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

// my malicious compiler inserts 2 byte "padding" here

Even if you use -fpacked-struct or whatever it's option to force that is? I'd expect padding for union elements that are not as long as the longest element to then be at the end.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If your compiler is truly adding the padding you are saying, that has got to be the worst compiler implementation I have ever seen. I can unnderstand 16 or 32bit alignment on members, but adding 32bits of spacing between each array element is just plane silly. [actually I'll go even furhter and say the the compiler is non-conforming due to the padding in the array, unless uint8_t actually returns a size of 5 bytes, which would also be non-conforming]

As for 6.2.6.1/7 it may be "undefined" in therms of the text, in that they are not defining a specific result. But it is an implementation defined result. So as long as you know the implementation you can safely do overlapping structures to access the same data in differnt ways.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:

// my malicious compiler inserts 2 byte "padding" here

Even if you use -fpacked-struct or whatever it's option to force that is? I'd expect padding for union elements that are not as long as the longest element to then be at the end.
Hah! Where did you see a -fpacked-struct in the standard? :-)

I said, 100% conforming and 100% malicious!

Just for starter, have a look at Annex J - the list of unspecified, undefined and implementation-defined behaviour runs two dozens of pages long! So easy to cater for the hatred... ;-)

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not conforming...

There will never be padding in between elements of an array. That is specifically not allowed.

The C99 standard says this: "An array type describes a contiguously allocated nonempty set of objects...".

Whereas, a structure is "sequentially", not "contiguously" allocated, thus allowing for padding between members [or after so that the next object starts on an alignment boundary].

And since the data type is uint8_t, its size will be exactly 1 byte [8 bits] no additional padding bits are allowed for this type. [also specifically stated in the spec]

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

glitch wrote:
If your compiler is truly adding the padding you are saying, that has got to be the worst compiler implementation I have ever seen.

Sure. Deliberately. Isn't that the definition of malice, be deliberately bad?

That union can be used for "typecasting" is an accident rather than intention (intention was to share memory between variables of disjunct functions with no mutual relationship); unfortunately it's also a quite good idea and it spread. This is exactly in the hackish/viral spirit of C and *nix: [further rants self-censored as I don't want to be kicked out off this forum today].

But, jokes aside, the malicious compiler would do exactly the same job (just with other means) as xLint-s and MISRA and other "code checkers": find the weak points where the programmer's intention could potentially be misunderstood by the compiler(s).

glitch wrote:
[actually I'll go even furhter and say the the compiler is non-conforming due to the padding in the array, unless uint8_t actually returns a size of 5 bytes, which would also be non-conforming]
Yeah, maybe. I am not sure uint8_t could not be 5-bytes but I am also not that sure with requirements on [non-]padding arrays [edit read your last post later]. Structs surely can be padded at free will of the compiler, though, so I can replace the array with a struct of 3 bytes [edit or any other type I would be allowed to pad at my will] easily.

glitch wrote:
As for 6.2.6.1/7 it may be "undefined" in therms of the text, in that they are not defining a specific result. But it is an implementation defined result.

Not at all. The standard is very, very clear in defining, distinguishing and using the terms "undefined", "unspecified" and "implementation defined"; see 3.4. This item is definitively "unspecified" (also listed in annex J1).

glitch wrote:
So as long as you know the implementation you can safely do overlapping structures to access the same data in differnt ways.

The implementation MAY assure you of how the unspecified and (maybe to a lesser extent) undefined items, but there's no obligation for it to do so (as it is for the "implementation defined" items, see 4/8). (Just a note, 4/5 defines a "strictly conforming program" as such which does not depend on unspecified etc., so there's also a discouragement to go this way in the standard). At the end of the day, it may be the matter of luck - a new platform, new compiler with no such reassurance (which I never have seen written down, btw. - anybody willing to scan through gcc's documentation?), and you might start to completely rewrite your personal set of libraries.

I repeat, I do like this feature, I do use it a lot, and apparently this will make it in some form to the new standard. But the present status is, that using union for this purpose *is* dangerous.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

That'd be a bit sad for the Linux kernel - it uses the union of structs technique all over the place!

I do not think that I have looked at Linux kernel code since the days of Minix-386.

Please can you show one example.

Linux will run on several different architectures. And I would not be surprised if you can use both BIG-END ARM and LITTLE-END ARM versions.

So it would be very unwise to ignore endian issues. As I said earlier, there is no problem with unions for storing different objects. But there is always a 'type' member that determines what sort of object is currently stored in the union.

There is no problem with taking advantage of a particular memory layout via a function or function-like macro. The macros that I posted earlier would be conditional to the platform and architecture.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

David,

OK this one looks like a case in point. It comes from linux-2.6.33.2/net/wireless where core.h contains:

enum cfg80211_event_type {
	EVENT_CONNECT_RESULT,
	EVENT_ROAMED,
	EVENT_DISCONNECTED,
	EVENT_IBSS_JOINED,
};

struct cfg80211_event {
	struct list_head list;
	enum cfg80211_event_type type;

	union {
		struct {
			u8 bssid[ETH_ALEN];
			const u8 *req_ie;
			const u8 *resp_ie;
			size_t req_ie_len;
			size_t resp_ie_len;
			u16 status;
		} cr;
		struct {
			u8 bssid[ETH_ALEN];
			const u8 *req_ie;
			const u8 *resp_ie;
			size_t req_ie_len;
			size_t resp_ie_len;
		} rm;
		struct {
			const u8 *ie;
			size_t ie_len;
			u16 reason;
		} dc;
		struct {
			u8 bssid[ETH_ALEN];
		} ij;
	};
};

In this struct the early entry 'type' is an enum that dictates which of four interpretations (cr, rm, dc or ij) is to be used. The use of this union is then:

static void cfg80211_process_wdev_events(struct wireless_dev *wdev)
{
	struct cfg80211_event *ev;
	unsigned long flags;
	const u8 *bssid = NULL;

	spin_lock_irqsave(&wdev->event_lock, flags);
	while (!list_empty(&wdev->event_list)) {
		ev = list_first_entry(&wdev->event_list,
				      struct cfg80211_event, list);
		list_del(&ev->list);
		spin_unlock_irqrestore(&wdev->event_lock, flags);

		wdev_lock(wdev);
		switch (ev->type) {
		case EVENT_CONNECT_RESULT:
			if (!is_zero_ether_addr(ev->cr.bssid))
				bssid = ev->cr.bssid;
			__cfg80211_connect_result(
				wdev->netdev, bssid,
				ev->cr.req_ie, ev->cr.req_ie_len,
				ev->cr.resp_ie, ev->cr.resp_ie_len,
				ev->cr.status,
				ev->cr.status == WLAN_STATUS_SUCCESS,
				NULL);
			break;
		case EVENT_ROAMED:
			__cfg80211_roamed(wdev, ev->rm.bssid,
					  ev->rm.req_ie, ev->rm.req_ie_len,
					  ev->rm.resp_ie, ev->rm.resp_ie_len);
			break;
		case EVENT_DISCONNECTED:
			__cfg80211_disconnected(wdev->netdev,
						ev->dc.ie, ev->dc.ie_len,
						ev->dc.reason, true);
			break;
		case EVENT_IBSS_JOINED:
			__cfg80211_ibss_joined(wdev->netdev, ev->ij.bssid);
			break;
		}
		wdev_unlock(wdev);

In which it's processing wireless device events. The routine picks up the next generic event packet from a linked list with list_first_entry() and then depending on the value of ev->type it handles the retrieved packet in different ways.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cliff,

This looks completely respectable to me. The enum determines the type of object. The subsequent code operates on the object in a manner relevant to that object.

If you can find a case where it uses say ev->dc.xxxx for an EVENT_ROAMED then I would be really worried.

A cupboard is often used for storing cups. It could equally well store apples. As long as you do not try eating a cup or pouring tea into an apple, life gets along just fine.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is the case of recycling the same space for variables with completely disjunct use.

What, a case example for the originally intended use of union...

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But this is effectively just casting a differing interpretation onto anonymous bytes - I thought this was one of the things you guys were moaning about?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Strictly speaking, if the union is written through the same element, then that's OK.

If it was written through some other element (a pure array of bytes) that's broken.

If it was written through a pointer to byte, then that's broken, too; but that's just other of the many fundamental flaws in both the ... ehm ... design of the language and its common usage.

Still wonder why I hate it? :-)

JW

Last Edited: Thu. Apr 22, 2010 - 12:20 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
But this is effectively just casting a differing interpretation onto anonymous bytes - I thought this was one of the things you guys were moaning about?

No. This what a union is designed for: i.e. recycling the same space for variables with completely disjunct use.

My objection is when you store something as a widget and then use it as a gobbledygook.

For example you store a 'long' and then access it as a pointer or a char or something completely different.

I know and you know that there may be some efficiency gained by inspecting bits 24..31 directly in memory. I also know that it is fraught with danger.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Please can you show one example.
Example

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
So it would be very unwise to ignore endian issues. As I said earlier, there is no problem with unions for storing different objects. But there is always a 'type' member that determines what sort of object is currently stored in the union.
My solution does not require knowledge of the compiler's endianness.
The "solutions" involving unions do not work because
they do not produce the constant expressions needed
for initialization of global and static variables.
Unions have their uses, but this is not one of them.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think that we produced a storage solution about 40 messages ago. The constant was placed in memory in little-endian style.

The OP actually has his own 48 bit functions and has full control of how he accesses these constants.

As far as I understand, he is not trying to cast to uint64_t's or to serialise away from the AVR. So our diversion into discussions about use/abuse of unions is not really relevant.

My macro for storage should work with any Compiler that supports 'long long' expressions. It is always little-endian. The OP would have to follow the same style.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
I think that we produced a storage solution about 40 messages ago. The constant was placed in memory in little-endian style.
Oops. Don't know how I missed it.
Though I did provide a mechanism for doing assignments,
the OP seems to be interested exclusively in initialization.
Quote:
The OP actually has his own 48 bit functions and has full control of how he accesses these constants.

As far as I understand, he is not trying to cast to uint64_t's or to serialise away from the AVR. So our diversion into discussions about use/abuse of unions is not really relevant.

My macro for storage should work with any Compiler that supports 'long long' expressions. It is always little-endian. The OP would have to follow the same style.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sternst wrote:
Cliff,
the difference between improper and proper use of a union is the access sequence to the members.

combined.longlong = 0x1000000000000ll/19;
// now combined.i[] has the lower 6 bytes

Writing to one member (longlong) and reading a different member (i) after that is not covered by the standard. After writing to longlong you can only expect a proper value in longlong and in no other member.

There is a GCC optimisation controlled by

no-strict-aliasing

which allows GCC to re-order or remove reads and writes to variables, or to but variables into different registers, if they have different type, even if they refer to the same memory location.

I haven't seen GCC doing this to C code yet, but the C++ compiler does.

The GCC extension for structs is a specific exception to this optimisation, allowing old-style behaviour, but only if you use a struct to do it. That is, not by casting a pointer.