char vs unsigned char in for loops (++ on char objects)

20 posts / 0 new
Last post
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok so I stumbled across this and it seems like a compiler issue. It is at least something that I was not expecting.

What I'm seeing is that when a variable is declared as "char" and it is used in a for loop, the loop does not work as expected, but when the variable is declared as an "unsigned char" or -funsigned-char is used for compiling, it works as expected.

While I might buy that the comparison being done in the for loop is not really valid for a type "char" what does not seem valid is to update/increment the char variable value as a 16 bit value that then gets passed to a child function. i.e. values larger than 0xff can exist for the char variable value and be passed to the child function.

And note that the values larger than 8 bits are not simply sign extension, they are actual 16 bit values.

Here is the tiny test code that shows the issue:

void bar(int c);

void foo(void)
{
char c;

        for(c = 1; c; c++)
                bar(c);
}

Here is the code when using type "char":

00000000 :
   0:	cf 93       	push	r28
   2:	df 93       	push	r29
   4:	c0 e0       	ldi	r28, 0x00	; 0
   6:	d0 e0       	ldi	r29, 0x00	; 0
   8:	ce 01       	movw	r24, r28
   a:	0e 94 00 00 	call	0	; 0x0 
   e:	21 96       	adiw	r28, 0x01	; 1
  10:	00 c0       	rjmp	.+0      	; 0x12 <__zero_reg__+0x11>

Here is the code when the type is "unsigned char" for c or when using -funsigned-char on the command line:

00000000 :
   0:	1f 93       	push	r17
   2:	11 e0       	ldi	r17, 0x01	; 1
   4:	81 2f       	mov	r24, r17
   6:	90 e0       	ldi	r25, 0x00	; 0
   8:	0e 94 00 00 	call	0	; 0x0 
   c:	1f 5f       	subi	r17, 0xFF	; 255
   e:	01 f4       	brne	.+0      	; 0x10 
  10:	1f 91       	pop	r17
  12:	08 95       	ret

The result is very similar when using:

void bar(int c);

void foo(void)
{
char c;

        for(c = 1; c < 0x0fe; c++)
                bar(c);
}

I stumbled across this accidentally as I mistyped some code during some testing with an Arduino project and the only reason that I saw it all is that the Arduino IDE does not turn on -funsigned-char

Both C and C++ exhibit this behavior.

Note:
Updated to change subject since it really is about
the ++ operator on char types.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

> what does not seem valid is to update/increment the char
> variable value as a 16 bit value

Then go back to #1, and study the C language, chapter "integer promotion",
and "sign extension".

Rule of thumb: use "char" if you are referring to a printable character,
something that you entered from a keyboard, display it on an LCD, or
pass it to the various str*() functions for processing.

If you actually want a small integer number, #include , and
use uint8_t or int8_t, depending on whether you want an unsigned or a
signed small integer. If you don't insist on it being exactly 8 bits
wide but just want to use the most efficient type of at least 8 bits,
use uint_least8_t/int_least8_t, or uint_fast8_t/int_fast8_t, depending
on whether you are more interested in optimizing for space or speed,
respectively.

And, never mix those three types, except by a typecast. Even though
"char" is technical equivalent to either "uint8_t" or "int8_t", treat
it as a completely distinct type.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.
Please read the `General information...' article before.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Heh. I've been reading ARM documentation. The ALU does 32bit math ONLY, so math on smaller data types ends up being slower and bigger as it carefully truncates results to the correct length. Sort of ... odd feeling. Must make the compiler optimization step "fun."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Roll over was only defined for unsigned type.

In general, if no negative number was needed, you should use unsigned type as loop counter.

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dl8dtl wrote:
> what does not seem valid is to update/increment the char
> variable value as a 16 bit value

Then go back to #1, and study the C language, chapter "integer promotion",
and "sign extension".

I'm very familiar with types and promotion.
(I've been using C on embedded systems for 28 years...)

Ignore the rollover part, and look at the char to 16 bit promotion on the char variable increment.

The resulting bigger than
8 bit value that is being created by what should
be sign extension for the 8 to 16 bit promotion does not seem to be correct in this case.

Normally when sign extension occurs, the sign bit is propagated to all the upper bits to the left.
This is not happening in this case.
The char value 0x80, is passed down to the child function as 0x0080 not 0xff80
and other values like 0xff are passed down to the child as 0x00ff not 0xffff
which would imply unsigned or the use of 16 bits for the char type.

In this case it appears to be treating the char as a 16 bit value.

And so values larger than 0xff like 0x0100 and beyond are
also passed down in the 16 bit integer.

Its like the variable type char is sometimes considered to be signed 8 bits and sometimes larger than 8 bits.

Consider this even simpler example:

void foo()
{
char c;
	c = 0x0;

	while(1)
		bar(c++);
}

Should this ever pass a value to bar() that is
larger than 8 bits? without the full sign extension?
i.e. values like 0x0100 etc? I think it shouldn't since a char is only 8 bits (sizeof returns 1 for it) but yet
it does. it marches right up through every integer
value beyond 7 bits, beyond 8 bits and all the way up.
127, 128, 129, ..... 300,... 1000, ....

And as a final example if the char variable is declared
as a global value to require
pushing the value back into memory and re-fetching it
as 8 bits, the results are different than if declared
as a local.

char c;
void foo()
{
	c = 0x0;

	while(1)
		bar(c++);
}

These two examples produce different "promotion"
so bar() receives a different set of values.
bar() will get 8 bit sign extended values with the
global declaration, but will get 16 bit values
when the variable is declared locally.

To me, this seems incorrect as both are still
operating on a "char" which is only 8 bits.

Now if you change the type to either "unsigned char" or "signed char",
the compiler will ensure that the char variable value does not extend beyond 8 bits and is promoted to 16 bits the way I am expecting.

======

BTW, while I know that the implementation of char can be somewhat implementation dependent, the above two examples work the same on my linux machine. They both sign extend the 8 bit value to integers so that the value goes from 127 to -128 at 0x80
and then counts from -128 back down to zero and then backup again as I would expect.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bperrybap wrote:
What I'm seeing is that when a variable is declared as "char" and it is used in a for loop, the loop does not work as expected, but when the variable is declared as an "unsigned char" or -funsigned-char is used for compiling, it works as expected.

While I might buy that the comparison being done in the for loop is not really valid for a type "char" what does not seem valid is to update/increment the char variable value as a 16 bit value that then gets passed to a child function. i.e. values larger than 0xff can exist for the char variable value and be passed to the child function.

And note that the values larger than 8 bits are not simply sign extension, they are actual 16 bit values.

Here is the tiny test code that shows the issue:

void bar(int c);

void foo(void)
{
char c;

        for(c = 1; c; c++)
                bar(c);
}

The fundamental problem is that conversion to a signed type is only defined over the common range of the types.
When converting to a signed char,
only values from -128..127 are reliable.
Conversions from outside that range are allowed to make demons come out your nose.
That is why some of the values passed to bar could not be obtained by converting from char to int.

Michael Hennebry
"Religious obligations are absolute." -- Relg

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But the fundamental question for me still is, why is a variable declared as a char
allowed to take on more than an 8 bit value? in some situations -- See the second 2 examples above.
(not the one you referenced but the 2nd examples with a simple while loop)

sizeof reports a size of 1 for a char variable yet the value
can contain more than 8 bits.

The first of the 2 examples using the simple while loop showed an example of a variable declared as a char
that was being incremented as a 16 bit value
and the resulting 16 bit value passed down.

The 2nd of the two examples was the same code
but functioned as expected.

So depending on where the variable lives,
the promotion and resulting behavior is different.

To me that is just plain scary. That you can have the same
code and the way it treats the data when it is promoted
from 8 bits to 16 bits varies depending on where the
variable lives.

And a few more examples:

The following two code fragments generate identical code

void bar(int c);
void foo(void)
{
char c = 0;

        while(1)
        {
                bar(c++);
        }
}
void bar(int c);
void foo(void)
{
unsigned int c = 0;

        while(1)
        {
                bar(c++);
        }
}

Both call bar() with an incrementing 16 bit unsigned incrementing value.
Why should a char variable be allowed to hold more than
8 bits?

But yet change the c char declaration to be a static,
or make it a global or make it a "unsigned char"
or a "signed char" and it works as expected
by incrementing an 8 bit value and then promoting
it to a 16 bit value using sign extension as necessary.

Why should the compiler have different behaviors
for all but this one case, given that the variable type "char"
is the same for all cases?

Also, consider the next two examples:

void bar(int c);
void foo(void)
{
char c = 0;

        while(1)
        {
                bar(c);
                c += 1;
        }
}
void bar(int c);
void foo(void)
{
char c = 0;

        while(1)
        {
                bar(c);
                c++;
        }
}

The first one will generate what I believe is the correct code by incrementing c as an 8 bit value then sign extending into the upper 8 bits if necessary, while the second one treats c as an unsigned int
and increments it as a 16 bit value and passes that incremented 16 bit value down to bar().

It seems to me like ++ has an issue with local chars
when they are later promoted to ints and passed to functions.

Or am I totally missing or not understanding something?

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As danni said, the behaviour of overflow on signed types is undefined, as such it is permitted for them to increase indefinitely, warp around, do both at the same time, or crash the program.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bperrybap wrote:
But the fundamental question for me still is, why is a variable declared as a char
allowed to take on more than an 8 bit value?
The nasal demons are not a joke.
Nasal demons can make chars do whatever nasal demons want.

If you prefer, your program crashes as soon as it is required to convert 128 into a char.
You are observing the wreckage.

Michael Hennebry
"Religious obligations are absolute." -- Relg

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
In general, if no negative number was needed, you should use unsigned type as loop counter.

I agree - char is not the thing you should make arithmetic operations with. Besides its signedness is defined through compiler's options. Relying on this property is error prone.

AFAIK a sizeof() macro gives the information of the storage size. Depending on the implementation it can return absolutely any value for char, including 8. But when it returns 1, does it mean it can occupy 2 bytes (two 8-bit registers) when it is an auto?

I thought not..

It looks like there is:

    cast c8 to c16, loop(pass c16,c16++)

instead of(what I would expect):

    loop(cast c8 to c16,pass c16,c8++)

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brutte wrote:
AFAIK a sizeof() macro gives the information of the storage size. Depending on the implementation it can return absolutely any value for char, including 8.
No. sizeof(char) is always 1, no matter how many bits a char has.

Stefan Ernst

Pages