Forum Menu




 


Log in Problems?
New User? Sign Up!
AVR Freaks Forum Index

Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Author Message
bperrybap
PostPosted: Sep 16, 2011 - 02:54 AM
Resident


Joined: May 03, 2009
Posts: 564
Location: Dallas, TX USA

Ok so I stumbled across this and it seems like a compiler issue. It is at least something that I was not expecting.

What I'm seeing is that when a variable is declared as "char" and it is used in a for loop, the loop does not work as expected, but when the variable is declared as an "unsigned char" or -funsigned-char is used for compiling, it works as expected.

While I might buy that the comparison being done in the for loop is not really valid for a type "char" what does not seem valid is to update/increment the char variable value as a 16 bit value that then gets passed to a child function. i.e. values larger than 0xff can exist for the char variable value and be passed to the child function.

And note that the values larger than 8 bits are not simply sign extension, they are actual 16 bit values.

Here is the tiny test code that shows the issue:

Code:
void bar(int c);

void foo(void)
{
char c;

        for(c = 1; c; c++)
                bar(c);
}



Here is the code when using type "char":

Code:
00000000 <foo>:
   0:   cf 93          push   r28
   2:   df 93          push   r29
   4:   c0 e0          ldi   r28, 0x00   ; 0
   6:   d0 e0          ldi   r29, 0x00   ; 0
   8:   ce 01          movw   r24, r28
   a:   0e 94 00 00    call   0   ; 0x0 <foo>
   e:   21 96          adiw   r28, 0x01   ; 1
  10:   00 c0          rjmp   .+0         ; 0x12 <__zero_reg__+0x11>


Here is the code when the type is "unsigned char" for c or when using -funsigned-char on the command line:
Code:

00000000 <foo>:
   0:   1f 93          push   r17
   2:   11 e0          ldi   r17, 0x01   ; 1
   4:   81 2f          mov   r24, r17
   6:   90 e0          ldi   r25, 0x00   ; 0
   8:   0e 94 00 00    call   0   ; 0x0 <foo>
   c:   1f 5f          subi   r17, 0xFF   ; 255
   e:   01 f4          brne   .+0         ; 0x10 <foo+0x10>
  10:   1f 91          pop   r17
  12:   08 95          ret


The result is very similar when using:
Code:
void bar(int c);

void foo(void)
{
char c;

        for(c = 1; c < 0x0fe; c++)
                bar(c);
}


I stumbled across this accidentally as I mistyped some code during some testing with an Arduino project and the only reason that I saw it all is that the Arduino IDE does not turn on -funsigned-char

Both C and C++ exhibit this behavior.

Note:
Updated to change subject since it really is about
the ++ operator on char types.

--- bill


Last edited by bperrybap on Sep 18, 2011 - 08:29 PM; edited 1 time in total
 
 View user's profile Send private message  
Reply with quote Back to top
dl8dtl
PostPosted: Sep 16, 2011 - 08:48 AM
Raving lunatic


Joined: Dec 20, 2002
Posts: 7365
Location: Dresden, Germany

> what does not seem valid is to update/increment the char
> variable value as a 16 bit value

Then go back to #1, and study the C language, chapter "integer promotion",
and "sign extension".

Rule of thumb: use "char" if you are referring to a printable character,
something that you entered from a keyboard, display it on an LCD, or
pass it to the various str*() functions for processing.

If you actually want a small integer number, #include <stdint.h>, and
use uint8_t or int8_t, depending on whether you want an unsigned or a
signed small integer. If you don't insist on it being exactly 8 bits
wide but just want to use the most efficient type of at least 8 bits,
use uint_least8_t/int_least8_t, or uint_fast8_t/int_fast8_t, depending
on whether you are more interested in optimizing for space or speed,
respectively.

And, never mix those three types, except by a typecast. Even though
"char" is technical equivalent to either "uint8_t" or "int8_t", treat
it as a completely distinct type.

_________________
Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.
Please read the `General information...' article before.
 
 View user's profile Send private message Send e-mail Visit poster's website 
Reply with quote Back to top
westfw
PostPosted: Sep 16, 2011 - 05:32 PM
Posting Freak


Joined: Jun 19, 2002
Posts: 1321
Location: SF Bay area

Heh. I've been reading ARM documentation. The ALU does 32bit math ONLY, so math on smaller data types ends up being slower and bigger as it carefully truncates results to the correct length. Sort of ... odd feeling. Must make the compiler optimization step "fun."
 
 View user's profile Send private message  
Reply with quote Back to top
danni
PostPosted: Sep 16, 2011 - 05:56 PM
Raving lunatic


Joined: Sep 05, 2001
Posts: 2586


Roll over was only defined for unsigned type.

In general, if no negative number was needed, you should use unsigned type as loop counter.


Peter
 
 View user's profile Send private message  
Reply with quote Back to top
bperrybap
PostPosted: Sep 16, 2011 - 09:34 PM
Resident


Joined: May 03, 2009
Posts: 564
Location: Dallas, TX USA

dl8dtl wrote:
> what does not seem valid is to update/increment the char
> variable value as a 16 bit value

Then go back to #1, and study the C language, chapter "integer promotion",
and "sign extension".


I'm very familiar with types and promotion.
(I've been using C on embedded systems for 28 years...)

Ignore the rollover part, and look at the char to 16 bit promotion on the char variable increment.

The resulting bigger than
8 bit value that is being created by what should
be sign extension for the 8 to 16 bit promotion does not seem to be correct in this case.

Normally when sign extension occurs, the sign bit is propagated to all the upper bits to the left.
This is not happening in this case.
The char value 0x80, is passed down to the child function as 0x0080 not 0xff80
and other values like 0xff are passed down to the child as 0x00ff not 0xffff
which would imply unsigned or the use of 16 bits for the char type.

In this case it appears to be treating the char as a 16 bit value.

And so values larger than 0xff like 0x0100 and beyond are
also passed down in the 16 bit integer.


Its like the variable type char is sometimes considered to be signed 8 bits and sometimes larger than 8 bits.

Consider this even simpler example:
Code:
void foo()
{
char c;
   c = 0x0;

   while(1)
      bar(c++);
}


Should this ever pass a value to bar() that is
larger than 8 bits? without the full sign extension?
i.e. values like 0x0100 etc? I think it shouldn't since a char is only 8 bits (sizeof returns 1 for it) but yet
it does. it marches right up through every integer
value beyond 7 bits, beyond 8 bits and all the way up.
127, 128, 129, ..... 300,... 1000, ....


And as a final example if the char variable is declared
as a global value to require
pushing the value back into memory and re-fetching it
as 8 bits, the results are different than if declared
as a local.

Code:
char c;
void foo()
{
   c = 0x0;

   while(1)
      bar(c++);
}



These two examples produce different "promotion"
so bar() receives a different set of values.
bar() will get 8 bit sign extended values with the
global declaration, but will get 16 bit values
when the variable is declared locally.

To me, this seems incorrect as both are still
operating on a "char" which is only 8 bits.

Now if you change the type to either "unsigned char" or "signed char",
the compiler will ensure that the char variable value does not extend beyond 8 bits and is promoted to 16 bits the way I am expecting.

======

BTW, while I know that the implementation of char can be somewhat implementation dependent, the above two examples work the same on my linux machine. They both sign extend the 8 bit value to integers so that the value goes from 127 to -128 at 0x80
and then counts from -128 back down to zero and then backup again as I would expect.

--- bill
 
 View user's profile Send private message  
Reply with quote Back to top
skeeve
PostPosted: Sep 17, 2011 - 12:19 AM
Raving lunatic


Joined: Oct 29, 2006
Posts: 3041


bperrybap wrote:
What I'm seeing is that when a variable is declared as "char" and it is used in a for loop, the loop does not work as expected, but when the variable is declared as an "unsigned char" or -funsigned-char is used for compiling, it works as expected.

While I might buy that the comparison being done in the for loop is not really valid for a type "char" what does not seem valid is to update/increment the char variable value as a 16 bit value that then gets passed to a child function. i.e. values larger than 0xff can exist for the char variable value and be passed to the child function.

And note that the values larger than 8 bits are not simply sign extension, they are actual 16 bit values.

Here is the tiny test code that shows the issue:

Code:
void bar(int c);

void foo(void)
{
char c;

        for(c = 1; c; c++)
                bar(c);
}
The fundamental problem is that conversion to a signed type is only defined over the common range of the types.
When converting to a signed char,
only values from -128..127 are reliable.
Conversions from outside that range are allowed to make demons come out your nose.
That is why some of the values passed to bar could not be obtained by converting from char to int.

_________________
Michael Hennebry
"Well, if anyone would know about hubris,
it would be the man who built God." -- Root
 
 View user's profile Send private message  
Reply with quote Back to top
bperrybap
PostPosted: Sep 17, 2011 - 01:17 AM
Resident


Joined: May 03, 2009
Posts: 564
Location: Dallas, TX USA

But the fundamental question for me still is, why is a variable declared as a char
allowed to take on more than an 8 bit value? in some situations -- See the second 2 examples above.
(not the one you referenced but the 2nd examples with a simple while loop)

sizeof reports a size of 1 for a char variable yet the value
can contain more than 8 bits.

The first of the 2 examples using the simple while loop showed an example of a variable declared as a char
that was being incremented as a 16 bit value
and the resulting 16 bit value passed down.

The 2nd of the two examples was the same code
but functioned as expected.

So depending on where the variable lives,
the promotion and resulting behavior is different.

To me that is just plain scary. That you can have the same
code and the way it treats the data when it is promoted
from 8 bits to 16 bits varies depending on where the
variable lives.

And a few more examples:

The following two code fragments generate identical code

Code:
void bar(int c);
void foo(void)
{
char c = 0;

        while(1)
        {
                bar(c++);
        }
}

Code:
void bar(int c);
void foo(void)
{
unsigned int c = 0;

        while(1)
        {
                bar(c++);
        }
}


Both call bar() with an incrementing 16 bit unsigned incrementing value.
Why should a char variable be allowed to hold more than
8 bits?

But yet change the c char declaration to be a static,
or make it a global or make it a "unsigned char"
or a "signed char" and it works as expected
by incrementing an 8 bit value and then promoting
it to a 16 bit value using sign extension as necessary.

Why should the compiler have different behaviors
for all but this one case, given that the variable type "char"
is the same for all cases?

Also, consider the next two examples:

Code:
void bar(int c);
void foo(void)
{
char c = 0;

        while(1)
        {
                bar(c);
                c += 1;
        }
}


Code:
void bar(int c);
void foo(void)
{
char c = 0;

        while(1)
        {
                bar(c);
                c++;
        }
}


The first one will generate what I believe is the correct code by incrementing c as an 8 bit value then sign extending into the upper 8 bits if necessary, while the second one treats c as an unsigned int
and increments it as a 16 bit value and passes that incremented 16 bit value down to bar().

It seems to me like ++ has an issue with local chars
when they are later promoted to ints and passed to functions.



Or am I totally missing or not understanding something?

--- bill
 
 View user's profile Send private message  
Reply with quote Back to top
TimothyEBaldwin
PostPosted: Sep 17, 2011 - 01:16 PM
Hangaround


Joined: Aug 26, 2008
Posts: 223


As danni said, the behaviour of overflow on signed types is undefined, as such it is permitted for them to increase indefinitely, warp around, do both at the same time, or crash the program.
 
 View user's profile Send private message  
Reply with quote Back to top
skeeve
PostPosted: Sep 17, 2011 - 04:15 PM
Raving lunatic


Joined: Oct 29, 2006
Posts: 3041


bperrybap wrote:
But the fundamental question for me still is, why is a variable declared as a char
allowed to take on more than an 8 bit value?
The nasal demons are not a joke.
Nasal demons can make chars do whatever nasal demons want.

If you prefer, your program crashes as soon as it is required to convert 128 into a char.
You are observing the wreckage.

_________________
Michael Hennebry
"Well, if anyone would know about hubris,
it would be the man who built God." -- Root
 
 View user's profile Send private message  
Reply with quote Back to top
Brutte
PostPosted: Sep 17, 2011 - 10:00 PM
Raving lunatic


Joined: Oct 05, 2006
Posts: 3055
Location: Poland

Quote:
In general, if no negative number was needed, you should use unsigned type as loop counter.


I agree - char is not the thing you should make arithmetic operations with. Besides its signedness is defined through compiler's options. Relying on this property is error prone.

AFAIK a sizeof() macro gives the information of the storage size. Depending on the implementation it can return absolutely any value for char, including 8. But when it returns 1, does it mean it can occupy 2 bytes (two 8-bit registers) when it is an auto?

I thought not..

It looks like there is:
    cast c8 to c16,
    loop(pass c16,c16++)


instead of(what I would expect):
    loop(cast c8 to c16,pass c16,c8++)
 
 View user's profile Send private message  
Reply with quote Back to top
sternst
PostPosted: Sep 17, 2011 - 10:06 PM
Raving lunatic


Joined: Jul 23, 2001
Posts: 2728
Location: Osnabrueck, Germany

Brutte wrote:
AFAIK a sizeof() macro gives the information of the storage size. Depending on the implementation it can return absolutely any value for char, including 8.
No. sizeof(char) is always 1, no matter how many bits a char has.

_________________
Stefan Ernst
 
 View user's profile Send private message  
Reply with quote Back to top
Brutte
PostPosted: Sep 18, 2011 - 12:28 AM
Raving lunatic


Joined: Oct 05, 2006
Posts: 3055
Location: Poland

ISO/IEC 9899 wrote:
When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

Yes, you are right, that is an exception! In case of 32-bitters that must be difficult or even impossible to keep an auto in 1/4-th of the register but that is how sizeof works.

_________________
No RSTDISBL, no fun!
 
 View user's profile Send private message  
Reply with quote Back to top
bperrybap
PostPosted: Sep 18, 2011 - 08:14 PM
Resident


Joined: May 03, 2009
Posts: 564
Location: Dallas, TX USA

I would like to focus the discussion to the detailed matter and not about why was a char used and what should have been used.
Like I said originally its use in this case was accidental and exposed what I think is mishandling
of a char object type by the ++ operator.

So here is the detailed matter:
The use of the ++ operator on chars.

When the char is an automatic, the ++ operator appears
to treat the char object as a 16 bit unsigned value.
It definitely creates a value for the object that is outside the range values defined in <limits.h> for a char.

Yet for other char objects including
a static char, or an unsigned char or a signed char,
the ++ operator works as expected confining the
object to its defined MIN and MAX values.

Also when using an automatic char with other integer math,
it seems to work as expected with full sign extension during promotion and the object itself is limited to 8 bits or the values in <limits.h>

These two do not generate the same final value
when var is declared as an automatic char:

Code:
var++; // gives 16 bit value to var
var +=1; // limits var to 8 bits



Is it valid to allow a char object type to return different results for the same operator depending on whether it is automatic or static?
And is it valid to ever allow an object to have a value that is outside it's range as defined in <limits.h>
i.e a char object value outside the bounds of CHAR_MIN and CHAR_MAX



--- bill
 
 View user's profile Send private message  
Reply with quote Back to top
Koshchi
PostPosted: Sep 18, 2011 - 10:28 PM
10k+ Postman


Joined: Nov 17, 2004
Posts: 14663
Location: Vancouver, BC

It seems to be very specific circumstances. If foo() and bar() are in the same file such that bar() gets inlined, then c is treated as an 8 bit number in all cases.

It appears to me to be a case of over aggressive optimization: "I know that the function takes an int, and I need to promote the char to an int to do the ++ anyways, so I might as well keep it an int all the time". But to me it is clearly wrong. If there is some "magic" that is supposed to happen because of automatic promotion, it should certainly happen in all cases of 8 bit variables. And in any case, that "magic" could only make sense with something like bar(c++), where the increment and the function call are in the same statement.

_________________
Regards,
Steve A.

The Board helps those that help themselves.
 
 View user's profile Send private message  
Reply with quote Back to top
dl8dtl
PostPosted: Sep 19, 2011 - 02:31 PM
Raving lunatic


Joined: Dec 20, 2002
Posts: 7365
Location: Dresden, Germany

> But to me it is clearly wrong.

It might look strange at the first place, but as others have
pointed out, it's simply the result of an integer overflow,
which is explicitly mentioned in the C standard as an example
of ``undefined behaviour''. As such, as surprising as it might
seem at a first glance, GCC is clearly *not* wrong in that. It
would not be wrong in passing 42 to the called function either,
or in making the program prompting the operator for a password. ;-)

_________________
Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.
Please read the `General information...' article before.
 
 View user's profile Send private message Send e-mail Visit poster's website 
Reply with quote Back to top
skeeve
PostPosted: Sep 19, 2011 - 04:30 PM
Raving lunatic


Joined: Oct 29, 2006
Posts: 3041


Brutte wrote:
ISO/IEC 9899 wrote:
When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

Yes, you are right, that is an exception! In case of 32-bitters that must be difficult or even impossible to keep an auto in 1/4-th of the register but that is how sizeof works.
IBM 360's managed it.
The descendants of 80386's are still managing it.
Also, there is no reason an implementation cannot have 32-bit chars.

_________________
Michael Hennebry
"Well, if anyone would know about hubris,
it would be the man who built God." -- Root
 
 View user's profile Send private message  
Reply with quote Back to top
Brutte
PostPosted: Sep 19, 2011 - 04:49 PM
Raving lunatic


Joined: Oct 05, 2006
Posts: 3055
Location: Poland

Quote:
Also, there is no reason an implementation cannot have 32-bit chars.

Then how to check for the size of the char(or something that holds it), when sizeof(char) always returns 1? Well, not with sizeof().
I wonder how one would call memcpy with 32-bit chars..

_________________
No RSTDISBL, no fun!
 
 View user's profile Send private message  
Reply with quote Back to top
skeeve
PostPosted: Sep 19, 2011 - 04:59 PM
Raving lunatic


Joined: Oct 29, 2006
Posts: 3041


bperrybap wrote:

Like I said originally its use in this case was accidental and exposed what I think is mishandling
of a char object type by the ++ operator.
No you did the mishandling.
You lied to the compiler:
You told it that c would always be in the range -128..126 and it relied on that information.
When c reached 127, c++ failed and you saw the wreckage.
Quote:
When the char is an automatic, the ++ operator appears
to treat the char object as a 16 bit unsigned value.
It definitely creates a value for the object that is outside the range values defined in <limits.h> for a char.
No. When the char is local bar can neither use it nor change it.
A local static would allow the same result.
Quote:
Is it valid to allow a char object type to return different results for the same operator depending on whether it is automatic or static?
And is it valid to ever allow an object to have a value that is outside it's range as defined in <limits.h>
i.e a char object value outside the bounds of CHAR_MIN and CHAR_MAX
After converting 128 to a char that is signed,
everything is valid, including the aforementioned nasal demons,
which by the way, I did not make up.

_________________
Michael Hennebry
"Well, if anyone would know about hubris,
it would be the man who built God." -- Root
 
 View user's profile Send private message  
Reply with quote Back to top
bperrybap
PostPosted: Sep 19, 2011 - 08:46 PM
Resident


Joined: May 03, 2009
Posts: 564
Location: Dallas, TX USA

Ok, So lets actually start to quote sections of the C standard.
(I have C88/89 and a draft C99)

I'll reference my C99 draft copy. Hopefully it
is close enough the final standard for this.

In C99 Down in the section J.3 Implementation-defined behavior
Quote:
[#1] A conforming implementation shall document its choice
of behavior in each of the areas listed in this subclause.
The following are implementation-defined:


A bit lower down in J.3.4 Characters
Quote:
-- Which of signed char or unsigned char has the same
range, representation, and behavior as ``plain'' char
(6.2.5, 6.3.1.1).


In this case, neither signed char or unsigned char behave the same as a plain char.
In fact even static char is different behavior than plain char.

The out of range char is also available to further
local expressions or assignments. For example if you
assign c to an integer variable after the c++ the value
assigned to the integer variable can be outside the
range of allowed values for a char.

There are too many other examples of the AVR implementation going out of its way
to ensure the required range values when using ++ where this is being handled correctly for me to believe that this behavior was intended.

In other words, if the intent was to hide behind
the cloak of 3.18 undefined behavior
and cite the specific overflow example given of:

Quote:
[#3] EXAMPLE An example of undefined behavior is the
behavior on integer overflow.


to justify the validity of this behavior,
Why bother at all? Why not just let things spin out of range in all cases for chars?
It would generate faster and smaller code in all cases.

I'll also site this example:

Code:
void bar(int);
int bar2(void);
void foo(void)
{
char  c = 0;

        while(1)
        {
//              c = bar2();
                c++; // ++c; works the same way
                bar(c);
        }
}


If the function call to bar2() is uncommented the ++ operator generates code to stay within bounds correctly
yet without that call, it generates a value of c outside its defined range.
So it isn't all cases of ++ operating on local char types that increment the object beyond its range just certain ones.

In section 6.5.3.1 Prefix increment and decrement operators
under semantics
Quote:
[#2] The value of the operand of the prefix ++ operator is
incremented. The result is the new value of the operand
after incrementation. The expression ++E is equivalent to
(E+=1). See the discussions of additive operators and
compound assignment for information on constraints, types,
side effects, and conversions and the effects of operations
on pointers.


section 6.5.6 Additive operators (as referenced above)
Quote:

in the Constraints section:
[#2] For addition, either both operands shall have
arithmetic type, or one operand shall be a pointer to an
object type and the other shall have integer type.
(Incrementing is equivalent to adding 1.)


In this case I as showed in earlier posts.

Code:

++c; // creates a 16 bit value
c++; // creates a 16 bit value
c +=1; // creates an 8 bit value


(++c and c++ behave the same way)

Do not generate the same code or resulting value for c.

So I'm with Koshchi, in that I believe that this is an error/oversight due to over aggressive optimization.

--- bill
 
 View user's profile Send private message  
Reply with quote Back to top
skeeve
PostPosted: Sep 20, 2011 - 12:45 AM
Raving lunatic


Joined: Oct 29, 2006
Posts: 3041


bperrybap wrote:
Ok, So lets actually start to quote sections of the C standard.
(I have C88/89 and a draft C99)

I'll reference my C99 draft copy. Hopefully it
is close enough the final standard for this.

Doesn't matter.
It's long been the case that you may not put 128 into an 8-bit signed variable.
Quote:
Why bother at all? Why not just let things spin out of range in all cases for chars?
This part is a bit less whiny, so I'll respond.
On most computers, unsigned integers of all kinds are represented as bits with powers of two for their weights.
The formulae for unsigned integer arithmetic correspond to what most computers actually do.
At one time, signed integers were not always represented in twos complement.
That implied that the right thing to do when assigning an out-of-range value to a signed integer was not at all obvious.
The standards committees decided that there was not a right thing to do,
hence undefined behaviour.
Quote:
So I'm with Koshchi, in that I believe that this is an error/oversight due to over aggressive optimization.
Sometimes, when you lie to it,
the compiler is allowed to wash your mouth out with soap.

_________________
Michael Hennebry
"Well, if anyone would know about hubris,
it would be the man who built God." -- Root
 
 View user's profile Send private message  
Reply with quote Back to top
Display posts from previous:     
Jump to:  
All times are GMT + 1 Hour
Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Powered by PNphpBB2 © 2003-2006 The PNphpBB Group
Credits