Disable nan [solved]

Go To Last Post
28 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've got a controls problem which due to extreme stiffness in normal coordinates can lead to nan showing up. The problem with nan is that it's contagious, and will eventually spread like a virus through all the calculations.

Of course, I can check at every iteration if nan is in the output, or, better, I can check the parts of the code that can possibly give a nan and squash it there. However, this is a giant waste of resources for something that can potentially occur, but will not in the vast majority of cases.

The best (from a computational efficiency) solution would simply be to "disable" nan. I realize that theoretically this could do some funny things, but in my case it can't. I know this is a long shot, but I think the problem is interesting so it's worth a try. Is there any compiler way to redefine nan to something else?

EDIT:
===============
Turns out that there is no GCC command to deactivate Nan, but Clawson showed how to change and recompile GCC in order to have this effect: https://www.avrfreaks.net/index.p...

Last Edited: Thu. Dec 9, 2010 - 05:03 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Can't you just write a macro wrapper around all fp function calls that checks return values for Not A Number?

As you say, if you get it then something is wrong so it seems very unwise to simply ignore it. I'd just find all the places where it may occur and put in protection to do something sensible when it does.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Can't you just write a macro wrapper around all fp function calls that checks return values for Not A Number?

Yes. But if I were to do that for all return values, I'd be looking at several thousand `if` checks per loop, and my loop speed is on the order of 500Hz. So that seems like an awful lot of calculations to throw away.

Quote:

As you say, if you get it then something is wrong so it seems very unwise to simply ignore it. I'd just find all the places where it may occur and put in protection to do something sensible when it does.

Which is better than the above, but once again the sensible protection leads to a lot of wasted loops for a problem which might never occur. My analysis suggests that the loop is mathematically capable of recovering from the problem if it occurs, but the nan prevents that. Any real value would (eventually) work, just not nan.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kubark42 wrote:
Quote:

As you say, if you get it then something is wrong so it seems very unwise to simply ignore it. I'd just find all the places where it may occur and put in protection to do something sensible when it does.

Which is better than the above, but once again the sensible protection leads to a lot of wasted loops for a problem which might never occur. My analysis suggests that the loop is mathematically capable of recovering from the problem if it occurs, but the nan prevents that. Any real value would (eventually) work, just not nan.

I'm dubious about the claim that testing for NaN takes too long.
Floating point math without floating point hardware is *slow*.
Four more comparisons should not be a problem.
Your options are fix or prevent.
To fix, you must detect.
You don't necessarily need to detect every NaN.
Be sure to test the one that controls the radiation dose.
To prevent, find out what causes the NaNs.
Problems like acos(1.00001) can be fixed.

"Demons after money.
Whatever happened to the still beating heart of a virgin?
No one has any standards anymore." -- Giles

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Four more comparisons should not be a problem.

Two, you only need to check two of the 4 bytes to know you've got NaN:

s111 1111 1axx xxxx xxxx xxxx xxxx xxxx

So you check the bottom 7 bits of the first byte and the top two of the second byte. 'a' says whether it's a "quiet" or "signalling" NaN.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
kubark42 wrote:
Quote:

As you say, if you get it then something is wrong so it seems very unwise to simply ignore it. I'd just find all the places where it may occur and put in protection to do something sensible when it does.

Which is better than the above, but once again the sensible protection leads to a lot of wasted loops for a problem which might never occur. My analysis suggests that the loop is mathematically capable of recovering from the problem if it occurs, but the nan prevents that. Any real value would (eventually) work, just not nan.

I'm dubious about the claim that testing for NaN takes too long.
Floating point math without floating point hardware is *slow*.
Four more comparisons should not be a problem.

As I said, my loop runs at 500Hz. So even supposing that I only had one spot where the nan could come (I don't), that's 1000 (using clawson's hint) operations extra per second. However, to make things worse, each loop I have to integrate approx 500 times my dynamic system. So now we're at 500,000 operations extra per loop, still assuming I only have one equation where this could be occurring. (I haven't counted them yet.)

I know how to fix the problem. I'm simply curious if there is another way, that's all. I'm not as familiar as I'd like to be with the C compiler, so asking questions seems like a good way to learn.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Designing your algorithm is the most important step.

Get that right and you probably reduce the f-p operations by an order of magnitude.

Everyone is used to the concept of integer overflow when you do 1234 x 4567 and try to put the result in a uint16_t. So you check your operand values before you enter a loop. C does not tell you about underflow or overflow.

You do the same with f-p. Only you have the advantage of Nan.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is interesting:

http://en.wikipedia.org/wiki/NaN...

There aren't that many ways to create NaN so so perhaps it's easier to test to prevent one being created in the first place?

Of those 0/0 seems like the most likely one?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you decide to go that route, here is an inline macro to check for a NaN. It substitutes 0.0 but the four clr instructions can be replaced with ldi to load any value you'd like.

//
// This inline assembly macro examines a floating point value
// and replaces it with 0.0 if the FP value is either +/-
// infinity, a quiet NaN, or a signaling NaN.  Depending on
// the path taken, the code requires either 7 or 10 cycles.
//
#define FIX_NAN(f)              \
({                              \
    float val__ = f;            \
    __asm__ __volatile__        \
    (                           \
        "mov r0, %C0"  "\n\t"   \
        "rol r0"       "\n\t"   \
        "mov r0, %D0"  "\n\t"   \
        "rol r0"       "\n\t"   \
        "inc r0"       "\n\t"   \
        "brne 1f"      "\n\t"   \
        "clr %D0"      "\n\t"   \
        "clr %C0"      "\n\t"   \
        "clr %B0"      "\n\t"   \
        "clr %A0"      "\n\t"   \
        "1:"           "\n\t"   \
        : "=&d" (val__)         \
        : "0" (val__)           \
        : "r0"                  \
    );                          \
    val__;                      \
})

Don Kinzer
ZBasic Microcontrollers
http://www.zbasic.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
This is interesting:

http://en.wikipedia.org/wiki/NaN...

There aren't that many ways to create NaN so so perhaps it's easier to test to prevent one being created in the first place?

Of those 0/0 seems like the most likely one?

Hah, I wish. No, things seem to explode in both directions, so all the Inf and 0 combinations are possible, methinks. Good link, though, thanks for finding it.

Quote:

Designing your algorithm is the most important step.

Get that right and you probably reduce the f-p operations by an order of magnitude.

Everyone is used to the concept of integer overflow when you do 1234 x 4567 and try to put the result in a uint16_t. So you check your operand values before you enter a loop. C does not tell you about underflow or overflow.

You do the same with f-p. Only you have the advantage of Nan.

My algorithm is well designed, not to worry. The problem is that when you're in normal-space these kinds of problems are likely to occur. One step in the right direction is to use the SR-EKF, but we don't have the right (yet) as the fundamental mathematics that allow us to apply it to our situation have yet to be developed.

I can use some additional tricks to keep this from occurring, such as checking for singular matrices, etc... but it doesn't change the fact that a fundamental problem with real-time and robust control is when things get "stuck", and nan is a good way to do that. If I could simply use floating point like integer overflows, then it would (eventually) be fine and things would reconverge. Since Inf never truly occurs, and neither does 0 (they'll both occur due to round-off, but not due to the intrinsic mathematics), it's not so crazy to ask that when Nan comes along that it instead be given as, for instance, the largest number that a float can express.

If I can demonstrate mathematically that an algorithm always reconverges after throwing an almost `nan` then I can validate the approach for robust control and free up some processing power that is otherwise checking for a "once-in-a-never" likelihood.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dkinzer wrote:
If you decide to go that route, here is an inline macro to check for a NaN. It substitutes 0.0 but the four clr instructions can be replaced with ldi to load any value you'd like.
//
// This inline assembly macro examines a floating point value
// and replaces it with 0.0 if the FP value is either +/-
// infinity, a quiet NaN, or a signaling NaN.  Depending on
// the path taken, the code requires either 7 or 10 cycles.
//
#define FIX_NAN(f)              \
({                              \
    float val__ = f;            \
    __asm__ __volatile__        \
    (                           \
        "mov r0, %C0"  "\n\t"   \
        "rol r0"       "\n\t"   \
        "mov r0, %D0"  "\n\t"   \
        "rol r0"       "\n\t"   \
        "inc r0"       "\n\t"   \
        "brne 1f"      "\n\t"   \
        "clr %D0"      "\n\t"   \
        "clr %C0"      "\n\t"   \
        "clr %B0"      "\n\t"   \
        "clr %A0"      "\n\t"   \
        "1:"           "\n\t"   \
        : "=&d" (val__)         \
        : "0" (val__)           \
        : "r0"                  \
    );                          \
    val__;                      \
})

Nice, thanks for that. I'll bookmark it for the future, as it's handy no matter what.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are you sure an 8-bit microcontroller is the right tool for the job?

Stealing Proteus doesn't make you an engineer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kubark42 wrote:
skeeve wrote:
kubark42 wrote:
Quote:

As you say, if you get it then something is wrong so it seems very unwise to simply ignore it. I'd just find all the places where it may occur and put in protection to do something sensible when it does.

Which is better than the above, but once again the sensible protection leads to a lot of wasted loops for a problem which might never occur. My analysis suggests that the loop is mathematically capable of recovering from the problem if it occurs, but the nan prevents that. Any real value would (eventually) work, just not nan.

I'm dubious about the claim that testing for NaN takes too long.
Floating point math without floating point hardware is *slow*.
Four more comparisons should not be a problem.

As I said, my loop runs at 500Hz. So even supposing that I only had one spot where the nan could come (I don't), that's 1000 (using clawson's hint) operations extra per second. However, to make things worse, each loop I have to integrate approx 500 times my dynamic system. So now we're at 500,000 operations extra per loop, still assuming I only have one equation where this could be occurring. (I haven't counted them yet.)

Apparently you also have at least 250,000 other floating point operations per second.
I infer that you meant "per second". "Per loop" would be a bit much.
If you have a 20 MHz processor,
each of the aforementioned floating point
operations could only require 80 cycles.
If 3 more cycles is really too much,
note that you probably do not have to test all of them,
just enough to avoid metastasis.

"Demons after money.
Whatever happened to the still beating heart of a virgin?
No one has any standards anymore." -- Giles

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ArnoldB wrote:
Are you sure an 8-bit microcontroller is the right tool for the job?

Good point. Perhaps saturated arithmetics (like those found in DSP-s) would be a better match?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
Apparently you also have at least 250,000 other floating point operations per second.
I infer that you meant "per second". "Per loop" would be a bit much.
If you have a 20 MHz processor,
each of the aforementioned floating point
operations could only require 80 cycles.
If 3 more cycles is really too much,
note that you probably do not have to test all of them,
just enough to avoid metastasis.

Yup, yup, that's about right. And, no, right now I'm not using a 20MHz processor. I've got my sights on something bigger, because this is clearly too much, even for streamlining things with fixed-point code. This is for a research project, but the principles apply equally well to AVR as they do to what I'm using. Not all control situations require 14-dimensional state models in normal-space with a 1 second innovation horizon. Fortunately!

If you want robust control, you have to be able to account for as many situations as possible. If you can account for a situation architecturally instead of computationally you can win out. Or at the very least you can say something interesting. So the extra couple cycles isn't the end of the world, but it's annoying to have to turn the code into spaghetti with tests all over the place in order not to have any risk of metastasis.

As I've said above, I know how to fix the problem with tests. But it would have been interesting to learn how to simply disable nan. I guess such a thing doesn't exist, although I'm sure that if I wrote my own floating-point code (I wouldn't do such a thing, but I know people have in order to get more practical precision for likely real-world problems) it would be very easy. Anyway, thanks to all who responded, it was an interesting conversation, and the assembly code will come in handy for future projects.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

andreie wrote:
ArnoldB wrote:
Are you sure an 8-bit microcontroller is the right tool for the job?

Good point. Perhaps saturated arithmetics (like those found in DSP-s) would be a better match?

Saturated arithmetics. I just learned something good right there. Can saturated arithmetics be configured in GCC?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kubark42 wrote:
As I've said above, I know how to fix the problem with tests. But it would have been interesting to learn how to simply disable nan. I guess such a thing doesn't exist, although I'm sure that if I wrote my own floating-point code (I wouldn't do such a thing, but I know people have in order to get more practical precision for likely real-world problems) it would be very easy. Anyway, thanks to all who responded, it was an interesting conversation, and the assembly code will come in handy for future projects.
Somewhat easier than rolling your own might be to edit the gcc code.
IIRC all non-finites have a 1 in bits 24 to 30.
Clearing any of them would perform the desired conversion.

Also, it's likely you wouldn't have to test all
floating point results to prevent metastasis.
If, for example, you are computing a sum of many,
testing just the sum would be good enough in most cases.
If the sum was finite,
none of the intermediates would need testing.

"Demons after money.
Whatever happened to the still beating heart of a virgin?
No one has any standards anymore." -- Giles

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

IME nans are a sign of mistaken coding or calibration. Such "numbers" don't occur in the natural world for a reason. So fix your algorithm and the nans go away. Check for them at critical junctions to protect against the bugs you don't find.

C: i = "told you so";

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

cpluscon wrote:
IME nans are a sign of mistaken coding or calibration. Such "numbers" don't occur in the natural world for a reason.

That's simply not true. You have an overly limited definition of "natural world".

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Perhaps if you were to say what you actually want to do, and show your current algorithm.

I am always amazed by how my ideas get trumped by others with improved algorithms. I am not talking about a 10% peephole optimisation but rather 400% or greater.

There are also a vast range of stability effects. e.g. sensitivity to accuracy of inputs.

The very last thing that you need is silent ignorance of NaN.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
Perhaps if you were to say what you actually want to do, and show your current algorithm.

I am always amazed by how my ideas get trumped by others with improved algorithms. I am not talking about a 10% peephole optimisation but rather 400% or greater.

There are also a vast range of stability effects. e.g. sensitivity to accuracy of inputs.

As mentioned in my first post, I've got a controls problem in normal-space. There's not much point in boring everyone to death with complex mathematics, and moreover it's part of a research project so this isn't really the forum to discuss and (pre)publish such things. However, I can state clearly that it's not an algorithmic problem. It is in fact, intrinsic to the question of observability.

"What happens when you try to observe a system at an non-observable point?"
That depends.

In normal space, "non-observable" is easy to determine, which is one of the reasons it's a useful tool. You determine non-observability by looking to see what, if any, elements of the system Jacobian become indeterminate under certain conditions.

Alarm bells! Think about that for a second. The Jacobian of your system has indeterminate elements. In other words, the derivative of your system, at certain points, is indeterminate. That's not an algorithmic problem, nothing will make that go away. And there's no mathematical transformation to get around that, it's an intrinsic property of observability.

Of course, many systems are not necessarily unstable around points that are non-observable. This is demonstrated by the fact that observers work quite well in physical space, even if normal-space is tricky. And this particular system wouldn't be unstable if I were using saturated mathematics. But it goes into metastasis because the nan contaminates all calculations, and will never go away until it is forced to.

So while I appreciate the help and comments, I really am skeptical about the ability to improve this. Especially since I am looking at this from a theoretical point of view, whereas I feel the suggested improvements will be on a practical, case-by-case basis. Not that there's anything wrong with the practical approach-- indeed I prefer it to theory-- but you can't always choose...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you insist, you can modify the floating point code to do whatever you want, which I suspect will be even worse than nans. Perhaps you just need more precision....

C: i = "told you so";

Last Edited: Sun. Dec 12, 2010 - 12:51 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

you can modify the floating point code to do whatever you want,

The source for FP in libm.a is here:

http://svn.savannah.nongnu.org/v...

This is the div routine:

http://svn.savannah.nongnu.org/v...

As noted above it returns NaN when 0/0 is attempted. You can see this occurring in:

.L_nan:	rjmp	_U(__fp_nan)

So it's __fp_nan you need to change and it will presumably change all occurrences of NaN.

__fp_nan is here:

http://svn.savannah.nongnu.org/v...

ENTRY   __fp_nan
	ldi	rA3, 0xFF
	ldi	rA2, 0xC0
	ret
ENDFUNC

So just modify that component of libm.a. I think you should be able to over-ride the library function with -Wl,-u__fp_nan I think?

Cliff

PS fp32def.h has this:

#define	rA0	r22
#define	rA1	r23
#define	rA2	r24
#define	rA3	r25

So those would appear to be the four registers your replacement routine has to set to the IEEE754 interpretation of 0.0

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:

you can modify the floating point code to do whatever you want,

The source for FP in libm.a is here:

http://svn.savannah.nongnu.org/v...

This is the div routine:

http://svn.savannah.nongnu.org/v...

As noted above it returns NaN when 0/0 is attempted. You can see this occurring in:

.L_nan:	rjmp	_U(__fp_nan)

So it's __fp_nan you need to change and it will presumably change all occurrences of NaN.

__fp_nan is here:

http://svn.savannah.nongnu.org/v...

ENTRY   __fp_nan
	ldi	rA3, 0xFF
	ldi	rA2, 0xC0
	ret
ENDFUNC

So just modify that component of libm.a. I think you should be able to over-ride the library function with -Wl,-u__fp_nan I think?

Cliff

PS fp32def.h has this:

#define	rA0	r22
#define	rA1	r23
#define	rA2	r24
#define	rA3	r25

So those would appear to be the four registers your replacement routine has to set to the IEEE754 interpretation of 0.0

:shock:

Amazing. I never would have thought it would be so easy. Simply amazing. Thanks a mil.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
The Jacobian of your system has indeterminate elements. In other words, the derivative of your system, at certain points, is indeterminate.

As with any iterative solution, you have to devise algorithms that are reasonably stable. Both from a calculation precision and peturbation of the inputs.

Some algorithms converge faster and are less sensitive.
To be honest, I would perform your analysis with regular Math packages on a PC.
You would then test their sensitivity to limited precision math. Only then would you trust 32bit floats.

You will probably end up with using one method for one range of input values, and another for a different range.

David. (who ended up getting a bit lost with the OU 'M373 Optimisation' course last year)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I never would have thought it would be so easy

You can tell how unsure I am about over-riding __fp_nan in libm.a by my use of the phrase "I think" twice in one sentence ;-)

But it's kind of the same as when you use the different printf's and use -Wl,-u,vfprintf to say "forget the libray copy of vfprint, I'm providing a new one".

Off to try an experiment... ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh wow - it's MUCH easier that I thought. Clearly __fp_nan is a "weak link" so all you need to do to over-ride it is simply provide one...

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hehe, that's bloody awesome. I'm updating my original post to say that you've solved it.