## Floating point design help needed

24 posts / 0 new
Author
Message

Hello,

I need some help for designing a software.

Equation:

a = (1200*x/r) * ( (1+(r/1200))^m - 1)

Desired range of Values :
a = 0 to 5,00,00,000
x = 0 to 1,00,000
m = 0 to 1200
r = 0 to 50

OR need to knw what max values I can get.

Processor : 8 bit
Float size : 32 bit
Precision : 15 digits (similar to excel)
Preference : Lesser code size over time for calc.

Use Cases:
I. Enter a, m & r. Get x.
II. Enter x, m, & r. Get a.

Can it be done with 32-bits or I will need 64-bit lib?

:)

Last Edited: Mon. Feb 27, 2012 - 08:19 AM

Quote:
a = 0 to 5,00,00,000
x = 0 to 1,00,000

Do you mean these groups of two zeroes to be three?

(As in 5,000,000,000 and 1,000,000)

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

Nope. In India, we write 50000000 as 5,00,00,000
:)

It's probably good to have that clarified before this discussion goes too much further.

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

yes, right

Quote:
Lemme knw if I m missing some info

You're missing some characters - "Let me know if I am missing some info." Another Indian ideosyncracy?

For floating point - have you looked at your compiler's specifications? Which compiler are you going to use? At a guess, you're going to need 64 bit floats for 15 digits of precision - note that this is not decimal places. This may determine your choice of compiler.

Quote:
Float size : 32 bit
Precision : 15 decimal places (similar to excel)
Sorry, but 32 bit floating point is not 15 decimal places, it is only about 7. You need 64 bit floating point to get 15 places.

Regards,
Steve A.

The Board helps those that help themselves.

Koshchi,
Thnx. I updated my initial post with 15 digits.

I need to make a generic library which I can use with AVR & 8051. Hence I had mentioned
Processor : 8 bit
Float size : 32 bit

any 64 bit libs I can use?

For 15 digits you need implementation of double (64 bits), not single floating point (32 bits). For GCC you can find it in the last message here:
https://www.avrfreaks.net/index.p...
It is very slow, so if speed is in concern, you might opt for fixed point math (but for your needs you have to do some customizations). Just read my messages in that thread and here:
https://www.avrfreaks.net/index.p...

Quote:
you might opt for fixed point math (but for your needs you have to do some customizations)

Kindly explain bit more or point me towards a ref link
I can do customizations/change limits a bit since the specs are mine

Thnx avra

Obviously you can compile and run this on a pc, but everything would compute using 80 bit floating point and save as 64 bit double. Perhaps there is a way to tell c++ to compile using float, but I don't know how to tell it that. In fortran, if everything is real*4 it will eval as real*4. I think.

Imagecraft compiler user

Last Edited: Mon. Feb 27, 2012 - 04:16 PM

Quote:
but everything would compute using 80 bit floating point
That depends entirely on the compiler. You usually have to specify "long double" to get 80 bit precision.
Quote:
Perhaps there is a way to tell c++ to compile using float
Simply using "float" instead of "double" should do that. But again, it depends on the implementation.

Regards,
Steve A.

The Board helps those that help themselves.

kpanchamia wrote:
a = (1200*x/r) * ( (1+(r/1200))^m - 1)

Desired range of Values :
a = 0 to 5,00,00,000
x = 0 to 1,00,000
m = 0 to 1200
r = 0 to 50

OR need to knw what max values I can get.

Processor : 8 bit
Float size : 32 bit
Precision : 15 digits (similar to excel)

10**15 is nearly 2**50 .
64-bit IEEE floats only gives 53 bits.
You will need to be careful to avoid catastrophic round-off.
The following formula might help:
(1+(r/1200))^m - 1 = expm1(m*log1p(r/1200))
expm1(y) computes e**y - 1 .
log1p(y) computes ln(1+y) .
Both are designed to be accurate for small values.
If these are unavailable, the following formulas might be useful:
(1+p)*(1+q)-1=p+q+pq.
(1+p)**(d+f) - 1 = ((1+p)**d-1) + ((1+p)**f-1) + ((1+p)**d-1)*((1+p)**f-1)
Is m an integer?
It looks a bit like an interest calculation.

Iluvatar is the better part of Valar.

Koshchi: I think all floats and doubles that get read into the 387 are extended to 80 bit.

Imagecraft compiler user

Inside the floating point unit, this may be so. As far as C/C++ is concerned, this is not true. It will only store the precision that the compiler decides to.

Regards,
Steve A.

The Board helps those that help themselves.

My point was, if there is an fpu instruction to do a single precision mult, and the fortran math libray uses these instructions, I dont know how to get the c runtime to use em. If one could run the eqn using single and get a good enough answer, then it would compile up and run on the avr using single.

Imagecraft compiler user

skeeve,
I think (1+p)*(1+q)-1=p+q+pq might be too complicated. Yes m is an integer. It is an interest calculcation :)

@all : Can't decide whether I should work in the direction of getting 64 bit double on winavr or use fixed point math.
How do I decide whether fixed point is possible in my application - any related equations or calculations?

(This may be off-topic)
Can I tell the compiler say WinAVR, that instead of 8 bits expo and 23 bits mantissa, I want the number to be stored as say 4 bits expo and 27 bits mantissa?

Quote:
How do I decide whether fixed point is possible in my application - any related equations or calculations?

(This may be off-topic)
Can I tell the compiler say WinAVR, that instead of 8 bits expo and 23 bits mantissa, I want the number to be stored as say 4 bits expo and 27 bits mantissa?

How do you decide? Do your calculations in integer and take note of the errors.

As for 'telling' the compiler about different floating point arrangements, no luck here. The libraries are written to be IEEE754 compliant. You want something different, then you'll have to write it yourself. Being a financial calculation, you might want to do it in bcd (binary coded decimal) and avoid any roundups caused by floating point. You can do arbitrary precision math - there's code out there.
For a description:
http://en.wikipedia.org/wiki/Arb...

You then decide how many digits you want.

Quote:

As for 'telling' the compiler about different floating point arrangements, no luck here. The libraries are written to be IEEE754 compliant. You want something different, then you'll have to write it yourself.

With the possible exception of the IBM decimal library code?

http://speleotrove.com/decimal/

Thanks kartman,clawson

I will work out couple of options and let you guys know

Nick Gammon ported the Gnu "BC" arbitrary precision math library to Arduino (ATmega328): http://arduino.cc/forum/index.ph...

bobgardner wrote:
Koshchi: I think all floats and doubles that get read into the 387 are extended to 80 bit.

Using the x87 (FPU) the internal accumulator is 80-bits, though you can only feed it 64-bit values, and retrieving a value will get rounded to a 64-bit value.

This is quite unique; the SSE2 (and greater) instruction sets on x86/x64 operate only on a 64-bit accumulator, as does the PowerPC FPU and the ARM VFPv3 (Cortex A8, A9, probably A15).

I'm not aware of a mainstream architecture that has a full 80bit FPU in hardware. I am happy to be proven wrong :)

-- Damien

avra's ref to http://www.mikrocontroller.net/t...
looks good. I ll try that.

Can I use pascals triangle for solving (1+(r/1200))^m ?

westfw, awesome. I was looking for that.

PS: Guys, I am on a job. This is my personal work. So I may take some time to implement your suggestions and give feedback. Its great to see so much help though