Floating point design help needed

Go To Last Post
24 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

I need some help for designing a software.

Equation:

a = (1200*x/r) * ( (1+(r/1200))^m - 1)

Desired range of Values :
a = 0 to 5,00,00,000
x = 0 to 1,00,000
m = 0 to 1200
r = 0 to 50

OR need to knw what max values I can get.

Processor : 8 bit
Float size : 32 bit
Precision : 15 digits (similar to excel)
Preference : Lesser code size over time for calc.

Use Cases:
I. Enter a, m & r. Get x.
II. Enter x, m, & r. Get a.

Can it be done with 32-bits or I will need 64-bit lib?
Lemme knw if any more info is reqd

Thanks for your time
:)

Last Edited: Mon. Feb 27, 2012 - 08:19 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
a = 0 to 5,00,00,000
x = 0 to 1,00,000

Do you mean these groups of two zeroes to be three?

(As in 5,000,000,000 and 1,000,000)

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nope. In India, we write 50000000 as 5,00,00,000
:)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's probably good to have that clarified before this discussion goes too much further.

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

yes, right

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Lemme knw if I m missing some info

You're missing some characters - "Let me know if I am missing some info." Another Indian ideosyncracy?

For floating point - have you looked at your compiler's specifications? Which compiler are you going to use? At a guess, you're going to need 64 bit floats for 15 digits of precision - note that this is not decimal places. This may determine your choice of compiler.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Float size : 32 bit
Precision : 15 decimal places (similar to excel)
Sorry, but 32 bit floating point is not 15 decimal places, it is only about 7. You need 64 bit floating point to get 15 places.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi,
Thnx. I updated my initial post with 15 digits.

I need to make a generic library which I can use with AVR & 8051. Hence I had mentioned
Processor : 8 bit
Float size : 32 bit

any 64 bit libs I can use?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For 15 digits you need implementation of double (64 bits), not single floating point (32 bits). For GCC you can find it in the last message here:
https://www.avrfreaks.net/index.p...
It is very slow, so if speed is in concern, you might opt for fixed point math (but for your needs you have to do some customizations). Just read my messages in that thread and here:
https://www.avrfreaks.net/index.p...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
you might opt for fixed point math (but for your needs you have to do some customizations)

Kindly explain bit more or point me towards a ref link
I can do customizations/change limits a bit since the specs are mine
Meanwhile I will got through your other links

Thnx avra

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Obviously you can compile and run this on a pc, but everything would compute using 80 bit floating point and save as 64 bit double. Perhaps there is a way to tell c++ to compile using float, but I don't know how to tell it that. In fortran, if everything is real*4 it will eval as real*4. I think.

Imagecraft compiler user

Last Edited: Mon. Feb 27, 2012 - 04:16 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
but everything would compute using 80 bit floating point
That depends entirely on the compiler. You usually have to specify "long double" to get 80 bit precision.
Quote:
Perhaps there is a way to tell c++ to compile using float
Simply using "float" instead of "double" should do that. But again, it depends on the implementation.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kpanchamia wrote:
a = (1200*x/r) * ( (1+(r/1200))^m - 1)

Desired range of Values :
a = 0 to 5,00,00,000
x = 0 to 1,00,000
m = 0 to 1200
r = 0 to 50

OR need to knw what max values I can get.

Processor : 8 bit
Float size : 32 bit
Precision : 15 digits (similar to excel)

10**15 is nearly 2**50 .
64-bit IEEE floats only gives 53 bits.
You will need to be careful to avoid catastrophic round-off.
The following formula might help:
(1+(r/1200))^m - 1 = expm1(m*log1p(r/1200))
expm1(y) computes e**y - 1 .
log1p(y) computes ln(1+y) .
Both are designed to be accurate for small values.
If these are unavailable, the following formulas might be useful:
(1+p)*(1+q)-1=p+q+pq.
(1+p)**(d+f) - 1 = ((1+p)**d-1) + ((1+p)**f-1) + ((1+p)**d-1)*((1+p)**f-1)
Is m an integer?
It looks a bit like an interest calculation.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi: I think all floats and doubles that get read into the 387 are extended to 80 bit.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Inside the floating point unit, this may be so. As far as C/C++ is concerned, this is not true. It will only store the precision that the compiler decides to.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My point was, if there is an fpu instruction to do a single precision mult, and the fortran math libray uses these instructions, I dont know how to get the c runtime to use em. If one could run the eqn using single and get a good enough answer, then it would compile up and run on the avr using single.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve,
I think (1+p)*(1+q)-1=p+q+pq might be too complicated. Yes m is an integer. It is an interest calculcation :)

@all : Can't decide whether I should work in the direction of getting 64 bit double on winavr or use fixed point math.
How do I decide whether fixed point is possible in my application - any related equations or calculations?

(This may be off-topic)
Can I tell the compiler say WinAVR, that instead of 8 bits expo and 23 bits mantissa, I want the number to be stored as say 4 bits expo and 27 bits mantissa?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
How do I decide whether fixed point is possible in my application - any related equations or calculations?

(This may be off-topic)
Can I tell the compiler say WinAVR, that instead of 8 bits expo and 23 bits mantissa, I want the number to be stored as say 4 bits expo and 27 bits mantissa?


How do you decide? Do your calculations in integer and take note of the errors.

As for 'telling' the compiler about different floating point arrangements, no luck here. The libraries are written to be IEEE754 compliant. You want something different, then you'll have to write it yourself. Being a financial calculation, you might want to do it in bcd (binary coded decimal) and avoid any roundups caused by floating point. You can do arbitrary precision math - there's code out there.
For a description:
http://en.wikipedia.org/wiki/Arb...

You then decide how many digits you want.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

As for 'telling' the compiler about different floating point arrangements, no luck here. The libraries are written to be IEEE754 compliant. You want something different, then you'll have to write it yourself.

With the possible exception of the IBM decimal library code?

http://speleotrove.com/decimal/

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks kartman,clawson

I will work out couple of options and let you guys know

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nick Gammon ported the Gnu "BC" arbitrary precision math library to Arduino (ATmega328): http://arduino.cc/forum/index.ph...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bobgardner wrote:
Koshchi: I think all floats and doubles that get read into the 387 are extended to 80 bit.

Using the x87 (FPU) the internal accumulator is 80-bits, though you can only feed it 64-bit values, and retrieving a value will get rounded to a 64-bit value.

This is quite unique; the SSE2 (and greater) instruction sets on x86/x64 operate only on a 64-bit accumulator, as does the PowerPC FPU and the ARM VFPv3 (Cortex A8, A9, probably A15).

I'm not aware of a mainstream architecture that has a full 80bit FPU in hardware. I am happy to be proven wrong :)

-- Damien

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avra's ref to http://www.mikrocontroller.net/t...
looks good. I ll try that.

Can I use pascals triangle for solving (1+(r/1200))^m ?

westfw, awesome. I was looking for that.

PS: Guys, I am on a job. This is my personal work. So I may take some time to implement your suggestions and give feedback. Its great to see so much help though

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I went with
avra's ref to
http://www.mikrocontroller.net/t...

Thanks Guys