Speed of float/trig maths

Go To Last Post
20 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi - I wanted to run this by yall before I attempted to code up the beast. I would like to do some fairly hefty float math at 50hz. Specifically, say I have floats X, Y, and Z. First I want to find S1, S2, S3, and H, all floats:

H = (X^2 + Y^2 + Z^2)^.5

S1 = tan-1(Y/X)

W = cos-1(H/12)

S3 = 2 * W

S2 = sin-1(Z/H) - W

After that, I want to multiply S1, S2, and S3, individually, by an unsigned int, then divide that by a (constant) float.

I want to do all of this in 12.5ms at 10Mhz, which is 125Khz. If necessary, I can switch to a 20Mhz crystal, so I'd have 250Khz to do all of this in. Will I be able to do all of this? I'm using AVR-GCC and an ATMEGA168. My code is currently taking up about 1/4 of the 16KB of flash on the ATMEGA168, so it'd be fine if the above code consumed most of the rest of that. My memory usage is fairly low - I think I'm using under 100 bytes of the 1KB RAM. Also, note that I believe all the math functions used above would return doubles, even though I don't need that much precision, Ias far as I know.

What do yall think? Thanks!

-Mike

Last Edited: Mon. Jun 5, 2006 - 06:57 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Never needed the trig functions, so I can't tell you how fast they are. However, from a coding perspective, X^2 != X squared. In C you need to literally use X*X, as X^2 is interpreted as X XOR 0x02.

- Dean :twisted:

Make Atmel Studio better with my free extensions. Open source and feedback welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'd precalculate tan-1[], cos-1[] and sin-1[] look up tables so that a line such as:

S2 = sin-1(Z/H) - X

just becomes:

S2 = sin-1[Z/H] - X 

And the table lookup probably takes about 10 machine cycles rather than the 10,000 it might take to calculate it on the fly.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you really need floating-point? Fixed-point would be *much* faster. You could use Taylor series for the trig functions.

Leon

Leon Heller G1HSM

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have a little benchmark that times 1000 sins in a loop. Seems like I remember they take several ms each. Regular fp adds and mults are about 60 per ms, so 50hz=20ms, plenty time for couple dozen fp ops and 3 or 4 trancendentals. (my tests were at 16mhz). Go for it.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

leon_heller wrote:
Do you really need floating-point? Fixed-point would be *much* faster. You could use Taylor series for the trig functions.

Leon


Fixed point would be fine, but I'm not very sure how I'd write the code to do that. Are there pre-built libraries for that, or would this be a write my own assembly thing?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
I'd precalculate tan-1[], cos-1[] and sin-1[] look up tables so that a line such as:

S2 = sin-1(Z/H) - X

just becomes:

S2 = sin-1[Z/H] - X 

And the table lookup probably takes about 10 machine cycles rather than the 10,000 it might take to calculate it on the fly.

Cliff


This seems like a valid way to do it. How would you go about generating that table, and adding it to the code? And also the float Z/H would somehow need to be converted to an int.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bobgardner wrote:
I have a little benchmark that times 1000 sins in a loop. Seems like I remember they take several ms each. Regular fp adds and mults are about 60 per ms, so 50hz=20ms, plenty time for couple dozen fp ops and 3 or 4 trancendentals. (my tests were at 16mhz). Go for it.

I should have been more clear though - I only have the honor of using 12.5/20ms/hz. During the first 7.5ms of every hz the AVR will be busy doing other stuff. From what you're saying though, it sounds like I should be able to just barely squeak by - and probabaly there won't be any issue at if I change crystals. I'll write and run a similar benchmarking program later today and report back results.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nleahcim wrote:
This seems like a valid way to do it. How would you go about generating that table, and adding it to the code? And also the float Z/H would somehow need to be converted to an int.

Personally I'd use Excel on a PC to generate a table of trig values and then just massage this into a list of comma separated const array initialisers.

There was a thread about this very thing within the last week. If you search for "Excel" I'll bet you'd find it.

Cliff

EDIT: the thread I was thinking of: https://www.avrfreaks.net/index.p...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If he was using a 1mhz 8080, he would be stumped. Fortunately, he's using a 20mhz risc processor, and the old wives tale that sw fp is too slow for any cpu is finally laid to rest. He did the right thing. He speced his time budget at runtime. He DIDNT spec his sw development effort to do lots of tricky scaled integer trigonometric identities and other special case stuff. Get it working with the off the shelf tools, then speed up the 10% thats too slow.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nleahcim wrote:
leon_heller wrote:
Do you really need floating-point? Fixed-point would be *much* faster. You could use Taylor series for the trig functions.

Leon


Fixed point would be fine, but I'm not very sure how I'd write the code to do that. Are there pre-built libraries for that, or would this be a write my own assembly thing?

I had to do that once for a 16-bit ADI DSP, using a mixture of floating-point and fixed-point. I used assembler for the time-critical algorithm, called from the main C program. Most of the basic assembler routines I needed were available from ADI, apart from a fixed-point square root routine which I wrote myself.

Leon

Leon Heller G1HSM

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Mike,

H = (X^2 + Y^2 + Z^2)^.5 

X = cos-1(H/12) 

H & X are dependant on each other. Either H or X has to be assigned some value in order to determine the value of the other.

Will the real H and X please stand up!

Edit:
Or should I say: "Which comes first, the Chicken or the Egg?"

You can avoid reality, for a while.  But you can't avoid the consequences of reality! - C.W. Livingston

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

microcarl wrote:
Mike,

H = (X^2 + Y^2 + Z^2)^.5 

X = cos-1(H/12) 

H & X are dependant on each other. Either H or X has to be assigned some value in order to determine the value of the other.

Will the real H and X please stand up!

Edit:
Or should I say: "Which comes first, the Chicken or the Egg?"


Oops - that was a typo. Some of those should be Ws. W is not something I need at the end - it's just used in both S2 and S3 so I put it in it's own temporary variable

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Mike, one other question...

Do you think that:

H = (X^2 + Y^2 + Z^2)^.5

will be faster or more efficient then:

H = sqrt (X^2 + Y^2 + Z^2);

???

The later will be more readable to the un-aware or less fimiliar programmer.

You can avoid reality, for a while.  But you can't avoid the consequences of reality! - C.W. Livingston

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

if it would be, the library code would just map sqrt onto pow(...,.5), at least it should.

there are different ways to compute a square root, and a lot of optimizations, therefore I think I'd stick to sqrt. but maybe better run some test, you never know :D

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

microcarl wrote:
Mike, one other question...

Do you think that:

H = (X^2 + Y^2 + Z^2)^.5

will be faster or more efficient then:

H = sqrt (X^2 + Y^2 + Z^2);

???

The later will be more readable to the un-aware or less fimiliar programmer.


I was just writing out the math, not the functions that I plan on using. I was planning on using the sqrt function. I would assume it'd be better optimized than the power function.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Update... I added pow(x,.5) to my fp benchmark... its half as fast as sqrt! I get about 2000 per sec for sin, log, sqrt, but I was surprised that pow was about 1000 per sec.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

one possibility to implement pow ( a,b) is obviously exp(log(a)*b) which would explain that it takes twice the time

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

As there've been one or two threads recently about speeding up trigonometry and one of the suggestions has been to used fixed lookup tables to avoid using the library functions I've put together a little PC application to help generate such LUTs

I haven't exactly beaten this to death in the testing - so call this an "alpha" release. ;) [e.g notice how the radio buttons don't even line up!!]

All you do is run the program, pick the function, decide whether you need a quarter, half or full circle of values (they're obviously just a +ve/-ve reflection). Decide how many sample points you want. I default this to every degree but you might choose 128 or 256 or something. Then say how you want the table formatted - that is what data type and finally click the [generate] button.

In the edit control you will have to manually select the generated text and Ctrl-C copy it to the clipboard (I'll add a [copy to clipboard] button later when I've reminded myself how to do that!). Then just paste it into your C editor.

Cliff

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nice, now the user's main problem is whether just doing linear interpolation or maybe use some more cycles for some higher order interpolation. "neville's algorithm" might here a keyword to start at ... 2nd ordner polynomial interpolation should cost 3 times as much as the linear one, and the error should be half as much as with the linear approximation, assuming the bounds for the second derivative are as high as those of the first derivative, which'd be the case for sin(x) e.g.