speed up division

Go To Last Post
19 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

hello, i've been trying to figure out how to speed up the time to calc:

index = 103680/timer1

on an atmega8. the 'timer1' is a 16-bit timer count (ranges from 0x0400 to 0xffff) and the result, 'index' , is the index into an 100 element lookup table.

seems like there should be some way to simplifiy this sucker, but i'm not coming up with anything. any help, tips, shoves in the right direction would be greatly appreciated.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm sure the software guys will have lots of comments.

I'd just mention that with only 100 output options, perhaps a smart lookup table of the correct table index based upon the timer1 value would be faster than calculating it on the fly.

Welcome to the forum.

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why do you have to divide 103680 by the timer1 value at all ? Surely there's a better option to achieve the result you're after.
Can you tell us what you're trying to do ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

He's obviously trying to calculate a frequency from the measured period and linearize/scale the result by using a lookup table. Since the 1/T maximum value doesn't exceed 8 bits with the ranges given above, the divide routine can be minimized to 24 bits / 16 bits = 8 bits. Written smartly in assembler, it wouldn't be too lengthy/slow.

Warning: Grumpy Old Chuff. Reading this post may severely damage your mental health.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This Guy, are you in love with me?

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yep, thats right. Using the timer to calc a freq. The lookup table has to reside in eeprom, so its small. I thought about using the smaller eeprom lookup table to produce a larger lookup table sized for the timer1 values, but that seems lame. I would prefer the "smart asm" or clever math trick, but I cant seem to come up with one.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As a silly question. "Why do you worry about a uint32_t division?"

Ok. It will take several microseconds, but is your eyesight faster than this?

In practice, you probably capture the period and want to know the frequency. You possibly want to maintain an average over several periods.

So add up 10 periods. Then do the division.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just out of interest where does 103680 come from (it's presumably something multiplied by something or something divided by something?). Perhaps you can adjust one of those "somethings" to make things a bit more "binary friendly"?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah, here is the sticky part. The 103680 is the result of some very long math, involving lots of operations (including divisions), so this is the most reduced I can get it. The lookup table contains user defined delays (they set the delays through a gui) for different freqs. The problem is, the division takes so long that it limits the delay range, i.e. i need a gaurd interval to allow time for the calc to take place. since the lookup is user defined, it has to be in eeprom, so space is limited.

ok, one solution i did come up with...

103680/256 = 405 so if you drop the lower byte of timer1 then you have an 8-bit divided by 8 bit which would be 405/timer1_upper_byte which gets you the correct index most of the time, and the max error is 1 index. so, i'm kind of going down that path. to correct the error, i might make a '1 bit' lookup table to correct for the error (it would be 1 or 0, so you add that as needed.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How disasterous is it if you choose the wrong element (off by 1) from the table? That is, if you select Table[i-1] or Table[i+1] when the perfect calculation calls for Table[i]? Because if you can tolerate a little slop at the boundry between two index values, you can use a much smaller (fewer bits) divide routine.

EDIT: I see you're looking at similar options.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have a second array of 100 elements that contains the maximum value the timer must be to evaluate to an index: e.g. a timer value of 51840 will result in 2, 34560 in 2, 25920 in 3, etc. So the first 4 values in the array will be 65535, 51840, 34560, 25920. Then do a binary search from the center of the array testing to see if the current timer value is >= the value in the array element. That will give you a maximum of ~7 tests.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ahh! I like that idea! Good suggestion.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And for those of you to young to remember or don't know what to make of John Brown's post, an obvious play on words, Google Herb Alpert or Burt Bacharach - This Guy's in Love With You".

Rick

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here is an update, I toke Koshchi's advice, and timing went from 625 clock cycles to 20 clock cycles. And the 20 was the worst case search. Thanks for the ideas.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

one question. do you agree that I should make the max values in lookup table: 103680/(x + 0.5) instead of 103680/x so that instead of "51840 will result in 2" and using 51840 in the LUT, use 41472, to minimize rounding error. does that make sense to you guys?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That is up to you to decide since you know how the resulting values will be used. But your original code did not round.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

yeah, that was another problem from the original code, 2.9 would evaluate to 2, etc. mainly wanted to get a sanity check on my plan to better handle rounding. thanks again for the suggestion. reasonably sized lut and much much faster.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

this_guy wrote:
Here is an update, I toke Koshchi's advice, and timing went from 625 clock cycles to 20 clock cycles. And the 20 was the worst case search. Thanks for the ideas.

I am curious how the value of 20 clock cycles can be obtained in this case. We are dealing with 16bit numbers (array stored in memory). Just to reed an element to compare it, it will take 2 times 2 cycles (LD/LDD instructions). Only with this without any comparison it is 7 * 4 = 28 cycles in the worst case scenario. Or am I missing something?

Thanks for explanation.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

boborm,

Sorry i did not see your question until now. I dont have a good answer for you, and your question is a good one. I pulled the 20 clock cycles from reading clock before and after executing the search for every possible value and then output the worst time. I did not actually go through and figure the clock cycles required. I'll go back and double check everything and reply back.