strange behavioral with volatile

Go To Last Post
48 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello

I'm using avr studio v4.18 with winavr 20090313 and atmega8

I have a variable and the following code

uint32_t frequency;
....
frequency= (uint32_t)((((uint64_t)TCNT1+((uint64_t)65536*(uint64_t)timer1_ovf)) * (uint64_t)2034635417)/1000000000);

i need to use the uint64_t to avoid overflow, i also use multiply with 2034635417 instead of 2,034635417 and i divide the result in the end with 1000000000 so it is able to fit in the uint32_t variable.

My code including the above statements compiles to 1660 bytes.

If i use

volatile uint32_t frequency;

then the size becomes 6393 bytes!!!

I want to avoid float operations so i prefer the uint64_t.
I'm using volatile just for debugging to be able to view the variable value inside proteus simulator.
I never has such a strange behavior before but i also have never used uint64_t.

Any idea what might be wrong with volatile?

Thank you
Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

alexan_e wrote:
Any idea what might be wrong with volatile?
Probably nothing. What do you do with "frequency" after you set it? If nothing, then your entire expression probably gets optimized out.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
alexan_e wrote:
Any idea what might be wrong with volatile?
Probably nothing.

As an addition to that, 64-bit arithmetic is NOT optimized in avr-gcc - I believe it uses the generic routines - so if, indeed, the code is being optimized out in the first place you can expect a huge increase in size.

Martin Jay McKee

As with most things in engineering, the answer is an unabashed, "It depends."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

i also use multiply with 2034635417

I'm just a simple old bit-pusher. As TCNT1 is only 16 bits of precision, surely we don't need a full 32-bits of precision with the multiplier. Where does that number come from? The final result is only going to be good to ~16 bits, anyway.

In all probability there is a small ratio that will approximate your huge numbers, and all will work nicely with a 32-bit intermediate result and a 16-bit final result.

Quote:
% factor 2034635417
2034635417 = 13187 154291

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
 2.034635417   3.38236796526559E-06        470           231 
 2.034635417   6.61533256351987E-06        881           433 
 2.034635417   3.38236796526559E-06        940           462 
 2.034635417   3.13721686762136E-06        1351          664 
 2.034635417   3.38236796526559E-06        1410          693 
 2.034635417   9.37821883661982E-06        1469          722 
 2.034635417   6.61533256351987E-06        1762          866 
 2.034635417   1.45450837996108E-06        1821          895 
 2.034635417   3.38236796526559E-06        1880          924 
 2.034635417   7.92486988476782E-06        1939          953 

I'd say these would be "close enough".
470/231 is the smallest multiplier.
1821/895 is the closest with these small numbers. E-06 loses no precision. *470 means you can still use 4700 or 47000 to get some fractional information, while still staying in 16->32->16 bits.

Also interesting is the implication of using TCNT1 in a high-precision "frequency" calculation. I'd think that would be ICP1 or something derived from ICP1.

If I let the VisualBASIC program chunk along a while longer, I find

 2.034635417   1.0454651633296E-07         5052          2483 
...
 2.034635417   3.33333360913457E-10        7813          3840 

Both of those should give you more [meaningless] precision than you desire.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
frequency= (uint32_t)((((uint64_t)TCNT1+((uint64_t)65536*(uint64_t)timer1_ovf)) * (uint64_t)2034635417)/1000000000); 

it looks like the above is simply

timer1_ovf:TCNT1 * 2034635417 / 1000000000.

where timer1_ovf:TCNT1 is a 32-bit number.

so that above becomes, in integer math:

timer1_ovf:TCNT1 * 2 + 0.034635417*timer1_ovf:TCNT1.

0.034635417*timer1_ovf:TCNT1 = (0.03125*timer1_ovf:TCNT1) + ( 0.003385417 *timer1_ovf:TCNT1)
the first term is timer1_ovf:TCNT1 / 32,

and the 2nd term can be decomposed further, to
timer1_ovf:TCNT1 * 1/512 + 0.001432292000 * timer1_ovf:TCNT1.

and you keep decomposing the 2nd term until you get to 1/64K or exhaust the effective digits.

my professor called that "numerical analysis 101" in my 2nd week of programming class.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you for your answers,
my project is an lc meter using avr,
the capacitor or inductor is connected to a comparator to oscillate and avr is used to measure the frequency and calculate the component value.
Timer1 clock source is set to falling edge pulses in T1 input and the signal is measured for a period of about 0.5 sec, i use 15 timer0 overflows at 7813Hz, 15*256=3840 which is 0,491488545 sec so i need to multiply with 2.034635417 to have it in seconds (Hz).

The factor I'm using is floating point so i multiply it with 10^9 (to avoid float) and after i get the result i divide again with 10^9, this is why i need the int64_t.
for example (70000*2034635417)/1000000000
In the example above 70000 * 2034635417 = 142424479190000 = 0x8188C87D4BF0 (44 bit).
My code calculates only the frequency for now and i use proteus simulation to check if the result frequency is correct.

So if i understand correctly when i don't use volatile the code is optimized and my expression is removed and this is why i see the small code size.
Yes this is probably what is going on because
I tried with an extra line of code just for testing so that the compiler wouldn't remove my expression

if (frequency==125000) TCNT0=0x01;

and the results were almost the same (big size) with or without volatile.

I have also tested my expression using float instead of uint_64_t (added libm.a) and the code size is actually much smaller, 2566 bytes using volatile

volatile uint32_t frequency;
...
frequency= (uint32_t)((((float)TCNT1+((float)65536*(float)timer1_ovf)) * (float)2.034635417));

or even the same code (with multiply/divide)

volatile uint32_t frequency;
...
frequency= (uint32_t)((((float)TCNT1+((float)65536*(float)timer1_ovf)) * (float)2034635417)/1000000000);

I thought that avoiding float would produce smaller code but i guess uint64 and int64 are the exceptions to that rule.

Thank you theusch for doing all that calculations
ICP i think is better suited to lower frequency measurement or duty/cycle but i will give it a try.
I think using T1 as clock input for timer1 gives decent result, simulation for a measure period of 0.5sec as above shows about 0.006% error (450028Hz measured instead of 450000Hz )

Thank you friends
Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

Last Edited: Fri. Jan 7, 2011 - 12:16 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

millwood wrote:

it looks like the above is simply
timer1_ovf:TCNT1 * 2034635417 / 1000000000.

Your math is obviously more advanced than mine, i tried to understand what you mean but i can't understand what is the : symbol.
What is the meaning of "timer1_ovf:TCNT1"?

My original expression is

( (TCNT1 + (65536*timer1_ovf)) * 2034635417) / 1000000000;

Thank you
Alex

Ok now i got it, must be the late hour,
you actually said what it was but i have just realized it, it is the total count of timer1, a 32 bit integer.

Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A + 65536*B essentially creates a 32-bit number, assuming that A and B are 16-bit numbers themselves, where the highest 16 bits are B and the lowest 16 bits are A, denoted B:A.

The process I walkedd through essentially is the process that you do floating point math on a cheat, by decomposing the multiplication into a series shifts.

There are addditional ways to speed up the process, like using a union to construct and deconstruct a 32-bit number.

Basically, numberical analysis 101.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

alexan_e wrote:
My code including the above statements compiles to 1660 bytes.

If i use

volatile uint32_t frequency;

then the size becomes 6393 bytes!!!

While everyone is off looking at the equation, let me revisit this.

If I understand you correctly, all you do different is modify

uint32_t frequency;

to

volatile uint32_t frequency;

:?:

If that is so, the optimizer is probably doing the math in the frequency variable, which means every access will require a reload of frequency (perhaps at every step of the calculation?).

My question is why do you need frequency to be volatile? Is it passed back from an ISR? I hope you are not trying to do this calculation in an ISR!! Instead, capture TCNT1 and timer_ovf in other variables and set a (volatile) flag to the main to tell it that new values have arrived. Also check out the ATOMIC_BLOCK macros (Avr-libc Manual: ) for a way to safely access these from non-ISR space.

You should isolate the calculation from the volatile by using a temp variable (or, worst case, a function call) and just set frequency after the calc is done. Remember, volatile forces the compiler to store/save the variable from/to memory for every access.

64-bit math is not AVR friendly, as others have said. Mixing it with a volatile will probably not be pretty.

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

stu_san wrote:
My question is why do you need frequency to be volatile?

I only use the interrupt to count the overflows, all the calculation is done inside the main.
The volatile is temporarily used so that i can view the variable value inside proteus simulator.
The difference in size was because i was not using the frequency variable anywhere so that part was optimized and removed from the compiler when volatile was not used.

I have found that using float the code is much smaller than using uint64_t, 2566 bytes instead of 6393 bytes.

Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I have found that using float the code is much smaller than using uint64_t, 2566 bytes instead of 6393 bytes.


I'll bet that *470/231 is smaller.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

1866 bytes, can you please tell me how do you search for these numbers, what does the VB code do to search for these.

Thank you
Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

size is an issue that may or may not matter: if your code already uses floating point math, then the incremental cost of using a fp-based approach is minimum.

speed is another matter: all the approaches proposed, other than the one I gave you, are roughly equal in execution time - about 9ms on a 1MIPS avr.

the approach I proposed runs around 0.5ms.

all without optimization.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

what does the VB code do to search for these.


Brute-force, trying every numerator/denominator pair.

There is a setting for "how close" the result needs to be before the results are printed. And generally I limit the loops to a range of denominators like up to 10000, and also short-cut the loops when the numerator gets so big that further searches are fruitless.

Sure, it could be made more sophisticated. But the runs I used for your example take only some seconds on a slowish PC, so I don't sweat it.

Example of use in my apps: Take a thermistor app, where a bias resistor is used to make readings linear in an area of interest. (see https://www.avrfreaks.net/index.p... )

I let Excel fit a line. The slope will often be a number such as yours. I then use the VB program to find rise/run equivalents, and it is almost always small integers. I have a routine shown below that does the "y = rise * x / run + intercept". I wouldn't think it would take 500 microseconds, even at 1MHz.

//
// **************************************************************************
// *
// *		C A L C Y
// *
// **************************************************************************
//
//	calcy()	--	returns a value after solving y = mx + b = (rise/run)x + intercept
//
int 		calcy	(	int x,
						int rise,
						int run,
						int intercept)
{
	return 	( (int)	(	(	(long int)x
							* (long int)rise
						)
							/ (long int)run
					)
						+ intercept
			);	/* Calc. y=mx+b */
}

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

2034635417/1000000000: the closest to that is 133342/65536.

so a simpler approach may be to multiple it by 133342 (which can be broken down to a bunch of left shifts) and then shift right by 16. that probably is the fastest approach without losing precision.

the 32-bit version of it would be 8738692575 / (65536*65536). that's not too hard either.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

well, on the same 1mips avr, a straight multification followed by right-shifting 16/32-bit takes about 1 - 1.5ms.

makes sense as the biggest overhead is in division.

you can further speed that up via using combination left-shifts in lieu of multification.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

2034635417/1000000000: the closest to that is 133342/65536.

LOL. 2.04E-6 is "closest"? Not any of those small ratios I gave above?

Quote:

well, on the same 1mips avr, a straight multification followed by right-shifting 16/32-bit takes about 1 - 1.5ms.

makes sense as the biggest overhead is in division.


That would be interesting, as https://www.avrfreaks.net/index.p... work showed y=mx+b to be something less than 200us at 1MHz. True, that was with no division. Let me resurrect that...found it. ;)

loop overhead: 55k cycles

y=m*x+b using longs: 172k cycles - overhead = 117k cycles. That's 117 microseconds. So your shifting takes like 800 cycles? Sounds strange.

y=rise*x/run+b using longs: 915k cycles - overhead = 860k cycles. That is indeed close to 1000 cycles, and indicates that the long division averaged about 700 cycles for the numerator and denominator values used during the test. (Those aren't constant but are based on the loop counter.)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


Quote:
2.04E-6 is "closest"?

the name of the game, as any good "numeric analysis" intro book would teach you, is to reduce the math operations to shifts, as computers are very good at shifting (integers), not as good at doing math, contrary to what most lay persons would think.

so rather than * 2034635417/1000000000, you try to see if you can reduce it to multification + (right) shifting, as division is where most of the time is consumed.

so naturally, the question becomes: if you were to right shift by 16, what would the multiplier be? thus 133342.

now, you can "cheat" by shifting fewer digits. But you will find that on an 8-bit mcu, you want to do either 8 bit shifts or 16 bit shifts. not some other numbers.

hopefully, you will realize by now why *470/231 is a poor way to go.

But if you can decompose it into a series of shifts:

2034635417/1000000000 -> 2 + 4/256 + 1246/65536, aka into a series of 8/16 bit shifts, you can retain high precision without losing speed.

Quote:
So your shifting takes like 800 cycles?

typically, when rationale people make a comparison, you want to make sure that the two things they are trying to compare is indeed "comparable".

I don't know of your "exercise", but I would point out for you that the 1.5ms figure (for multification + shifting) is comparable to the 9ms figure (for multification + division) I quoted earlier. so of the 9ms, 7.5ms was spent on division, and 1.5ms on shifting - roughly.

I know the above simple things may sound daunting to you so if you need additional clarification, I am happy to help you with.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
into a series of 8/16 bit shifts

the reason you want to keep in on 8-bit boundary is so that you can use unions to perform high speed shifts. that approach has its own problems but on a given hardware / compiler platform, it can be lightening fast.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

the 0.5ms figure was achieved using similar techniques, but without using multification at all. Rather than decomposing the factor into just 8/16 bit shifts, which requires the use of multification, I decomposed it into a series of shifts, thus does not utilize any multification:

2034635417/1000000000 =
2 + 1/16 + 1/512 + 1/1024 + 1/4096 + 1/8192 + 1/16384 + ....

also, I used successive shifts to reduce time: 1/512 is simply shifting additional 4 bits from 1/16, etc.

so for a 32-bit type numerator, the most you have is to shift 32 times.

this is a trade-off between multiple shifts but no mulfication vs limited shifts but with multification.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

the same general principle applies if you want to high speed multification - less of value as many modern mcus have hardware multipliers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

so of the 9ms, 7.5ms was spent on division, and 1.5ms on shifting - roughly.

I know the above simple things may sound daunting to you so if you need additional clarification, I am happy to help you with.


Yeah--how come my test program then does the 32-bit multiply, 32-bit divide, and 32-bit add in ~800 cycles and your test program takes ~8000 cycles? 1.5ms for shifting? 1500 cycles at 1MHz? C'mon.

Anyway, I see your point. Given the 16-bit nature of the TCNT1 input, and given your references to Numerical Analysis 101, no "shifts" are even needed. 32-bit scaling factor (your 133k) times TCNT1, then just take the high 16 bits as the result. Done. That's something less than 200 cycles. I might indeed have to modify my calcy() accordingly.

I was thinking I could then save an entry in my scaling factors table (rise, run, intercept) but no luck there as now they are 32-bit scaling factor; same size as 16-bit rise plus run.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

Yeah--how come my test program then does the 32-bit multiply, 32-bit divide, and 32-bit add in ~800 cycles and your test program takes ~8000 cycles? 1.5ms for shifting? 1500 cycles at 1MHz? C'mon.

Again, I am not sure how comparable those numbers are. if you look at the original code, s/he is doing operations on 64-bit types before casting them back to 32-bits.

and I maintained that in my calculation.

I am not sure how yours did it.

and not sure how optimization was done for yours too.

and I have no idea with your reference to 1Mhz is all about.

Quote:
Anyway, I see your point.

I am glad you did, however long it took.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

millwood wrote:
2034635417/1000000000: the closest to that is 133342/65536.

so a simpler approach may be to multiple it by 133342 (which can be broken down to a bunch of left shifts) and then shift right by 16. that probably is the fastest approach without losing precision.

the 32-bit version of it would be 8738692575 / (65536*65536). that's not too hard either.

It seems very easy to calculate the multiplication value this way so thank you for explaining that way. 7813/3840 is more accurate but takes more effort to calculate, i don't know about the speeds at runtime.

I use timer1 to measure a frequency up to about 500KHz so for a 0.5 sec measurement timer1 value will be about 250000 max.
A small calculation with the timer1 value shows that
250000*133342=16384000000 (0x3D0900000) 34 bit
250000*7813=1953250000(0x746C3AD0) 32 bit
the results will be different in different applications but the smaller number will always have "more headroom" to fit in 32bit.

so in my case the second ratio 7813/3840 will be able to be calculated completely in 32bit.

133342/65536 gives an error of -2,03417E-06 (using 2,034635417 - (133342/65536))
470/231 gives an error of 3,38237E-06
7813/3840 gives an error of 3,33333E-10

result of calculated frequency as integer (for 500000)are
using 133342/65536 =499999
using 470/231 =499998
using 7813/3840 =499999

so the result as an integer is very close in all 3 ways but best at first and third way.

I have only one question, why do you use 65536 (0x10000 which is a 17 bit number) and not 65535 (0xffff which is a 16 bit number), is it because of the 8 shifts you said earlier?

Thank You
Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
so the result as an integer is very close in all 3 ways but best at first and third way.

depending on what you are going after. if you are going after precision, stay with the floating point math will save you all the trouble.

if you are going after speed, go with the 1st approach.

Quote:
I have only one question, why do you use 65536 (0x10000 which is a 17 bit number) and not 65535 (0xffff which is a 16 bit number), is it because of the 8 shifts you said earlier?

to shift right n bits is to divide by 2^n.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

133342/65536 gives an error of -2,03417E-06 (using 2,034635417 - (133342/65536))
470/231 gives an error of 3,38237E-06
7813/3840 gives an error of 3,33333E-10

result of calculated frequency as integer (for 500000)are
using 133342/65536 =499999
using 470/231 =499998
using 7813/3840 =499999

so the result as an integer is very close in all 3 ways but best at first and third way.


You do realize, don't you, that all your "best" decisions are superfluous, as you are only starting with a 16-bit TCNT1. You can calculate to 99 decimal places if you want to. Anything beyond a few digits is not meaningful, and only 64k unique values--which are going to be pretty close to 0, 2, 4, 6, ... .

(and I still don't see how any decent results can be obtained using any TCNT1 method up to 500kHz, after doing a ~500kHz excitation/response ultrasonic app)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
You do realize, don't you, that all your "best" decisions are superfluous, as you are only starting with a 16-bit TCNT1. You can calculate to 99 decimal places if you want to.

s/he is using timer1_ovf and TCNT1 to reconstruct a 32-bit timer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

millwood wrote:
Quote:

Quote:
I have only one question, why do you use 65536 (0x10000 which is a 17 bit number) and not 65535 (0xffff which is a 16 bit number), is it because of the 8 shifts you said earlier?

to shift right n bits is to divide by 2^n.

Yes i know that but i was asking why you chose to divide with 65536 which is a 17 bit number and not with 65535 which is the maximum 16bit number.
Is it because 16 shifts will be faster?

millwood wrote:

s/he is using timer1_ovf and TCNT1 to reconstruct a 32-bit timer.

(I am a he)
millwood is right, I am using timer1 overflows and as i wrote in the previous post i expect a max total value of about 250000 (timer1 ticks) so this number is big enough to convert a 0.000004 float number to an integer, yes i don't need all 9 float digits but i need 6 of them for sure.

theusch, why does it seem strange that i can count a frequency using timer1, timer1 uses the (square wave) pulses in input T1 (falling edge) as a clock source for a time window of 15 timer0 overflows(about 0.5 sec).
There is a small error and this is because i have to start timer1 first and timer0 in the next code line and also when timer0 has given the 15th overflow i need a few cycles check the overflow counter and turn off timer1, this is why I'm getting a few more counts in timer1 but the relative error is still small.
I use a crystal of 8MHz (with mega8), but do you think that input capture for one period of 500KHz (2us) can give a more accurate result.
Also note that if i increase my measure time window to 1 sec then the accuracy will be 2 times better than before (and there will also be no need to use the float factor), but i would like to get the result faster.

Thank you
Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

theusch, why does it seem strange that i can count a frequency using timer1

Yes, that is a good way to count high frequencies.

But there is no use agonizing over 10 decimal places. As you said, there is a bit of slop (a few cycles) in the timing anyway. In any case, you are like +/- a count in TCNT1.

Now, calculate the results ideally with a calculator or whatever, for TCNT1 values of 10000, 10001, and 10002. The "real" result is somewhere between the 10000 and 10002 values for a reading of 10001. The result is only good to about 4 significant digits anyway, so why agonize over e.g. 64-bit arithmetic or doubles or like 10 significant digits? It is meaningless.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
But there is no use agonizing over 10 decimal places. As you said, there is a bit of slop (a few cycles) in the timing anyway.

it depends on the cause(s) of the slop / miscounts. they may or may not impact the accuracy here. for example, isr latency, typically 20 - 30 counts, does n ot impact the accuracy as they are "consistently" late by the same latency. so you will undercount going into the isr and you will overcount exiting the isr, and missing no count in the process.

Quote:
The result is only good to about 4 significant digits anyway, so why agonize over e.g. 64-bit arithmetic or doubles or like 10 significant digits? It is meaningless.

frequency meters are typically six digits long so I think that's why the original poster wanted to get his meter to 6 digits too.

with a 32-bit counter, you can obviously get to more than 6 digits. so why settle for 4 digits?

I don't fully understand the significance of multiplying the count by a constant (calibration?). to me, you just display whatever the count you have in a second or a fraction of a second and you have your frequency. Say that you counted X pulses in 10ms, and your frequency is essentially (X * 100 = X * 64 + X * 32 + X * 4), which can be done with a series of speedy shifts.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I use a crystal of 8MHz (with mega8), but do you think that input capture for one period of 500KHz (2us) can give a more accurate result.

the issue with capture is that unless you know the frequency you are trying to measure, you will have a hard time knowing if overflow has taken place between two captures.

counting is the way to do. you just need to figure out how long you want to count, which is in part determined by your desire for accuracy.

what I did is to implement an auto-ranging function: you first count the incoming pulse with a short period of time. if the count is deemed "too high", you shorten the time period and measure again. if the count is deemed "too low", you lengthen the time period and measure again.

you will find that this type of hardware is good maybe to 2Mhz. so implementing 3 time periods is good enough.

with a 32-bit counter (to 4Mhz), you probably don't need to use that approach, unless you want fast display update.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:

Now, calculate the results ideally with a calculator or whatever, for TCNT1 values of 10000, 10001, and 10002. The "real" result is somewhere between the 10000 and 10002 values for a reading of 10001. The result is only good to about 4 significant digits anyway, so why agonize over e.g. 64-bit arithmetic or doubles or like 10 significant digits? It is meaningless.

But my (timer1_ovf*65536)+TCNT1 will be up to 250000, why do you calculate using low timer values of 10000?
In my example above i shows that a number like 0.000004 will become an integer.
using my original factor of 2,034635417 the digits up to 2.034635 are significant.

250000*2.034635 = 508658
250000*2.03463 = 508657

Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

millwood wrote:
theusch wrote:

I don't fully understand the significance of multiplying the count by a constant (calibration?). to me, you just display whatever the count you have in a second or a fraction of a second and you have your frequency. Say that you counted X pulses in 10ms, and your frequency is essentially (X * 100 = X * 64 + X * 32 + X * 4), which can be done with a series of speedy shifts.

Your way is the same mine, you are multiplying with a consent number to get the freq/sec (Hz), you are just doing it much faster with shifts instead of a number like mine.
Since the measure time is always the same this multiply factor is also always the same.

I chose to use a measure window of x*timer0 overflow to avoid loosing any time setting a starting value to TCNT0 inside the overflow interrupt or i could easily measure for 100ms,200ms or 500ms and do a simple multiplication with an integer.
Unfortunately no x*timer0 is close to an integer so i have used 15*256*(1/7813Hz)=0,49148854473313...sec

Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
for example, isr latency, typically 20 - 30 counts, does n ot impact the accuracy as they are "consistently" late by the same latency.
No they are not. The latency in will depend on what opcode is being run at the time the interrupt is triggered. And if other ISRs are enabled, you may have to wait for another ISR to finish.
Quote:
which can be done with a series of speedy shifts.
Or even speedier multiply.

Regards,
Steve A.

The Board helps those that help themselves.

Last Edited: Sat. Jan 8, 2011 - 05:34 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

maybe you should think about configuring tmr0/oscillator so that it yields a number that's more "multiplification" friendly? for example, you can set your timer to overflow at every 200us.

Quote:
15*256*(1/7813Hz)

i assume that 256 is your prescaler? if you have used a prescaler on the input pulse train, and you don't have a way to read your prescaler, the last 8 bits of your frequency reading is basically not reliable - as you will always undercount the pulses and the amount of undercounting is not knowable, to the code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

millwood wrote:
maybe you should think about configuring tmr0/oscillator so that it yields a number that's more "multiplification" friendly? for example, you can set your timer to overflow at every 200us

200us seems very small when counting the pulses but for any friendly timing (i would say at least 100ms) i would have to set TCNT0 to a starting value in every overflow interrupt, i will loose time in this and the result will loose accuracy.

millwood wrote:
Quote:
15*256*(1/7813Hz)

i assume that 256 is your prescaler? if you have used a prescaler on the input pulse train, and you don't have a way to read your prescaler, the last 8 bits of your frequency reading is basically not reliable - as you will always undercount the pulses and the amount of undercounting is not knowable, to the code.

No timer1 has no prescaler, it is counting the pulses in input T1 for a duration given but timer0.

this is the time it takes for timer0 which is 8bit to overflow 15 times with clock 7813Hz
15*256*(1/7813Hz)=0,49148854473313...sec

Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
clock 7813Hz

the clock for the mcu is 7813? how are you going to count those Mhz pulses?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

the mcu clock is 8Mhz.
timer0 is using a 1024 prescaler and is working at a frequency of 7813Hz

Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

then reconfigure tmr0 so it doesn't come up with such an odd number.

how about 5000 interrupts * 200us/interrupt = 1s?

in this case, it makes more sense to use the 8-bit timer/counter (tmr0) as counter of the input pulses and use the 16-bit timer (tmr1) as the timer to gate tmr0.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Y'all are far more sophisticated than the Old Bit Pusher. I'd just take my TCNT1 every 1/8 or 1/10 second, and park the sample into an 8 (or 10) place circular buffer. When I wanted to display or further process, I'd add them up. For 500kHz, adding them up does indeed cost a few 32-bit adds. Nary a float or conversion factor in sight. "Partials" aren't lost--they end up in the next sample.

I'd have to do some thinking about the right way to clear
TCNT1. For that, your overflow counting has an advantage, as long as you handle the "just overflowed" case.

For ICP, I got "close enough" for the app with 20MHz AVR and 50ns ticks. That gave me ~40 ticks at my max signal rate.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

millwood wrote:
in this case, it makes more sense to use the 8-bit timer/counter (tmr0) as counter of the input pulses and use the 16-bit timer (tmr1) as the timer to gate tmr0.

using a prescaler of 64 the timer clock becomes 125000Hz
1/125000=8us per tick

switching the timers is a good idea,
if i use timer1 then 62500*8us=500000us=0.5sec
i only have to set the starting value of the timer once before i start the measurement, then i will multiply the result frequency with 2.

Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
For ICP, I got "close enough" for the app with 20MHz AVR and 50ns ticks. That gave me ~40 ticks at my max signal rate.

it is more of an issue of if the pins can follow the pulse train at that kind of frequency.

Quote:
8us per tick

I couldn't quite follow the math here but you want the tmr that controls the duration of the measurement time (originally tmr0 in your design and now tmr1 in my proposal) to generate as fewer interrupt as possible so the mcu doesn't miss any pulses.

that means you want tmr1 (in my proposal) to be set to trip as long as possible, without generating a computational issue for you later.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

millwood wrote:

Quote:
8us per tick

I couldn't quite follow the math here but you want the tmr that controls the duration of the measurement time (originally tmr0 in your design and now tmr1 in my proposal) to generate as fewer interrupt as possible so the mcu doesn't miss any pulses.

that means you want tmr1 (in my proposal) to be set to trip as long as possible, without generating a computational issue for you later.

The mcu core uses an 8Mhz crystal
i have set timer1 clock to use a prescaler factor of 64 so the timer1 clock is 8000000Hz/64=125000Hz

The period of frequency 125000Hz is 1/125000Hz=0.000008 sec = 8us
Timer1 will increase the counter (TCNT1) every 8us,
for a time duration of 0.5sec (500000us) i need 500000us/8us =62500 ticks

so if i start timer1 from 65536-62500=3036 (TCNT1=3036) when i get the overflow interrupt it will be 0.5sec.

And for that duration of 0.5sec i will be counting the overflow interrupts of timer0 which uses T0 input as clock (falling edges from the signal i want to measure)

Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

But my (timer1_ovf*65536)+TCNT1 will be up to 250000, why do you calculate using low timer values of 10000?
In my example above i shows that a number like 0.000004 will become an integer.
using my original factor of 2,034635417 the digits up to 2.034635 are significant.

250000*2.034635 = 508658
250000*2.03463 = 508657


Your end result is +/-2 anyway (since your base is an integer count +/-1 and you are multiplying by ~2). From the table below, I used Excel to put in your fancy multiplier, as well as the small ratio numbers presented above. A reported value of (say) 250001 from the table results in a frequency +/-2 from 508660. The "real" value is somewhere between 508658 and 508662. Any more displayed precision is simply superfluous.

        		470      	7813
	2.034635417	231     	3840

10000	20346.35417	20346.32035	20346.35417
10001	20348.38881	20348.35498	20348.3888
10002	20350.42344	20350.38961	20350.42344

250000	508658.8543	508658.0087	508658.8542
250001	508660.8889	508660.0433	508660.8888
250002	508662.9235	508662.0779	508662.9234

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Reversing the timers had great results.
I start timer 1 from value 3036 and when it gives the overflow interrupt i stop the measurement, I count the incoming frequency with the overflow interrupts of timer 0 (input T0).
Using this way i get an exact frequency measurement even for 2Mhz.

Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Using this way i get an exact frequency measurement even for 2Mhz.

would be interesting if you could push it over 2Mhz. that was the barrier I had with PICs (2.5Mhz to be exact).

how fast is your cpu running?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

cpu clock is 8MHz,
i tried with 5000000Hz input, results were 4999998-5000002Hz

with 6000000 input results were 6000022-6000026Hz

with 7000000 the result were about 6999958Hz

Al this are from simulation in proteus but from previous projects experience the simulation results reflect the actual circuit behavior.

Alex

"For every effect there is a root cause. Find and address the root cause rather than try to fix the effect, as there is no end to the latter."
Author Unknown