Time taken for a mathematical operation

Go To Last Post
96 posts / 0 new

Pages

Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi!!
I have a problem regarding the time taken for executing a math operation like sine.
Suppose i use the code like:
//the code
y=sin(x);
//where y,x are in float type

how many clock cycles will it take if i set the clock as 1 MHz?
i amusing AVR gcc for programming the micro
I am a newbie and will be very glad for your help!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Use the simulator.

Leon Heller G1HSM

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

leon_heller wrote:
Use the simulator.

He has to write the code before he can use the simulator.

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the reply!
simulator in the sense-AVR studio?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Of course!

Leon Heller G1HSM

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I know that search is a bit bad in this forum but somewhere there some numbers how fast the different C compilers are for this kind of things.
I think that sin(x) is about 1000-2000 clk. (for a mega AVR, and way slower on a tiny)
Be aware that the speed depends of the value of x.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the help!!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The fact is that the sin() algorithm is not a trivial one. Lookup tables are not used, so the sine must be calculated
somehow. One solution is solving the Taylors series for sin(x).
Ie. sin(x)= 1- (x^3)/3! +(x^5)/5!- (x^7)/7! and so forth until the required accuracy is reached. Other trig functions can be calculated from the trigonometric identities. Sorry no magic, it takes time!
To find out how long it takes, just use the C maths libraries and do the simulation using AVRStudio as suggested.
If you are only interested in a few sines or less accuracy, you could use a lookup table followed by some linear interpolation.

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In this:

#include 
#include 

float f;

int main(void) {
   f = sin(PINB / 180 * 3.1415926);
}

The call to sin() alone (not the parameter setup) takes 1012 cycles.

I had to use a complicated parameter because even if I use:

   f = sin(45.0 / 180 * 3.1415926);

the GCC optimiser is so smart it calculates the entire thing at compile time and just generates code to put the fixed result into 'f'.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

what math lib , and compiler settings are you using?
with -0s and the standard lib (I never use float) the sine takes about 24000 clk!
(AS4 simulator2 with a mega128 project).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

somehow i always thinking of code optimization during development. We can always decide, if you need speed then you loss the space (hence using look up table), and vice versa with the space. If the look up table size is bigger than the trigonometry algorithm then don't use look up table, still if you need more processing speed you can choose faster controller

KISS - Keep It Simple Stupid!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If your application has strict timing requirement, beware that some implementations of math routines have non-deterministic timing; there might be inputs where the execution time is much, much longer than normal.

An article on DIY trig functions:
http://www.ganssle.com/articles/atrig.htm

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The choice here is to use float at all, sin() are small if float mul allready is used, but the program clawson show is about 1600 byte!
here are some numbers:
sin(0) 33 clk
sin(0.1) 3974clk
sin(1) 4135clk
sin(2) 4176clk
sin(3) 3962clk
sin(100) 4605clk

edit:
Now I know why the numbers are high, the sin(), (that's the float mul) don't use the HW multiplyer so this is the numbers for a tiny AVR!
I will find out why!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
the GCC optimiser is so smart it calculates the entire thing at compile time

Is it? How does it know the sin() is a trigonometric function, and knows the code for it if the sin() code (definitely float*float) was written for AVR in assembler?

It seems to me AVR GCC needs to have two codes for sin() to do that. The first one is an AVR implementation (mul, mulsu and such) and the second one is a host implementation to calculate sin(pi/34) and substitute a constant value into the AVR's hex.

Or perhaps it has only one implementation and the compiler itself loops through AVR asm code calculating/simulating it?

By the way, I know it is off-forum, but it is on-topic.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
It seems to me AVR GCC needs to have two codes for sin() to do that. The first one is an AVR implementation (mul, mulsu and such) and the second one is a host implementation to calculate sin(pi/34) and substitute a constant value into the AVR's hex.

Yes. Why not. Id does that with other numeric computations, so why not with trig?

Quote:
How does it know the sin() is a trigonometric function

By looking at it's name?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

vageesh wrote:
how many clock cycles will it take if i set the clock as 1 MHz?
... the same number as for all clock frequencies ...

Ross McKenzie ValuSoft Melbourne Australia

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

and the standard lib

BIG MISTAKE! You must use libm.a with avr-gcc if you use !
Quote:
but the program clawson show is about 1600 byte

Not when libm.a is used:

Size after:
AVR Memory Usage
----------------
Device: atmega168

Program:    1202 bytes (7.3% Full)
(.text + .data + .bootloader)

Quote:
It seems to me AVR GCC needs to have two codes for sin() to do that. The first one is an AVR implementation (mul, mulsu and such) and the second one is a host implementation to calculate sin(pi/34) and substitute a constant value into the AVR's hex.

It clearly does:

00000090 
: #include float f; int main(void) { f = sin(45.0 / 180 * 3.1415926); 90: 83 ef ldi r24, 0xF3 ; 243 92: 94 e0 ldi r25, 0x04 ; 4 94: a5 e3 ldi r26, 0x35 ; 53 96: bf e3 ldi r27, 0x3F ; 63 98: 80 93 00 01 sts 0x0100, r24 9c: 90 93 01 01 sts 0x0101, r25 a0: a0 93 02 01 sts 0x0102, r26 a4: b0 93 03 01 sts 0x0103, r27

Obviously the sin() in this case is done by the 8086 program using 8086 library code - that leads to a numeric constant then the AVR code generator just generates code to load this constant.

Last Edited: Sat. Jul 2, 2011 - 02:58 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Several decades ago I wondered how long a routine took. My solution was to call the routine 10 times in a loop. I figure that divides the loop overhead by 10. Then you run the loop 10,000 or 100,000 times and you can time it with your watch. I assume you are a Young Guy and you might not have ever seen one of those obsolete gizmos, but engineers over 50 used to all have one on their wrist at all times during the day to see the time at a glance.
OK, I looked up my AVR reaults... at 18.432MHz, I get 60,000 fp mults or adds a sec and 2000 sins a sec. Divide that by your clk rate. I'll post the prog if anyone wants to see it.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

and 2000 sins a sec

That is 9,216 cycles. GCC did the call to sin() in the following in 1,629:

#include 
#include 

float f;

int main(void) {
   PORTB = 39;
   f = sin(PORTB / 180.0 * 3.1415926);
}

(again I had to use PORTB to prevent the aggressive optimiser)

(Did I mention the other day I had some reservations about ICC?)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Except I don't believe any extrapolated results from a simulator. If you want to run a couple 1000 sins and actually measure the time, I'll believe that. I didn;t know you had to reserve a copy of the Imagecraft compiler. I always got my copy right when I ordered it. Maybe because I'm such a Good Customer?

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Except I don't believe any extrapolated results from a simulator. If you want to run a couple 1000 sins

I'd trust the simulator in AVR Studio 4 in this respect. I've never seen it mis-calculate clock cycles.

I agree that we need to run many call to sin, mainly because of what Jayjay argued. Some trig implementation have varying execution times depending on the argument passed. It just might be that 39 degrees is a case fitting avrlibc extremely well.

If we want to measure the time for the trig only, and not get other stuff (i.e. the test-bench involved) then the simulator is an excellent instrument. Running a test e.g. 10K times on a real chip and measure by looking at a LED and your wristwatch makes it very hard to discern between the time for the 10K loop and the trig call proper. And the discussion will

How about whole degrees, every degree, from 0 to 180? Span of time-measuring is from point of call to point of return? Most meaningful unit to report results in is clock cycles? Data is a table with two columns: Degree value and clock cycles?

I'll smack it into Excel or OO Calc and make a diagram.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Bob,

This took 13s on a mega16 in an STK500 clocked using the STK500 at 3.6864MHz:

#include 
#include 

float f;

int main(void) {
   uint16_t i;
   DDRC = 0xFF;
   for (i=0; i<20000; i++) {
     PORTB = i;
     f = sin(PORTB / 180.0 * 3.1415926);
   }
   PORTC = 0xFF;
   while(1);
} 

(the PORTC lights went out at the end of the period). Normalised for 1MHz that would be 47.9s. At 18.432MHz it would have been 2.6s

Note that I'm involving PORTB to force the sin() to be called. It also means that the /180 and *3.1415926 are being done each time too (or rather that will be one division by 57.3 each time), before the sin(). I used the iterator as the input value to get over the "39 happens to be a good value" effect (if present).

EDIT: thinking about it I don't need PORTB or even the conversion from degrees to radians. This works:

#include 
#include 

float f;

int main(void) {
   uint16_t i;
   DDRC = 0xFF;
   for (i=0; i<20000; i++) {
     f = sin(i);
   }
   PORTC = 0xFF;
   while(1);
} 

That completes in 6.1s. So 22.48s at 1MHz or 1.22s at your 18.432MHz

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I put the call 10 times in the loop to divide out the loop overhead. Your result was 819 per sec. Did I quote 2000 per sec? Ready to reserve your own copy? Nyuk Nyuk.

Imagecraft compiler user

Last Edited: Sat. Jul 2, 2011 - 03:46 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I put the call 10 times in the loop to divide out the loop overhead.


Pointless it's about 10 cycles amongst 1000 or more.

EDIT but OK I tried:

   for (i=0; i<20000; i++) {
     f = sin(i);
     f = sin(i);
     f = sin(i);
     f = sin(i);
     f = sin(i);
     f = sin(i);
     f = sin(i);
     f = sin(i);
     f = sin(i);
     f = sin(i);
   }

Sadly the overall time was the same. I'm afraid it's a constant battle against the GCC optimiser it obviously recognises there's no point doing it 10 times as it's only used once. Perhaps if I make 'f' volatile - but that's going to introduce more overhead that the loop iteration...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here's the version that times the loop overhead, then subtracts it out after adding the sins. I use tabs=2 and it lines up nice.

Attachment(s): 

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I tried this:

   for (i=0; i<20000; i++) {
	asm(""::);
   }

(the central statement generates no code but leads to the loop being generated).

Sadly the led was on for so short a time I couldn't measure it - a fraction of a second. So I tried:

   uint32_t i;
   for (i=0; i<2000000; i++) {
	asm(""::);
   }

That took 6 seconds. So the loop overhead previously was 1/100th of this - that is 0.06s - in fact the above is using less efficient uint32_t so it is quicker. Like I said, triival.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Except I don't believe any extrapolated results from a simulator.

Bob, we've gone round and round on this. With real numbers. Comparing compilers. ImageCraft did not come out well. When the same topic was revisted some months later, your results magically got 10x faster. When challenged on this, there was no reply.

Now, this was some years back, and there is a note from ImageCraft on a FP re-do. But read all of both threads and then, Bob, tell us which steroids you have been feeding your compiler.

https://www.avrfreaks.net/index.p...
https://www.avrfreaks.net/index.p...

IIRC "fpbench" was re-visited again, with GCC numbers as well. Searching... Can't seem to find the thread I was looking for, but this one has links to a lot of prior discussions:
https://www.avrfreaks.net/index.p...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Meta-meta: I find it interesting that you determine the loop overhead for an empty loop and then assume that the compiler will produce the same loop code (or at least the same overhead) when you have stuff in the loop.

Did you verify this (e.g. by looking at the generated code)?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah But LT... doesn't it make sense to run the tests every decade or so to keep the compiler writers honest? The Real Interesting thing I remember about the argument was that a bunch of y=mx+b calcs using longs was 6x faster than the same calcs using floats on the AVR, but only 2x faster on my android (ARM so it has 32 bit regs but no hw fp)

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

list file for above c file I think....

Attachment(s): 

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So with so much time taken for computation, is the following idea ok?:
create a array with value of sine function for a single cycle with a certain resolution and use the values and some interpolation
For me memory is not a issue

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Read the posts. It depends on how fast you need it.

If your design is a solar tracker and you need one sin(x) every 10 minutes, then your LUT is not ok. Just use sin(x) and do not worry about the time.

But if you tried your algorithm in a simulator and from it you get AVR must be clocked at 321MHz, then you can consider a LUT (or better an ARM with FPU).

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well I've taken Bob's rats nest of a code scattered with literal constants, use of compiler specific functionality and assumptions about processor speed and tried to make it work with GCC but failed. Hopefully someone else has more patience than I. The point I was actually going to make was that my compiler was going to throw away most of the pointless code in the loops so I would get near 0 timing on many of the loops. But I'm afraid I can't make it that far.

Does anyone have a sensible benchmark that is cross-platform we could all try?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Yeah But LT... doesn't it make sense to run the tests every decade or so to keep the compiler writers honest?

Certainly. But I still want to know why your code mysteriously got 2x faster. You never chose to address that. You choose not to address it now. Which number that you are spewing, based on your watch rather than counting cycles, are you touting now?
Quote:

OK, I looked up my AVR reaults... at 18.432MHz, I get 60,000 fp mults or adds a sec and 2000 sins a sec. Divide that by your clk rate. I'll post the prog if anyone wants to see it.

29-July-2004 https://www.avrfreaks.net/index.p... You claimed 985 sin()/second. The thread was nebulous about the AVR clock speed, but the code indicates 14.7MHz. That would make about 1234 at 18.4MHz.

17-October-2006 https://www.avrfreaks.net/index.p... You claimed 2023 sin()/second. Twice that of the previous run, but an unspecified clock speed.

So the steroids have made your test complete in half the time. (If you really do care

Quote:

to run the tests every decade or so to keep the compiler writers honest

then why haven't you given current numbers rather than going back to the numbers from the 2006 run?

Note that if you re-run and post the test results again, I'm guessing the numbers are still going to be a fraction of those for CV, unless ImageCraft did indeed rework FP primitives and library. I suspect that is why you don't want to take this up. But in that case, the dare/challenge is confusing.

Yes, Bob, rule-of-thumb guidelines as well as "rough mental math" to check sanity is very useful. But don't come and spew supposedly hard numbers without something to back it up and then post inconsistent results.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The code is written for a mega32 isn't it - how can you run that at 18.4MHz anyway?

(Personally I'm happy running benchmarks at 3.6864MHz just because it is baud friendly - at the end of the day they have to be normalised to 1MHz anyway or perhaps better is simply to calculate CPU cycles - you can't put a double speed crystal on the AVr and suddenly claim my compiler is twice as efficient you know!)

Perhaps you needed to use a high F_CPU to reduce the timer interrupt overhead?

Do we all even agree that the following from Bob's code (ignore the whacky timing mechanisms) are the valid things to test anyway?

  while(c != 'q'){
    n=1000;
    printf("\nbegin %d overhead loops\n",n);
  	t1=gettics();
    for(i=0; i1 meg)\n",n);
	  t1=gettics();
    for(i=0; i < n; i++){
	    memcpy(buf2,buf1,sizeof(buf1)); //took 5 sec for 10 megs... about 2 megs/sec
    }
	  t2=gettics();
    dt=t2-t1; //5usec tics
	  secs=dt*secpertic;
	  flops=n/(8*secs); //K/sec
//  	dumpstack();
    printf("%lu tics  %#8.6f secs %#7.1f K/sec\n",dt,secs,flops);

If so then forget seconds - we need a solution that will work across compilers to count cycles in each.

(For me the obvious solution is simulator 2 in AS4 - YMMV)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Itchy trigger finger Cliff. The upcoming spitfire flight is getting to you!

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

If so then forget seconds - we need a solution that will work across compilers to count cycles in each.

(For me the obvious solution is simulator 2 in AS4 - YMMV)


Quote:

Itchy trigger finger Cliff. The upcoming spitfire flight is getting to you!

As that is (probably) what I used in the past go-round, I'll oblige. Now, Cliff--does Bob's Dinosaur-Of_Choice Mega32 have Sim2 support?

So, we need to agree on a target AVR model. I don't THINK it will make any difference Mega128-Mega32-Mega8-Mega88-Mega324-Mega640, but let's pick one.

And since Cliff and I want to abstract from AVR clock rate, let's abstract to operations per second per Megahertz. That means with simulator ticks choosing a 1MHz clock means the simulator gives us our numbers almost directly.

So, Cliff, pick an AVR model supported by Sim2. IIRC the latest 'Studio I have loaded is 4.18 but I'd be happy to upgrade if there is a hue and cry.

Now, Bob doesn't care to acknowledge references to what he has posted/claimed in the past, so I'll re-post my adapted FPBENCH that I used before. It is similar to your approach, Cliff, except I have removed all the printf() stuff to make more friendly for an arbitrary toolchain setup. I used the simulator ticks from breakpoint to breakpoint. (You'll probably need to make "c" volatile so it will not disappear.)

//file fpbench.c
//test avr flops
//Mar 4 2003 Bob G (bobgardner@aol.com) compile w iccavr 6.27
//Jan 22 04 Bob G compile with iccavr 6.30
//Feb 4 04 Bob G add timer
//July 29 04 Bob G add y=mx+b

// Ripped up by Lee Theusch  29 July 2004 CV version, AVRStudio
// Assume 14.745600MHz crystal

/*
			Ms.		Net		Ops/sec
Overhead	36.69	0		---
FP Add		44.18	7.49	133511
FP Mult		61.78	25.09	39857
FP Mult1	60.74	24.05	41580
FP Mult0	39.81	3.12	320513
FP Div		70.49	33.8	29586
FP sin()	318.88	282.19	3544
FP log()	299.26	262.57	3809
LONG y=mx*b	11.94	-24.75	-40404	83752
FP y=mx+b	70.22	33.53	29824
BLOCK Move	559.42	522.73	1913
*/

#include 
#include 
#include 

#define INTR_OFF() asm("CLI")
#define INTR_ON()  asm("SEI")


//-----globals----------
unsigned int tofs;
unsigned long t1,t2;
unsigned long dt,ohdt,net;
int n,i,j;
float x,y,z,m,b;
long ix,iy,iz,im,ib;
float sec,flops;
char buf1[1024],buf2[1024];

//----------------
void main(void){
//fpbench main program
char c;

  c=0;
  n=1000;

	c = 1; // a spot to set a breakpoint
// Overhead Loop
    for(i=0; i

Results are in the linked thread. I will repeat, but it is a holiday weekend here for us Yanks and my remote-control appears to be hung up.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Does anyone have a sensible benchmark that is cross-platform we could all try?
CoreMark
Recent Atmel additions to its results database are a UC3A3 and AP7000.
Interesting results for Freescale Kinetis Cortex-M4.
IIRC, XMEGA may have some cycle count improvements;
if so, someone running CoreMark on a XMEGA A-series would put it on CoreMark's map.
But, CoreMark does not directly answer the OP's questions.

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I will repeat,...

Bob sez it is a "good thing" to repeat his FPBENCH every decade.

Good thing too--CV has apparently improved FP multiplication (though the overhead loop lost some cycles). Below is the source file, and two sets of results for differnt compiler versions.

AVRstudio 4.18. SIM2. Mega128 target. (Mega644 has same results.)

Since these are hard results numbers from a repeatable test. Bob will remain strangely silent until he pulls his next rule-of-thumb number out of the air, inconsistent with any he had given before.

Lee

Loop Repetitions
1000                    CodeVision 1.25.9                       CodeVision 2.04.5

                GROSS   NET     CYCLES/ OPS/            GROSS   NET     CYCLES/ OPS/
TEST            CYCLES  CYCLES  OP      MHz             CYCLES  CYCLES  OP      MHz
OVERHEAD        535006                                  555006
ADD             643475  108469  108     9219            671475  116469  116     8586
MULT           1367988  832982  833     1201            752512  197506  198     5063
MULT1           729859  194853  195     5132            749859  194853  195     5132
MULT0           581006   46000   46    21739            601006   46000   46    21739
DIV            1289682  754676  755     1325           1317682  762676  763     1311
SIN            3699126 3164120 3164      316           3783126 3228120 3228      310
LOG            4119284 3584278 3584      279           4099324 3544318 3544      282
SLOPE (long)    170022 -364984  170     5882            170022 -384984  170     5882
SLOPE (float)   849968  314962  315     3175            868966  313960  314     3185
#include 
#include 
#include 

#define INTR_OFF() asm("CLI")
#define INTR_ON()  asm("SEI")


//-----globals----------
float x,y,z,m,b;
volatile char c;
long ix,iy,im,ib;
char buf1[1024],buf2[1024];

//----------------
void main(void){
//fpbench main program
int n,i,j;

  c=0;
  n=1000;

	c = 1; // a spot to set a breakpoint
// Overhead Loop
    for(i=0; i
//cliff added this to widen the thread so the table looks right
=========================================================================================

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sheesh. It wasn't a conspiracy. I ran the test with a timer tic enabled, and Holy Smokes! If I run the timer tic real fast, my benchmark slows down. My intention was to measure the execution time of the test loop, subtract the overhead of the for loop and assignments, and printf the results. If I run it with fpbench >foo I scoop the results off to a file. The reason it looks like I kept cuttin and pastin extra cases on the end, was that I was cuttin and pastin cases on the end. I have run it on a 16mhz mega32 and on my hotwired MT128 where I changed the xtal to 18.432. I think the general results are: fp mult and add have cost 'X', divs are slower, sins and logs are slower. I was curious about whether 'short cycling' a mult by 0 or by 1 helped. I ran this exact same bench on all my pcs since the XT, and it wasnt till the pentium that fp mult by 0 and 1 got speeded up.

I post a program that calculates ops per sec, and you guys change it to ops per mhz? Cliff says my program is a rat's nest, full of literal constants and compiler dependencies. Man this is a tough crowd. If he had posted a file full of stuff like __PROGMEM__ and I gave a big rant about compiler dependent stuff, the shoe would be on the other foot. Do you object to running the timer to collect the timing info? Could I make you happier by adding an ops/mhz result to my ops per sec results?

OK, here's my results from my 16MHz mega32, using 100usec interrupt.

Attachment(s): 

Imagecraft compiler user

Last Edited: Tue. Jul 5, 2011 - 07:37 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

The reason it looks like I kept cuttin and pastin extra cases on the end, was that I was cuttin and pastin cases on the end.

That's fine.

You still have never answered how your numbers magically got twice as fast from one post to the next.

You've responded with your "sheesh", but no numbers. As a Wise Sage opined:

Quote:
Yeah But LT... doesn't it make sense to run the tests every decade or so to keep the compiler writers honest?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm getting 40 flops per ms at 16mhz, that extrapolates to 46 flops per ms at 18.432. I have a theory that if I reduce the timer tic freq I can get faster results. But in the Real World, the timer will be ticking, so maybe that 40 flops per ms is the new revised Real and True BobG Rule of Thumb for fp overhead.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bobgardner wrote:

so maybe that 40 flops per ms is the new revised Real and True BobG Rule of Thumb for fp overhead.

"Rule of Thumb"? ARM? :lol:

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Try creative approaches specific to the 8-bit microcontroller. I suggest using tables and putting the tables into a serial EEPROM. Then use TWI to access the sine table for an angle from the table in EEPROM. EEPROMs for this application cost only about $1 USD.

Try avoiding float and assume that the 32-bit result needs to be shifted to make it less than one, as as sine value is. Try using sines that are derived from angles based on a circle that has 256 degrees instead of 360. These equations might be able to be done by creative bit shifting, instead of float routine calls.

Don't use precision that is beyond the real-world needs of the specific application, also.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Try creative approaches specific to the 8-bit microcontroller. I suggest using tables and putting the tables into a serial EEPROM. Then use TWI to access the sine table for an angle from the table in EEPROM. EEPROMs for this application cost only about $1 USD.

Try avoiding float and assume that the 32-bit result needs to be shifted to make it less than one, as as sine value is. Try using sines that are derived from angles based on a circle that has 256 degrees instead of 360. These equations might be able to be done by creative bit shifting, instead of float routine calls.

Don't use precision that is beyond the real-world needs of the specific application, also.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Try creative approaches specific to the 8-bit microcontroller. I suggest using tables and putting the tables into a serial EEPROM. Then use TWI to access the sine table for an angle from the table in EEPROM. EEPROMs for this application cost only about $1 USD.


Gotta be slower. Now, might be a solution for the OP, bit I'm in the middle of Bob-Bashing (tm).

Taking Bob's latest results, which are from ImageCraft version 7.23, and normalizing to OPS/MHz (by dividing the FLOPS number by 16 'cause the tests were run at 16MHz):

        CodeVision 2.04.5       ImageCraft ver. 7.23
        OPS/                    OPS/
TEST    MHz                     MHz
                                
ADD      8586                    2693
MULT     5063                    2765
MULT1    5132                    2500
MULT0   21739                    5208
DIV      1311                     993
SIN       310                      97
LOG       282                     108

GCC, IAR, Rowley anyone?

I'd speculate IAR and Rowley will beat CV in most cases. By how much? And GCC will be somewhere between the CV and ImageCraft numbers.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Tue. Jul 5, 2011 - 09:32 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Last V7 release was 7.23. You didnt use my 'wacky timing mechanism' I assume. So my timings include the 100usec timer interrupt servicing.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

So my timings include the 100usec timer interrupt servicing.

Not my problem. But anyway, if your "soft timer" has a resolution of 100us, then just setting up timer1 at /1024 gives you a 4+ second reach with a resolution of 64us. Using /256 gives 1+second reach and 16us resolution and covers all but your longest tests.

So why not just use free-running timer1, and capture the TCNT value at each end of a test run? Better resolution and no interference during the test.

Version edited above.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Tue. Jul 5, 2011 - 09:48 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In case I have missed the plot, please can someone post some authorative code (or a link).

I am happy to massage it into something portable. The important thing is that anyone can compile and run immediately with any AVR or compiler.

Having got a bog-standard, you can make valid comparisons.
Likewise, you can post compiler-specific tweaks and the resultant effects.

Meanwhile the current number bandying is fairly pointless.

I would have guessed that there was little difference between compiler libraries. e.g. +-30%.
I would also guess ImageCraft, CV, Rowley, GCC, IAR.
I doubt that anyone would notice any performance difference.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I am happy to massage it into something portable. The important thing is that anyone can compile and run immediately with any AVR or compiler.

Well, I posted what I used above. Cliff also did something very similar.

I don't know if you can get entirely portable, given that standard chip-include names are different. Hmmm--maybe chip-include isn't needed unless you self-time like Bob does but then you actually have to >>run<< on the same platform.

Quote:

Meanwhile the current number bandying is fairly pointless.

Quote:

I doubt that anyone would notice any performance difference.

David, I tend to rail against Bob's Grand Pronouncements. The come out as ex cathedra but usually when you did down the Emperor has no clothes.

From the earlier links, you can see that this has been going on since 2004 (on this exact topic). I gave my reasons for protesting when I posted the links.

No-one notices? Unless you measure. The numbers above show CV 2x or more in most tests. (As would be expected unless ImageCraft had a re-write in the past few years.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK, Lee. What I did since the last time we talked this afternoon (How bout that Casey Anthony gettin away with murder in Orlando anyway?). I changed to using timer1 with ps=3 so its counting 1us clocks, but more importantly, its only interrupting every 65536 usecs, and my results are Real Close to yours. Guess that 100usec interrupt was really loading it down. And of course, there's nuthin up my sleeve. I'll ul the c source if someone calls me out. Its all the same except the timer and gettics.

fpbench July 5 11 Bob Gardner

begin 1000 overhead loops
9987 tics  0.009987 secs

begin 1000 fp adds
15540 tics  5553 net  0.005553 secs 180082.9 flops 11518

begin 1000 fp mults
15413 tics  5426 net  0.005426 secs  184297.8 flops   

begin 1000 fp mults by 1
15986 tics  5999 net  0.005999 secs  166694.5 flops

begin 1000 fp mults by 0
12847 tics  2860 net  0.002860 secs 349650.4 flops 21853

begin 1000 fp divs
25081 tics  15094 net  0.015094 secs 66251.5 flops

begin 1000 fp div by 1
25788 tics  15801 net  0.015801 secs 63287.1 flops

begin 1000 sin
164625 tics  154638 net  0.154638 secs  6466.7 sins/sec 404 

begin 1000 log
148240 tics  138253 net  0.138253 secs  7233.1 logs/sec

begin 1000 sqrt
134488 tics  124501 net  0.124501 secs  8032.1 sqrts/sec

begin 1000 pow
352751 tics  342764 net  0.342764 secs  2917.5 pows/sec

begin 1000 y=mx+b using longs
3704 tics  0.003704 secs 269978.4 y=mx+b/sec

begin 1000 y=mx+b using floats
22595 tics  0.022595 secs 44257.6 y=mx+b/sec

begin 8000 1/8k block moves (1000K->1 meg)
132876 tics  0.132876 secs  7525.8 K/sec
done. any char to repeat...

Wednesday: these are bogus results because I was calculating the timer count wrong. Disregard.

Imagecraft compiler user

Last Edited: Wed. Jul 6, 2011 - 02:24 PM

Pages