Avoiding floating point math

Go To Last Post
67 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What are the best techniques to avoid using fp math when programming for AVR?

I usually multiply the operands by 10, operate with them on that higher order, then in the end add 5 and divide by 10 to truncate the result to the approximation. As an example, 10/6=1.(6). If I just use the result I get 1, but if I do (10*10/6+5)/10 I get 2 which is a more approximate result. However, this seems highly ineffective to me, and I wonder it here's an easier way. I read in this forum that I could multiply by powers of 2 and then the division operations would be simples bit shifts, but didn't quite understand how that would work. If anyone could elaborate I'd appreciate it.

Thanks,
Rodrigo

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You are talking about "rounding" e.g 1.99 is a lot nearer 2 than 1.

your method is fine. Put it in a macro to make it look tidier if you like.

#define ROUNDED_DIV(a, b) ((((a) * 10) / (b)) + 5) / 10) 
#define ROUNDED_DIV(a, b) ((((a) * 2) / (b)) + 1) / 2) 

You can write as shifts if you like, but the compiler will probably do shifts anyway.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Another approach is to use fixed point arithmetic. In your case multiply by 10 and divide by 6, your result will be 16. So it is 1.6, very close approximation. If you need more precision multiply by 100, so you will get two decimal points. Of course it is more convenient to use powers of 2, after some operations you have to renormalize result, for example after multiplication you have to divide your result by 10, to get only one decimal point, and it is faster to divide by power of 2 than power of 10.
Or if you do some complex math try to use external FPU, you can find such an FPU for AVR with communicates over SPI with math library for avr-gcc.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And how would I go about doing rounding (the +5 I add when I multiply by 10) if I multiply by 16 instead of 10?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In exactly the same way, but add 8.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Unless you are doing a lot of math, doing a floating point calculation once in a while on a 8 bit AVR isn't the end of the world. Try enabling floating point math in avr studio, write a floating point calculation, run the simulator, and see how many clock cycle it takes - not as many as you'd think. I was too lazy myself to write fixed point myself, and used floating point for a lot of calculations in one of my projects, and it works well enough that I haven't bothered to go back and fix it.

Marcus

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The technique that I most often use is to find an even power of 2 scale to multiply my "floating point" values by that will give me sufficient resolution.

For example: 1.3333 x 4.75 = 6.333175

Let say that my answer only needs to be accurate to about 0.5%. I'd use 256 as my scale.

1.3333 x 256 ~= 341
4.75 x 256 = 1216
341 x 1216 = 414656

We know that the number is 65536, 256x256, or (2^16) larger than the actual answer (414656/65536 ~= 6.327148)

6.327148/6.33175 = 0.99927(0.073% accuracy). What about rounding. Let's look again at the numbers.

414656 dec = 0110 0101 0011 1100 0000 binary

Since bit 3 is 0, no rounding will be necessary after the division. If bit 3 had been 1, then the result after division would be increased by 1.

For multiplication, the scales for each number can be different as long as you keep up with them. For addition, the scales should be the same for all numbers so the scale can be divided out in one operation if necessary. Otherwise you will have to go through a normalization process, which kind of defeats the whole purpose of scalar integer math.

Try some examples of numbers of interest for you application. Keeping scales to even powers of 2 will make your life much easier.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

To MarcusW: The problem lies not in cycle numbers wasted by floating point calculations, but in precious space occupied by math library.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes TFrancuz, that's exactly my issue ;)

Thanks guys, you cleared this up for me. Basically I should do the same but multiply and divide by powers of two, as these operations are easier on the micro controller.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ok, I just added the floating point stuff to avr studio. Takes up a little bit less than 3000 bytes. About 2 percent of the Program memory on a mega128.

Marcus

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you are using a smaller Tiny, then you may need to worry about the code size of the odd f-p mult or divide. Or perhaps you are using a size limited evaluation compiler.

Otherwise why not let the compiler take the strain?

#define SCALE_INT(a, f) ((int)((float)(f) * (a) + 0.5)

int result = SCALE_INT(integer, 1.2345);

You need no extra libraries. The compiler will make the appropriate calculation and do the rounding.
Of course you can do this more efficiently by choosing some special integer ratios. But unless you have vey severe time or space issues, why not just write clear code.

The macro might look horrible. But you alter once and get your desired result with every call.

David

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But assume the variable "integer" changes during the course of the program. In that case wouldn't the code with floating point calculations be done by the microcontroller and not the preprocessor?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

To MarcusW: The problem lies not in cycle numbers wasted by floating point calculations, but in precious space occupied by math library.

Editorial comments:

How to avoid floating point? The same way you avoid drigs--just say no! It isn't like avoiding noise. All the inputs to your AVR are going to be small integer numbers--A/D results, timer counts, pin states. No avoiding there. Whatever operations you do with them are up to you--where is the need for the FP? And the output is again a small integer number like a pin state or PWM value or the like.

Certainly you can come up with the counterexample. I like to use a FP mult with a conversion factor to change to display units when there is a wide dynamic range, such as for displaying pressure. The single FP mul turns out to be quite economical in terms of both time and space versus an elaborate fixed-point scheme or other decimal point adjustment.

Perhaps you are doing a formula with trig functions or exp() or the like. Why, back when I was your age I remember doing whole graphics packages with rotation and perspective and shading and only using an occasional redering using the floats and FP trig functions. As with the AVR, the 8088 was just too darned slow to do the "real" rotation so you did it in integer work using approximations and compromises--the cos of 0 or even 2 degrees is sooo close to 1 that you can do the "real time" rotation with the approximations and then redraw with full accuracy when the rotation stops for the final rendering.

Now, let's talk about this "precious space" used by the "library". I'll wager that if your program does an assortment of mults in various sizes and combinations of signedness that the FP mult will take less space than you already have for your integer work. And that the same goes for div. A few hundred words, maybe.

Sure, if you use every trig & trancendental in the library the code size may get bigger. But then you supposedly have a reason to use all those functions and they are justified.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Neojag wrote:
But assume the variable "integer" changes during the course of the program. In that case wouldn't the code with floating point calculations be done by the microcontroller and not the preprocessor?

Yes. It will be done by the AVR. But how often do you do this calculation ? The AVR will do it plenty fast enough. One calculation that takes 1 ms is not going to break the bank. If you do millions of this calculation, then rethink your algorithms. Or use the integer ratio.

Personally I suspect that this paranoia is all due to the avr-gcc _delay_us() feature. If you are doing one-off calculations then time is fairly irrelevant.

Are you using a Tiny2313 ? Are you running out of flash ?

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

David, but again, if you use float in one point the whole math library must be included. In that case it is better to switch to floating calculations. But to use float just to round a number doesn’t make any sense to me.
To theusch: not, it wouldn’t take more space. Integer calculations are simple and thus occupy not too much space. Besides of that most of your subroutines can be reused to do more complex calculations (e.g. extending length of numbers) and in some cases you can use very effective MUL instructions. In my case fixed point procedures which multiply 16x16, 32x32 bits numbers, divide 32x16, 16x8 and make square root occupies maybe 200 bytes or less.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Fixed point calculations has one more advantage: They may be done with higher accuracy that 4-byte floating point with easy.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why not try it and see for yourself ?

Any compiler has internal routines for multiplying integers, casting from float to integer etc. Of course you use some code space for these functions. But you do not pull in libm.a for these internals.

Again. If you have a 7k program in an 8k chip, you must be careful. Else you may have to buy a 16k chip. But a 7k program in a 16k chip is not going to break the bank with the odd f-p mult, and cast function.

1. if you write clear and concise code, both you and someone else can read it.
2. if you do need a scaling operation, then using a macro keeps your project consistent. You can debug or change the functionality in one place.

I may prefer to use fixed point, others may find the floating point method more understandable. Most projects can afford the extra code.

As Lee has pointed out, you start with say 10-bit ADC. You do not need to have 32 bit accuracy. The lsb is fairly iffy anyway.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
Neojag wrote:
But assume the variable "integer" changes during the course of the program. In that case wouldn't the code with floating point calculations be done by the microcontroller and not the preprocessor?

Yes. It will be done by the AVR. But how often do you do this calculation ? The AVR will do it plenty fast enough. One calculation that takes 1 ms is not going to break the bank. If you do millions of this calculation, then rethink your algorithms. Or use the integer ratio.

Personally I suspect that this paranoia is all due to the avr-gcc _delay_us() feature. If you are doing one-off calculations then time is fairly irrelevant.

Are you using a Tiny2313 ? Are you running out of flash ?

David.

I'm using a mega8 and I have about 61% of my flash occupied. However, I just have to write 10.5 in a calculation instead of 10 and the code size increases to 80% during compilation!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I would suggest that 80% full is a safe level. You will always add some more features, and you will probably have enough room for them.

Bear in mind that once a casting function has been linked from the (libc.a) library, you can use it as many times as you like. The calling code is trivial.

It is like using the printf() family. You only have to have one call to suck in the code from the library. But if you can afford this flash space, you are using convenient debugged code. It is not difficult to write your own functions, but why re-invent the wheel ?

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That's "bloating point" alright. Remember 10.5 = 2688/256.

This message paid for by the committee to only use bloating point math when necessary.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

David, but again, if you use float in one point the whole math library must be included.

Quote:

I'm using a mega8 and I have about 61% of my flash occupied. However, I just have to write 10.5 in a calculation instead of 10 and the code size increases to 80% during compilation!

Perhaps with YOUR brand of compiler. [With mine, the entire "anything" is only included if needed and actually used.]

Be sure to write the 10.5 expression so all the other pieces can be integer and only the one FP operation needs to be done.

I'd like to see this expression that took 2k of flash changing a 10 to a 10.5. Please post.

Remember that 10.5 is 105/10 = 21/5.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Remember that 10.5 is 105/10 = 21/5.

?

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

David, you are right, but… it’s only casting, if he add multiplication, division, it will consume more memory. So what is the point of using float just to round the numbers if you cannot use floats for other math operations?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here is a test program from a recent go-round on cycle counts for float operations.

#include  
float a,b,c,d;
void main(){
a = 1.234;
b = 2222222.345;
c = a*b;
d = a+b;

	do{

	} while (1);
}

The total size of this prigram is 341 words built for Mega8/Speed. As the "null" program is 75 words that gives about 270 words for FP conversion, add, and mul. Hardly 2k, and certainly not "use float in one point the whole math library must be included". A recent thread asked whether there were any disadvantages to using the GCC tool chain. The answers were a resounding "Certainly not!". If y'all are GCC users perhaps that thread needs to be revisited.

The c=a*b; FP mult by itself is 213-75=138, hardly a killer as 32-bit integer mul takes about 75. c=a+b; is 240-75=165 for the FP add. That is indeed much more than with integers that can be serviced by the AVR instruction set but still not a flash-sucker -- if there are a number of these operations in the app the flash per-each is only a few words.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Quote:
Remember that 10.5 is 105/10 = 21/5.

?

JC


The thread title is "avoiding". Neojag said
Quote:
However, I just have to write 10.5 in a calculation instead of 10 and the code size increases to 80% during compilation!
which is 2k, and another thread said 3k. Wow. Anyway, JC, why use the 10.5 in the caclulation when the 21/5 would give the same result and avoid the apparent bloat with Neojag's brand of compiler.

Lee

[edit] LOL! Nevermind, JC, I got it now. I guess that replacing 10.5 with 21/2 was not that straightforward after all.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think that the time has come for some real examples.

I know that Lee does enjoy a little wind-up. There is no dispute that you can carefully craft code to perform miracles. You can also encourage code bloat.

You can see your library objects with avr-nm -n
You should be able to recognise the internal primitive functions. You can see where they come from.

I just feel that my bicycle is adequate for the job. I can get to the pub. Hopefully I do not fall off on the way back. Do I need a hand-made bicycle ? Do I need to re-design it for each pub that I visit ? Does it matter if I get there one minute quicker ?

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Example of what's happening to me:

with this function like this:

void ALARMS_timerSetTime(int alarmDelay) {

	int cycles = CYCLES_PER_MS;
	cycles = alarmDelay*cycles;
	TimerRecharge=65535-cycles;

}

I get:
Program: 4090 bytes (49.9% Full)
(.text + .data + .bootloader)

If I just add a .0 just to force floating point, like this:

void ALARMS_timerSetTime(int alarmDelay) {

	int cycles = CYCLES_PER_MS;
	cycles = alarmDelay*cycles;
	TimerRecharge=65535.0-cycles;

}

I get:
Program: 6642 bytes (81.1% Full)
(.text + .data + .bootloader)

It increases about 2.5Kb, and that's what's worrying me ;)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

that's because by adding the ".0" your forcing the calculation to be floating point, and since one of the components of the calculation is a variable, the compiler cannot optimize it away.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, I understand that. What surprised me is that just the fact of using floating point operations takes more than 25% or the microcontrollers flash space! What exactly does the compiler include?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

It increases about 2.5Kb, and that's what's worrying me

Well, the first thing that worries me is that "cycles" is an int--signed and with a max of +/- 32k.

"alarmDelay" is also an int. Given that the two are multiplied together, you lose half your range plus run the danger of overflowing and end up with a negative number.

I have no idea what type TimerRecharge might be.

Now let's really get into no-man's land, and the esoteric rules of C. [Why you didn't just make everything "unsigned int" I have no idea.] You have 65535 - (signed int). Since there is no 65535u I >>think<< (I try not to paint myself into these corners) that the compiler is forced to make it a long and do long arithmetic. For someone worried about program size you could take care in typing your variables and in the operations that you ask the compiler to perform.

Obviously it serves no purpose to change 65535 to 65535.0. Doing so will force several primitives to be needed: convert-to-float, subtract, convert-from-float. But for purposes of checking let's assume that TimerRecharge is an int an all the arithmetic fits.

Remember from above that with my compiler and a Mega8 target a "null" program is 75 words. Adding your function and a call to it resulted in 144 words for the 65535 version, and 405 words for the 65535.0 version. So the 3 FP primitives totaled 260 words. From above we saw that FP add was 165, so the convert functions are about 50 words each--about right.

No, it does not take 2kB, 1k words.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

What exactly does the compiler include?

I would be asking the same thing if it were my compiler.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Generally you are right, but it shows how easy you can get into troubles using float type. I think that it’s worth to switch to float if you want to use this type widely in your program. If you have to do just a couple of operations fixed point calculations are the right solution.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Since there is no 65535u I >>think<< (I try not to paint myself into these corners)

BZZZZZZ! Incorrect. U is a valid suffix to force the literal to be unsigned. It does not need to promote it to a long, since it fits within an unsigned 16bit value.

[edit]
Re-reading what you said, I may have mis-interpreted it. Without the U suffix the compiler would indeed be forced to promote the calculation to long, since 65535 does not fit within a signed int.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

Last Edited: Mon. Jun 23, 2008 - 02:22 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

"alarmDelay" is also an int. Given that the two are multiplied together, you lose half your range plus run the danger of overflowing and end up with a negative number.

Aye, I've retyped them to unsigned ints. There is code in the project that assures their multiplication won't result in overflowing the cycles variable (go above 2^16-1). TimerRecharge is an unsigned int that is loaded in TCNT1 everytime it overflows to obtain a constant period.

Quote:

Obviously it serves no purpose to change 65535 to 65535.0.

Yes, as I've said it was just for testing purposes.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK, I think I'm really missing something here.
This compiles to 134 bytes:

#include 

int main() {

	int moo;
	moo=2*6;

	while (1) {
	   TCNT0=moo;
	}
}

This here compiles to 134 bytes as well.

#include 

int main() {

	int moo;
	moo=2.4567*6.2231;

	while (1) {
	   TCNT0=moo;
	}
}

Shouldn't the decimal part in the operands force the compiler do use floating point math an thus increase the code size? I'm using the same compiler settings as in the code I posted first, and doing the exact same thing (adding a decimal part) doesn't seem to do anything here.

However, if I declare moo as float it compiles to 2466 bytes.

I'm confused :S

Edit: Using the optimization flag -O1 brings my code (the "float moo" one) size down from 2466 bytes to 102 bytes. Now I'm really confused...

Last Edited: Mon. Jun 23, 2008 - 02:29 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've found out that the fp routines in the imagecraft compiler take much less room than the fp print out, so you can go ahead and do the fp operations, just convert em to ints (maybe scaled up to see the decimal part) to print em out during debug. Using printf pulls in about another 4k of stuff in addition to a couple of k for the math routines themselves.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Neojag wrote:

Shouldn't the decimal part in the operands force the compiler do use floating point math an thus increase the code size? I'm using the same compiler settings as in the code I posted first, and doing the exact same thing (adding a decimal part) doesn't seem to do anything here.

It won't grow, because both values in the calculation are known at compile time, so the compiler is able to calculate the result, and use only that in the generated code, so no floating point math is done at runtime.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh, I understand! Thanks!

Indeed, if I multiply 2*var and 2.5*var the code size increases from 104 bytes to 2058 bytes. So apparently the simple use of fp math does drag with it a lot of code in this compiler (with my settings atleast). Declaring a variable as float instead of int also increases the code in 2304 bytes. So apparently with these settings and in this compiler using any form of floating point in calculation at runtime always increases the code by about 2K, right?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Playing games with the complier on data types, or depending on the compiler to "figure out" what you wanted on data types is very dangereous game, and will likely get you into trouble. Sometimes the code may actually seem to work (maybe for days, weeks or years) until you get a variable that's outside of the range of what the program would normally see, then a variable overflow or underflow occurs and everything spins out of control.

Here is some good C advice.

1. Be explict in variable type declaration,and I don't mean slapping a "char","int", or "long" on the front some variable. Explict means signed/unsigned and size. Examples:
unsigned char
signed char
unsigned short
signed short
unsigned long
signed long
float
double

Although the actual data sizes for these are machine dependant according to K&R, I would be willing to bet that you couldn't find a "C" complier that did't treat char as 8 bit, short as 16 bit, and long as 32 bit.

The point of all this is to make sure that you get the results that you expect. If you have to change data types (often you will have to), do the type casting yourself, and don't depend on the compiler to change data types for you.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are you guys including libm.a and libc.a in your project? If I'n not mistaken, one of these two have optimized floating point routines. I always include both of them in all my projects.

Felipe Maimon

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

There is code in the project that assures their multiplication won't result in overflowing the cycles variable (go above 2^16-1).

You will get an "overflow"--rather, a jump to the negative--at 2**15 not 2**16. Thus, as I said, you lose half your range.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just to demonstrate what avr-gcc will do, I compiled this test code with different lines commented.

text = 154. With neither scale macro
text = 374. With INTSCALE macro. does not round anyway.
text = 734. With SCALE macro. looks simple, does rounding

#include 

#define INTSCALE(x, mul, div)	((int)((long)(x) * (mul) / (div)))
#define SCALE(x, f)	((int)((float)(x) * (f) + 0.5))

int main()
{
	DDRB = 0xFF;        //so we can see LEDs
//	PORTB = INTSCALE(PINA, 101, 177) ^ 0xFF;
	PORTB = SCALE(PINA, 0.57062) ^ 0xFF;
	while (1) ;
}

No doubt I could compile this with CodeVision or ImageCraft, and I would expect a similar code "bloat". But personally I find one macro more readable than the other.

I do not want a compiler war. They will all add 100-500 bytes.
I really would like to see an example program where I can achieve 2k "bloat" with a f-p cast, mult and add.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
#include 

int main() {

   int moo;
   int meh=4;
   moo=2*meh;

   while (1) {
      TCNT0=moo;
   }
}

Try this. I my compiler (if optimization is set to -O0) it compiles to 104 bytes. If you change the 2 to 2.4 (for example) to force floating point, it compiles to 2058 bytes.

Doing the same thing with optimization flag -O1, the code size doesn't change (I believe it's because as meh is known and initialized, the calculation is done a compile time). However, if I don't initialize meh, the code compiles do 2058 bytes again.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Look at your code and hand-trace it on the proverbial cigarette packet.

You know that moo will be 8 or (int)9.6 in advance. So does the compiler. (it has its own supply of cigarette packets)

Now using -O0 with avr-gcc is a bit of an academic exercise. It is a non-smoker.

You really need to produce an example with a realistic optimisation level. Or alternatively use "volatile" variables in your calculations. All the sfr's are volatile, so my example should do real calculations. Although I must admit that I have not simulated it.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My results (built for a mega16):

1) -O0, constant = 2

Program:     192 bytes (1.2% Full)

2) -O0, constant = 2.4

Program:    2294 bytes (14.0% Full)

3) -O0, constant = 2.4, libm.a added to link

Program:     820 bytes (5.0% Full)

Conclusion: you aren't using libm.a - this is a "Very Bad Idea(tm)"

BTW all C compilers will have a "cost" for using FP, buit for any particular compiler it's generally a fixed cost (to make room for one copy of the necessary support routines) and once you've paid it once you can use FP to your heart's content though it may not be the most efficient (size or speed) solution and it's particularly expensive in the smaller controllers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cliff,

The correct conclusion is that it is a compile-time known result for his example.

I have just tried my example without libm.a

It now does text=2692 with my -Os

So my conclusion is that avr-gcc should and does fetch its internal f-p primitive functions from libc.a

It is just that either the libc.a versions are crap, or they are all in one object module that suck in unnecessary modules. I am too idle to look any further.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
So my conclusion is that avr-gcc should and does fetch its internal f-p primitive functions from libc.a

It is just that either the libc.a versions are crap, or they are all in one object module that suck in unnecessary modules. I am too idle to look any further.


It's all of that - in fact some of the transcendental functions are just plain wrong in libc.a as they are a first pass attempt for a 32 bit implementation. Meanwhile libm.a is hand crafted AVR code optimised for the job (and very recently completely re-written to do an even better job).

The failure here is in no "-lm" being passed to the linker. If one uses Mfile to generate Makefiles then its default LDFLAGS has the -lm. Also if one uses a "modern" version of AVR Studio then that too defaults to including -lm (I got this fixed by Atmel a while back). The fact that OP is clearly not using -lm suggests he's either got a "roll your own" Makefile or is using an outdated version of Studio (or at least an out of date .aps project file)

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The traditional libc.a should contain all the necessary routines for internal maths and casts.

libm.a should contain the functions from

I cannot think of any reason for an internal transcendental routine. Surely libc.a should not have any of these.

Since a platform build has conditional modules anyway, I should have thought that the correct internal routines in libm.a should be moved into libc.a

This would solve the problem forever. And stop Lee giggling.

I have used a compiler with a choice of 32-bit or 64-bit doubles. The internal routines had different names for the necessary functions. After all you need a float_to_uint16() and a double_to_uint16(). If doubles are the same as floats, the compiler uses the 32-bit version.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

David,

Try an avr-nm on (for example) C:\WinAVR-20080512\avr\lib\avr5\libc.a and you'll see what I mean.

This is a widely known problem with avr-gcc and as long as folks stick to -lm they need not worry about any of this. (If there was an error is was in AVR Studio's gcc-plugin not defaulting to including -lm in the link)

Cliff

EDIT just a small sample:

libc.a:sinh.o:         U __fp_powsodd
libc.a:sinh.o:         U __subsf3
libc.a:sinh.o:         U exp
libc.a:sinh.o:         U inverse
libc.a:sinh.o:         U ldexp
libc.a:sinh.o:00000000 T sinh
libc.a:sin.o:         U __fp_rempio2
libc.a:sin.o:         U __fp_sinus
libc.a:sin.o:00000000 T sin
libc.a:sqrt.o:         U __fp_mpack
libc.a:sqrt.o:         U __fp_nan
libc.a:sqrt.o:         U __fp_norm2
libc.a:sqrt.o:         U __fp_splitA
libc.a:sqrt.o:00000008 T sqrt
libc.a:square.o:         U __mulsf3
libc.a:square.o:00000000 T square
libc.a:tanh.o:         U __addsf3
libc.a:tanh.o:         U __divsf3
libc.a:tanh.o:         U __fp_powsodd
libc.a:tanh.o:         U __subsf3
libc.a:tanh.o:         U exp
libc.a:tanh.o:         U ldexp
libc.a:tanh.o:00000000 T tanh

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Surely sin.o or sinh.o should not be in libc.a

libc.a will be avr-model specific

libm.a should be generic

I expect there may be some good reasons for these anomalies.

Meanwhile, a default -Os -lm is going to suit most people most of the time. And on that basis, avr-gcc is perfectly adequate for sensible use of C arithmetic.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Surely sin.o or sinh.o should not be in libc.a

Not what the manual says:

http://www.gnu.org/software/libt...

The comments in this are also interesting:

http://www.delorie.com/djgpp/doc...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cliff,

I must confess that I am no expert with this. There may well be an official C99 ruling on where object modules should live.

When C first evolved, it was a small language which gained all its functionality from external libraries. It is unwise for me to assume that this would not change over time.

I do note that this is "gcc" documentation. "gcc" generally does "features" in its own way.

Even so, I am sure that there is no requirement for libc.a to have modules that force the linker to include unused functions.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I do note that this is "gcc" documentation.

I felt it was justified in this thread by the OP's use of (the usual sign that GCC is in use here)

(no doubt someone will be along in a minute to tell us to bugger off to the GCC forum, though there's wider principles being discussed in this same thread - so maybe not)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
(no doubt someone will be along in a minute to tell us to bugger off to the GCC forum, though there's wider principles being discussed in this same thread - so maybe not)

No, don't! I can't poke at posted examples there.

Quote:
And stop Lee giggling.

Te-he, te-he.

Anyway, I can't quite figure out from the responses above: If you take one of these test cases and introduce a simple FP or a few as with the function that I tried with my compiler; and you properly configure your GCC with -Os and -lm; do you add a few hundred bytes to the size or a few thousand? In other words, does raising the FP flag always result in a 2k min hit in GCC?

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The OP's test case above, when built for a mega16 leads to 628 bytes of additional code when 2.4 rather than 2 is used as the multiplying constant value but the test was a little invalid because it WAS a constant multiplier so actually, if I'd built with anything but -O0 then the whole thing would have been calculated at compile time and have no runtime overhead. However the FP code that is being pulled in even when I build the "test app" as -O0 will have been built for -Os (or at least SOME form of optimisation) so that 628 byte "cost" probably is representative for a floating point multiplication.

(but it sure ain't 2K!)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No Lee.

A C linker usually links like this:
crts.o modules.o... lib???.a... libc.a

Each library is just an archive of object.o ...
The linker scans the library for any items that it needs.
If one of these objects contains several unused functions, they are hauled in too. This is the standard procedure.

Now the f-p primitives exist in both libm.a and in libc.a but apparently the libc.a versions are crap.

If you do not ask for libm.a then you get libc.a versions. If you had asked for libm.a, then the linker would stop looking after it has loaded the libm.a versions.

Personally I would only expect to use libm.a if I used functions. I would expect the f-p primitive math to work without any special instruction.

However by magic, if you ask for libm.a then you get a sensible C compiler behaviour. Most avr-gcc users will get libm.a automatically from Studio4.

Of course CodeVision is so "stupid" that it removes any unused functions even if you have written them in your program yourself. ( it is not unique in this --- many compilers use intelligent linkers ).

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
many compilers use intelligent linkers

Like GCC's -fwhole-program you mean ? ;)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello Neojag... You have observed that any invocation of any fp computation at run time incurs the penalty of linking in several K of subroutines. If you compute the 'bytes per operation' it is horrible! But if the fp libraries fit in the flash, you can go ahead and use a couple dozen more, and that 'bytes per operation' computation doesnt go up much at all! And if the fp routines are fast enough for your application.... man in the loop turning a knob for example... 10 or 20 updates a sec is faster than he can read a display... might as well use em. The decision point is whether you are making 10,000 of these gizmos, and working on the custom tweaked scaled integer speed and sized optimized version lets you use a smaller AVR that is $1 cheaper, you have saved $10,000. Hopefully for the boss, this is a saving over a week or so extra programmer time.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Actually NeoJag said it was a mega8 so with 8K to play with I'd have thought the 500..2,000 bytes for an FP lib might be a price worth paying. It's when you get down to the 2K critters that you may not be able to afford FP. 'course then again I guess it depends what other things you had planned for that 8K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:
many compilers use intelligent linkers

Like GCC's -fwhole-program you mean ? ;)

But gcc does not default to using this switch, does it.

I would expect a standard behaviour with the default settings. If I needed something special, I read the documentation, try the switch, check the behaviour.

This thread has all come from the OP invoking avr-gcc in a sensible way. ( at least in my opinion )

If I do not use say graphics functions, then I would not choose to link with the graphics library. Or so you might think !!!

Bob, please compile my trivial example. See your behaviour. I would guess you will not get a massive code increase.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
//file prenticefptest.c
//June 23 08 Bob G

//size 1506 with both
//size 1052 with scale only
//size 550 with intscale

#include 

#define INTSCALE(x, mul, div)   ((int)((long)(x) * (mul) / (div))) 
#define SCALE(x, f)   ((int)((float)(x) * (f) + 0.5)) 

void main(void){ 

   DDRB = 0xFF;        //so we can see LEDs 
   PORTB = INTSCALE(PINA, 101, 177) ^ 0xFF; 
//   PORTB = SCALE(PINA, 0.57062) ^ 0xFF; 
   while (1) ; 
} 

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I would expect a standard behaviour with the default settings

The default behaviour of:

$ avr-gcc test.c

is QUITE different to the way in almost everyone uses it. In fact what sets the "default" is more likely the Makefile being used than the compiler. (and the latest Mfile delivery includes a whole-program template as well ;) )

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok. so you must use "-mmcu=atmega32" or whatever your AVR

I generally use a very cut-down makefile, or Studio4.

I would expect a new user to use Studio4. I would expect her to change nothing from the default. Studio4 demands the AVR model.

It would make life easier for her if things were a little more consistent.

David.

p.s. I use -Os -mmcu= -Ipath -DF_CPU= -gdwarf-2

Last Edited: Mon. Jun 23, 2008 - 05:28 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
#include  // Mega88 target, speed, CodeVisionAVR 1.25.7a
//file prenticefptest.c
//June 23 08 Lee T

//size 549 words (1098 bytes) with both
//size 430 words (860 bytes) with scale only
//size 213 words (423 bytes) with intscale only


#define INTSCALE(x, mul, div)   ((int)((long)(x) * (mul) / (div)))
#define SCALE(x, f)   ((int)((float)(x) * (f) + 0.5))

void main(void){

   DDRB = 0xFF;        //so we can see LEDs
   PORTB = INTSCALE(PIND, 101, 177) ^ 0xFF;
   PORTB = SCALE(PIND, 0.57062) ^ 0xFF;
   while (1) ;
} 

Null program is 84 words (168 bytes).

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So ImageCraft, CodeVision and avr-gcc (with -Os and -lm ) all behave quite acceptably.

And with my Studio4, it defaults to no "-lm" so I get:
text=2970 for SCALE macro
text=374 for INTSCALE macro

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I suppose that "acceptable" is a rather subjective term.

According to the best case synarios presented, additional code usage using floating point ranged from 437 extra bytes to 502 extra bytes with worse case numbers much higher.

I've written a lot of simple, but useful programs that didn't use this much memory for the entire program.

Now, why would you use more memory than you have to ?
Why would you take the penalty in speed ?
Is it because 437 extra bytes doesn't seem too bad ?

If you have ever had to deal with a marketing department, you know what "feature creep" means. After the boards are laid out, and you are ready for production, the last minute addition of a software feature may put you over the edge. You might wish you had those 437 bytes back.

At some point, trying to figure out clever ways to eliminate a byte or two of code has diminishing returns. As always, use the best tools that you have available. Hopefully, your best tool will be your brain.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

additional code usage using floating point ranged from 437 extra bytes to 502 extra bytes with worse case numbers much higher

Well, this is a contrived example to do a couple of FP operations to ensure that a build can be made WITHOUT pulling in a "whole library".

Agreed, I've got very few apps out of very many with FP.

However---

1) If one does a number of operations of the same type in an app, the size of the FP primitives becomes not as significant. In fact, though somewhat larger they are on the same scale as the multi-byte integer equivalents.

2) I'm somewhat a believer, in certain cases. I had a hand-crafted packed app and needed to display pneumatic pressure in various units. They dynamic range is quite large among the common units. I had an elaborate fixed-point and scaling routine to pick the "best" 4 digits for the display and locate the decimal point. It was ugly and complex and usually worked.

I found that a single convert-to-FP of the internal normalized integer "counts" and a single FP mult by a conversion factor from a table was smaller and faster than my crafted solution.

Quote:
Hopefully, your best tool will be your brain.

And that is what I feel that I did--abandoned a week or so of hard work for, essentially, a one-line solution.

Now, if the brain was used in tweaking 10 to 10.5 as mentioned earlier...

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.