_delay_ms results in huge code

Go To Last Post
14 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello!

I am using an ATMEGA16, and I wonder if there is a good reason for _delay_ms resulting in huge code (over 3KB in plus), or is this a bug?

I have browsed around the code for a while, and I have seen floating point operations, that might be the cause of this. I don't think there is any reason for using floats (long long-s would be more the enough), in a software loop.

the _delay_us() does not result in this problem.

Cheers,

axos88

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Read the second "Note:" on this page:

http://www.nongnu.org/avr-libc/u...

(_delay_ms() is a macro that DOES use floats but it relies on the floating point stuff being calculated and hard coded at compile time - this only happens (a) with optimisation enabled and (b) when the value is a constant that CAN be calculated at compile time)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ah I see.

But I don't understand why there is a need for floating point operations here.

Wouldn't a simple long long counter suffice? I remember writing a software loop based on them, and they had a bigger range too, and better resolution even for greater delays.

Also I think the huge size of the floatingpoint library is a very big problem when programming smaller devices with code that doesn't use floatingpoint operations.

Regards,

axos88

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes but if you use it properly (like it says in the manual) there is no RUNTIME floating point involved. It's all calculated and optimised out at compile time. It has to be this way as F_CPU could be just about any value imaginable. If you look at the .lss once you have switched optimisation on and ensured that constants are used you'll see that what is actually generated is a soft counting loop using only integer values in machine registers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Wouldn't a simple long long counter suffice?

But that basically what _delay_ms() is. It only uses floating point to calculate the integer needed for the loop itself, and when used properly the floating point is done by the compiler, not the final program. The purpose of using floating point is for convenience to the user.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ah ok, now I understand:)

Thanks!

axos88

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's due to a rather unfortunate implementation choice. Rather than define additional delay functions of smaller granularity that all take an integer parameter and use integer math, the designers decided to pass a floating point parameter to achieve delays that are shorter or with more precision than the implied function.
For example, instead of having a _delay_ns()
function you pass in a floating point value to to _delay_us() to get delays with a better than us precision.

Because of this implementation choice, floating point is used but when the argument is a constant *and* compiler optimizer is enabled the floating point math is optimized away.

That said, for all the floating point usage, the delay routines can be very inaccurate especially when the delays are short/small due to the way the calculations do their cycle rounding.

If you need more accurate delays,
see this thread for a drop in backward compatible replacement for the routines.

https://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=30242

If you want to do your own cycle calculations, there is a built in function in the newer gcc compilers that you can call:

extern void __builtin_avr_delay_cycles(unsigned long __n);

The parameter to this function is an unsigned long which is an unsigned 32-bit number of cycles to delay. When you use this function
in your application, GCC will replace the function with "do-nothing" assembly code that will delay the specified number of cycles.

The alternate delay_x.h and __builtin_avr_delay_cycles() functions are accurate to within 1 cycle of the delay requested, which is not the case with the currently distributed routines.
Also the newer delay_x routines ensure that the delay is guaranteed not to be shorter than the requested delay which is not always the case with the routines.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
For example, instead of having a _delay_ns() function

But even an xmega running at top speed could only give you at best ~31ns resolution, so you would have less precision than is implied by that function as well.

The functions are not designed to give you absolute precision, they are designed for convenience. If they are not accurate enough for your purpose, then use something else.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
Quote:
For example, instead of having a _delay_ns() function

But even an xmega running at top speed could only give you at best ~31ns resolution, so you would have less precision than is implied by that function as well.

The functions are not designed to give you absolute precision, they are designed for convenience. If they are not accurate enough for your purpose, then use something else.

The reason that I mentioned a _delay_ns() function was not to attempt to show a function with such precision but to show that if the delay function implementers had chosen to add additional functions with additional precision, the entire need for a floating point input parameters could go away. The input parameters could be integers instead of floating point values and it would then be possible to write the delay functions to use nothing but integer math. This would allow avoiding floating point all together so that even when optimization was disabled, the floating point routines would not be silently and unexpectedly dragged into the users code.

Unfortunately, at this point in time, to get rid of the floating point parameters and all the floating point math, would mean either losing backward compatibility or creating a set of new delay functions that take integer parameters.

=====

The real point I was trying to make is that in my opinion the existing delay implementation is not as good as it could and should be accuracy wise and there was a simple drop in yet fully backward compatible solution readily available that while it doesn't solve the floating point issues, it simply works better than the existing implementation.

With the existing implementation, certain combinations of requested delays and CPU clock rates result in delays that are quite inaccurate, vs what is really possible.

The main problem with the existing implementation is that it doesn't manage the delay cycles in its entirety; it attempts to map the delay cycles into _delay_loop1() and _delay_loop2() sized "chunks" without properly handling the the needed cycles when it isn't a multiple of what these routines do. Because of this, there are cycle rounding errors. These rounding errors can cause the delay to be longer than what is really possible or even worse, shorter than what was requested.

The alternate delay functions available in the thread mentioned above manage the delay clock cycles in their entirety and therefore don't suffer from any sort of cycle rounding errors.
Because of this, the alternate delay routines will offer delays that are accurate to within 1 cycle.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

> It's due to a rather unfortunate implementation choice.

Nope, it was simply chosen that way because I felt it being more natural.
Look at your crystal, it says something like "3.6864 MHz". Now, it seems
most natural to me that you can go on, and say

#define F_CPU 3.6864E6

(Unfortunately, the macros broke that though. I'd consider
this a bug.)

Our computers are powerful enough, so it should be the computer approaching
the way humans think, rather than the programmer already scaling its view
to the deficiencies of the computer (i.e. the requirement to perform calculations
in an integer range).

When used properly, there is no negative side effect of using FP math at compile time.
If you don't optimize, you don't need delay loops anyway. ;-)

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I guess I think differently. (My wife tells me I'm not "normal" all the time... :lol: )
but if I wanted a 140 nanosecond delay, it seems more natural to call _delay_ns(140) rather than _delay_delay_us(0.140)

Floating point aside, the bigger issue with the existing routines is that sometimes they round up in multiples of the basic delay functions cycle counts and sometimes they round down (truncate).

This leads to inaccuracies and unpredictable delays.

Ok, it is predictable, but which way they round varies depending on CPU clock rate and the desired delay. They can even generate a delay that is shorter than what was requested.

What seems unfortunate to me is that the greatest amount of inaccuracy occurs right at the point where it might be most critical - at the very short end of the delay requests.
- Say when requesting nano second delays to satisfy hardware setup times.

So if you call _delay_us() with a delay that results in needing 0.66666 iterations of the 3 cycle _delay_loop_1, you get 1 full iteration or a 3 cycle delay instead of a 2 cycle delay.

On the other side, if _delay_us() calculates 1.66666 iterations of the 3 cycle _delay_loop_1, you still get 1 full iteration because of the truncation, This means you get a delay of 3 cycles instead of 5 cycles.

Which might not be so good.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There's an inherent problem in these macros regarding their granularity not
being just a single clock, and there's no way to tell about how much time
the code will take the compiler introduces in order to setup the loop.
Hopefully, this will improve soon with the advent of the __delay_cycles()
intrinsic function.

I'm all open for improving the existing implementation as well, and I
certainly don't mind adding a _delay_ns() version, except that with the
current clock rates, and the issues outlined in the previous paragraph,
I think calling it _delay_ns() might leave the impression about an
accuracy that cannot be had this way.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How about we move further delay implementation discussion off line to the libc buglist in the thread for bug#17216

http://savannah.nongnu.org/bugs/...

I think this is probably a better place to discuss this.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, that's fine with me. Thanks for filing the bug report.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.