Making _delay_ms work well with variables

Go To Last Post
29 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The poor performance of _delay_ms when called with a non-constant is an FAQ. So I propose a macro which will call _delay_ms if passed a constant or floating point argument, if passed a integer non-constant it will loop around _delay_ms(1).

#define _delay_ms(__ms1) do { \
	typedef __typeof__(__ms1 + 0) __T; \
	__T __ms = (__ms1); \
	if ((__T)1 / (__T)4 != 0 || (__builtin_constant_p(__ms) && __ms < 6554)) \
		(_delay_ms)(__ms); \
	else \
		while(__ms--) (_delay_ms)(1); \
} while(0)
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, impressive, but why the hassle?

Anything but a few nop's worth of delay is good only for the beginners' first attempts - and it's better for them to be kicked into #$%^ as soon as possible, IMHO.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The way I see things is that if it is reasonable enough to do something with delay loops, then there is no reason to fire up a timer and poll for it lapse in a loop.

Also sometimes required delays are so short that it would be less accurate or impossible to perform if it needs to set up a timer.

Heck, even now I have a 36MHz ARM bit-banging a 100kHz I2C bus with delay loops. Also in few projects I made the main loop work at 1kHz rate with 1ms delay loop. And I instructed my friend how to do a NEC IR code transmitter for his car radio (by wire, not IR) - you bet, with delay loops. So what if code transmitting takes 120ms if there is nothing else to do.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Also sometimes required delays are so short that it would be less accurate or impossible to perform if it needs to set up a timer.

(_delay_ms)(1)

This is not a short delay. Any setup for it would be in the microsecond range.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, there are dozens of ways to make a delay; burning cycles is one of them.

All I want to say is, that IMHO it is a good thing that a pre-chewed delay function fails spectacularly now and then - it makes the newbies to think, however unfashionable is thinking nowadays. And I would not make this easier for them. The experienced certainly don't need more than what's available now.

But I can't deny I don't know how to teach.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
All I want to say is, that IMHO it is a good thing that a pre-chewed delay function fails spectacularly now and then - it makes the newbies to think, however unfashionable is thinking nowadays. And I would not make this easier for them. The experienced certainly don't need more than what's available now.

But I can't deny I don't know how to teach.

Ah, the old "Throw them in the water to teach them to swim" school, eh Jan? :wink: Perhaps they should be shark-infested waters as well. :twisted:

If it was my choice I would prefer a more guided approach to teaching. Some way of calling out to newbies, "Beware of the bear trap over there!" seems better to me than letting them walk through the woods without light and, when they get a leg caught in a bear trap, laugh and say, "Oh yeah, that bear trap is there for your own good." OTOH, I understand the reluctance to institute a poor delay mechanism, which would cost all programs, just to help the newbies avoid a common mistake.

Back in the day, Pascal was created to help newbies learn to program by instituting "strong typing" (which, C++ later picked up). "Real Programmers" (tm) deplored the use of Pascal because they knew what they were doing. Or so they thought.

The real problem here is that we have newbies trying to learn AVR GCC coding (and embedded computing in general) by approaching the subject from practically every way that can be tried. While we have some paths laid out, there are newbies that will tackle our beloved subject on their own, without reading the many tutorials and threads we have set out for them.

Perhaps Jan is right. Having them catch their leg in a bear trap lets us all know that they are off the beaten path and need to be led to the True Path Of Enlightenment. *sigh*. :?

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Regardless of how true the path is, I've just never comprehended the fascination with elaborate delays in an AVR8 environment. Yes, I'll use a delay at startup to let things settle--often flashing the LEDs and the like to signal "alive". Certainly not a critical timekeeping app. Yes, I use a delay of a few microseconds once in a while when I've got a slow transistor or the like. Again not critical, and usually can be replaced with some useful work such as calculating the next byte.

What a tempest in a teapot. Just as "GoTo Considered Harmful", so should delay. :twisted:

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

stu_san wrote:
Perhaps they should be shark-infested waters as well. :twisted:

Only if the sharks have lasers!

- Jani

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

stu_san wrote:
Ah, the old "Throw them in the water to teach them to swim" school, eh Jan? :wink:

I said: I dunno. My impression is, these kids see these shiny and comfortable cars on the highway, and then they get one and try to drive it all around. They concentrate on the technicalities and believe blindly what is said in the glossy shiny ads.

All I want is to stop them for a while. Open the eyes wide. Learn to appreciate the comfort the cars provide, and learn the price paid of it - including the portion one does not want to see (pollution, health problems from lack of excercise etc.) Learn the variety of environments and the appropriateness of the choice - a wagon, a sport car, a truck... and, forgive me, sometimes even the old-fashioned walk.

Jan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

stu_san wrote:
Ah, the old "Throw them in the water to teach them to swim" school, eh Jan? :wink: Perhaps they should be shark-infested waters as well. :twisted:

You do have to admit, that would REALLY give a person a good incentive to learn!

 

Clint

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am rather for an upgrade of the delay routines but not because I feel there is anything inherently wrong with them. In fact, I have always felt that what ever inadequacies are minimal at best. However, anything that can be done to improve library code ( that will be used in fact ) is, as far as I am concerned, a good thing. I personally avoid busy wait delays for two reasons, one good and one bad. The good reason is that most of my projects are fairly tight control loops and jitter in timing is noticable in the final result. Using a hardware timer has a positive effect on performance. On the other hand, I also have something of a personal distaste for busy wait loops. There is just something about programming a uC to do nothing ( with all its might none the less ) for what can be substantial portions of its run time that simply rubs me the wrong way. Aside from those two genuinely ideosyncratic reasons I do not feel that I have a reasonable reason to argue against the use of busy wait loops out of hand. That is not to say that there are no arguments against, only that they are not unequivocally bad.

For me there are really only two major questions: 1) will an update have a positive impact on the use of the library and 2) is there any reason to expect that it will be detrimental to programming practices. As to the first question, I think the answer is an unabashed yes. Removing quirks always makes use of a library easier and, assuming such improvements do not have a detrimental effect on processing load, also make the use of the library nicer in general. Perhaps the best place to use something like the delay macros is when a quick ( as in programming effort ) delay is needed and there are only minimal accuracy and load constraints. As to question number two, it may be harder to argue that making things easier is a good thing given the already mentioned predilection that so many have to escew thinking. Nevertheless, I would still argue that the problem in most code that precipitates the delay FAQ complaints is not really a delay problem but a code architecture one. In most cases the code is designed in such a way that it is almost doomed from the begining, long delays in interrupts, inefficient utilization of RAM and peripherials and lack of understanding of the problem are all par for the course. It seems unfair to blame the delay macros for all the problems that these beginners have with their code given the above. Certainly there is a place for sink or swim instruction, I find that projects at the limit of your abilities are the best teachers. Nevertheless, having refined tools, never hurt anyone. Besides, there are much more impressive tools, I just cannot convince myself that even the nicest busy wait delay is a shiny, comfortable car, maybe an old pickup truck that refuses to die?

Perhaps, instead of worrying about the application of improved delay macros it makes more sense to look at how to teach people when they are not appropriate. Ideas?

Martin Jay McKee

As with most things in engineering, the answer is an unabashed, "It depends."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Martin, I would disagree that the inadequacies in the existing delay.h are minimal.

Consider this:
- They can be too inaccurate at the very short end (ns range) where I would think you would most need to use a busy loop.
- They also don't ensure that you will get a delay of at least as long as you request.

These 2 are absolute deal killers if you want to use the functions for a portable method of providing hardware setup time delays that require busy loops because they are way too short for any sort of timer.
This is because depending on your cpu clock rate and the delay you ask for, you might get way more delay than what is possible or worse, you might get less delay than you asked for.

Hans-Juergen Heinrichs delay_x.h replacement solves both of these issues in a fully backward compatible way. -- see his delay_x project

That said, I think that a macro like this is a step in the right direction: to automatically allow the use of variables as an argument, albeit with reduced accuracy.

What hurts this macro is part of what I really don't like about the existing delay.h functions, and that is the use of a floating point parameter.

As useful as this new macro is, it can't support the floating point argument.
Which, to me, makes it a bit inconsistent and confusing.
For constants, it supports a floating point parameter, but for variables they must be integers.

In my mind floating point was never needed for delay functions and is the cause for many complexities, limitations, and issues.

I mean if you want 1.5ms you could call a microsecond delay function with 1500
Or if you want 200ns you could use a nanosecond delay with an integer parameter of 200 rather than ask for 0.200 microseconds.

Eliminating the floating point arguments would also allow creating macros/functions that don't depend on optimization to function properly.

What is truly tragic in the existing delay.h functions is that for all the floating point calculations being done, they don't ensure that you get a delay of at least as long as you asked for because of some rounding issues as the floating point value is truncated back to an integer value to call the basic delay functions.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

(ns range)

Eh? Even at 16MHz a single cycle is 62.5ns so how could you ever achieve a granularity in the ns range? At 1MHz the cycle is 1000ns and there's no hope of ns delays at all - they'll all have to be us or ms granularity.

Were you perhaps talking about accuracy at the 1,2,3.. us range?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The apparent reason the original macros takes in floating point is that it calculates the delay loop counts which need to be 8-bit or 16-bit integers. As the macros take in amounts of time and the amount of loop counts is based on crystal frequency, with a 8MHz clock your loop time is 375ns, so everything calculated is based on 0.000000375 second time units.
Someone might have a UART crystal like 7372800 Hz, which makes the timebase 135.63368055555555555555555555556 ns. Please explain if timebase is not an integer, what harm it does if the time is a float too, as compile time constants can be used as integers in the assembly code.

If you want to have a function that works with variable integer delays, you have to convert the time given into clock ticks at run time. This requires either multiplication or division, both of which are slow operations, which may take more time than the delay you wanted.

For really accurate use, I bet the good old _delay_loop and _delay_loop2 macros are still provided, so you can calculate your own loop counts to get exactly the time you need and can tune if you want to get at least or at most the delays you want.

I agree that the delay macros are a hassle, but making universal delay macros that suit everyone is impossible. It must be decided what is good enough and live with it.

Edit: Clawson, I think it was just that he does not like floats and would rather say delay_ns(1500) than delay_us(1.5) because in his opinion it would avoid floats, but even delay_ns(1500) would not avoid floats, as timings must be calculated compile time to integers before use, which avoids floats anyway.

Maybe it is just the macro definition that confuses people, as it uses floats.

Edit2: and since it uses perfectly normal float to integer truncating, you can just add +0.5 there if you want rounding. See, there is always a solution.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:

(ns range)

Eh? Even at 16MHz a single cycle is 62.5ns so how could you ever achieve a granularity in the ns range? At 1MHz the cycle is 1000ns and there's no hope of ns delays at all - they'll all have to be us or ms granularity.

Were you perhaps talking about accuracy at the 1,2,3.. us range?

Nope.
I really am talking about the ns range.
(Think 50 to 300 ns range or even low us range)
I'm not talking about ns or us granularity.

What I'm talking about is getting as close to the amount of delay time you ask for without getting less which can be critical for hardware setup times.

Obviously it is only possible to get a delay that is a some multiple of the cycle time of CPU clock.

A good delay function (at least one that is used for hardware setup timing) will give you as close to the delay asked for as is possible without undershooting.

The way the delay macros are currently written, there are 2 issues.
1 that causes a delay to be longer than it should and 1 that causes the delay to be shorter than what was asked for.

The general problem is that the _delay_xx() functions attempt to map the applications request of delay time into some number of iterations of a basic delay function.
That's all fine, but they don't properly deal with the fractional portions of cycles that don't map evenly into a basic delay loop.

In the specific case of ns type delays,
both of these errors relate to how the _delay_us() maps the requested delay time into basic delay function _delay_loop1() iterations.

The _delay_us() calculates the needed number of _delay_loop1() iterations - *NOT* the number of actual cycles.
This calculation is done as a floating point number, if the number is less than 0, it use 1 otherwise it uses the integer truncated calculated value.

And that is the problem.
This methodology results in errors.

The "too long" issue is when _delay_us() function rounds up to a minimum of 1 loop of the basic delay function _delay_loop1() which is 3 CPU cycles.
Which means the shortest delay achievable is
3 cycles, even though it is possible that 1 or 2 cycles would have satisfied the delay request.

The "too short" delay is because the _delay_us() function truncates the calculated
loop iterations to an integer when it is larger than 1.
This means that the delay can be shorter than what was requested.

So if say 1.6666 loop iterations were needed to satisfy the delay request, only 1 loop iteration would be done. Which means that instead of the needed 5 clocks of delay, only 3 clocks would be done.

==================================

So some specific examples.

Overshoot:
----------

Suppose you have 16Mhz clock and ask for a 100ns delay.
You end up with a 3 clock delay rather
than a 2 clock delay. Or 187.5ns vs 125ns

Or say a 1Mhz clock and ask for 1us delay.
You end up with a 3 clock delay or 3us instead of a 1 clock delay which would be the exact 1us requested.

Or say 20Mz and ask for 50ns delay.
You get 150ns instead of 50ns.

Undershoot:
-----------

Suppose you have a 16Mhz clock and ask for a 300ns delay. This translates into 1.6 iterations through _delay_loop1(). It gets truncated to 1 which means you get 3 clocks at 187.5ns instead of the needed 5 clocks at 312.5ns.

Or say a 1Mz clock and ask for a 5us delay.
This would be 1.666667 iterations which truncates to 1 and so you get 1us instead of 5us.

--------------------------------------------
So that is the problem I'm referring to.

I ran into this when trying to write a portable GLCD library that I want to automagically work for everyone regardless of the clock speed they may happen to be using.

Certain clock speeds fail because of the undershoot.

To do properly delays, the entire number of cycles has to be considered. Loops are fine but the fractional part must also be accounted for.

Hans-Juergen Heinrichs delay_x.h replacement for delay.h handles all the delay cycles properly and gives you the shortest possible delay without giving you a delay that is shorter than you asked for.
It is a direct drop in replacement for delay.h

===========================================

The reason I said that floating point was an unfortunate choice, is that if the functions were defined in terms of integers, then other options become available such as using simple integer math to calculate the number of cycles which could be used with the new gcc built-in function:
extern void __builtin_avr_delay_cycles(unsigned long __n);

Which generates inline code to delay the exact specified number of cycles.

Also things like the new macro definition/wrapper that kicked of this thread could sit on top of integer based delay functions.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But if you offered _delay_ns() a naive user might assume that meant they could use arbitrary values with it like _delay_ns(10) or _delay_ns(100).

On 1MHz both would give 1000ns in fact (assuming round up?)

On 16MHz I guess 10 might actually round down to 0 and offer no delay perhaps? And 100 would presumably round down to the nearest 62.5ns ?

I just don't see the point in offering something that simply cannot be delivered? (or with HUGE inaccuracy compared to what was requested)

OTOH all AVRs should be able to do a _delay_us(1) (or at least something pretty close)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Jepael wrote:

For really accurate use, I bet the good old _delay_loop and _delay_loop2 macros are still provided, so you can calculate your own loop counts to get exactly the time you need and can tune if you want to get at least or at most the delays you want.

Using the basic delay loops is the entire cause of the overshoot/undershoot errors.
To get accurate delays, you have to calculate the number of cycles in their entirety.
And if you calculate the actual cycles yourself, you might as well call the built in __builtin_avr_delay_cycles() function.

Jepael wrote:

I agree that the delay macros are a hassle, but making universal delay macros that suit everyone is impossible. It must be decided what is good enough and live with it.

I don't think they are a hassle. I just want them work properly at least for constants.
I think that the delay_x.h macros would definitely qualify for "good enough".
They actually generate proper delays for constants.

A newer/simpler version of delay.h could be written to use the __builtin_avr_delay_cycles() function and let GCC do all the code generation.

There wouldn't be any issues with variables as the built-in function will only take constants.

Jepael wrote:

Edit: Clawson, I think it was just that he does not like floats and would rather say delay_ns(1500) than delay_us(1.5) because in his opinion it would avoid floats, but even delay_ns(1500) would not avoid floats, as timings must be calculated compile time to integers before use, which avoids floats anyway.

Maybe it is just the macro definition that confuses people, as it uses floats.

No not really. My beef is that I believe
that you can calculate the necessary cycles using only integer calculations rather than using floating point so that should the optimizer not be enabled, you won't unexpectedly drag in the floating point library.

Jepael wrote:

Edit2: and since it uses perfectly normal float to integer truncating, you can just add +0.5 there if you want rounding. See, there is always a solution.

Not really. To ensure you don't undershoot, you can't use "normal" rounding. You must always round up the next integer if there is any fractional component.

But using the existing simple logic with basic delay loops and then simply always rounding up the iteration count really isn't the answer either.

The proper way is that the total cycles must be considered and that is what delay_x.h does.

delay_x.h is nice because it is a drop in and works on older versions of gcc.
Alternatively for the newer GCC releases, we could re-write the functions in delay.h to calculate the cycles and call the built in GCC function.
Which might be better as it will totally preclude any attempt to use variables.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

> The apparent reason the original macros takes in floating point is [...]

...that I found it most natural that you could express things to do in
a natural way. That includes to me the ability to tell your computer to
"delay by 2.5 ms" (which is to me *not* the same as telling it to delay
by 2500000 ns -- I'd have to start counting zeros in the latter), as well
as to see that your crystal is labelled as 7.3728 MHz, so you might want
to write #define F_CPU 7.3728E6 rather than having to figure out how many
zeros you are going to append in order to specify it in a 1 Hz resolution
(that is beyond its accuracy anyway).

_delay_us and _delay_ms are helper functions, so I wanted them to help as
much as possible. Offloading stupid power-of-ten shifts from the developer
to the computer was (in my book) part of that job. Anyone not interested
in those helper functions is still free to pick the underlying inline
functions from . (These functions have been there first,
and it became a frequently requested item to have a more elegant and more
natural way of expressing a delay *time*.)

(Alas, the F_CPU thing has been broken later by the way the
macros are implemented, which don't allow for floating-point values in F_CPU.)

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
But if you offered _delay_ns() a naive user might assume that meant they could use arbitrary values with it like _delay_ns(10) or _delay_ns(100).

On 1MHz both would give 1000ns in fact (assuming round up?)

That is no different from what already exists today.
A person can ask for 0.010 or 0.100 microseconds.
So what is the difference?

Correct at 1MHz you get a 1000ns delay.
That is the shortest delay that is as close as is possible.

This is actually a documentation issue.

clawson wrote:

On 16MHz I guess 10 might actually round down to 0 and offer no delay perhaps? And 100 would presumably round down to the nearest 62.5ns ?

Normally, delay functions always round up, so you get *at least* as long as you asked for.
This is critical when using them for what I believe is one time when busy loops have to be used, and this is for hardware setup times.
You always want and need a delay of at least as long as you ask for and those delays are always constant.

clawson wrote:

I just don't see the point in offering something that simply cannot be delivered? (or with HUGE inaccuracy compared to what was requested)

It may not be as blatantly obvious,
But again, that already exists today.

This is more a matter of documentation.

clawson wrote:

OTOH all AVRs should be able to do a _delay_us(1) (or at least something pretty close)

Which is something that cannot be done today
using the existing delay.h routines.

And Guys, I'm not just a complainer. I am happy to work on a new version and submit the patches, including new documentation, to make it happen.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In my mind the use of floating point isn't a real "issue". Yes it might be nice to eliminate it, but I can see the value of using floating point time arguments.

To me the big issue is fixing what the delay functions actually do.

They need to offer accurate predictable delays when constants are used.

It would be very useful if those delay functions could be used for the short delays that are often needed for hardware setup times.

Right now the delay_x.h macros provide this functionality over the delay.h macros in a completely backward compatible drop in way.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So all 1000 values from _delay_ns(1) to _delay_ns(1000) on a 1MHz processor will actually delay 1000ns? And _delay_ns(1001) will actually delay 2000ns? And this is in some sense "accurate" ? I just don't get how you can offer ns delays on processors that don't even come close to working in the ns range? ms and us, yes, because even the lowliest AVR should be able to get down to 1us - but ns? I don't think so.

And I don't think the fractional support in _ms and _us was so that _us(0.001) could be use but so that something like _us(1.7) might (or at least something close to it)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bperrybap wrote:

> To me the big issue is fixing what the delay functions actually do.

There's some light at the end of *that* tunnel. There's a patch around that
waits for incorporation into the GCC source that would implement an intrinsic
function named __delay_cycles. As this function is resolved directly by the
compiler, there is no additional overhead for calling it, neither for setting
up the respective registers -- the compiler always knows what it is doing, so
it can take that overhead into account.

What would still be needed is an accompanying preprocessor macro so the library
can tell whether the underlying compiler supports __delay_cycles or not. If
this functionality is present, the _delay_ms etc. wrappers could use it, otherwise
they have to emulate a similar behaviour (as close as possible) using the old
delay loops.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Rational numbers anyone?

#define delay_sec_rat(num, den) ....

Require num to be a nonnegative integral expression.
Require den to be a constant positive whole number.
Floating point would be allowed,
but only if the value was a positive whole number.
Casting it to long shouldn't change its value.
gcc has a macro that would allow delay_sec_rat
to behave differently depending on whether
num was a compile-time constant.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
So all 1000 values from _delay_ns(1) to _delay_ns(1000) on a 1MHz processor will actually delay 1000ns? And _delay_ns(1001) will actually delay 2000ns? And this is in some sense "accurate" ? I just don't get how you can offer ns delays on processors that don't even come close to working in the ns range? ms and us, yes, because even the lowliest AVR should be able to get down to 1us - but ns? I don't think so.

And I don't think the fractional support in _ms and _us was so that _us(0.001) could be use but so that something like _us(1.7) might (or at least something close to it)

I think somehow you are confusing resolution or precision with accuracy.

The accuracy of the actual delay that is possible is directly proportional to the CPU clock frequency. i.e. the higher the CPU clock frequency, the closer the actual delay is to the desired delay.

I'm not talking about offering delays with a precision of 1 ns resolution on hardware that obviously cannot support that. I'm not talking about offering delays with 1 ns resolution on any hardware.

What I'm talking about is having delay routines that ensure delays are always at least as long as the requested delay and are as close as possible the desired delay within the limitations of the CPU clock frequency.

When talking to real hardware you often need to honor setup times.
Those setup times are usually defined in terms of minimums.

If you violate those setup times, things won't work correctly.

That is why you want delay functions to ensure that the actual delay is at least as long as what was requested.

So say you are talking to some hardware that needs 400 ns of setup time for the data to be ready.
What do you do on a 1MHz clock CPU?
You can't wait 0 time to read the data because it wouldn't be ready yet, and 1 nop would be 1us.
Well, you are stuck doing 1 nop or a 1us delay because that is the closet to the 400ns possible at that clock frequency.

The point of correcting the delay routines is not to offer any sort of better resolution or to appear to offer something that somehow violates the rules of physics.

The point is to define routines that can be used to define delays that can be used to guarantee a minimum delay regardless of the CPU clock rate. So that delay needs to be at least as long as you requested.

So yes getting 1000ns for all the calls to _delay_ns() with values from 1 to 1000 is what you want. Obviously the actual delay is not very accurate with respect to the requested delay when the CPU clock is 1MHz.
But remember the point is to offer the shortest delay that is possible without undershooting the requested delay.

If you have a 1Mhz clock you are stuck with actual delays that are increments of 1us.

The value of all of this is that it becomes possible to write portable library code that talks to hardware that has hardware setup delay needs. Rather than having to get people to screw around adjusting NOPs to make library code work in their environment, the library code could use delay calls to request a certain number of nanoseconds.

If the delay functions worked as I have outlined and the way the delay_x.h delay functions work, the library code automagically adjusts to using the shortest possible delay for any CPU speed.

While not necessary, just like a call to _delay_ms(2.5) looks cleaner than a call to _delay_us(2500),
a call to _delay_ns(200) looks cleaner than call to _delay_us(0.200)

That is why I think the _delay_ns() function is useful.

But even if you ignore the new _delay_ns() function, as stated in previous posts, the existing delay routines are not good enough to use for hardware setup time delays as they don't ensure you always get the minimum delay you ask for.

And by the way as also noted earlier, because of the rounding/truncation that occurs in the existing delay.h functions that use iterations of basic delay functions, you don't get what you wanted on 1Mhz clock even when you ask for increments of 1us unless you ask for delay larger than 3us that is also a multiple of 3us. - Not hardly what is likely to be expected of someone using this function.
So the existing _delay_us() routine can't even do integer increments of 1us when the cycle time is exactly 1us.

In your specific example of 1.7us, I would argue that it should be 2us on a 1MHz clock because you asked for more than 1us and 2us is the next closest delay that is possible.
Rounding down would yield an undershoot to 1us
and yet the existing _delay_us() function will delay 3us by calling the basic delay function 1 iteration.

delay_x.h has no issues like this or with ensuring no undershoot as it provides the closest delay possible within 1 CPU clock
without being shorter than what was requested.
It does this all in a fully backward compatible and totally transparent way.

And this is why I currently use it over the provided delay.h

=======================================

Consider the bigger picture.
So many folks say using busy loops are total BS and other methods should be used.
But there are cases where they are necessary like for delaying for short hardware setup times.
Yet as shown, the very place where it is necessary to use busy loops and timing is critical, is the very place the current delay routines tend to break down the most.

This is why I keep pushing so hard to get these functions updated to work better.
With respect to delay cycles, they need to work the way the delay_x.h functions work.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dl8dtl wrote:
bperrybap wrote:

> To me the big issue is fixing what the delay functions actually do.

There's some light at the end of *that* tunnel. There's a patch around that
waits for incorporation into the GCC source that would implement an intrinsic
function named __delay_cycles. As this function is resolved directly by the
compiler, there is no additional overhead for calling it, neither for setting
up the respective registers -- the compiler always knows what it is doing, so
it can take that overhead into account.

What would still be needed is an accompanying preprocessor macro so the library
can tell whether the underlying compiler supports __delay_cycles or not. If
this functionality is present, the _delay_ms etc. wrappers could use it, otherwise
they have to emulate a similar behaviour (as close as possible) using the old
delay loops.

The current WinAVR downloadable from Atmel, already has the gccc built-in _delay_cycles() function. I've already done some testing with it.

When the new built in __delay_cycles() function is used by the _delay_xxx() functions, the functions can reduce down to a very simple calculation and call to the built-in delay function rather than attempt to use some sort of iteration loop in a basic delay function.

The old delay loops simply don't work very well. Not good enough to use for reliable hardware delays. So I don't think we want to emulate the existing behavior.
A much better approach would be to fall back to doing what Hans did in his delay_x.h version of the delay routines if the newer built-in function is not available.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

> So I don't think we want to emulate the existing behavior.

We (i.e. the avr-libc vendors) do want, because we do not want
to remove an existing API. Things that don't appear to work
very well in your book (and I understand the reasons) might be
perfectly suitable for the job still for many, many others,
including allowing things like minimum delays in an HD44780
control firmware (for just one example).

So there are just two options for us then: keep the existing
API as is, and invent a new one that is based on __delay_cycles
(so those whose compiler does not implement it could still use
the old API), or make an automatic decision between both. For
the latter, I'd need a preprocessor macro as a decision base.
AFAICT, the current __delay_cycles patch does not offer that
feature though.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dl8dtl wrote:
> So I don't think we want to emulate the existing behavior.

We (i.e. the avr-libc vendors) do want, because we do not want
to remove an existing API.

How closely must the existing API be followed?
The documentation doesn't promise very much.
If the existing API was two cycles short,
would adding those cycles be allowed?
If the existing API was five cycles long,
would dropping four cycles be allowed?

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

> How closely must the existing API be followed?

As exact as possible. When in doubt, I'd follow it in a way
so the delay is *at least as long* as requested. (Yes, I know,
this isn't always the case right now due to rounding errors.)

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dl8dtl wrote:
> So I don't think we want to emulate the existing behavior.

We (i.e. the avr-libc vendors) do want, because we do not want
to remove an existing API. Things that don't appear to work
very well in your book (and I understand the reasons) might be
perfectly suitable for the job still for many, many others,
including allowing things like minimum delays in an HD44780
control firmware (for just one example).

So there are just two options for us then: keep the existing
API as is, and invent a new one that is based on __delay_cycles
(so those whose compiler does not implement it could still use
the old API), or make an automatic decision between both. For
the latter, I'd need a preprocessor macro as a decision base.
AFAICT, the current __delay_cycles patch does not offer that
feature though.

I think you have misunderstood me.
(I wish we could sit down face to face in a pub; Heck I'd even buy all the food and beers because I'd love to pay you back on your beer licenses for your work I've benefited from)

I believe that an API and how the functionality under that API is actually implemented are two different things.
i.e the code may not properly or fully implement the functionality as defined by the API.

Yes, there was some discussion about changing the actual API - And if an API for delays would happen to be defined that used only integers, I would not want to remove or alter the existing API to do that - backward compatibility is vital when it comes to system libraries. It would have to be something new.

For this comment when I said that "we don't want to emulate the existing behavior"
I meant we don't want to emulate the exact functionality/method of providing delays that the current delay.h has implemented, which does not work properly in all situations.

i.e. rounding delays to iterations of basic delay loops and truncating loop values such that some delays are too short.

For example, it should be possible to get a delay of 1us, 2us, or 5us and any other delay in 1us increments when running on a 1MHz AVR. But today with delay.h you cannot.
Today, you get 3us if you ask for 1us, 2us, 3us, 4us, or even 5us.
And you will get 6us if you ask for 6us, 7us or 8us.

There are better methods of providing delays
that can be implemented underneath the existing API.

delay_x.h is a great example of not changing the API but fixing the actual functionality by doing the delays in a better way.

To say "Things that don't appear to work very well in your book" seems a bit short sighted, given that I can show by example and mathematically, specific cases where the existing implementation breaks down and doesn't work properly.

The entire reason that this came up for me was that I was implementing a ks0107 driver and the existing delay.h delay functions were giving improper delays.
I'm wanting to publish a GLCD driver that works in anybody's environment and that simply cannot be done with the existing delay.h implementation. However, it can be done with the delay_x.h implementation.

(Ok it can be done with the existing delay.h functions but I have to bump the requested delays up needlessly to work around the bugs in the existing delay.h functions, which slows things down)

So it isn't a valid statement to say that the existing implementation works suitably well for things like minimum delays in an HD44780 control firmware.
It it will depend on your CPU clock speed, your specific LCD and how closely your code tries to minimize delays by reducing the requested delays to the minimum requirements specified by the LCD.
If you have an "unlucky" combination, then the existing delay.h routines may return a delay shorter than you asked for and you violate the timing enough such that it no longer works reliably.

Yes you can fiddle with it until you get something that works, but I'm a firm believer that library functions provided with the development tools should "just work" and should be updated when implementation issues crop up

So to further and help along the updating process, I'll start poking around and see if I can figure out how to automagically detect if the built-in delay routine exists.
And if there is a way to automatically detect if the built delay function does not exist, the code needs to fall back to something like the code in delay_x.h rather than the existing delay.h

--- bill