delay_basic.h clarification?

Go To Last Post
9 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

hi,

 

Why are the delay functions in delay_basic.h written in assembly and not C? For faster execution?

 

 void _delay_loop_1(uint8_t __count)
{
  __asm__ volatile (
  "1: dec %0" "\n\t"
  "brne 1b"
  : "=r" (__count)
  : "0" (__count)
  );
 }
 

Could someone translate this into C?

 

thanks!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

For faster execution?

Why would a delay function need to run faster?

 

Most likely it's written in assembler for accuracy as you need to be sure of how many clock cycles something takes.

 

The above routine is decrementing count until it reaches zero.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Most likely it's written in assembler for accuracy as you need to be sure of how many clock cycles something takes.

Exactly this.

 

Note that htese days the routines in delay_basic.h are effectively superseded. They existed to support _delay_ms() and _delay_us() in <util/delay.h> but the fact is that (assuming the compiler supports it) those routines now make use of __builtin_avr_delay_cycles() which is an intrinsic within the C compiler itself. So if you want to delay for 2345 cycles just use __builtinn_avr_delay_cycles(2345); and that's exactly what will happen.

Could someone translate this into C?

Simply as an academic exercise it's effectively:

 void _delay_loop_1(uint8_t __count)
{
  while(--__count);
}

However whether that really generates DEC and BRNE will depend on the version of the compiler, the optimization level and various other things. The reason it was written in ASM was to guarantee it would be a DEC and an BRNE as it's known that DEC is 1 machine cycle and BRNE is 2 cycles each time it branches back and 1 cycle on the last occasion when it doesn't.

 

The user manual is here:

 

http://www.nongnu.org/avr-libc/u...

 

it says:

The loop executes three CPU cycles per iteration,

it can only make that guarantee because it knows that DEC/BRNE are going to be used.

 

As I say __builtin_avr_delay_cycles() is a much better solution. It does not limit you to a multiple of 3 cycles for example.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
However whether that really generates DEC and BRNE will depend on the version of the compiler, the optimization level and various other things. The reason it was written in ASM was to guarantee it would be a DEC and an BRNE as it's known that DEC is 1 machine cycle and BRNE is 2 cycles each time it branches back and 1 cycle on the last occasion when it doesn't.

It is really important to understand this!!

 

http://www.8052.com/forum/read/1...

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

thanks guys for the links! 

 

you can NEVER predict the duration of a delay in 'C'.

interesting... off to reading

Last Edited: Tue. Mar 10, 2015 - 04:05 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Never say never. ;-)

 

If you write some C code, build it, study the Asm generated and count the cycles then as long as you stick with the exact same source, the same compiler, the same version of it and the same build options then it will generate the same sequence of opcodes next week as it did this week. In that sense it is predictable/deterministic and so "never" is too strong (but only just).

 

However if you had to recheck the cycle count every time you upgraded the compiler or changed one of the build options life could get very tedious indeed. You might even find that some extra lines added to the same function but not in the immediate vicinity of the delay loop changed things.

 

That's why C library authors go to the trouble of writing things like _delay_ms() (delay_ms() in come compilers) and C compiler authors deliver things like __builtin_avr_delay_cycles(). It gives the user something they can rely on to always execute the same cycles whatever changes they make in the compiler or their other code. To achieve this it has to be written in Asm.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, of course - all generalisations are bad...

 

wink

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

_delay_loop_1(uint8_t __count)
{
	__asm__ volatile (
		"1: dec %0" "\n\t"
		"brne 1b"
		: "=r" (__count)
		: "0" (__count)
	);
}

_delay_loop_1

Is used by _delay_us function and it is counter, it can count up to 256 and each one increment takes 3 ticks of MCU clock. So maximum is 768 times one clock period for 1Mhz = 768 microseconds.
 

_delay_loop_2(uint16_t __count)
{
	__asm__ volatile (
		"1: sbiw %0,1" "\n\t"
		"brne 1b"
		: "=w" (__count)
		: "0" (__count)
	);
}

_delay_loop_2

Is used by _delay_ms function and it is counter counting to max. 65 536 and each increment takes 4 clock cycles. So 262 144 x period is maximum time. For 1MHz it is 262.1milliseconds. For faster F_CPU it is lower value.

The maximal possible _delay_ms is 262.14 ms / F_CPU in MHz. (For 1MHz 262,14/1=262,14ms)

The maximal possible  _delay_us is 768 us / F_CPU in MHz.  (For 1MHz 768/1=768us)

 

So if you have MCU on 1MHz maximum _delay_ms generated using this way is 262ms. 4,8MHz 54,6ms, 9,6MHz only 27,3ms ...

_delay_ms longer than 262 144 x (1/F_CPU) is automatically achieved differently and then resolution is 0,1ms. And maximum possible delay is 6,5535s and it is independent on F_CPU. 
_delay_us longer than 768 x (1/F_CPU) is achieved by calling _delay_ms function instead of executing _delay_us

So maximum _delay_ms is 6553. If you will have longer delay in your program, it will be still only this maximum and no more. Then you must use counter.

 

//60 seconds delay
for (byte i = 0; i < 60; i++) 
    {
    _delay_ms(1000);   
    } 

In arduino IDE it works differently and it is possible to use longer delay than 6,55s.

 

If F_CPU is not defined delay.h will define it to 1Mhz.

 

 

delay.h:
 

/* Copyright (c) 2002, Marek Michalkiewicz
   Copyright (c) 2004,2005,2007 Joerg Wunsch
   Copyright (c) 2007  Florin-Viorel Petrov
   All rights reserved.

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions are met:

   * Redistributions of source code must retain the above copyright
     notice, this list of conditions and the following disclaimer.

   * Redistributions in binary form must reproduce the above copyright
     notice, this list of conditions and the following disclaimer in
     the documentation and/or other materials provided with the
     distribution.

   * Neither the name of the copyright holders nor the names of
     contributors may be used to endorse or promote products derived
     from this software without specific prior written permission.

  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  POSSIBILITY OF SUCH DAMAGE. */

/* $Id: delay.h,v 1.5.2.1 2009/02/25 10:14:03 joerg_wunsch Exp $ */

#ifndef _UTIL_DELAY_H_
#define _UTIL_DELAY_H_ 1

#include <inttypes.h>
#include <util/delay_basic.h>

/** \file */
/** \defgroup util_delay <util/delay.h>: Convenience functions for busy-wait delay loops
    \code
    #define F_CPU 1000000UL  // 1 MHz
    //#define F_CPU 14.7456E6
    #include <util/delay.h>
    \endcode

    \note As an alternative method, it is possible to pass the
    F_CPU macro down to the compiler from the Makefile.
    Obviously, in that case, no \c \#define statement should be
    used.

    The functions in this header file are wrappers around the basic
    busy-wait functions from <util/delay_basic.h>.  They are meant as
    convenience functions where actual time values can be specified
    rather than a number of cycles to wait for.  The idea behind is
    that compile-time constant expressions will be eliminated by
    compiler optimization so floating-point expressions can be used
    to calculate the number of delay cycles needed based on the CPU
    frequency passed by the macro F_CPU.

    \note In order for these functions to work as intended, compiler
    optimizations <em>must</em> be enabled, and the delay time
    <em>must</em> be an expression that is a known constant at
    compile-time.  If these requirements are not met, the resulting
    delay will be much longer (and basically unpredictable), and
    applications that otherwise do not use floating-point calculations
    will experience severe code bloat by the floating-point library
    routines linked into the application.

    The functions available allow the specification of microsecond, and
    millisecond delays directly, using the application-supplied macro
    F_CPU as the CPU clock frequency (in Hertz).

*/

#if !defined(__DOXYGEN__)
static inline void _delay_us(double __us) __attribute__((always_inline));
static inline void _delay_ms(double __ms) __attribute__((always_inline));
#endif

#ifndef F_CPU
/* prevent compiler error by supplying a default */
# warning "F_CPU not defined for <util/delay.h>"
# define F_CPU 1000000UL
#endif

#ifndef __OPTIMIZE__
# warning "Compiler optimizations disabled; functions from <util/delay.h> won't work as designed"
#endif

/**
   \ingroup util_delay

   Perform a delay of \c __ms milliseconds, using _delay_loop_2().

   The macro F_CPU is supposed to be defined to a
   constant defining the CPU clock frequency (in Hertz).

   The maximal possible delay is 262.14 ms / F_CPU in MHz.

   When the user request delay which exceed the maximum possible one,
   _delay_ms() provides a decreased resolution functionality. In this
   mode _delay_ms() will work with a resolution of 1/10 ms, providing
   delays up to 6.5535 seconds (independent from CPU frequency).  The
   user will not be informed about decreased resolution.
 */
void
_delay_ms(double __ms)
{
	uint16_t __ticks;
	double __tmp = ((F_CPU) / 4e3) * __ms;
	if (__tmp < 1.0)
		__ticks = 1;
	else if (__tmp > 65535)
	{
		//	__ticks = requested delay in 1/10 ms
		__ticks = (uint16_t) (__ms * 10.0);
		while(__ticks)
		{
			// wait 1/10 ms
			_delay_loop_2(((F_CPU) / 4e3) / 10);
			__ticks --;
		}
		return;
	}
	else
		__ticks = (uint16_t)__tmp;
	_delay_loop_2(__ticks);
}

/**
   \ingroup util_delay

   Perform a delay of \c __us microseconds, using _delay_loop_1().

   The macro F_CPU is supposed to be defined to a
   constant defining the CPU clock frequency (in Hertz).

   The maximal possible delay is 768 us / F_CPU in MHz.

   If the user requests a delay greater than the maximal possible one,
   _delay_us() will automatically call _delay_ms() instead.  The user
   will not be informed about this case.
 */
void
_delay_us(double __us)
{
	uint8_t __ticks;
	double __tmp = ((F_CPU) / 3e6) * __us;
	if (__tmp < 1.0)
		__ticks = 1;
	else if (__tmp > 255)
	{
		_delay_ms(__us / 1000.0);
		return;
	}
	else
		__ticks = (uint8_t)__tmp;
	_delay_loop_1(__ticks);
}


#endif /* _UTIL_DELAY_H_ */

delay_basic.h:
 

/* Copyright (c) 2002, Marek Michalkiewicz
   Copyright (c) 2007 Joerg Wunsch
   All rights reserved.

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions are met:

   * Redistributions of source code must retain the above copyright
     notice, this list of conditions and the following disclaimer.

   * Redistributions in binary form must reproduce the above copyright
     notice, this list of conditions and the following disclaimer in
     the documentation and/or other materials provided with the
     distribution.

   * Neither the name of the copyright holders nor the names of
     contributors may be used to endorse or promote products derived
     from this software without specific prior written permission.

  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  POSSIBILITY OF SUCH DAMAGE. */

/* $Id: delay_basic.h,v 1.1 2007/05/13 21:23:20 joerg_wunsch Exp $ */

#ifndef _UTIL_DELAY_BASIC_H_
#define _UTIL_DELAY_BASIC_H_ 1

#include <inttypes.h>

/** \file */
/** \defgroup util_delay_basic <util/delay_basic.h>: Basic busy-wait delay loops
    \code
    #include <util/delay_basic.h>
    \endcode

    The functions in this header file implement simple delay loops
    that perform a busy-waiting.  They are typically used to
    facilitate short delays in the program execution.  They are
    implemented as count-down loops with a well-known CPU cycle
    count per loop iteration.  As such, no other processing can
    occur simultaneously.  It should be kept in mind that the
    functions described here do not disable interrupts.

    In general, for long delays, the use of hardware timers is
    much preferrable, as they free the CPU, and allow for
    concurrent processing of other events while the timer is
    running.  However, in particular for very short delays, the
    overhead of setting up a hardware timer is too much compared
    to the overall delay time.

    Two inline functions are provided for the actual delay algorithms.

*/

#if !defined(__DOXYGEN__)
static inline void _delay_loop_1(uint8_t __count) __attribute__((always_inline));
static inline void _delay_loop_2(uint16_t __count) __attribute__((always_inline));
#endif

/** \ingroup util_delay_basic

    Delay loop using an 8-bit counter \c __count, so up to 256
    iterations are possible.  (The value 256 would have to be passed
    as 0.)  The loop executes three CPU cycles per iteration, not
    including the overhead the compiler needs to setup the counter
    register.

    Thus, at a CPU speed of 1 MHz, delays of up to 768 microseconds
    can be achieved.
*/
void
_delay_loop_1(uint8_t __count)
{
	__asm__ volatile (
		"1: dec %0" "\n\t"
		"brne 1b"
		: "=r" (__count)
		: "0" (__count)
	);
}

/** \ingroup util_delay_basic

    Delay loop using a 16-bit counter \c __count, so up to 65536
    iterations are possible.  (The value 65536 would have to be
    passed as 0.)  The loop executes four CPU cycles per iteration,
    not including the overhead the compiler requires to setup the
    counter register pair.

    Thus, at a CPU speed of 1 MHz, delays of up to about 262.1
    milliseconds can be achieved.
 */
void
_delay_loop_2(uint16_t __count)
{
	__asm__ volatile (
		"1: sbiw %0,1" "\n\t"
		"brne 1b"
		: "=w" (__count)
		: "0" (__count)
	);
}

#endif /* _UTIL_DELAY_BASIC_H_ */

 

Last Edited: Tue. Mar 10, 2015 - 08:20 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

However whether that really generates DEC and BRNE will depend on the version of the compiler, the optimization level and various other things. The reason it was written in ASM was to guarantee it would be a DEC and an BRNE as it's known that DEC is 1 machine cycle and BRNE is 2 cycles each time it branches back and 1 cycle on the last occasion when it doesn't.

 

 

Just for fun, below is a dummy program using CodeVision to implement 10000 repetitions of the delay loop.  CodeVision's optimizer isn't as good as GCC in the respect of this example, and stores the decremented loop counter in each of the three incarnations.

 

Note from the ASM that there are different sequences generated depending on whether the loop variable is held in SRAM, is a register variable in low registers, or is a register variable in high registers.  In a real app, that may change (unless explicitly managed by the programmer) as local and global variables are added.

 

#include <io.h>

register unsigned int i;
unsigned int j;

void main(void)
{  
unsigned int k;

for(i = 10000; i; i--) {}
i = 10000;
while (--i){}

for(j = 10000; j; j--) {}
j = 10000;
while (--j){}      

for(k = 10000; k; k--) {}
k = 10000;
while (--k){}      
      
}  
 
   
; 0000 000A for(i = 10000; i; i--) {}
;	k -> R16,R17
    LDI  R30,LOW(10000)
    LDI  R31,HIGH(10000)
    MOVW R4,R30
_0x4:
    MOV  R0,R4
    OR   R0,R5
    BREQ _0x5
    MOVW R30,R4
    SBIW R30,1
    MOVW R4,R30
    RJMP _0x4
_0x5:
; 0000 000B i = 10000;
    LDI  R30,LOW(10000)
    LDI  R31,HIGH(10000)
    MOVW R4,R30
; 0000 000C while (--i){}
_0x6:
    MOVW R30,R4
    SBIW R30,1
    MOVW R4,R30
    BRNE _0x6
; 0000 000D 
; 0000 000E for(j = 10000; j; j--) {}
    LDI  R30,LOW(10000)
    LDI  R31,HIGH(10000)
    STS  _j,R30
    STS  _j+1,R31
_0xA:
    LDS  R30,_j
    LDS  R31,_j+1
    SBIW R30,0
    BREQ _0xB
    LDI  R26,LOW(_j)
    LDI  R27,HIGH(_j)
    LD   R30,X+
    LD   R31,X+
    SBIW R30,1
    ST   -X,R31
    ST   -X,R30
    RJMP _0xA
_0xB:
; 0000 000F j = 10000;
    LDI  R30,LOW(10000)
    LDI  R31,HIGH(10000)
    STS  _j,R30
    STS  _j+1,R31
; 0000 0010 while (--j){}
_0xC:
    LDI  R26,LOW(_j)
    LDI  R27,HIGH(_j)
    LD   R30,X+
    LD   R31,X+
    SBIW R30,1
    ST   -X,R31
    ST   -X,R30
    SBIW R30,0
    BRNE _0xC
; 0000 0011 
; 0000 0012 for(k = 10000; k; k--) {}
    __GETWRN 16,17,10000
        macro   +LDI R16 , LOW ( 10000 )
                +LDI R17 , HIGH ( 10000 )
_0x10:
    MOV  R0,R16
    OR   R0,R17
    BREQ _0x11
    __SUBWRN 16,17,1
        macro   +SUBI R16 , LOW ( 1 )
                +SBCI R17 , HIGH ( 1 )
    RJMP _0x10
_0x11:
; 0000 0013 k = 10000;
    __GETWRN 16,17,10000
; 0000 0014 while (--k){}
_0x12:
    MOVW R30,R16
    SBIW R30,1
    MOVW R16,R30
    BRNE _0x12
; 0000 0015 
; 0000 0016 }
_0x15:
    RJMP _0x15

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Tue. Mar 10, 2015 - 08:58 PM