Documentation of delay driver

Go To Last Post
8 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm tinkering and experimenting with the SAM D21 clock system, with no particular goal other than learning and getting a feel for "the terrain".

 

I thought I'd have a look at what Start/ASF offers when it comes to delay loops, and it has a delay driver. API looks straight-forward, e.g.

void delay_us(
    const uint16_t us
)

but before using that an init function must be called. This is documented like this:

delay_init

Initialize Delay driver.

void delay_init(
    void *const hw
)

Parameters

hw

Type: void *const

The pointer to hardware instance

Returns

Type: void

As far as I can see, this is all that is said about the init function throughout the complete ASF documentation.

 

So what is the hw parameter. No hint from the type, since it's a void *.

 

To investigate I created a Start project to which I added the delay driver. I then dug into the code, starting with delay_init()

void delay_init(void *const hw)
{
	_delay_init(hardware = hw);
}

and from there thus to _delay_init()

void _delay_init(void *const hw)
{
	_system_time_init(hw);
}

and then to _system_time_init()

void _system_time_init(void *const hw)
{
	(void)hw;  // <-- What?!
	SysTick->LOAD = (0xFFFFFF << SysTick_LOAD_RELOAD_Pos);
	SysTick->CTRL = (1 << SysTick_CTRL_ENABLE_Pos) | (CONF_SYSTICK_TICKINT << SysTick_CTRL_TICKINT_Pos)
	                | (1 << SysTick_CTRL_CLKSOURCE_Pos);
}

There are two things I don't understand:

  1. the first line in the body of this function, marked with an arrow by me. What is this? A technique so that the compiler decides the variable is used? Why?
  2. If the parameter is essentially not used at all, then why is it there?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

JohanEkdahl wrote:
(void)hw; // <-- What?!
Well that's a common technique to ensure that a parameter is "used" so that your lint/split/cppcheck/klockwork or whatever does not moan heavily about a parameter being declared that is not accessed. 

JohanEkdahl wrote:
If the parameter is essentially not used at all, then why is it there?
I'm guessing it is is for consistency. Presumably APIs for UART or timers or ADC or something all take a pointer to "hw" and this is just maintaining a consistent interface even if a delay routine doesn't need to access anything in it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Want to see even more fun with the ASF4 delay functions?

 

First, the delay_us() and delay_ms() ultimately call down to _delay_cycles() after translating the passed ms/us to processor cycles based on a compile time processor speed.  Works as expected if you call _delay_cycles() after setting up the clocks.  But if you call it in your startup code pre main(), or more exactly pre atmel_start_init(), the calculations are wrong as the processor boots by default with the internal RC oscillator, often much slower than after you setup your system clocks.  OK, so don't use the library functions before initializing the library, seems reasonable.

 

The bigger problem is with the compiler/linker and the _delay_cycles() routine.  If the function is linked to the wrong address it exposes a nasty timing bug.  The _delay_cycles() is a basic spin loop delay.  You are guaranteed the minimum time before returning plus overhead of interrupts that occur during the call.  Normal right?  But if the compiler happens to place the spin loop across the wrong address boundaries, the actual delay could be 2X or even 44X  longer than expected!!!  Yes that's 44X not 4X.  The fix is to force the function to be aligned to 8-byte boundaries.

 

Here's some test code for a Cortex-M7:

 

//////////////////////////////////////////////////////////////////////////
// SRD 2017-11-06 Found and fixed a nasty delay timing bug.  Be sure the _delay_cycles() function below is aligned to
// 8-byte boundaries otherwise there is a chance the linker can place the function in such a way to have the loop cross
// an address boundary that causes a 2X or 44X actual delay.  Checked both with the debugger cycle counter and with an
// oscilloscope.
//
// The _delay_cycles() function below is called from delay_ms() and delay_us() passing a calculated number of cycles
// based on the processor clock speed.  The calculated value guarantees the actual delay is at least the expected amount.
// For example, a call to delay_ms(10) calls down to _delay_cycles() with a value to ensure a 10ms delay by decrementing a
// counter in a tight loop.  Each iteration is expected to take two cycles so the calculation for the number of iterations
// takes this into account.  There is minor overhead entering and exiting the functions and calculating the loop counter.
// When interrupts are enabled the loop overhead increases too as expected.
//
// The bug is related to the Cortex-M7 processor pipeline and address boundaries.  Normally the loop takes two cycles per
// iteration.  But if the loop branch crosses an eight byte address alignment the delay is longer than two cycles.
// When crossing 0xXXXXXX06-08, 0xXXXXXX0E-10, or 0xXXXXX16-18 boundariesthe number of cycles per iteration doubles.
// Worse is if the loop crosses a 0xXXXXXX1E-20 boundary as the number of cycles per iterations is 44 times more (yes forty-four).
// This means the actual delay for a call to delay_ms(100) could be 100ms plus overhead, 200ms plus overhead, or 4400ms plus
// overhead all depending upon where the linker places the function in memory.  This failure pattern repeats for every 32-byte
// address boundary.  So the 0xXXXXX2E-30, 0xXXXXX4E-50 boundaries and so on fail at 44X as well.
//
// To reproduce these results, add NOP instructions in front of the loop pushing the loop across the bad boundaries.
// Check the number of cycles per iteration with the debugger cycle counter or by blipping a GPIO pin and using an
// oscilloscope to measure timings.
//////////////////////////////////////////////////////////////////////////
void _delay_cycles(void *const hw, uint32_t cycles) __attribute__ (( aligned(8) ));
//////////////////////////////////////////////////////////////////////////

void _delay_cycles(void *const hw, uint32_t cycles)
{
#ifndef _UNIT_TEST_
	(void)hw;
	(void)cycles;
#if defined __GNUC__
// uncomment these two NOP's to reproduce the timing error - forces the loop to the bad address alignment
//     __asm("nop");
//     __asm("nop");

// uncomment one to three of these sets of four NOP's as well to switch between the 2X and 44X timing error
//      __asm("nop");
//      __asm("nop");
//      __asm("nop");
//      __asm("nop");

//      __asm("nop");
//      __asm("nop");
//      __asm("nop");
//      __asm("nop");

//      __asm("nop");
//      __asm("nop");
//      __asm("nop");
//      __asm("nop");

	__asm("__delay:\n"
	      "subs r1, r1, #1\n"   // be sure these two lines do not cross an eight-byte address alignment otherwise
	      "bhi __delay\n");     // the number of actual cycles is 2X or 44X expected on a Cortex-M7 processor
#elif defined __CC_ARM
	__asm("__delay:\n"
	      "subs cycles, cycles, #1\n"
	      "bhi __delay\n");
#elif defined __ICCARM__
	__asm("__delay:\n"
	      "subs r1, r1, #1\n"
	      "bhi __delay\n");
#endif
#endif
}

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi ScottMN!

 

Interesting, or should I type "interesting" (with quotes).

 

Yes, having just looked into the inner workings of the delay routines I've understood that they won't work until system initialization is done.

 

Your description of how they work, and the bug you are talking about,  makes me think you're on an older version of ASF/Start.  Or this basic functionality is coded differently for a SAM 7 (that you use) and a SAM D21 (that I currently play with). The version I have been looking into (generated earlier today from/by Start) actually isn't a spin-loop using NOPs, but rely on the SysTick timer:

 

Basically, it

1) Translates the desired delay time to a certain number of cycles

2) Load SysTick timer with the number of cycles

3) Spins in a tight loop (no NOPs) as long as systick hasn't ticked the loaded number of ticks. For long(ish) times the systick is loaded several times with portions of the total number of ticks so it's actually a nested loop, followed by a final un-nested loop to consume the remainder of ticks.

 

So it seems Atmochip has fixed the bug you've discovered.

 

Below is the code for the current low-level spin-loop function (as generated by Start, for a SAM D21J18, at the time of writing):

 

/**
 * \brief Delay loop to delay n number of cycles
 */
void _delay_cycles(void *const hw, uint32_t cycles)
{
	(void)hw;


	uint8_t  n   = cycles >> 24;
	uint32_t buf = cycles;

	while (n--) {
		SysTick->LOAD = 0xFFFFFF;
		SysTick->VAL  = 0xFFFFFF;
		while (!(SysTick->CTRL & SysTick_CTRL_COUNTFLAG_Msk))
			;
		buf -= 0xFFFFFF;
	}

	SysTick->LOAD = buf;
	SysTick->VAL  = buf;
	while (!(SysTick->CTRL & SysTick_CTRL_COUNTFLAG_Msk))
		;
}

 


 

My desire for a spin-loop delay was for experimenting with "the clock system" while trying to learn and understand it. For that an actual spin-loop-using-NOPs is to be preferred. I want something that actually reflects the execution of instructions.

 

Basically, I want to tinker  with all aspects of the clock system using a spin-loop delay in combination with blinking a LED to confirm that the changes has the effects that I anticipate.

 

Thus, I have no high requirements on accuracy. As long as I can say "Yeah, that's about 1 Hz blinks - not 8 Hz blinks" that's good enough. Inspired by a post by Alex Taradov at the EEVblog forums I ended up with the below - which possibly needs some minor adjustments. My tests suggests that error is lower than 1 % (I measured about 0.3% error over 10 minutes):

 

/* delay_ms_spinloop
 * A simple, non-exact, spin-loop delay to be used when delays are wanted
 * without engaging a timer of any sort. By experiments, error has been
 * estimated to be 0.3%.
 *
 * For this to work a symbol F_CPU_HZ needs to be #define'd.
 *
 * Parameters
 *  int milliseconds  The number of milliseconds to delay.
 *
 * Internals: For a delay of one millisecond the needed number of spinloops
 * is 201 per MHz of CPU frequency. The expression is the #define of 
 * SPINLOOPS_PER_MS divides F_CPU_HZ by 10 to avoid the multiplication with
 * 201 to overflow. Consequently, the divisor for getting the final 
 * SPINLOOPS_PER_MS for Hz will be 100000 rather than 1000000.
 */

#ifndef F_CPU_HZ
#warning F_CPU_HZ not defined (needed by delay_ms_spinloop), will default to 1000000 Hz
#define F_FPU_HZ 1000000ul
#endif

#define SPINLOOPS_PER_MS ((201ul*(F_CPU_HZ/10))/100000ul)

void delay_ms_spinloop(int milliseconds)
{
  while (milliseconds--) {
    for (int i = 0; i < SPINLOOPS_PER_MS; i++) {
      asm("nop");
    }
  }
}

The value 201 was arrived at by repeated experimentation. Main code:

 

// Expected/assumed CPU frequency
#define F_CPU_HZ 8000000ul

 .
 .
 .

// Keep a count of number of blinks (i.e. seconds), so that
// when program is breaked this count can be compared with an
// external/separate stopwatch:
volatile int i = 0;

// Using Alex Taradovs macros for digital I/O ease of use
HAL_GPIO_PIN(LED,      B, 30)

int main(void)
{
  SystemInit();

  // Switch to 8MHz clock (disable prescaler)
  SYSCTRL->OSC8M.bit.PRESC = 0;

  HAL_GPIO_LED_out();
  HAL_GPIO_LED_clr();
    
  while (1)
  {
    delay_ms_spinloop(25);
    HAL_GPIO_LED_set();
    delay_ms_spinloop(975);
    HAL_GPIO_LED_clr();
    i++;
  }   
}

 

As a sidebar comment I found it disappointing that while debugging the SAM D20 on-chip there is no cycle counter or stopwatch available in Studio. (The grass is always greener on the other side: For AVR(8)s cycle counter and stopwatch are available - but there you get no call stack, which is available for the SAM D's.)

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

"the actual delay could be 2X or even 44X longer than expected" - maybe that is because you are running it from flash instead of RAM? The SAMD version is defined in ASF3 in common2/services/delay/sam0/cycle_counter.c. It is defined with a couple of following options:
OPTIMIZE_HIGH
RAMFUNC

which translate to the following in the GCC world, I believe:
__attribute__((optimize(s)))
__attribute__((section(".ramfunc")))


Edit: Actually I'm not sure what OPTIMIZE_HIGH translates too. But I think the most important thing is that it is placed into RAM.

Last Edited: Fri. Dec 15, 2017 - 12:30 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Above I wrote

My desire for a spin-loop delay was for experimenting with "the clock system" while trying to learn and understand it. For that an actual spin-loop-using-NOPs is to be preferred. I want something that actually reflects the execution of instructions.

This was based on faulty assumptions. Yeah, I know, but there's just so much to digest re the SAM Ds so some assumptions must be made while under way. I'll try to rectify:

 

I have now dug into the SysTick counter and since it is ARM generic rather than SAM specific it is documented by ARM. More or less the only mention in the SAM data sheet is that it is implemented (it is optional by ARM).

 

In the ARM document ARMv6-M Architecture Reference Manual, section B3.3.1 SysTick operationit says

The timer is clocked by a reference clock. Whether the reference clock is the processor clock or an external
clock source is  IMPLEMENTATION DEFINED . If an implementation uses an external clock, it must document
the relationship between the processor clock and the external reference.

I can not find any mention in the D21 data sheet of how its SysTick is clocked. I could assume that it is clocked by the CPU clock. Or I could do a test, since (again from the ARM document mentioned above) in section B3.3.3 SysTick Control and Status Register, SYST_CSR :

 

Bits TYPE Name       Function
[2]  RW   CLKSOURCE  Indicates the SysTick clock source:
                     0  SysTick uses the optional external reference clock.
                     1  SysTick uses the processor clock.
                     If no external clock is provided, this bit reads as one and ignores writes

So, writing a 0 to this bit in the SYST_CSR register and then reading it should give the answer. If the readout yields a 1 then there is no external clock. If it yields a zero then there is an external clock provision  (and the question then becomes what it is and how it is configured...)

 

Does this seem to be a reasonable test to you all?

 


 

Anyway.. Using the SysTick seems to be a "valid" way to count CPU clock ticks . I'll do some experiments..

 

Since my goal is to have a reasonable understanding of the complete clock system etc I'll have to dig into the SysTick eventually anyway..

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SysTick is a part of the ARM core. SysTick is described (briefly):

http://infocenter.arm.com/help/i...

http://infocenter.arm.com/help/i...

 

The "SysTick Control and Status Register" controls the clock source:

http://infocenter.arm.com/help/i...

(for some reason, this keeps picking up the wrong document page...Oh well)

 

I have not explored how to utilize the "external reference clock"...

Although checking the "SysTick Calibration Value Register" should/would let you know if it is available

 

 

Edit: wrong document references...

Edit2: For some reason, the only reference that seems to go to the correct location is the SysTick description.

 

David (aka frog_jr)

Last Edited: Fri. Dec 15, 2017 - 02:52 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yup, START has been updated.  I just created a dummy sample project with only the delay driver.  It now hijacks SysTick as you describe instead of the spin loop code it generated when my project was created.

 

Edit:  I take that back.  A dummy project WITHOUT the delay driver creates the following code in <project>\hpl\core\hpl_core_m7_base.c, notice the lack of forced alignment to 8-byte boundaries.  (START website, new project, SAME70Q20, no modules, <view code>)

/**
 * \brief Delay loop to delay n number of cycles
 *
 * \note In theory, a single loop runs take 2 cycles or more. But we find it
 * really only needs 1 cycle through debugging.
 *
 */
void _delay_cycles(void *const hw, uint32_t cycles)
{
#ifndef _UNIT_TEST_
	(void)hw;
	(void)cycles;
#if defined __GNUC__
	__asm("__delay:\n"
	      "subs r1, r1, #1\n"
	      "bhi __delay\n");
#elif defined __CC_ARM
	__asm("__delay:\n"
	      "subs cycles, cycles, #1\n"
	      "bhi __delay\n");
#elif defined __ICCARM__
	__asm("__delay:\n"
	      "subs r1, r1, #1\n"
	      "bhi __delay\n");
#endif
#endif
}

 

But adding the delay module to the project the code, the _delay_cycles() routine is missing from <project>\hpl\core\hpl_core_m7_base.c, instead included inside <project>\hpl\systick\hpl_systick_ARMv7_base.c:

/**
 * \brief Delay loop to delay n number of cycles
 */
void _delay_cycles(void *const hw, uint32_t cycles)
{
	(void)hw;
	uint8_t  n   = cycles >> 24;
	uint32_t buf = cycles;

	while (n--) {
		SysTick->LOAD = 0xFFFFFF;
		SysTick->VAL  = 0xFFFFFF;
		while (!(SysTick->CTRL & SysTick_CTRL_COUNTFLAG_Msk))
			;
		buf -= 0xFFFFFF;
	}

	SysTick->LOAD = buf;
	SysTick->VAL  = buf;
	while (!(SysTick->CTRL & SysTick_CTRL_COUNTFLAG_Msk))
		;
}

 

So yeah, the default _delay_cycles() still has the nasty alignment bug unless you include the SysTick delay driver.

Last Edited: Fri. Dec 15, 2017 - 08:20 PM