Hunt for the missing clock cycle, where?

Go To Last Post
17 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi, I wrote a macro for a delay procedure as follows:

;---------------------------------------------------------
;@0=COUNT (16-bit value to count to, each count takes 8 clock cycles)
.MACRO delay_counts
     ;load count into temp2:temp
     ldi temp2, high(@0)
     ldi temp, low(@0)
     DELAY_START:     
          ;decrement the counter pair
          subi	temp, 1            ;1 cycle
          sbci temp2, 0            ;1 cycle

          ;test the counter pair          
          clr  temp3		   ;1 cycle
          cpi  temp, 0             ;1 cycle
          cpc  temp2, temp3 	   ;1 cycle
          brne DELAY_START         ;2 cycles

.ENDMACRO

Say for example one wanted a 7ms delay, one would use it as follows:

;Example of a 7 ms delay
.equ fosc	= 14745600
.equ f_wait     = (fosc*0.007/8)-4      ;7ms, delay routine takes 8 clock cycles, and 4 cycles to initialise

delay_counts f_wait     
//do stuff after the delay

If you look at the delay routine, you will see each iteration takes 7 cycles. However when I scoped my routine I have found it actually takes 8!
Where is the extra clock cycle being introduced?
I have check the cycles for each instructions 3 times already.
Does it preform a NOP after the branch?

Thanks for any help from you gurus!

Just a noob in this crazy world trying to get some electrons to obey me.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I would agree. The loop looks like 7 cycles to me.

Try:

	.MACRO	__DELAY16
.set	__DELAY = (@0*K_CPU)/4000 
	ldi	temp,LOW(__DELAY)
	ldi	temp1,HIGH(__DELAY)
__DELAY_USW_LOOP:
	sbiw	temp,1			;2
	brne	__DELAY_USW_LOOP	;2
	.ENDM

This has a 4 cycle loop with a 2 cycle initialisation.
Subtract 1 cycle for the final fall-through the BRNE.

Quite honestly, you can just calculate the loop count as (F_CPU * @0)/4000000

By the look of my macro, the Atmel assembler can overflow a int32_t. You may prefer using floating point.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No interrupts on? Not using external ram (with a wait state?)

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What do you see when you run/single-step it in the AVR Studio Simulator and observe the stopwatch/cycle-counter?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The code

ldi r24,10
ldi r25,0
top2:
subi r24,1
sbci r25,0
clr r26
cpi r24,0
cpc r25,r26
brne top2

Take 7 clk for the loop, I simulate a mega16

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok using the stop watch verifies it is 7 cycles.

After digging around a little, I was actually looking at my old delay procedure on the scope!! The name was like one character different.

They both used the same constants so when I changed the constants (f_wait).

Oh my I am disappointed in myself. Sorry guys! But I have learnt about the usage of the stopwatch now, that is very handy. Thank you for your great help!

Just a noob in this crazy world trying to get some electrons to obey me.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Your code obviously contains lots of bugs:
- comment says it takes 8 clocks periteration. Serious bug.
- for arg=0 it takes about 2+65535*7+6 clocks
- for arg=1 it takes 2+6 clocks
- for arg=2 it takes 2+7+6 clocks
...
- for arg =0xFFFF it takes 8 clocks
- for negative arguments or over 0xFFFF hell knows what happens.

Summary:
- There is no such natural number N that your code executes N*7.
- only for N=1 the code executes in N*8 clocks.

You must understand this is a macro, not a function. You have no limitations on what code is included and no code/speedpenalty on execution.

I only hope you think it over before hurting yourself badly.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not a single one of those is a "bug". The comment is incorrect (comments aren't bugs no matter how bad they are), but the routine takes N * 7 + 1 cycles to run. The argument of 0 is really an argument of 0x10000 (this is simply a characteristic of this type of delay loop).

Quote:
- for arg =0xFFFF it takes 8 clocks
This is absolutely incorrect. It in fact takes 0xFFFF * 7 + 1 clocks just as it is supposed to.
Quote:
- for negative arguments or over 0xFFFF hell knows what happens.
I the hell know. If a number greater than 16 bits is entered, the upper bits are dropped. And if a negative number is entered it is interpreted as positive.
Quote:
You must understand this is a macro, not a function.
What difference would that make? Certainly you have to use a macro correctly to get the expected results, but doesn't that apply just as much to functions? If you had a function that takes a uint16_t and you send in 1000000, would you not get the same type of behavior as with the macro?

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
The comment is incorrect (comments aren't bugs no matter how bad they are)

Every code executes something. You just need to find the name for it and a proper comment :)

Quote:
Quote:
- for arg =0xFFFF it takes 8 clocks

This is absolutely incorrect. It in fact takes 0xFFFF * 7 + 1 clocks just as it is supposed to.

Sure thing, wrong line.

Quote:
What difference would that make?

Passing this value to a function requires additional rjmp to avoid executing FFFF instead of 0 loops when 0 is passed (code is shorter, but you cannot have 0 iterations). With macro you are free to include any initialization code you like as it is calculated at compile time.
Quote:
If you had a function that takes a uint16_t and you send in 1000000

You mean 0x10000. It is not about the value you pass, but the fact macros contain code which is "executed" by compiler so you can easily write the code which executes in N*7 cycles and put a valid comment.
Anyway, this code executes in N*7+1 cycles except for N=0.
Why does not it use sbiw?

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you Brutte for your concerns, and thank you Koshchi for jumping to my defense ^^.
You both make good points.

I did release I made some critical mistakes in the code (as you guys pointed out integer number of iterations only).

And Brutte if I wanted a delay of zero or a negative delay, then a delay routine is not the tool for the job!

If I need delays longer than 0xffff x Fosc, then I would make the high byte overflow into another high register. Then I would have one delay routine to rule them all!

For anyones reference who wants a simple delay routine this is what I finally ended up with:

;-------------------------------------------------------------------
;@0=COUNT (16-bit value to count to, each count takes 7 clock cycles)
.MACRO mtouch_delay_counts
     ;load count into temp2:temp
     ldi temp2, high(@0)
     ldi temp, low(@0)
     MTOUCH_DELAY_START:     
          ;decrement the counter pair
          subi	temp, 1            ;1 cycle
          sbci temp2, 0            ;1 cycle

          ;test the counter pair          
          clr  temp3		   ;1 cycle
          cpi  temp, 0             ;1 cycle
          cpc  temp2, temp3 	   ;1 cycle
          brne MTOUCH_DELAY_START  ;2 cycles

.ENDMACRO

; Clock oscillator frequency, Hz 
.equ fosc = 14745600				

.equ f_wait = (fosc*0.004/7)    
;4ms, delay routine takes 7 clock cycles

mtouch_delay_counts f_wait

I think there is nothing wrong with that, it scopes fine, you gurus are free to scrutinize it if you want.

Just a noob in this crazy world trying to get some electrons to obey me.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Your routine works fully identical to:

;-------------------------------------------------------------------
;@0=COUNT (16-bit value to count to, each count takes 7 clock cycles)
.MACRO mtouch_delay_counts
     ;load count into temp2:temp
     ldi temp2, high(@0)
     ldi temp, low(@0)
     MTOUCH_DELAY_START:     
          ;decrement the counter pair
          subi   temp, 1            ;1 cycle
          sbci temp2, 0            ;1 cycle

          nop                      ;1 cycle
          nop                      ;1 cycle
          nop                      ;1 cycle
          brne MTOUCH_DELAY_START  ;2 cycles

.ENDMACRO

The zero flag was already set by a subtraction.
So compare after subtraction should be omitted.

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am very interested in the negative delay concept. If you get this to work, please let us all know. :D

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

then a delay routine is not the tool for the job!

I'll bite: For what job(s) >>is<< a delay routine the right tool?

Then expand on the answer to focus in on those job(s) where cycle-accurate delay routines will make my life more enjoyable.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is a perfect tool for development/debugging. You can easily generate a worst case scenario with it, in terms of processing power required. It does not influence the rest of the code (ISRs) as much as other (delaying) methods.

One of the necessary uses is with watchdog debugging. Neither simulator nor OCD has access to the state of this nasty asynchronous beast (hidden states that are a necessity for some freaks). Your code works perfect till temperature drops and WDRF is being set on some parts on some ocasions. You cannot speed up watchdog clock for debugging and you cannot slow down CPU frequency in many cases - simple delay before "wdr" helps. If the code works with delay, it will not trigger WDRF without it.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I am very interested in the negative delay concept.

I think this is the idea of running device backwards.
If timer count is 1000 now, then after 100 ticks delay it has 1100 and after(before) delay=-100 it has 900.

There are some simulators which can run simulation back and forth. If they can do it, then I guess jumping +1 or +100 steps is the same as -1 or -100 steps for them.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Quote:
then a delay routine is not the tool for the job!
I'll bite: For what job(s) >>is<< a delay routine the right tool?
You took the quote out of context. The first part of the statement was:
Quote:
And Brutte if I wanted a delay of zero or a negative delay,
For a delay of 0, you don't need a delay routine, and negative delays are impossible. So the answer to your question is: For jobs that require a positive delay.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

You took the quote out of context.

But I'm still asking the question, as creating a cycle-accurate delay routine seems to be important to OP as well as other posters over the years. It certainly isn't a critical part of any of my apps. I just can't see the justification for the amount of time and effort people dedicate to this.

Now, apps like the cool video stuff of AtomicZombie and others indeed need cycle-counting. But that's different than this general macro discussed here. In a full app I'm likely to have interrupts enabled. Isn't that going to upset these delays? I'll ask it again:

Quote:

For what job(s) >>is<< a delay routine the right tool?

Quote:

Then expand on the answer to focus in on those job(s) where cycle-accurate delay routines will make my life more enjoyable.

From time to time I indeed want to wait some cycles/microseconds. Setting a mux and waiting for things to settle, for example. A slow transistor that takes a bit to switch. Generally I can find something else to do, like put away a received byte or fetch the next for sending. Or do the check on supply V level or other unrelated "tasks".

Brutte gave an example of using it in dev. I don't think I cut my watchdog times that close, and in any case I wouldn't see the need for cycle-accurate for the purpose outlined. The CodeVision delay_ms() and delay_us() are good enough for me. (Ironically the delay_ms() does a WDR making it unsuitable for Brutte's work. lol )

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.