AVR Delay routing using parameter as delay needed

Go To Last Post
65 posts / 0 new

Pages

Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I write routine (label) that is making precision delay of 1us:

 

.equ wus=((16000000/1000000 - 4) / 4); used crystal 1us delay

rjmp WaitUs
sleep

WaitUs:
	ldi  zh, HIGH(wus)	; used crystal 1us delay (high Byte) 1cy
	ldi  zl, Low(wus)       ; used crystal 1us delay (Low Byte)  1cy
WAITUZ:	sbiw  zl, 1		; count down, 2 Clock cycles
        brne  WAITUZ		; Not zero: 2 Clock cycles, zero: 1 Clock cycle
	ret			; back: 4 Clock cycles     

 

 This above code works perfectly for delay of 1us...simulated using AVR-Simulator and i get 1.0us.

 

    

 

 I want to write delay routine that is accepting delay time so that i don't need multiple routine delay for each delay (eg: 1us, 2us...100us etc)...so i write this code that does not work as expected:

 

 

.equ wus=((16000000/1000000 - 4) / 4); used crystal 1us delay

ldi r16, 0x02             ; make 2us delay
rjmp WaitUs
sleep


WaitUs:
	ldi  zh, HIGH(wus)	; used crystal 1us delay (high Byte) 1cy
	ldi  zl, Low(wus)       ; used crystal 1us delay (Low Byte)  1cy
WAITUZ:	sbiw  zl, 1		; count down, 2 Clock cycles
	brne  WAITUZ		; Not zero: 2 Clock cycles, zero: 1 Clock cycle
        dec r16                 ; count down, 1 Clock cycle
        brne WAITUS             ; Not zero: 2 Clock cycles, zero: 1 Clock cycle
        ret			; back: 4 Clock cycles

 

 But i get this delay (insted of 2us):

 

 

 

 Could i get any help how to fix code to have parameter for us delay? I make this .equ wus constant that calculate using used crystal (16MHz in my case) the desired clocks for 1us and then loop as many times as r16 (user defined variable in register) to archieve 2us delay.

 

so using ldi r16, 0x02  i need to get 2us delay, using ldi r16, 0x08 i need to get 8us delay...etc.

 

Thanks

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why don't you use a timer ?

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Because i im writing code for frequency meter and t0 and t1 will be used for counting pulses...so i need precision delay using parameter in us.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

which kind of freq do you look at ?

 

How would you deal with timer overflow ? (ISR will change the time of the delay)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

frequency counter counting from 0.45Hz to 10MHz...but this is another thread...now i im focusing on precision delay routine..any code how to get 1us delay using parameter? I can use push and pop stack pointers to improve above code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

then make a delay loop that always take 16 clk!

and make sure that you also know how long time it take from you clear/read the Timers.(if that is a hole multiply also of 16).

 

And then you should have timer ISR to tell about overflow (and perhaps lose count).

 

 

And a way could be to read the timers 10 times for every 1/10 of the time, and that way make the timers "bigger"

 

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Can you give example codehow to do that? I made code without parameter and it took 16cycles for 1us...that works..but when i add parameter it don't work

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You simply MUST count all instruction cycles used by EVERYTHING used in creating & calling the delay.  Are you doing that?

 

If you want it extremely exact, you also have to say what you count as part of the delay?  Are you including the call & the return?  What happens when an IRQ happens during the delay?

 

Account for your fixed & repetitive timings separately when calculating.  Doubling the repetitive & ignoring the fixed does not give double time!

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Mon. Apr 13, 2020 - 01:55 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What will your microsecond timer be used for, give an example please.

Jim

 

 

FF = PI > S.E.T

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I im finishing low level driver for lcd 16x2...there i need delay of 1us for E pin toggle from hight to low...i know it is not time critical(above code works) but i want to write time precisiin routine...so that i will have it for next projects in the feauture.

As for timer i will rewrite code from this meter and instead 7seg displays i will put lcd 16x2...this removes multiplexing as it is problem when you insert board into analog function generator...and most important...i will learn how it works...there i simulate line by line to see what happends in ave core.
http://danyk.cz/avr_fmetr3_en.html

Later i will modify time base from 1 sec to 0.5sec(to get faster display update
..eg 2readings per second)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It looks like your trying to solve something that is rarely needed in embedded programming, and as you have found is vary difficult to do with any accuracy as you don't know the overhead of calling or looping the function and if interrupts are enabled than that can throw off all your calc's, and will need to be adjusted for different cpu clock speeds makes a universal delay almost impossible!  

Most apps that need a small microsecond delay, seldom need a variable delay, best to use the fixed delay provided by the toolchain. 

 

Millisecond delays, are fairly easy to do, as a few microseconds either way are not critical most of the time, and lend themselves to h/w timer use in a system tick function.

 

Jim

 

 

FF = PI > S.E.T

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

robydream wrote:
to 10MHz
Impossible with a 16 MHz system clock.  Fastest you can get is less than Fclk/2.  Recommended is Fclk/2.5, or 6.4 MHz with a 16 MHz system clock.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I know...16MHz is what i have at home right now...20MHz will be...as it is Fclk/2 so 10MHz is maximum....as it will be used as frequency counter in analog signal generator it will count frequency from 15Hz to 165kHz so it is overkill...but this is not thema of this topic..delay is more important...as it after the stack is initialized it calls LcdInit that uses ms and us delay routing to initialize display, not sei or other interrupts are used at beginning of start program...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

robydream wrote:
LcdInit that uses ms and us delay routing to initialize display, not sei or other interrupts are used at beginning of start program...

but fixed delays will work fine here.

 

 

FF = PI > S.E.T

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I know...there is no reason to get 1.0us or 1.25us both works on lcd...but here is point that i want to learn and make waitus code that accept parameter for feauture projects (so that i don't need to write wait1us, wait2us and so one...)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

robydream wrote:
20MHz will be...as it is Fclk/2 so 10MHz is maximum
As I said, Fclk/2.5 is the highest recommended for reliable operation:

Each half period of the external clock applied must be longer than one system clock cycle to ensure correct sampling. The external clock must be guaranteed to have less than half the system clock frequency (f ExtClk < f clk_I/O /2) given a 50/50% duty cycle. Since the edge detector uses sampling, the maximum frequency of an external clock it can detect is half the sampling frequency (Nyquist sampling theorem). However, due to variation of the system clock frequency and duty cycle caused by Oscillator source (crystal, resonator, and capacitors) tolerances, it is recommended that maximum frequency of an external clock source is less than f clk_I/O /2.5.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Mon. Apr 13, 2020 - 04:23 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

A busy wait that is not screwed up by

interrupts pretty much requires a timer.

For a timer-based busy wait:

Read timer

Calculate end time

Wait for end time.

Without compare match, that is a 4- or 6-cycle loop.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

A busy wait that is not screwed up by

interrupts pretty much requires a timer.

For a timer-based busy wait:

Read timer

Calculate end time

Wait for end time.

Without compare match, that is a 4- or 6-cycle loop.

But then what happens if an interrupt fires just before the end time ?

(I'm assuming here you are polling timer to find out when the end time is reached).

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MrKendo wrote:
But then what happens if an interrupt fires just before the end time ?

the effect is the same as a software timer, although with a h/w timer, you will know how far off you are(by reading the timer value, something a soft timer will never do).

 

Jim

 

 

FF = PI > S.E.T

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

robydream wrote:

 

.equ wus=((16000000/1000000 - 4) / 4); used crystal 1us delay

ldi r16, 0x02             ; make 2us delay
rjmp WaitUs
sleep


WaitUs:
	ldi  zh, HIGH(wus)	; used crystal 1us delay (high Byte) 1cy
	ldi  zl, Low(wus)       ; used crystal 1us delay (Low Byte)  1cy
WAITUZ:	sbiw  zl, 1		; count down, 2 Clock cycles
	brne  WAITUZ		; Not zero: 2 Clock cycles, zero: 1 Clock cycle
        dec r16                 ; count down, 1 Clock cycle
        brne WAITUS             ; Not zero: 2 Clock cycles, zero: 1 Clock cycle
        ret			; back: 4 Clock cycles

 

 But i get this delay (insted of 2us):

 

 

 

 

The cause of your problem is simple.  dec r16 and brne WAITUS both take time to execute.  They add 3 additional cycles of overhead each time through the outer loop, so you need to include those 3 cycles in each loop delay cycle count.  But it's even more involved than this, because there are other instructions that only execute once (ret and the fall-thru of brne WAITUS) so you need to account for those cycle times only once, not each time through the outer loop.  Since the fall-thru is actually 1 cycle faster than taking the branch, you have a total of 3 additional clock cycles that you need to account for one time.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

you have a total of 3 additional clock cycles that you need to account for one time.

Zactly...see my comment in #8 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I try but i im getting every time different timing...so if someone could get a view of the code above and rewrite to make for example 1us 5us and 12us delay

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I try but i im getting every time different timing

What do you mean by that? Isn't different timings what you want?  Go through your code & calculate exactly how many cycles each takes...it is that simple.

 

If you need 537 cycles...make sure your code adds up to exactly 537 cycles

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
Why don't you use a timer ?
robydream wrote:
Because i im writing code for frequency meter and t0 and t1 will be used for counting pulses...so i need precision delay using parameter in us.

Ever think that you should be using a different chip such as a Cortex M0+ for this project instead?

 

More timers, faster clock, etc... means no having to fudge with a software delay or the Fclk/2.5 limitation at all!

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

Last Edited: Mon. Apr 13, 2020 - 10:50 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here is code that i try to archieve 40us delay:

 



.equ wus=((16000000/1000000 - 15+1) / 1); used crystal 1us delay

ldi  r16, 40			; 1 cycle
rjmp WAITUS			; 2 cycles

WAITUS:
        push zh
	push zl			; 2cycle
        ldi zh, HIGH(wus)	; 1cycle
	ldi zl, LOW(wus)	; 1cycle
WAITUZ:
	sbiw zl, 1		; 2cycle
	brne WAITUZ		; non zero: 2cycle, zero: 1cycle
        pop zh
        pop zl			; 2cycle
	dec r16			; 1 cycle
	brne WAITUS		; non zero: 2cycle, zero: 1cycle
	ret			; back 4cycle   

 

 When i run simulation i get 40.3125us delay (instead of desired 40.000us)...then i try to get 10us delay i get 10.3125us delay, then i try to get 1us delay i get 1.3125us....

 

As for clock of 16MHz i get 5clocks more that i need (so there is 0.3125us more delay that i need)...can you see the code and make correction where i im doing wrong?

Last Edited: Mon. Apr 13, 2020 - 09:53 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

can you see the code and make correction where i im doing wrong?

No, did you bother to count up how many cycles the code will take?  

Request 1 integer less than you do now (if requesting 157, request 156 instead)...add in enough nops to fill in the fractional remainder to give you the exact time.  Then  all of your requests will be correct. 

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
.equ wus=((16000000/1000000 - 15+1) / 1); used crystal 1us delay 

ldi  r16, 40			; 1 cycle
rjmp WAITUS			; 2 cycles

WAITUS:
        push zh                 ; 2cycles
	push zl			; 2cycles
        ldi zh, HIGH(wus)	; 1cycle
	ldi zl, LOW(wus)	; 1cycle
WAITUZ:
	sbiw zl, 1		; 2cycle
	brne WAITUZ		; non zero: 2cycle, zero: 1cycle
        pop zh                  ; 2cycle
        pop zl			; 2cycle
	dec r16			; 1 cycle
	brne WAITUS		; non zero: 2cycle, zero: 1cycle
	ret			; back 4cycle

 

 Cycles Calulation:

 

    8: rjmp, push, push, ldi, ldi

    1: 1 clock cycle per count - 1

    9: 3 clock cycles for last count

    4: ret

    = 1 * Z + 8 - 1 + 9 + 4

    = 1 * Z + 20

    Z = (clock cycles - 20) / 1

 

  So i will define at beginning or program:

.equ wus=((16000000/1000000 - 20) / 1); used crystal 1us delay

  And i get endlees loop beacuse Z - 1 must NOT be below 15-1 (remember 16cycles at 16MHz is 1us), and when i put -15 i get 1,3125us delay so i need to get 5cycles lower to get into 1.00us...but this is not possible...can you show me your code? I try to count above cycles, and it is correct but Z cannot be below 0(negative number because of endless loop)...so i don't know where to start digging to fix this problem....

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why in the world don't you simulate this ? (in studio7 it would take less than 10 min to setup and run).

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for replaying...but i im learning AVR asm delay functions so i made this from beginning:

 

.equ wus=((4*16000000/1000000 - 4)/4); used crystal 1us delay


rjmp WaitUs			; 2 cycles
sleep				; Stop here To see how many cycles we have

WaitUs:
	ldi  zh, HIGH(wus)	; used crystal 1us delay (upper nibble) - 1 cycle
	ldi  zl, Low(wus)       ; used crystal 1us delay (lower nibble) - 1 cycle
WAITUZ:	sbiw  zl, 1		; count down - 2 Clock cycles
	brne  WAITUZ		; Non zero: 2 Clock cycles, zero: 1 Clock cycle
	ret			; back: 4 Clock cycles

 

 This code works very good..without error...i put 40us, 4us, 1us and every time i get right delay...i start from beginning and i got this code..it works very nice....the problem you have to have very long to setup un AVR IDE Studio is this way that i did not correctly put stack pointer (push, pop), this is my next learning asm to study...and then i need to add parameter...i put parameter in above loop as you can see 4* means 4us delay...try this but add at the beginning code init stack pointer and choose right crystal (16MHz)...it works but when it is returning from ret it drops message about stack pointer not initialized....so i im studying where i need to put it and change above formula to be compatibile with push and pop.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rjmp then ret,

that's not right is it, shiould be call then ret ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes you are right...i im reading atmel avr pdf of mnemonics and see that it must go rjmp...this takes 1cycle more that rcall (3cycles vs 2cycles) so above code using rjmp must be:

 

.equ wus=((1*16000000/1000000 - 11+7)/4); used crystal 1us delay 

 I will keep you posted when i got stack pointer used...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

robydream wrote:
the problem (sic) you have to have very long to setup un AVR IDE Studio is this way that i did not correctly put stack pointer (push, pop)

The whole point of programming in assembler is precisely that you are entirely responsible for everything!

 

If you consider it a "problem" that you have to spend your time setting up the stack pointer, managing pushes & pops, etc, etc - then that's exactly why we have languages like 'C' and C++

 

laugh

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MrKendo wrote:
But then what happens if an interrupt fires just before the end time ?

(I'm assuming here you are polling timer to find out when the end time is reached).

Emphasis added.

In that case, the busy wait will end a little late.

Timers keep running during interrupt-handling:

Interrupts have to be strategically timed to mess up a timer-based busy wait.

Usually a timer-based busy wait need not interfere with other timer-using tasks.

All you need is a timer that ticks at CPU speed and to which no task assigns.

 

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Make sure you set SPH/SPL so you can do rcall (or possibly call) & ret, they use the stack. This should be the first lines of code for any project!

 

Your delay routine does NOT need to use push and pop, unless you want to do so. Using pus/pop would allow the delay to be called from anywhere at any time, without worrying about registers (variables) being effected.  

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok..i finally got code that works using stack pointer (does not use push and pop) but it is ok, i don't need it as you say in this application:

 

.equ wus=((1*16000000/1000000 - 8)/4); used crystal 1us delay

.def tmp1=r16					; temporary register

	.CSEG
.ORG	0x0000	

Reset:
	ldi		tmp1, Low(RAMEND)		; initialize stack pointer
	out		SPL, tmp1
	ldi		tmp1, High(RAMEND)
	out		SPH, tmp1

	rcall WaitUs					; 3 cycles

MAIN:
	sleep						; Stop here To see how many cycles we have
	rjmp MAIN

WaitUs:
	ldi  zh, HIGH(wus)		; used crystal 1us delay (upper nibble) - 1 cycle
	ldi  zl, Low(wus)       ; used crystal 1us delay (lower nibble) - 1 cycle
WAITUZ:	sbiw  zl, 1				; count down - 2 Clock cycles
	brne  WAITUZ			; Non zero: 2 Clock cycles, zero: 1 Clock cycle
	ret						; back: 4 Clock cycles

    Now the interesting thing that i need to do is how to set above variable wus to accept first argument? In above 1* means 1 us delay, for example i would like to set it using asm code like this:

 

   

ldi tmp1, 40    ; set variable 40us delay
rcall WaitUs    ; call WaitUs routine

    So above variavle wus will be from 1* to 40* and all will be automated... can i use .equ variable to insert first parameter? If yes can i see some example code?

 

   Thanks

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

BTW a group of eight bits that a processor handles

as a unit is generally called a byte, not a nibble.

Half such a byte is often called a nybble.

 

Though its rare, a byte need not be eight bits.

That is why some standards refer to octets.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
Half such a byte is often called a nybble.

also often spelled "nibble"

 

https://en.wikipedia.org/wiki/Nibble

 

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
ldi tmp1, 40    ; set variable 40us delay

NO, you need to load ZH & ZL,with your desired amount, then call your delay

 

sbiw ZH:ZL   is preferred over sbiw ZL (in fact I'm not 100% sure your notation is proper, though it prob is....that is not the way the command is described by Atmel).  

 

from command help file:

Example:

sbiw r25:r24,1 ; Subtract 1 from r25:r24

sbiw YH:YL,63  ; Subtract 63 from the Y pointer(r29:r28)

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

???

Studio7 have the AVR assembler build in, run the code directly from there, so you don't need all these questions about why the clk cycles don't match!

    

Last Edited: Tue. Apr 14, 2020 - 09:23 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Another option- just put in what you want into a c compiler using the values you need via _delay_nn macros, copy the resulting few lines, place then inline into your code.

 

You can use an online compiler to do the job-

https://godbolt.org/z/uPhw4B

 

When you need 40us, 37us, 150us, 1us, just plug in the F_CPU, delay time needed, copy the few lines and you are done. No call/return, no rcall vs call, no counting instructions, etc. 

 

These things are small enough so they can be placed inline without any trouble, and you typically will not need many of them per project.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
.equ wus=((1*16000000/1000000 - 15)); used crystal 1us delay

.def tmp1=r16 ; temporary register

.CSEG
.ORG 0x0000

Reset:
         ldi tmp1, Low(RAMEND) ; initialize stack pointer
         out SPL, tmp1
         ldi tmp1, High(RAMEND)
         out SPH, tmp1

         ldi tmp1, 5 ; max 255us delay - i choose here 5us delay
         rcall WaitUs ; 3 cycles

MAIN:
         sleep ; Stop here To see how many cycles we have
         rjmp MAIN

WaitUs:
        adiw  ZH:ZL, wus+1
        WAITUZ: sbiw  zl, 1 ; count down - 2 Clock cycles
        brne  WAITUZ ; Non zero: 2 Clock cycles, zero: 1 Clock cycle
        dec   tmp1      ;  1 Clock cycle
        brne  WaitUs  ; Non zero: 2 Clock cycles, zero: 1 Clock cycle
        ret ; back: 4 Clock cycles

When i run 1us delay i need to remove +1 from wus in adiw, and when i run 2us to 255 i need to add +1 to wus in adiw. The problem is that i im missing here are the calculated cycles for 16MHz and simulated (error marked here):

1us=>16cycle get (14)         error: 2cycle missing
2us=>32cycle get (30)       error: 2cycle missing
3us=>48cycle get (42)       error: 6cycle missing
4us=>64cycle get (54)       error: 10cycle missing
5us=>80cycle get (66)       error: 14cycle missing

 

As you can see i im missing 4cycles when using 2us to 255us delay...so how to fix this? I know that counting cycles are missing and how to calculate this but i im not sure about formula because using above .equ wus formula i im missing 4 cycles constantly...could someone rewrite code to return? I have tried in simulator but have no clue how to fix this.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

what a mess!

 

hint (I'm not writhing your code, I would still say use a timer you have 3 of them). make so it take 16 clk with a value of 1 , and no loop (breq forward) and add nop so it takes 16 clk.

then make a loop that take 16 clk.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This seems to work.  Doesn't use any registers, other than the passed-in us count.  7 instructions (one more than your looping version.)
It'll need to be adjusted for other clock frequencies...

 

;;us delay: enter with us Count (1-255) in R24
;; the call and return time is included as part of the delay.
usdelay:
;; one us is 16 cycles at 16MHz
   ;; 3 cycles for call
   rcall cycdelay7  ;  10 cycles
   subi r24,1		;   11 cycles
   brne delayp6     ;    12 cycles if not taken
   ret              ;     16 cycles
delayp6:
   ;; Extra cycles for each loop that doesn't incldude call/return
   ;;   but does include an extra "successful jump" cycle.
   ;; this'll be at 10 cycles without the call.  
   rjmp PC+1		;11, 12		            
   rjmp PC+1		; 13, 14
   rjmp usdelay     ;  15, 16
  
;; a call to a routine containing only a ret takes 7 cycles.
;; (normally, stick this on any convenient "ret" instruction.
;;  separate out here, for clarity.)
cycdelay7: ret

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The ldi or mov to load r24 costs you a cycle, so replace one of the rjmp PC+1 with nop.  Won't work for 1us, though.  To cover that case you'd need to replace the rcall cycdelay7 with 3 x rjmp PC+1 instead.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here's how I did cycle-accurate delays in one project:

tmp  = 16	; Scratch register
.macro delay_c cycles								;
  ; Fail on negative.								;
  .if \cycles < 0								;
	.error "delay_c called with a negative cycle count"			;
  ; Fail on too many cycles.							;
  .elseif \cycles > (((3 * 256) + 7) + 2)					;
	.error "delay_c called with too many cycles"				;
  ; Use subroutine for long delays.						;
  .elseif \cycles >= 10								;
	; Value of 256 passed as 0.						;
	ldi	tmp, ((\cycles - 7) / 3) % 256					; 1
	rcall	delay_loop							; 3+
    ; Extra one or two cycles not possible with subroutine's 3-cycle loop.	;
    .if ((\cycles - 10) % 3) == 2						;
	rjmp	.+0								; 2
    .elseif ((\cycles - 10) % 3) == 1						;
	nop									; 1
    .endif									;
  ; Too short for subroutine, inline loop instead.				;
  .elseif \cycles >= 9								;
	ldi	tmp, \cycles / 3						; 1
0:	dec	tmp								; 1
	brne	0b								; 1/2
    ; Extra 1 or two cycles not possible with 3-cycle loop.			;
    .if (\cycles % 3) == 2							;
	rjmp	.+0								; 1
    .elseif (\cycles % 3) == 1							;
	nop									; 1
    .endif									;
  ; Short delays can be done in the same or fewer words, without clobbers.	;
  .else										;
    .if \cycles >= 8								;
	rjmp	.+0								; 2
    .endif									;
    .if \cycles >= 6								;
	rjmp	.+0								; 2
    .endif									;
    .if \cycles >= 4								;
	rjmp	.+0								; 2
    .endif									;
    .if \cycles >= 2								;
	rjmp	.+0								; 2
    .endif									;
    ; Odd cycle not possible with rjmp.						;
    .if (\cycles % 2) == 1							;
	nop									; 1
    .endif									;
  .endif									;
.endm										;
delay_loop:									;
	dec	tmp								; 1
	brne	delay_loop							; 1/2
	ret									; 4

Use like this:

	delay_c	46

I used it in an app running at 8.25 MHz, so the above resulted in a 5.818 µs delay.

 

Most it can do is 777 cycles, which at 16 MHz is 48.5625 µs.

 

EDIT:  sp

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Thu. Apr 16, 2020 - 03:17 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here's a tweaked version of Bill's code (clobbers r0):

dlyrpt:

lpm

lpm

usdelay;

rcall cycdelay7

dec r24

brne dlyrpt

cycdelay7:

ret

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I hope that Z don't point at anything where a read has a side effect.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:

I hope that Z don't point at anything where a read has a side effect.

 

LPM is Load Program Memory (flash), not LD, which reads from SRAM.  SRAM reads on traditional AVRs can have side-effects, but not flash.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Point take that is only a problem on the newer once. (and we all assume that a read from undefined flash just is remapped).

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:

Point take that is only a problem on the newer once. (and we all assume that a read from undefined flash just is remapped).

 

I don't assume that.  Not only have I read about other people's tests, I've tested flash address wraparound on several tiny/mega AVRs.  It's also well documented (search for " --pmem-wrap-around ").

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

Pages