Skip "X" MCU cycles

Go To Last Post
42 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

 

I want to make a function, that, depending on a variable "X" skips from 1 to 22 MCU cycles, thus precisely controlling a toggling of an output;

 

I suppose it is something like

 

PORTD_OUTCLR = 2;
asm("rjmp x");
asm("nop");asm("nop");asm("nop");asm("nop");asm("nop");asm("nop");asm("nop");    // 21 times
PORTD_OUTSET = 2;

 

Even when I try:

asm("rjmp 5");

I get error: "offset too large for rcall or rjmp".

I use XMEGA 256 A3U and Imagecraft.

 

Any ideas how should I write this and how can I make the raltive jump depend on a variable / register, not a constant?

This topic has a solution.
Last Edited: Fri. Feb 19, 2016 - 03:13 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you tell us more about the need for this "interesting" requirement, I'd be more intrigued.

 

Duff's Device comes to mind, but your requirements make it tough...it would take a few cycles to set up.

 

pnv_Creator wrote:
I get error: "offset too large for rcall or rjmp".

So check your toolchains documentation for the syntax and operands for RJMP.  I'm guessing it takes 5 as an absolute address. .+ perhaps?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pnv_Creator wrote:

I want to make a function, that, depending on a variable "X" skips from 1...

 

Not possible.

 

You'll never achieve your lower limit of 1 cycle.

 

Why not use the hardware at your disposal? ie timers?

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

Last Edited: Thu. Feb 18, 2016 - 05:01 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

set port bit
load z with 'table_end'
subtract function call parameter from z
IJMP
NOP
NOP
...
NOP
table_end: clear port bit

 

 

so z points to instruction to clear the port bit and has the incoming parameter subtracted from it. An input of 0 leaves Z alone and so the IJMP will go straight to the clear instruction. Any larger input will cause the IJMP to end up earlier in the sequence of NOPs.

 

However, your minimum width is going to be around 5 or 6 cycles.

 

As I said, use hardware.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

Last Edited: Thu. Feb 18, 2016 - 05:12 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry, this is on the XMEGA E5 MCU, but the prolem is the same as described above.

 

According to xmega docu:

RJMP k

PC <- PC + k + 1

No flags, 2 mcu cycles

 

Basically, I need to periodically make a very fast and precise toggling of an output.

The duration depends on an ADC signal (so it is kind of a feedback pulse).

Usually, I would do that with output compare function of the timers, it was done so and it worked great.

Now I have to do it without any timers, because I already use the three timers to control 8 other signals in similar precise manner.

 

So, no timers, the task is toggling a pin on a precise number of cycles, which is dynamically changed by the other part of the program.

 

It is similar to:

while (x--);

but of course one cycle is equal to several MCU cycles and I cannot make precise steps:

 

ldi R20,5              //this is x = 5;

L181:
L182:
    .dbline 28           //this is the while (x--)
    mov R2,R20
    subi R20,1
    tst R2
    brne L181
X3:
    .dbline 29
    ldi R24,2                  // this is the PORTD_OUTSET
    sts 1637,R24

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brian Fairchild wrote:

pnv_Creator wrote:

I want to make a function, that, depending on a variable "X" skips from 1...

 

Not possible.

 

You'll never achieve your lower limit of 1 cycle.

 

Why not use the hardware at your disposal? ie timers?

I am ok to have a minimum length of that. I know it takes some minimum instructions to do what I need.

Sorry, I should have explained from the start why I cannot use timers. All are needed for OC control, TCC4 controls 4 other outputs, TCC5 - 2 others, TCD5 - 2 others. It is a complex system, now we need to control 1 more output.

 

I will test your suggestion tomorrow, it seems logical.

Last Edited: Thu. Feb 18, 2016 - 05:25 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pnv_Creator wrote:
Sorry, this is on the XMEGA E5 MCU, but the prolem is the same as described above. According to xmega docu: RJMP k ...

The Xmega doc has nothing to do with the syntax that >>your chosen toolchain<< expects.

 

pnv_Creator wrote:
Now I have to do it without any timers, because I already use the three timers to control 8 other signals in similar precise manner.

And how many timer channels are there on an E5?

(and there is also port DMA...)

 

And these n channels are simultaneous?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Thu. Feb 18, 2016 - 05:26 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you actually tried _delay_us() ? The parameter can be a float (but NOT variable) value so you can do things like _delay_us(0.1) and it should create a cycle accurate delay as close to your target as possible. Making this variable could take a little more thought though.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, I understand about the ICC syntax, I will double check on it.

Right, there are 3 timers, I use them all,  control 8 OC outputs with them. All are simultaneous, the program should be jitter free, the timings should have 0 MCU cycles error.

I really can't use them, cannot stop and start them for that part and so on. The whole code runs on cycles, which can be 25 us short, so I can also barely afford any additional entering and exiting in routines.

It is Hell.

 

I cannot figure out how to use the DMA in this case.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My guess is that what OP wants is a pulse from 1 to 22 cycles.

Not hard.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pnv_Creator wrote:
All are simultaneous,

Sounds like an ideal job for port DMA to me...  [not an Xmega person (and why not that forum???), but IIRC the DMA can be "clocked" at /1 and no timer need be involved]

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
_delay_us(0.1)

 

It should be something like: _delay_us(0.03125)

I suppose it will do something like a loop, which will not give the possibility for 1 cycle control.

Thanks, though, I will take a look.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

pnv_Creator wrote:
All are simultaneous,

Sounds like an ideal job for port DMA to me...  [not an Xmega person (and why not that forum???), but IIRC the DMA can be "clocked" at /1 and no timer need be involved]

 

Sorry, wrong forum indeed, I apologise - if the admins can move it, it will be great.

 

All timers are simultaneously running and synchronized, not all signals are simultaneously changed. All signals are independantly controlled within a cycle and their control polarity and timing might differ from a cycle to cycle.

Port DMA, triggered by timer events, instead of OC, could be possible, I am not sure, though, and I cannot share the whole picture due to NDA and specifics and it will take tons of time, so it is a dead end for that conversation, I think.

I think it will be best if we stick to the "no timers available" part. Thanks for the great suggestion, I don't mean do discard it or be disrespectful, it is just not suitable for me.

Last Edited: Thu. Feb 18, 2016 - 05:54 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pnv_Creator wrote:
I am not sure, though, and I cannot share the whole picture due to NDA and specifics and it will take tons of time, so it is a dead end for that conversation, I think. I think it will be best if we stick to the "no timers available" part.

So let me get this straight:  You and yours have invented something truly wonderful, but can't tell us what it is.  (just like the free-energy generator/motor threads a bit back?)

 

Fair enough.  But then you come to the public forum and put us to an impossible task.

 

You want one-cycle resolution, yet you will explore a wrapper around delay_us?

 

In answer to my question, it was "simultaneous".  But now that was rejected.

 

A very sophisticated situation, yet no attempt to actually explore e.g. RJMP syntax?

 

And if n channels are now working great, the $1 solution of an Xmega with more timers hasn't even been thought of?

 

I guess I'm just in the midst of winter depression.  But the days are getting longer so perhaps soon I will no longer be the Surly Curmudgeon (tm).

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pnv_Creator wrote:
I suppose it will do something like a loop, which will not give the possibility for 1 cycle control.

You are wrong - the GCC maintainers went to some length to implement __builtin_avr_delay_cycles(). In fact, now that I think about it, forget _delay_us() - which depends on float and an F_CPU calculation and simply call __buitlin_avr_delay_cycles(N) with N=1..n where it is the number of cycles you want to delay.

 

$ cat xmega.c
#include <avr/io.h>

int main(void) {
    __builtin_avr_delay_cycles(13);
    asm("nop");
    while (1) {
    }
}

$ avr-gcc -mmcu=atxmega128a1u -Os xmega.c -o xmega.elf
$ avr-objdump -S xmega.elf

xmega.elf:     file format elf32-avr


Disassembly of section .text:

00000000 <__vectors>:
   0:	0c 94 00 01 	jmp	0x200	; 0x200 <__ctors_end>
             [HUGE SNIP!]

00000200 <__ctors_end>:
 200:	11 24       	eor	r1, r1
 202:	1f be       	out	0x3f, r1	; 63
 204:	cf ef       	ldi	r28, 0xFF	; 255
 206:	df e3       	ldi	r29, 0x3F	; 63
 208:	de bf       	out	0x3e, r29	; 62
 20a:	cd bf       	out	0x3d, r28	; 61
 20c:	00 e0       	ldi	r16, 0x00	; 0
 20e:	0c bf       	out	0x3c, r16	; 60
 210:	18 be       	out	0x38, r1	; 56
 212:	19 be       	out	0x39, r1	; 57
 214:	1a be       	out	0x3a, r1	; 58
 216:	1b be       	out	0x3b, r1	; 59
 218:	0e 94 12 01 	call	0x224	; 0x224 <main>
 21c:	0c 94 18 01 	jmp	0x230	; 0x230 <_exit>

00000220 <__bad_interrupt>:
 220:	0c 94 00 00 	jmp	0	; 0x0 <__vectors>

00000224 <main>:
 224:	84 e0       	ldi	r24, 0x04	; 4
 226:	8a 95       	dec	r24
 228:	f1 f7       	brne	.-4      	; 0x226 <main+0x2>
 22a:	00 00       	nop
 22c:	00 00       	nop
 22e:	ff cf       	rjmp	.-2      	; 0x22e <main+0xa>

00000230 <_exit>:
 230:	f8 94       	cli

00000232 <__stop_program>:
 232:	ff cf       	rjmp	.-2      	; 0x232 <__stop_program>

I used asm("nop") as a marker for where the code ends but didn't realise that __builtin_avr_delay_cycles() would end with a 1 cycle balancing "nop" anyway.

 

Well anyway, I asked for 13 cycles. It gave me:

 

LDI - 1 cycle

DEC - 1 cycle (times 4)

BRNE - 1/2 cycles (2 cycles times 3 and 1 cycle times 1 = 7 cycles total)

NOP - 1 cycle

 

So in answer to my request for 13 cycles I got 1 + 4 + 7 + 1 = 13. I am happy!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brian Fairchild wrote:

 

set port bit
load z with 'table_end'
subtract function call parameter from z
IJMP
NOP
NOP
...
NOP
table_end: clear port bit

 

 

so z points to instruction to clear the port bit and has the incoming parameter subtracted from it. An input of 0 leaves Z alone and so the IJMP will go straight to the clear instruction. Any larger input will cause the IJMP to end up earlier in the sequence of NOPs.

 

However, your minimum width is going to be around 5 or 6 cycles.

 

As I said, use hardware.

"Quote Selected" didn't work.

 

If you change the sequence to

 

Set up Z

Set port bit

IJMP

 

Then you are down to the two cycles for the IJMP, plus one more.  Not too bad.

 

All this assumes the port is mapped to the low I/O, right?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If flash isn't overly tight, then one could set up a subroutine for each length, and the prep work decides which to call.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Clawson:  OP said toolchain is IMAGECRAFT, so gcc specific solutions will not work for him, unless he's willing to change and has the time to port app, learn new tools...ect..

 

 

 

 

 

 

 

 

 

(Possum Lodge oath) Quando omni flunkus, moritati.

"I thought growing old would take longer"

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It sounds like a commercial product, so I guess overclocking the Xmega to get more clock cycles for the overhead, etc., isn't possible.

 

If, however, the device is running in an isothermal environment, and you are going to qualify each device built, you could still consider it.

Note that the EEPROM functions, etc., don't work well when overclocked.

The port I/O and T/C, etc., work fine, in my testing, up to 48 MHz, on "several" devices.

If you are using the ADC for input, you might consider using a non-overclocked uC for the ADC input, and feed the data digitally, serially, tot eh overclocked unit.

 

IIRC there are some name brand DSO's that significantly overclock the front end, but then again each unit is individually qualified.

 

I know lots of the regulars hate discussions involving multiple processors to do a "single" task.

But I've done several such projects and at the end of the day if you have a working system, in spec, the customer doesn't care how many chips are inside the box.

Cost may be an issue, but in my mind one has to have a working system before one starts to scrimp on parts.

It can be easier to select a bigger/faster single chip to get a job done, BUT one can view the little micros these days as nothing more than a smart peripheral, feeding the main micro.

 

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

Then you are down to the two cycles for the IJMP, plus one more.  Not too bad.

 

Quick proof of concept using Codevision...

 

void pulse(char width)
{
    #asm
    LDI R30,LOW(END)
    LDI R31,HIGH(END)
    SUB R30,R26
    CBI 0x2,2
	IJMP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    END:
    SBI 0x2,2
    #endasm
}

 

 

Calling as pulse(0) give a low pulse of 4 cycles, pulse(5) gives 9 cycles as expected.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Should say this is on a mega1284P as I don;t do xmega.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brian Fairchild wrote:
Calling as pulse(0) give a low pulse of 4 cycles, pulse(5) gives 9 cycles as expected.

On AVR8, yes -- SBI is two cycles. (one cycle on Xmega.  But as mentioned, only if the port is mapped low (to VPORTx)).

 

Try this one:  ;)

void pulse(char width)
{
    #asm
    LDI R30,LOW(END)
    LDI R31,HIGH(END)
    SUB R30,R26
    LDI R16, 0x02
    OUT 0x00,R16
	IJMP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    END:
    OUT 0x00,R16
    #endasm
}

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Thu. Feb 18, 2016 - 07:47 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
My guess is that what OP wants is a pulse from 1 to 22 cycles

My guess is that the OP should have used an FPGA from the outset here.

Doing deterministic timing on a uP is always tough - especially if that uP is also using interrupts. He still hasn't told us what his requirements are. He's given us a small part of a spec, but a spec is not the same as a requirement!

 

SpiderKenny
@spiderelectron
www.spider-e.com

 

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

This will likely do what OP wants.

    ; void fred(uint8_t cyclesnum) ;
    ; Generate a 1 to 22 cycle positive pulse
    ; on assembled-in port pin as specified by cyclesnum.
    ; The time at which the pulse starts does not depend on cyclesnum.
    ; Port pin assumed to start low.

    ; An invalid cyclesnum will not trash other data.
    ; It might or might not cause a pulse.

    .global fred
    fred:
    LDI R30, lo8(pm(1f))
    LDI R31, hi8(pm(1f))
    ANDI R24, 0x1F  ; R24=cyclesnum
    SUB R30, R24
    SUBC R31, R1  ; R1==0
    IJMP   ; accidentally deleted this during the massive edit
    .repeat 0x1F
      SBI --- ; 1 cycle on an xmega
    .endr
    1: ; IJMP here if cyclesnum is zero
    CBI ---
    RET

--- has been left as an exercise for the reader.

Iluvatar is the better part of Valar.

Last Edited: Mon. Feb 22, 2016 - 03:41 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As we are talking about the E5, there is an extra "hidden" timer in the XCL module. Perhaps that can be of use. Provided that the XCL hasn't been used for something else already.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Svuppe wrote:
As we are talking about the E5, there is an extra "hidden" timer in the XCL module. Perhaps that can be of use. Provided that the XCL hasn't been used for something else already.

 

The XCL module could definitely be useful here.

Note: Timer compare outputs are available only on PD2 and PD3.

 

Here are some of my findings after using the XCL: http://nickdademo.blogspot.sg/20...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

Try this one:  ;)

 

Neat, gets the minimum pulse width down to 3 clocks.

 

The real problem with all of these is that whilst we can get the minimum pulse width down to a few cycles, the time between pulses is going to be quote long

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brian Fairchild wrote:
The real problem with all of these is that whilst we can get the minimum pulse width down to a few cycles, the time between pulses is going to be quote long

 

Probably not any more overhead "per channel" than on the other "channels" calculating timer values.

 

Brian Fairchild wrote:
Neat, gets the minimum pulse width down to 3 clocks.

 

skeeve showed us how to get 0-n cycle pulse width, by using a string of one-cycle bit-set instructions, instead of the NOPs.  Gotta vote on that. ;)

 

In that vein, I need to apologize for my Surly Curmudgeon rant that chased OP away.  skeeve demonstrated that it is not an impossible task.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Fri. Feb 19, 2016 - 02:48 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello again,

 

Thanks all, your suggestion were incredibly interesting, I didn't expect so many ways to solve it.

 

Theusch, you didn't chase me away, you were actually helpful. I always share the whole picture, when I can, I understand that it is lame to put a specific problem out of context.

I also know, that I have a lot to learn and I am a newbie around gurus on this forum, though I have some experience.

 

Anyhow, I tried implementing the Brian/theusch solution (though XCL timer is very interesting idea). I will definitely test skeeve's, it seems perfect.

I have some difficulties with getting the address, it seems not to work even without the SUB instruction, I still can't figure why.

The SBI and CBI work like a charm, I toggle the pin for 1 cycle - VPORT 0x0014 is v PORT C, and 0x0015 is the OUT.

The error seems to be in the first two lines.

 

    LDI R30,< END
    LDI R31,> END
    SUB R30,R16

    SBI 0x15,5
	IJMP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    END:
    CBI 0x15,5
  ret

 

 

Last Edited: Fri. Feb 19, 2016 - 03:23 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pnv_Creator wrote:
, it seems not to work even without the SUB instruction, I still can't figure why.
Probably a factor of 2 on the label. As "END:" is a flash address it could either be that the symbol has a byte address but IJMP expects a word address in Z or conversely it could be that END: has a word address and IJMP wants a byte address. I can't remember which. So anyway try replacing "< END" etc with either "< (END * 2)" or "< (END / 2)" and see if things improve. The simulator is great for testing this kind of thing - look at Z, follow the IJMP, find out where you land!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

IJMP does word addresses.

Whether a flash label is word or byte depends on the assembler.

For avr-as, 'tis byte.

For Atmel, 'tis word.

Don't know Imagecraft.

For avr-as, use pm(label) to get a word address.

 

My code relies on the avr-gnu ABI.

It has no need to save and restore registers.

It expects cyclesnum to be in R24.

It expects R1 to hold zero.

 

Note that after the NEG instruction, R30 holds 0x100-cyclesnum.

I will soon change comments on my code.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
skeeve showed us how to get 0-n cycle pulse width, by using a string of one-cycle bit-set instructions, instead of the NOPs.  Gotta vote on that. ;)
Actually 1-n.  I've corrected my comments.

OUT instructions would also have worked, but required more set up.

0-n would not have been a problem.

I just got uselessly cute with my arithmetic.

 

Now that I think of it, cyclesnum=0 would cause an out-of-function IJMP.

Fix shortly.

Iluvatar is the better part of Valar.

Last Edited: Fri. Feb 19, 2016 - 06:05 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
Now that I think of it, cyclesnum=0 would cause an out-of-function IJMP.

It is up to OP what action to take with parameter of zero, and how important squeezing out the last cycle is.  But at some point, there should be a check for <1 and >n and action taken for robustness.  Zero could be to jump to RET, or force 1 into parameter, or similar.  Same with >n.  Can be done before the invocation of the "driver", or within.  I'd tend to vote for within, as that's where the NUM_CYCLES loop is unrolled to the SBI or OUT.  Then the test and the unrolling can use the same value so if one changes they all change.  Robustness again.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
skeeve wrote:
Now that I think of it, cyclesnum=0 would cause an out-of-function IJMP.

It is up to OP what action to take with parameter of zero, and how important squeezing out the last cycle is.  But at some point, there should be a check for <1 and >n and action taken for robustness.  Zero could be to jump to RET, or force 1 into parameter, or similar.  Same with >n.  Can be done before the invocation of the "driver", or within.  I'd tend to vote for within, as that's where the NUM_CYCLES loop is unrolled to the SBI or OUT.  Then the test and the unrolling can use the same value so if one changes they all change.  Robustness again.

My criterion for invalid values is that they should not be disastrous.

Fixing my arithmetic took care of zero.

ANDI and a 0x20-word target took care of >22 .

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

This will likely do what OP wants.

    ; void fred(uint8_t cyclesnum) ;
    ; Generate a 1 to 22 cycle positive pulse
    ; on assembled-in port pin as specified by cyclesnum.
    ; The time at which the pulse starts does not depend on cyclesnum.
    ; Port pin assumed to start low.

    ; An invalid cyclesnum will not trash other data.
    ; It might or might not cause a pulse.

    .global fred
    fred:
    LDI R30, lo8(pm(1f))
    LDI R31, hi8(pm(1f))
    ANDI R24, 0x1F  ; R24=cyclesnum
    SUB R30, R24
    SUBC R31, R1  ; R1==0
    .repeat 0x1F
      SBI --- ; 1 cycle on an xmega
    .endr
    1: ; IJMP here if cyclesnum is zero
    CBI ---
    RET

--- has been left as an exercise for the reader.

 

Skeeve, man, you are so awesome, so are the rest of the guys here with nice ideas.

I tested the code, it works, with two changes (thanks to clawson here for the more important one).

 

 1) you forgot the IJMP, you might edit that, since it is marked as solution and someone can get confused :)

 

Some imagecraft specifics if someone is interested:

 2) when I load the address, I need to right shift it (LSR) - it seems that for ImageCraft the address is in bytes and the IJMP in words. I just have to figure out how not to lose the carry without too much code, but optimization is not so important here, only precision

I cannot use simple division, relying on preprocessor, it seems, I get an error "Absolute expression expected" when I try to do (JMPADDRS/2)

 

  -"<" and ">" are used to get lsb/msb of an address, as specified in the ICC help

 

 - First argument on function call is passed in R16, instead of R24, as specified in the ICC help

 

 - .repeat/.endr - I don't know how to do it, but it might be possible (with another syntax); Good old copy/paste gets that done... 

 

 - R1 seems to be zero, I couldn't find documentation regarding that, but it works; I tried to load it with 0 just in case, I can't, no access ("register is not valid")

 

 - R30, R31 "can be used in a function without being saved or restored", they are considered volatile registers for ICC as well, so no problems there

Last Edited: Mon. Feb 22, 2016 - 10:21 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pnv_Creator wrote:
- R1 seems to be zero, I couldn't find documentation regarding that, but it works; I tried to load it with 0 just in case, I can't, no access ("register is not valid")

Michael wrote the code under the mistaken impression that you were using GCC. The fact is that the GCC compiler always keeps 0 in R1 so it does not need CLR's or other ways to get a quick 0x00 - it just uses the value in R1.

 

(an unfortunate choice as it turned out as the later models of AVR then added the MUL instruction that uses R1 so now GCC code is peppered with stuff to reload the 0x00 into R1 when there's a chance it's not in that state - this is particularly tiresome in ISR()s where the ISR cannot know whether it interrupted a MUL so it starts by preserving R1 then putting the 0x00 back).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, I use R24 now just in case the R1 idea is different.

The code now looks like this here (edited after the next remarks):

 

  .area text
_pulse::
    LDI R24, 0        ; this is not necessary if we get R1 always 0, we can use R1 then
    LDI R30, <(JMPADDR)  ; lsb jump address
    LDI R31, >(JMPADDR)  ; msb jump address

    ANDI R30, 0xfe    ; clear the lsb's bit 0

    LSR R31           ; leftshift msb address - divide by 2
    ADC R30, R24 ;    ; add carry from msb to lsb address
    ROR R30;          ; rotate lsb address through carry  address - the carry from msb is now bit 7 here

    ANDI R16, 0x1f    ; R16=cyclesnum ;
    SUB R30, R16      ; substract the parameter from the jump address
    SBC R31, R24      ; R24==0, substract the carry from msb address if needed
    IJMP

    SBI 0x15,5        ; setbit VPORTC_OUT, pin 5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5

    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5

    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5

    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5

    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5
    SBI 0x15,5

    JMPADDR:        ; IJMP here if cyclesnum is zero
    CBI 0x15,5      ; clearbit VPORTC_OUT, bit 5
    RET

 

Last Edited: Wed. Feb 24, 2016 - 01:46 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
pnv_Creator wrote:
- R1 seems to be zero, I couldn't find documentation regarding that, but it works; I tried to load it with 0 just in case, I can't, no access ("register is not valid")

Michael wrote the code under the mistaken impression that you were using GCC. The fact is that the GCC compiler always keeps 0 in R1 so it does not need CLR's or other ways to get a quick 0x00 - it just uses the value in R1.

Actually, I used the AVR-GNU ABI because that is what I knew.

At the time of the massive edit, I was aware OP was using Imagecraft, so added ABI-related comments.

 

OP's current code works for valid values of cyclesnum,

but could be disastrous for values in the range 23-31, not to mention others.

A 0x10-word landing pad would fix that.  Note .repeat .

Also, there should be no need for run-time arithmetic to get the word address of the label.

avr-as uses pm for that.

Surely Imagecraft has something.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
Surely Imagecraft has something.

As I said above:

clawson wrote:
with either "< (END * 2)" or "< (END / 2)" and see if things improve

I would be surprised (nay, horrified) if it could not support an assemble time *2 or /2 ! Some folks may prefer the <<1 or >>1 syntax.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just a final note:

 

Skeeve is right about all, I've put back the ANDI 0x1f and the missing lines up to 0x1f just in case if I reuse or edit something in future. The combination of ANDI 0x1f and 22 lines can easily lead to an infinite loop or other problems with input between 22 and 31.

I still do have to take care for case >22 outside the function, but 1 line inside it just to be more robust by itself will do no harm.

 

Regarding the ICC specifics, according to the support, which I find great btw, .repeat functionallity (I am not talking about the syntax itself) is not supported, R1 is not reserved (so the use of LDI R24, 0 is necessary) and it seems that the address should be manually devided by 2, if there wasn't any misscommunication (I am sorry for the horrific experience for Clawson, he seems a nice guy). These are all not such a big problems, after all, just it seems the additional operations I've put are necessary and I mention them just in case for the curious and for the ICC users.

 

Thanks again!

Last Edited: Wed. Feb 24, 2016 - 01:41 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

I don't suppose it matters as both are 1 word and 1 cycle but I think most folks would tend to use:

CLR R24

rather than:

LDI R24, 0

The "CLR" itself is just a mildly disguised:

EOR R24, R24

in fact it is exactly the same opcode bit pattern - so CLR is one of those "phantom" instructions that reduce Atmel's claimed "136 powerful RISC instructions" to a reality of about "70 odd powerful RISC instructions".

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

When I can, I use LDI instead of CLR.

Not often, but sometimes it is useful to preserve SREG.

 

@OP: I recommend getting the assembler or linker to do your division.

Iluvatar is the better part of Valar.