Delaying - but not what you think!

Go To Last Post
84 posts / 0 new

Pages

Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

This question has nothing to do with the normal questions based around _delay_us() and friends!

 

Instead, I am trying to use a mega168 to really delay a pulse train in the microsecond range (20-50us). Read "delay line"

 

Ideally, I want to work with an input signal of 3MHz, but an AVR can't cope, so I've divided down my input signal to 500kHz. Dividing down the input more to say 250kHz is not a problem.

 

My tactic was to poll for input changes, stick them in a buffer, then some time later bang the square wave out on another pin:

 

	// Put debugging symbols on Release mode to see why the NOP()S are where they are.
	// We want a duty-cycle of the same as the input at ALL times. All paths must take equal time.
        cli();

	while (1){

		uint8_t& myb = buf[i++];
		uint8_t in_state = PINC & IN_HIGH;

		if (in_state){
			myb = 1;
		}else{
			myb = 0;
		}

		if (buf[c++]){

			if (in_state){
				NOP();NOP();
			}
			PORTC |= OUT_HIGH;
		}else{
			NOP();
			PORTC &= ~OUT_HIGH;

		}

	};

 

This works. The NOPs are there to keep the timing consistent (in the Sim)

 

 

However, the output has jitter, even at 250kHz. I am surprised by this. Interrupts are off as you can see. I clearly see up to 2uS of jitter on the 'scope, and this won't do once the input is multiplied back up in frequency :(

 

Has anyone any tips on how to do this jitter free? I just want to exactly delay the pulse train: no artifacts!

 

Thanks!

 

 

 

Last Edited: Tue. Aug 2, 2016 - 03:35 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nobba wrote:
Has anyone any tips on how to do this jitter free?

I recall at least one extensive past thread on AVR "delay line".  Hmmm--this one didn't have a "resolution"

https://www.avrfreaks.net/forum/d...

...and "delay line" site search came up with too many hits.

 

If really important, then I might recommend investigating using an Xmega with port DMA  -- set the buffer(s) length to the needed delay time.  [I see I suggested looking into that in the thread above but without resolution]

 

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes the whole "delay" vs "delay-line" thing makes it hard to Google alone; let alone anything else.

 

Extra Info: External crystal osc, 20MHz, no prescaler.

 

I do have an Xmega + development breakout board here somewhere...where's that data sheet ...

 

Thanks!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A Google search for "digital "delay line" with microcontroller" gives some interesting results.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you need a precise amount of delay,

probably you should use assembly.

If you just need jitter-free operation,

use a formula that does not require an if that depends on your data:

enum { DELAY=37 } ; // in while loop passes
enum { BUF_BITS=7 } ;
enum { BUF_SIZE=1<<BUF_BITS } ;
// twos complement assumed later
uint8_t buf[BUF_SIZE];
uint8_t jin=0, jout=0;
uint8_t delay_left=DELAY;

while(1) {
    buffer[jin++ & -BUF_SIZE]=PINC;
    volatile uint8_t portv=
            shift(buf[jout++ & -BUF_SIZE] & IN_HIGH) | (PORTC & ~OUT_HIGH); }
            // shift to get the right bit position
    uint8_t port3=portv;
    // getting the timing right on the following
    // if/else might be easier in in-line assembly
    // even slight compilation changes could make a differnce otherwise
    if(delay_left) {
        --delay_left;
    } else {
        PORTC=port3;
    }
} // while

 

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Reviewing the loop that I posted last year, and as a given that the whole AVR will be used during this "delay line" operation [it can be like e.g. miniDDS operation -- a hard loop with interrupts enabled with sources such as UART for "new command" or a STOP button to go back to command mode] then let's do some noodling...

 

Ideal operation from the initial post is 3MHz signal.  That is 60 AVR clocks at 20MHz, right?

 

My first guess is to use an entire AVR port for the input, and especially the output so that OUT can be used.  And also, fastest operation would be if [wastefully] a whole byte is used to store each sample.

 

Does Mr. Nyquist come into play here?  If you want to catch a single 1/3us pulse then do you need 6MHz sampling rate?  So what is the minimum high/low time to be reproduced?

 

You said "no jitter"...but with any kind of sampling you would not reproduce the input signal exactly, right?  For example, let's say we make this ideal delay line with the AVR, sampling the input every [for example] 1 microsecond.  The actual start/end of the incoming pulse could be up to 1 microsecond earlier.  It is true, isn't it, that the output pulse high/low times will be like +/- one sample period?

 

Let's start with some way of a "programmable" or "settable" delay period, with the maximum time of 50us.  If you sample once a microsecond and use the "wasteful" one byte per sample, that is a 50 byte buffer for max delay.  Not too bad.

 

Expand on your specs for min and max delay and resolution needed.  For most efficient wrap of the buffer pointers a power of two would be most efficient.  And for least cycles in the loop I think 256 would work best.  OK, here goes:

 

-- 256 byte buffer on a mod 256 boundary.

-- Arbitrary output port and pin, but overwrite entire PORT register.

-- Input pin on a different port, and in the same pin as the output pin.  E.g. PB0 as input, PD0 as output.

 

With the above, there should be no conditional logic or masking in the loop.

 

Init the buffer to zeros or ones for the initial delay front porch.

 

Set "offset" based on the desired delay and the sample time.  E.g. if the loop below takes 10 clocks and you are running at 20MHz, then "offset" is 2x the desired delay time in microseconds.

 

As mentioned, have interrupts on for stop; change parameters; etc.  If you care to, fuss with the stack so the RETI doesn't jump back into the loop but rather to the setup/start code.

 

Using the aforementioned PB0=>PD0 the ASM might look something like:

 

    .DSEG
    .ORG 0x400
BUFFER:
    .BYTE 0x100
...
; SETUP
    LDI ZH, HIGH(BUFFER)        ; OUTPUT POINTER
    LDI ZL, 0

    LDI XH, HIGH(BUFFER)        ; INPUT POINTER
    LDS XL, offset              ; CALCULATED ELSEWHERE

LOOP:
    IN  R16, PINB   ; 1 CYCLE
    STS X, R16      ; 2
    INC XL          ; 1
    LDS R16, Z      ; 2
    INC ZL          ; 1
    OUT PORTD, R16  ; 1
    RJMP LOOP       ; 2

If I counted correctly that is 10 cycles; 1/2 us at 20MHz.  2MHz sampling rate.  0-128us delay with 1/2 us resolution.

 

 

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Tue. Aug 2, 2016 - 05:26 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@Skeeve:

 

Thanks for the code. I'll give it a try! Sadly, my C is OK, assembly = nil! Its all I could do to de-mangle the assembly output to figure out where to put the NOP()s!!

BTW, buf size is UINT8_MAX, so OK.

Last Edited: Tue. Aug 2, 2016 - 05:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@ theusch:

Thank you, btw this is the AVR's only job, so I'm not worried about anything else.

 

I'd like to be able to set a delay of up to 50uS, say in 1 or 2 uS intervals, in hard code, empirically, but when its done, its done!

 

Let me figure out how to put this into my C compiler. I guess I'd better crash course on inline assembly...

Last Edited: Tue. Aug 2, 2016 - 05:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nobba wrote:
Let me figure out how to put this into my C compiler.

I've often posted in "you can't do this in C" threads; [almost?] always on the C side.

 

But C has no concept of buffer pointer wrapping.  So my guess is that even with alignment and my other axioms above the C clocks would be double or more.

 

nobba wrote:
I'd like to be able to set a delay of up to 50uS, say in 1 or 2 uS intervals,
theusch wrote:
2MHz sampling rate. 0-128us delay with 1/2 us resolution.
nobba wrote:
in hard code, empirically, but when its done, its done!
I don't know what you are getting at.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
nobba wrote: in hard code, empirically, but when its done, its done! I don't know what you are getting at.

 

I mean its set and forget once it comes off the bench! I just have to match the group delay in an audio low pass filter at rf: match it once (its around 20uS) and then its set forever.

 

So if I can't do it in C (for the reasons you suggest, I'll be finding out assembler 101, ie how to program my 168 with the code you posted. Mark you, I won't know whats happening if it doesn't work!)

 

So, is the assembly you posted the complete solution? (Bearing in mind i will struggle to edit it?)

 

eg: offset is my "delay", set elsewhere -- yes, before the loop.

 

So what's the assembly version of

 

static const uint8_t offset = 20;

 

??

 

Thanks

Last Edited: Tue. Aug 2, 2016 - 05:43 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nobba wrote:

Sadly, my C is OK, assembly = nil!

 

Only a couple changes to make the above a complete program for Atmel assembler:

 

    .DSEG
    .ORG 0x400
BUFFER:
    .BYTE 100

    .CSEG
    .ORG 0
; SETUP
    LDI ZH, HIGH(BUFFER)        ; OUTPUT POINTER
    LDI ZL, 0
    
    LDI XH, HIGH(BUFFER)        ;  POINTER
;   LDS XL, offset              ; CALCULATED ELSEWHERE
    LDI XL, 123 ; DELAY IN MICROSECONDS*2

LOOP:
    IN  R16, PINB   ; 1 CYCLE
    STS X, R16      ; 2
    INC XL          ; 1
    LDS R16, Z      ; 2
    INC ZL          ; 1
    OUT PORTD, R16  ; 1
    RJMP LOOP       ; 2     

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

nobba wrote:

Sadly, my C is OK, assembly = nil!

 

Only a couple changes to make the above a complete program for Atmel assembler:

 

    .DSEG
    .ORG 0x400
BUFFER:
    .BYTE 100

    .CSEG
    .ORG 0
; SETUP
    LDI ZH, HIGH(BUFFER)        ; OUTPUT POINTER
    LDI ZL, 0
    
    LDI XH, HIGH(BUFFER)        ;  POINTER
;   LDS XL, offset              ; CALCULATED ELSEWHERE
    LDI XL, 123 ; DELAY IN MICROSECONDS*2

LOOP:
    IN  R16, PINB   ; 1 CYCLE
    STS X, R16      ; 2
    INC XL          ; 1
    LDS R16, Z      ; 2
    INC ZL          ; 1
    OUT PORTD, R16  ; 1
    RJMP LOOP       ; 2     

 

Well, copy n paste: I can try! Thank you.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nobba wrote:
LOOP: IN R16, PINB ; 1 CYCLE STS X, R16 ; 2 INC XL ; 1 LDS R16, Z ; 2 INC ZL ; 1 OUT PORTD, R16 ; 1 RJMP LOOP ; 2

 

Compiler chokes at the line:

STS X, R16      ; 2

 

Should that be STS XL, R16

??

thanks

 

EDIT: clearly not, just tried it :(

 

UPDATE: It compiles with ST instead of STS and LD instead of LDS.

 

Is this OK?

 

Error: Invalid number

Last Edited: Tue. Aug 2, 2016 - 06:11 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry.  [I didn't actually run the code, as you see. ;) ]

 

ST X, R16

 

and

 

LD R16, Z

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello!

 

Here is the full program. It works but still looks a bit wobbly. Perhaps its my scope! Will investigate shortly with 'scope #2 in case its leading me up the garden path

 

I will also double-check its running on the crystal, coz that's the only reason I can see for any remaining jitter. Mind you, having said that, it has a hard time "keeping up" with an input beyond about 500khz... :(

 

;
; delayass.asm
;
; Created: 02/08/2016 19:02:34
; Author : steve
;

; Replace with your application code
    .DSEG
    .ORG 0x400
BUFFER:
    .BYTE 100 ; 256 bytes (100 is hex, innit)

    .CSEG
    .ORG 0
; SETUP

    LDI    R16,  0xFF       ; Load 0b11111111 in R16
    OUT    DDRC, R16        ; Configure PortC as an Output port

    LDI    R16,  0x00       ; Load 0b00000000 in R16
    OUT    DDRB, R16        ; Configure PortB as an Input port
	; This is because I am using PB1 as input and will take output on PC1

    LDI ZH, HIGH(BUFFER)        ; OUTPUT POINTER
    LDI ZL, 0

    LDI XH, HIGH(BUFFER)        ;  POINTER
    LDI XL, 0 ; DELAY IN MICROSECONDS*2

LOOP:
    IN  R16, PINB   ; 1 CYCLE
    ST X, R16      ; 2
    INC XL          ; 1
    LD R16, Z      ; 2
    INC ZL          ; 1
    OUT PORTC, R16  ; 1
    RJMP LOOP       ; 2 

Please let me know if there are any howlers in there - first asm build. Ever.

Just note that on the line:

LDI XL, 0 ; DELAY IN MICROSECONDS*2

You should put the required delay in. Zero is in there to measure just the propagation delay, and to prove it works without too much jitter.

 

Still worried that when I multiply the signal back up to 3Megs the 1u or so of jitter will smash the waveform up. Might need a "super AVR" (don't know what that is, just invented it)

Last Edited: Tue. Aug 2, 2016 - 08:11 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nobba wrote:
You should put the required delay in. Zero is in there to measure just the propagation delay, and to prove it works without too much jitter.

So, did you get about half a microsecond of propagation delay?

 

Describe what you mean by "jitter".  Remember what I outlined earlier:

theusch wrote:
You said "no jitter"...but with any kind of sampling you would not reproduce the input signal exactly, right? For example, let's say we make this ideal delay line with the AVR, sampling the input every [for example] 1 microsecond. The actual start/end of the incoming pulse could be up to 1 microsecond earlier. It is true, isn't it, that the output pulse high/low times will be like +/- one sample period?

 

Perhaps further describe your input signal.  An arbitrary pulse train?  Or a repeated "frequency"?   If so, is it 50% duty cycle?  And further, if so, are you really trying to do "phase shift"?

 

The above code should sample at 2Msps -- >>if<< your AVR is really running at 20MHz.  Have you proven that?  Nyquist would indicate that at 2Msps up to a 1MHz signal could be reproduced without losing any highs or lows.

 

Can you capture and post a 'scope trace that shows the jitter?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Tue. Aug 2, 2016 - 08:33 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And yes, I forgot to make the output pin high in my "complete program". ;)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

 

So I can't easily post a picture of the scope as it is analog.

 

To tell you a bit more: I set the fuse to give me clock output (no div) out on the pin, setting fuses like so:

-U lfuse:w:0x87:m -U hfuse:w:0xdf:m EDITED TO CORRECT VALUES

 

So I can scope the clock to see that it really is 20 Mhz.

 

The application is quite complex: put simply I must preserve any phase-modulated information on the 3 megs (or 4 Megs) signal, but I must delay it by x microseconds.

 

Now, since the AVR cannot handle fast freqs, I thought it best to divide the signal (using D flip flop) to a frequency it can cope with, perform the delay, then multiply the frequency back up to the operating frequency.

 

The scope, with the input on channel A (and triggered on channel A) shows input B (from the AVR output) as "smeary". If I trigger on the output (channel B) then "A", the input frequency, looks smeary.

 

The reason I absolutely must minimize jitter is that, say I have 1 uS jitter @ 500khz. I'm worried I will end up with 8 times that when I multiply back up: and thats a whole 'nother can of worms...

 

All my tests so far are KISS: testing with 1:1 square wave only from a waveform generator.

 

According to my 'scope, with the buffer read set to 0, I am seeing a (jittery) 0.5 - 1uS prop delay.

Last Edited: Tue. Aug 2, 2016 - 09:34 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nobba wrote:
So I can't easily post a picture of the scope as it is analog.

That's why I like my smartphone.  Or other digital camera...

nobba wrote:
The scope, with the input on channel A (and triggered on channel A) shows input B (from the AVR output) as "smeary". If I trigger on the output (channel B) then "A", the input frequency, looks smeary.

But does it "catch up"?  As I said, as far as I can see, the edges will always be +/- one sample period for >>any<< sampling-type setup.  Right?

 

So your 'scope has no one-shot or similar mode to show a few cycles across the screen?

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Maybe you could use the USART in SPI mode to handle the signal? After all, it is buffered, so maybe with the proper coding you will not loose any input. Also, the peripheral will do most work, leaving the cpu breathing time to handle memory transactions and delay calculations. I think 3-4 MHz may be possible, of course I'm just speculating and didn't do any actual coding.

 

And yeah, this is one of those times, when assembly will be needed.

 

Edit: As theusch said, because of the Nyquist theorem, you have to sample the signal at at least twice its frequency. So the USART clock should be set at maximum possible value, I think its system clock/2 (10 MHz). I don't actually know if this leaves enough cpu time to process the delay and handle interface with the peripheral. Maybe tomorrow I'll write some code.

Last Edited: Tue. Aug 2, 2016 - 10:50 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nobba wrote:
Thanks for the code. I'll give it a try! Sadly, my C is OK, assembly = nil! Its all I could do to de-mangle the assembly output to figure out where to put the NOP()s!! BTW, buf size is UINT8_MAX, so OK.
My code requires BUF_SIZE to be a power 2.

With BUF_SIZE=0x100 you would not need the & -BUF_SIZE .

 

Reliable cycle-accurate timing pretty much requires assembly.

Subcycle-accurate timing is not available.

 

As another noted, there is a variable delay between the time a signal

changes on a pin and it is available for reading within the AVR.

How big a jitter are we discussing?

 

Edit: Is there a chance that you have been trying to read or write buffer[0x100]?

Iluvatar is the better part of Valar.

Last Edited: Wed. Aug 3, 2016 - 12:36 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
Edit: Is there a chance that you have been trying to read or write buffer[0x100]?

 

Good point. I did check for this at one point, I *think* it was reading from 0 to UINT8_MAX -1, with the array dimensioned to be ar[UINT8_MAX + 1] just for good measure when I doubted myself...

 

I was letting the counter (uint8_t) simply overflow.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:

Maybe you could use the USART in SPI mode to handle the signal? After all, it is buffered, so maybe with the proper coding you will not loose any input. Also, the peripheral will do most work, leaving the cpu breathing time to handle memory transactions and delay calculations. I think 3-4 MHz may be possible, of course I'm just speculating and didn't do any actual coding.

 

And yeah, this is one of those times, when assembly will be needed.

 

Edit: As theusch said, because of the Nyquist theorem, you have to sample the signal at at least twice its frequency. So the USART clock should be set at maximum possible value, I think its system clock/2 (10 MHz). I don't actually know if this leaves enough cpu time to process the delay and handle interface with the peripheral. Maybe tomorrow I'll write some code.

 

OK, thanks. Let me know whar you find. Always ready to learn!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:
Maybe you could use the USART in SPI mode to handle the signal? After all, it is buffered, so maybe with the proper coding you will not loose any input.

Good idea.  I didn't think of that one--a fake SPI.  Input on RXD; output on TXD.  At say 8MHz SPI clock that gives 1us granularity.

 

El Tangas wrote:
I think its system clock/2 (10 MHz). I don't actually know if this leaves enough cpu time to process the delay and handle interface with the peripheral.

If one sets the SPI clock "properly", then [theoretically at least] one could cycle-count and not have to do any flag checking.

 

The loop would look very much like my bit-at-a-time, right?  Now, are the needed registers in reach of IN/OUT...

No, darn it.  Another two cycles wasted.  But doing 8 bits at a time should give good results.

 

Or does it...

 

El Tangas wrote:
So the USART clock should be set at maximum possible value, I think its system clock/2 (10 MHz).

But it appears that the max clock rate would be clk/16 -- 1.25MHz.

 

But maybe/probably U2X applies so then clk/8 2.5MHz.  8 clocks per bit means 64 clocks per byte, right?  Lots of time...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Simpler, clearly jitter-free:

enum { DELAY=137 } ; // in while loop passes
enum { BUF_BITS=8 } ;
enum { BUF_SIZE=1<<BUF_BITS } ;
// twos complement assumed later
uint8_t buf[BUF_SIZE];
uint8_t port1=PORTC;
memset(buf, port1, DELAY);  // DELAY passes of sameness

uint8_t jin=DELAY, jout=0;
while(1) {
    buffer[jin++ & -BUF_SIZE]=shift(PINC & IN_HIGH) | (port1 & ~OUT_HIGH);
            // shift to get the right bit position
    PORTC=buffer[jout++ & -BUF_SIZE];
} // while

 

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But it appears that the max clock rate would be clk/16 -- 1.25MHz.

Not for MSPI:

 

 

This might do it:

#define __SFR_OFFSET 0
#include <avr/io.h>

; Must be a power-of-two no greater than half of the available SRAM
#define DL_SIZE_BYTES 64

; 2 <= DELAY_BYTES < DL_SIZE_BYTES
; Since two bytes go  out from the tail of the delay line  before any are added
; to the head, the delay line must be at least 2 bytes long, or 16 samples, for
; a minimum delay of 1.6 us.
#define DELAY_BYTES 25

; Index registers by name
#define XL r26
#define XH r27
#define YL r28
#define YH r29

; 10 MHz sample rate  will generate 10 samples per us.  50  us will require 500
; 1-bit samples.  512 samples would fit in  64 bytes.  Although the m168 has 1K
; of SRAM, it is mapped starting at 0x100.  In order to keep the buffer aligned
; to a  power-of-two equal  to its size,  it cannot be  larger than  512 bytes.
; That would permit a 4096 sample delay line.  At 10 MHz, that's 409.6 us.
.section  .bss
.balign   DL_SIZE_BYTES
.comm    dl, DL_SIZE_BYTES

.section .text

.global __do_clear_bss

.global main

main:

; configure SPI for F_OSC/2 = 10 MHz
        eor     r1,     r1
        sts     UBRR0H, r1
        sts     UBRR0L, r1
        sbi     DDRD,   4
        ldi     r16,    (1<<UMSEL01)|(1<<UMSEL00)
        sts     UCSR0C, r16
        ldi     r16,    (1<<RXEN0)|(1<<TXEN0)
        sts     UCSR0B, r16
        sts     UBRR0H, r1
        sts     UBRR0L, r1

; Since  it's implemented  as a  circular  buffer of  bytes, the  delay can  be
; configured with a granularity of 8 bits, or 0.8 us.

; X is  used to point to  the head of  the delay line, where  incomming samples
; will be deposited
        ldi     XH,     hi8(dl+DELAY_BYTES)
        ldi     XL,     lo8(dl+DELAY_BYTES)
; Y is used to point to the tail of the delay line, where outgoing samples will
; be withdrawn
        ldi     YH,     hi8(dl)
        ldi     YL,     lo8(dl)


; Fill the MSPI TX buffer and wait for  the first RX byte to be ready, i.e. the
; first read must  occur at least 16  cycles after the first  write.  The third
; write must  occur no more  than 32  cycles after the  first write, or  the TX
; buffer will be empty, and there will be a gap.
        ld      r16,    Y+                        ;                    
        sts     UDR0,   r16                       ;         1st write
        ld      r16,    Y+                        ;                   2
        sts     UDR0,   r16                       ;         2nd write 2

; 3 cycles per pass, total 15 cycle wait.
        ldi     r16,    5                         ;
wait:
        dec     r16                               ;
        brne    wait                              ;                  15

; Run the delay line.  Loop must be exactly 16 cycles
loop:
        lds     r16,    UDR0                      ; 2       1st read  2 = 21
        st      X+,     r16                       ; 2                 2
        ld      r16,    Y+                        ; 2                 2
        sts     UDR0,   r16                       ; 2       3rd write 2 = 27
        andi    XL,     (DL_SIZE_BYTES-1) & 0xFF  ; 1
        andi    XH,     (DL_SIZE_BYTES-1) >> 8    ; 1
        andi    YL,     (DL_SIZE_BYTES-1) & 0xFF  ; 1
        andi    YH,     (DL_SIZE_BYTES-1) >> 8    ; 1
        rjmp    .+0                               ; 2
        rjmp    loop                              ; 2
                                                  ; = 16

Compiles, and looks like it will work, but COMPLETELY UNTESTED.

 

At 20 MHz system clock, and a 10 MHz sample clock, a 3 MHz input will show a fair amount of jitter since 10 isn't an integer multiple of 3.  Nyquist tells us that the sample frequency must be > twice the signal frequency, so we need >>more<< than two sample per period of the input signal.  In order to minimise jitter we'd want an even integral number of samples per period, so a minimum of 4 samples per period total.  That would be a 12 MHz sample frequency.  Not achievable with a 20 MHz system clock.  You could overclock to 24 MHz, or live with the jitter, or reduce the frequency of your input signal.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Wed. Aug 3, 2016 - 06:54 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nice coding, yeah, that's what I had in mind. Let's see if nobba tests it and it works in practice. If there are any timing errors, it will be quite hard to debug, but if you are working at the limits of the MCU, that's just the way it is.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
Not for MSPI:

Indeed, it was late when I went through that datasheet section and rolled right by the actual equations.

 

I was wondering whether one would want to start with a "seed" write to start filling the double-buffers.  But perhaps while the very first iteration may have a bit of a gap things would then "catch up" after cycle counting?

 

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's very likely I've missed something.  If I have time later today I will try to test it.

 

Oh, and the way to compute the number of bytes you'll need for a given delay is simple.  May as well add that to the code.

 

Note that floating point arithmetic in preprocessor directives isn't supported by GCC for assembler sources, thus the delay is specified in nanoseconds instead of microseconds.

 

And I suppose there's no reason to limit the buffer to 64 bytes.  Might as well just go for the maximum size.

 

Also, note that the delay you specify does not include the 0.5-1.5 cpu cycles synchronisation delay imposed by hardware, nor the half-bit (i.e. 1 cpu cycle) phase delay between TX and RX.  The result is likely to be a -0.5 to +0.5 cycle jitter.  A scope or LA would be required to characterise the actual delay offset on real hardware.

 

Further, it should be possible to change the length of the delay line at runtime by communicating a new value over SPI (or TWI, or another USART, if your AVR has one).  With interrupt-driven SPI slave code, an ISR could compute a new value for DELAY_BYTES and set new values for the head and tail registers [X and Y], and then resynchronise MSPI.  Here's the new code with changes made.  I've omitted the code required to receive runtime changes to the length of the delay line, that is left as an exercise for the reader ;-)

 

#define F_CPU 20000000

#define __SFR_OFFSET 0
#include <avr/io.h>

#define DELAY_NS 20000

; Must be a power-of-two no greater than half of the available SRAM
#define BUF_SZ_BYTES (((RAMEND + 1) - RAMSTART) / 2)

; 2 <= DELAY_BYTES < BUF_SZ_BYTES
; Since two bytes go  out from the tail of the delay line  before any are added
; to the head, the delay line must be at least 2 bytes long, or 16 samples, for
; a minimum delay of 1.6 us.
#define BITS_PER_US (F_CPU / 2000000)
#define DELAY_BYTES ((DELAY_NS * BITS_PER_US) / 8000)
#if (DELAY_BYTES >= BUF_SZ_BYTES)
  #warning DELAY_US is too long
  #undef DELAY_BYTES
  #define DELAY_BYTES (BUF_SZ_BYTES - 1)
#endif
#if (DELAY_BYTES < 2)
  #warning DELAY_US is too short
  #undef DELAY_BYTES
  #define DELAY_BYTES 2
#endif

; Create a symbol reflecting the real delay.  Examine it with avr_objdump -t
; or similar.
.equ real_delay, (DELAY_BYTES * 8000) / BITS_PER_US

; Index registers by name
#define XL r26
#define XH r27
#define YL r28
#define YH r29

; 10 MHz sample rate  will generate 10 samples per us.  50  us will require 500
; 1-bit samples.  512 samples would fit in  64 bytes.  Although the m168 has 1K
; of SRAM, it is mapped starting at 0x100.  In order to keep the buffer aligned
; to a  power-of-two equal  to its size,  it cannot be  larger than  512 bytes.
; That would permit a 4096 sample delay line.  At 10 MHz, that's 409.6 us.
.section  .bss
.balign   BUF_SZ_BYTES
.comm    dl, BUF_SZ_BYTES

.section .text

.global __do_clear_bss

.global main

main:

; configure SPI for F_OSC/2 = 10 MHz
        eor     r1,     r1
        sts     UBRR0H, r1
        sts     UBRR0L, r1
        sbi     DDRD,   4
        ldi     r16,    (1<<UMSEL01)|(1<<UMSEL00)
        sts     UCSR0C, r16
        ldi     r16,    (1<<RXEN0)|(1<<TXEN0)
        sts     UCSR0B, r16
        sts     UBRR0H, r1
        sts     UBRR0L, r1

; Since  it's implemented  as a  circular  buffer of  bytes, the  delay can  be
; configured with a granularity of 8 bits, or 0.8 us.

; X is  used to point to  the head of  the delay line, where  incomming samples
; will be deposited
        ldi     XH,     hi8(dl+DELAY_BYTES)
        ldi     XL,     lo8(dl+DELAY_BYTES)
; Y is used to point to the tail of the delay line, where outgoing samples will
; be withdrawn
        ldi     YH,     hi8(dl)
        ldi     YL,     lo8(dl)

; Fill the MSPI TX buffer and wait for  the first RX byte to be ready, i.e. the
; first read must  occur at least 16  cycles after the first  write.  The third
; write must  occur no more  than 32  cycles after the  first write, or  the TX
; buffer will be empty, and there will be a gap.
        ld      r16,    Y+                        ;
        sts     UDR0,   r16                       ;         1st write
        ld      r16,    Y+                        ;                   2
        sts     UDR0,   r16                       ;         2nd write 2

; 3 cycles per pass, total 15 cycle wait.
        ldi     r16,    5                         ;
wait:
        dec     r16                               ;
        brne    wait                              ;                  15

; Run the delay line.  Loop must be exactly 16 cycles
loop:
        lds     r16,    UDR0                      ; 2       1st read  2 = 21
        st      X+,     r16                       ; 2                 2
        ld      r16,    Y+                        ; 2                 2
        sts     UDR0,   r16                       ; 2       3rd write 2 = 27
        andi    XL,     (BUF_SZ_BYTES-1) & 0xFF   ; 1
        andi    XH,     (BUF_SZ_BYTES-1) >> 8     ; 1
        andi    YL,     (BUF_SZ_BYTES-1) & 0xFF   ; 1
        andi    YH,     (BUF_SZ_BYTES-1) >> 8     ; 1
        rjmp    .+0                               ; 2
        rjmp    loop                              ; 2
                                                  ; = 16

 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Wed. Aug 3, 2016 - 01:45 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:

It's very likely I've missed something.  If I have time later today I will try to test it.

 

Oh, and the way to compute the number of bytes you'll need for a given delay is simple.  May as well add that to the code.

 

Note that floating point arithmetic in preprocessor directives isn't supported by GCC for assembler sources, thus the delay is specified in nanoseconds instead of microseconds.

 

And I suppose there's no reason to limit the buffer to 64 bytes.  Might as well just go for the maximum size.

....

 

Hello guys and thanks for your sterling efforts.

 

Ok, baby steps here for me please.

 

I am totally willing to try to compile the code, and take a look on the scope (btw, its of 1988 vintage, much like my 'tronics knowledge, hence baby steps)

 

So, baby steps in mind -- using this code, I have no clue which pin to put the input signal in on, and which pin the output will appear. I will pore over the datasheet to see if I can fig it out.

 

 

BTW, the guy with your C code -- shift(...) doesn't exist on my compiler, so no idea what you mean by that line. I did paste it in. It fails to compile. I guess you mean left shift, but its just a guess..

 

Last Edited: Wed. Aug 3, 2016 - 02:49 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Input on RXD, output on TXD. For the m168 and friends, thats PD0 and PD1.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
Input on RXD, output on TXD. For the m168 and friends, thats PD0 and PD1.

 

Ok, I see that on pins 2 & 3. Is the configuration of these pins required in addition to the code? (DDRD)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, continuing on the baby steps theme.

 

It doesn't compile for me in AtmelStudio at all.

 

"Cannot find include file avr/io.h"

"Invalid directive: section"

 

I think this is not for the same compiler I have installed.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nobba wrote:

joeymorin wrote:
Input on RXD, output on TXD. For the m168 and friends, thats PD0 and PD1.

 

Ok, I see that on pins 2 & 3. Is the configuration of these pins required in addition to the code? (DDRD)

 

I think it's automatic for RXD and TXD, some peripherals just take over the pins. However, you can see in the initialization code that PD4 was set as output, this is the clock pin. It is not needed here, but maybe you can use it for debugging, it will generate a 10MHz square wave synchronized with the output.

 

 

edit: yeah, it's a different assembler, to do it in atmel studio, some minor changes need to be made.

 

edit #2: it so happens I had converted the original version smiley  (hope there are no mistakes):

 



 #define __SFR_OFFSET 0

; Must be a power-of-two no greater than half of the available SRAM
#define DL_SIZE_BYTES 64

; 2 <= DELAY_BYTES < DL_SIZE_BYTES
; Since two bytes go  out from the tail of the delay line  before any are added
; to the head, the delay line must be at least 2 bytes long, or 16 samples, for
; a minimum delay of 1.6 us.
#define DELAY_BYTES 25

; Index registers by name
#define XL r26
#define XH r27
#define YL r28
#define YH r29

; 10 MHz sample rate  will generate 10 samples per us.  50  us will require 500
; 1-bit samples.  512 samples would fit in  64 bytes.  Although the m168 has 1K
; of SRAM, it is mapped starting at 0x100.  In order to keep the buffer aligned
; to a  power-of-two equal  to its size,  it cannot be  larger than  512 bytes.
; That would permit a 4096 sample delay line.  At 10 MHz, that's 409.6 us.
.DSEG
.IF	(DL_SIZE_BYTES > SRAM_START)
	.ORG	DL_SIZE_BYTES
.ELSE
	.ORG	SRAM_START
.ENDIF
dl:	.BYTE	DL_SIZE_BYTES

.CSEG

.ORG 0

; configure SPI for F_OSC/2 = 10 MHz
        eor     r1,     r1
        sts     UBRR0H, r1
        sts     UBRR0L, r1
        sbi     DDRD,   4
        ldi     r16,    (1<<UMSEL01)|(1<<UMSEL00)
        sts     UCSR0C, r16
        ldi     r16,    (1<<RXEN0)|(1<<TXEN0)
        sts     UCSR0B, r16
        sts     UBRR0H, r1
        sts     UBRR0L, r1

; Since  it's implemented  as a  circular  buffer of  bytes, the  delay can  be
; configured with a granularity of 8 bits, or 0.8 us.

; X is  used to point to  the head of  the delay line, where  incomming samples
; will be deposited
        ldi     XH,     HIGH(dl+DELAY_BYTES)
        ldi     XL,     LOW(dl+DELAY_BYTES)
; Y is used to point to the tail of the delay line, where outgoing samples will
; be withdrawn
        ldi     YH,     HIGH(dl)
        ldi     YL,     LOW(dl)

; Fill the MSPI TX buffer and wait for  the first RX byte to be ready, i.e. the
; first read must  occur at least 16  cycles after the first  write.  The third
; write must  occur no more  than 32  cycles after the  first write, or  the TX
; buffer will be empty, and there will be a gap.
        ld      r16,    Y+                        ;
        sts     UDR0,   r16                       ;         1st write
        ld      r16,    Y+                        ;                   2
        sts     UDR0,   r16                       ;         2nd write 2

; 3 cycles per pass, total 15 cycle wait.
        ldi     r16,    5                         ;
wait:
        dec     r16                               ;
        brne    wait                              ;                  15

; Run the delay line.  Loop must be exactly 16 cycles
loop:
        lds     r16,    UDR0                      ; 2       1st read  2 = 21
        st      X+,     r16                       ; 2                 2
        ld      r16,    Y+                        ; 2                 2
        sts     UDR0,   r16                       ; 2       3rd write 2 = 27
        andi    XL,     (DL_SIZE_BYTES-1) & 0xFF  ; 1
        andi    XH,     (DL_SIZE_BYTES-1) >> 8    ; 1
        andi    YL,     (DL_SIZE_BYTES-1) & 0xFF  ; 1
        andi    YH,     (DL_SIZE_BYTES-1) >> 8    ; 1
        rjmp    go                                ; 2
	go:
        rjmp    loop                              ; 2
                                                  ; = 16

 

Last Edited: Wed. Aug 3, 2016 - 03:14 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:

nobba wrote:

joeymorin wrote:
Input on RXD, output on TXD. For the m168 and friends, thats PD0 and PD1.

 

Ok, I see that on pins 2 & 3. Is the configuration of these pins required in addition to the code? (DDRD)

 

I think it's automatic for RXD and TXD, some peripherals just take over the pins. However, you can see in the initialization code that PD4 was set as output, this is the clock pin. It is not needed here, but maybe you can use it for debugging, it will generate a 10MHz square wave synchronized with the output.

 

 

edit: yeah, it's a different assembler, to do it in atmel studio, some minor changes need to be made.

 

To get it to compile: I made a gcc project, emptied the main.c file , and added your file as a .s (little s) file to the project.

 

About to try it

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

 

Not really sure what I am seeing here. It looks like some other pulse modulated by the input signal.

 

In fact, with no input signal supplied (pin floating), I see what look like 1 or 2 us pulses every now and then.

 

Grounding the input pin just leaves a train of these spurious pulses.

 

I don't know if that might mean anything to you?

 

EDIT: I also noticed on PD4 there are bursts of 10MHz square waves on for about a second, off for about a second (just timed it in my head, nothing accurate)

Last Edited: Wed. Aug 3, 2016 - 03:32 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nobba wrote:
BTW, the guy with your C code -- shift(...) doesn't exist on my compiler, so no idea what you mean by that line. I did paste it in. It fails to compile. I guess you mean left shift, but its just a guess..
I did not know bit positions, so did not even know whether to shift left or right.

I'd intended OP to replace it with the appropriate << or >> .

Perhaps I should have added the comment shift(IN_HIGH)==OUT_HIGH.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nobba wrote:
To get it to compile: I made a gcc project, emptied the main.c file , and added your file as a .s (little s) file to the project. About to try it

Are you talking about running the code as a result of this? It's not going to work without the use of -nostartfiles if you are really building it in an avr-gcc project.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:

nobba wrote:

joeymorin wrote:
Input on RXD, output on TXD. For the m168 and friends, thats PD0 and PD1.

 

Ok, I see that on pins 2 & 3. Is the configuration of these pins required in addition to the code? (DDRD)

 

I think it's automatic for RXD and TXD, some peripherals just take over the pins. However, you can see in the initialization code that PD4 was set as output, this is the clock pin. It is not needed here, but maybe you can use it for debugging, it will generate a 10MHz square wave synchronized with the output.

 

 

edit: yeah, it's a different assembler, to do it in atmel studio, some minor changes need to be made.

 

edit #2: it so happens I had converted the original version smiley  (hope there are no mistakes):

 



 #define __SFR_OFFSET 0

; Must be a power-of-two no greater than half of the available SRAM
#define DL_SIZE_BYTES 64

; 2 <= DELAY_BYTES < DL_SIZE_BYTES
; Since two bytes go  out from the tail of the delay line  before any are added
; to the head, the delay line must be at least 2 bytes long, or 16 samples, for
; a minimum delay of 1.6 us.
#define DELAY_BYTES 25

; Index registers by name
#define XL r26
#define XH r27
#define YL r28
#define YH r29

; 10 MHz sample rate  will generate 10 samples per us.  50  us will require 500
; 1-bit samples.  512 samples would fit in  64 bytes.  Although the m168 has 1K
; of SRAM, it is mapped starting at 0x100.  In order to keep the buffer aligned
; to a  power-of-two equal  to its size,  it cannot be  larger than  512 bytes.
; That would permit a 4096 sample delay line.  At 10 MHz, that's 409.6 us.
.DSEG
.IF	(DL_SIZE_BYTES > SRAM_START)
	.ORG	DL_SIZE_BYTES
.ELSE
	.ORG	SRAM_START
.ENDIF
dl:	.BYTE	DL_SIZE_BYTES

.CSEG

.ORG 0

; configure SPI for F_OSC/2 = 10 MHz
        eor     r1,     r1
        sts     UBRR0H, r1
        sts     UBRR0L, r1
        sbi     DDRD,   4
        ldi     r16,    (1<<UMSEL01)|(1<<UMSEL00)
        sts     UCSR0C, r16
        ldi     r16,    (1<<RXEN0)|(1<<TXEN0)
        sts     UCSR0B, r16
        sts     UBRR0H, r1
        sts     UBRR0L, r1

; Since  it's implemented  as a  circular  buffer of  bytes, the  delay can  be
; configured with a granularity of 8 bits, or 0.8 us.

; X is  used to point to  the head of  the delay line, where  incomming samples
; will be deposited
        ldi     XH,     HIGH(dl+DELAY_BYTES)
        ldi     XL,     LOW(dl+DELAY_BYTES)
; Y is used to point to the tail of the delay line, where outgoing samples will
; be withdrawn
        ldi     YH,     HIGH(dl)
        ldi     YL,     LOW(dl)

; Fill the MSPI TX buffer and wait for  the first RX byte to be ready, i.e. the
; first read must  occur at least 16  cycles after the first  write.  The third
; write must  occur no more  than 32  cycles after the  first write, or  the TX
; buffer will be empty, and there will be a gap.
        ld      r16,    Y+                        ;
        sts     UDR0,   r16                       ;         1st write
        ld      r16,    Y+                        ;                   2
        sts     UDR0,   r16                       ;         2nd write 2

; 3 cycles per pass, total 15 cycle wait.
        ldi     r16,    5                         ;
wait:
        dec     r16                               ;
        brne    wait                              ;                  15

; Run the delay line.  Loop must be exactly 16 cycles
loop:
        lds     r16,    UDR0                      ; 2       1st read  2 = 21
        st      X+,     r16                       ; 2                 2
        ld      r16,    Y+                        ; 2                 2
        sts     UDR0,   r16                       ; 2       3rd write 2 = 27
        andi    XL,     (DL_SIZE_BYTES-1) & 0xFF  ; 1
        andi    XH,     (DL_SIZE_BYTES-1) >> 8    ; 1
        andi    YL,     (DL_SIZE_BYTES-1) & 0xFF  ; 1
        andi    YH,     (DL_SIZE_BYTES-1) >> 8    ; 1
        rjmp    go                                ; 2
	go:
        rjmp    loop                              ; 2
                                                  ; = 16

 

 

Aaaannnddd... the compiler says:

 

Severity    Code    Description    Project    File    Line
Warning        .org 0x40 in .dseg is below start of RAM at 0x100    delay-advanced   delay-advanced\main.asm    25

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

nobba wrote:
To get it to compile: I made a gcc project, emptied the main.c file , and added your file as a .s (little s) file to the project. About to try it

Are you talking about running the code as a result of this? It's not going to work without the use of -nostartfiles if you are really building it in an avr-gcc project.

 

Thank you, I enabled that flag and now I see as I described: the output square wave has a lot of unwanted junk along with it. I have no clue what it is. Its just a blur on the scope, so I'm guessing its 10Mhz spikes.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Try the attached .hex file.  Built with avr-gcc.  I still haven't tested it myself, but I will try this afternoon.

 

 

 

 

Attachment(s): 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:

Try the attached .hex file.  Built with avr-gcc.  I still haven't tested it myself, but I will try this afternoon.

 

 

 

 

Ok, thanks.

 

For me, I don't see much. Difficult for the scope to sync. Even the input is jumping around.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I ought to stay out of this...

 

But that said:

 

1) I think you need the right tools for the job.

A descent dual channel digital O'scope with a robust trigger module would sure make life easier.

It will be very difficult to measure your system performance with an analog scope, you might end up with single impulse test measurements to do this.

 

2) I generally only tinker with electronics, so I have much less cross platform experience than many of the Forum regulars.

That said, I started years ago with the Basic Stamp.

When I needed interrupts and more memory, I moved to Pic's.

When I found AVR's I moved to AVR's and haven't used a Pic since then, (except I guess now I am using MicroChip parts, again...)

When I wanted a faster micro and priority interrupts I switched to the Xmega line, and most of my projects have used an Xmega since then.

.

So, what the point of this story?

Occasionally when one is trying to push the limits on one's current technology, it is time to make a paradigm shift to something totally new.

It seems to me that this project might be trivial on a several MHz - GHz ARM chip, except for the obvious learning curve of new hardware and new development platform.

 

3) Not ready for the switch to an ARM?

One might consider switching to an Xmega series AVR.

It will run, in spec, at 32 MHz, which might give one considerably less jitter in the signal processing chain.

As it appears that this won't use the analog functions, or EEPROM, it is probably reasonably to overclock it to 48 MHz, giving even better performance.

Atomic Zombie routinely overclocks them higher, I started to get into trouble above 48 MHz when I built a custom "logic analyzer" to snoop a signal once.

I've not used the "virtual ports" feature of the Xmega, but it might also be worth reading that section of the manual to see if the port functions can be sped up any through their usage.

If you look at the DMA read to buffer or write from buffer capability remember that the DMA doesn't actually run in parallel with the uC core, if I remember correctly, so don't think of it as a parallel processor.

 

4) Obviously, as you are doing, start at a lower frequency where it is easier to see and measure the system performance, and debug the concept and the software, then start pushing the limits, (calculate your expected limit, and then measure it).

 

5) Some projects benefit from developing custom testbed hardware first, to help develop and debug your primary project.

For example, I built an EKG signal simulator to aid in the development of an EKG monitor.

You might consider a simple uC signal generator project, also.

Push a switch and it generates a single pulse, and a scope trigger, used to watch the pulse propagate through your project.

Push another switch and get a pulse of twice the period.

Push another switch and get a burse of 5 pulses.

Push another switch and get a continuous test stream, (along with your synchronizing O'scope trigger pulse).

etc.

 

6)  I've heard of PIC's DSP series micros, for digital signal processing.

I've never read their data sheets, or worked with one.

I don't know if they bring anything useful to the project or not.

 

7) At 48 MHz you might get away with coding your project entirely in "C", or at least minimizing your need for ASM, as you work towards getting a working low freq prototype up and running.

(No spurious 10 MHz noise spikes, etc.).

Then work on optimizing the system's performance.

 

Sounds like an interesting project!

 

JC

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Appreciate your comments, JC.

 

If I wanted an easy life, I would do it all in analog: heterodyne the 3Mhz signal +/- LO to 200kHz - all pass filter - het it back up to 3Mhz using the same LO.

 

Its just component count putting me off.

 

I do have an XMEGA here (and a dev board), but I figured that 35Mhz vs 20Mhz isn't *that* much faster.

 

Now, if I could do the whole shebang: 3Mhz --> IF --> delay --> back up to 3Mhz in one chip, then I would want *that* chip. So maybe ARM. But then the complexity might outweigh the heterodyne approach.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If I wanted an easy life, I would do it all in analog

laugh  Right.  Easy.  Weekend Project!  Or not!

 

Anyway, I don't believe you described the BW of the base signal.

 

JC 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Its a single-sideband (SSB) voice signal, so approx. 150c - 3kc

 

But its only the phase component I am concerned with in this part of the project: the amplitude part is done elsewhere (actually in a PWM modulator that does not use AVR at all -- just a triangle generator and a comparator then some huge FETS and a LPF) (and delays the amplitude by 20-50uS), hence the need for matching the delays so as to keep the phase relationships.

Last Edited: Wed. Aug 3, 2016 - 05:35 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Rookie mistake :(

 

When I was applying the mask to X and Y in the main loop, I was clearing the higher bit representing the base address of the buffer.  The result is that both X and Y were pointing at the GP register file.  I was clobbering state.  The machine was running amok!

 

New code below.  Tested, and 'works', although I haven't confirmed if the delay is correct.  But the input is in fact now duplicated on the output.

 

#ifndef F_CPU
  #define F_CPU 20000000
#endif

#define __SFR_OFFSET 0
#include <avr/io.h>

; Set to desired delay
#define DELAY_NS 20000

; Must be a power-of-two no greater than half of the available SRAM
#define BUF_SZ_BYTES (((RAMEND + 1) - RAMSTART) / 2)

; 2 <= DELAY_BYTES < BUF_SZ_BYTES
; Since two bytes go  out from the tail of the delay line  before any are added
; to the head, the delay line must be at least 2 bytes long, or 16 samples, for
; a minimum delay of 1.6 us.
#define BITS_PER_US (F_CPU / 2000000)
#define DELAY_BYTES ((DELAY_NS * BITS_PER_US) / 8000)
#if (DELAY_BYTES >= BUF_SZ_BYTES)
  #warning DELAY_US is too long
  #undef DELAY_BYTES
  #define DELAY_BYTES (BUF_SZ_BYTES - 1)
#endif
#if (DELAY_BYTES < 2)
  #warning DELAY_US is too short
  #undef DELAY_BYTES
  #define DELAY_BYTES 2
#endif

; Create a symbol reflecting the real delay.  Examine it with avr_objdump -t
; or similar.
.equ real_delay_ns, (DELAY_BYTES * 8000) / BITS_PER_US

; Index registers by name
#define XL r26
#define XH r27
#define YL r28
#define YH r29

; 10 MHz sample rate  will generate 10 samples per us.  50  us will require 500
; 1-bit samples.  512 samples would fit in  64 bytes.  Although the m168 has 1K
; of SRAM, it is mapped starting at 0x100.  In order to keep the buffer aligned
; to a  power-of-two equal  to its size,  it cannot be  larger than  512 bytes.
; That would permit a 4096 sample delay line.  At 10 MHz, that's 409.6 us.
.section  .bss
.balign   BUF_SZ_BYTES
.comm    dl, BUF_SZ_BYTES

; Determine  address  linker  will  select  for  buffer  (used  for  condtional
; compilation below).   Can't think  of a  way to extract  this from  the .comm
; declaration of dl above.   I expect it can't be done.   Rather, would need to
; specify a custom section and use --section-start= when building.  Meh.
#if BUF_SIZE_BYTES > RAMSTART
  #define DL BUF_SIZE_BYTES
#else
  #define DL RAMSTART
#endif

.section .text

.global __vector_default
        rjmp    reset
.global __vector_default

reset:

.global __do_clear_bss

.global main

main:

; configure SPI for F_OSC/2 = 10 MHz
        eor     r1,     r1
        sts     UBRR0H, r1
        sts     UBRR0L, r1
        sbi     DDRD,   4
        ldi     r16,    (1<<UMSEL01)|(1<<UMSEL00)
        sts     UCSR0C, r16
        ldi     r16,    (1<<RXEN0)|(1<<TXEN0)
        sts     UCSR0B, r16
        sts     UBRR0H, r1
        sts     UBRR0L, r1

; Since  it's implemented  as a  circular  buffer of  bytes, the  delay can  be
; configured with a granularity of 8 bits, or 0.8 us.

; X is  used to point to  the head of  the delay line, where  incomming samples
; will be deposited
        ldi     XH,     hi8(dl+DELAY_BYTES)
        ldi     XL,     lo8(dl+DELAY_BYTES)
; Y is used to point to the tail of the delay line, where outgoing samples will
; be withdrawn
        ldi     YH,     hi8(dl)
        ldi     YL,     lo8(dl)

; Fill the MSPI TX buffer and wait for  the first RX byte to be ready, i.e. the
; first read must  occur at least 16  cycles after the first  write.  The third
; write must  occur no more  than 32  cycles after the  first write, or  the TX
; buffer will be empty, and there will be a gap.
        ld      r16,    Y+                        ;
        sts     UDR0,   r16                       ;         1st write
        ld      r16,    Y+                        ;                   2
        sts     UDR0,   r16                       ;         2nd write 2

; 3 cycles per pass, total 15 cycle wait.
        ldi     r16,    5                         ;
wait:
        dec     r16                               ;
        brne    wait                              ;                  15

; Run the delay line.  Loop must be exactly 16 cycles
loop:
        lds     r16,    UDR0                      ; 2       1st read  2 = 21
        st      X+,     r16                       ; 2                 2
        ld      r16,    Y+                        ; 2                 2
        sts     UDR0,   r16                       ; 2       3rd write 2 = 27
        andi    XL,     lo8(BUF_SZ_BYTES-1)       ; 1
        andi    XH,     hi8(BUF_SZ_BYTES-1)       ; 1
        andi    YL,     lo8(BUF_SZ_BYTES-1)       ; 1
        andi    YH,     hi8(BUF_SZ_BYTES-1)       ; 1
#if DL < 0x100
        ori     XL,     lo8(dl)                   ; 1
        ori     YL,     lo8(dl)                   ; 1
#else
        ori     XH,     hi8(dl)                   ; 1
        ori     YH,     hi8(dl)                   ; 1
#endif
        rjmp    loop                              ; 2
                                                  ; = 16

; Catch-all
__vector_default:
        reti

 

New .hex file for m168 attached.  Built for 20 MHz, and a 20 us delay.  Also, built without using -nostartfiles, so the full CRT is linked.  This clears the buffer to zero since it is in .bss.  It's not necessary, but ensures no spurious output at the beginning with the random contents of SRAM after a power-up.  I've built and tested it both with and without the CRT and it works either way.

 

EDIT:  Whoops, had the wrong hex file attached.  It was built for a 1 ms delay.  Replaced with new file built for 20 us.

Attachment(s): 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Wed. Aug 3, 2016 - 07:29 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
I was clearing the higher bit representing the base address of the buffer.

I read past that also.

 

I thought I was clever in my non-SPI version to stipulate the 256 byte buffer and then only increment the _L register.  But that loop is intended to be minimum cycles; this one exactly 16.

 

This thread has been fun, in that it allows us to make the AVR do tricks.  "The AVR's UART doesn't work right" gets old after a while.

 

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Wed. Aug 3, 2016 - 07:40 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Luckily there were 2 clocks to spare wink now there is none.

 

theusch wrote:

 

I thought I was clever in my non-SPI version to stipulate the 256 byte buffer and then only increment the _L register.  But that loop is intended to be minimum cycles; this one exactly 16.

 

This thread has been fun, in that it allows us to make the AVR do tricks.  "The AVR's UART doesn't work right" gets old after a while.

 

 

If there were no slack cycles, it could still have to be made like that, but like this is quite beautiful, at the limit of the MCU laugh

Last Edited: Wed. Aug 3, 2016 - 07:52 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Whew!

Saved by Joey.

 

I was tempted, very tempted actually, to grab my XmegaE and O'scope and tinker a bit, to see what kind of programmably variable digital signal delay I could come up with, perhaps reading a bit, or the analog comparator, +/- the DMA, to a variable sized buffer. 

 

Good thing Joey solved this as I really have other, (less fun), things to be working on at the moment.

 

I did Google around for some old fashioned Bit-Bucket-Brigade chips that have 1024 or 2048 FlipFlops and used to be used in audio reverb or telephone echo cancelation circuits, but I didn't find anything that stood out as immediately helpful with 20 - 50 uSec delays. 

 

JC

 

Edit Typo

Last Edited: Wed. Aug 3, 2016 - 08:53 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

DocJC wrote:
+/- the DMA,

I'd also have to dig into it, but I think that port DMA would work well.  Probably [as I remember my reading] with sacrificing entire ports as in my polling example on AVR8.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Pages