Delay a stream of pulses by a set amount? best coding approach

Go To Last Post
22 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have a digital ("square" wave, rise fall time ~ 500nS)pulse stream, each pulse is close to 50% duty, and the frequency of those pulses varys from around 500Hz to 5KHz, ie as little as 200uS to as much as 2000uS between rising edges (The pulse stream does include significant jitter)

 

I want to input that pulse stream, and delay it, by a variable amount of time, approximately between 100uS and 10mS.  The output stream should be as close as possible in terms of timing of it's edges to the input stream, just delayed by the appropriate amount.  (The will be glitching when the delay is changed, but that doesn't matter)

 

So, i could use the ICP to capture and time stamp the in-coming streams edges, using a suitable ICP clock.  There would be a fixed minimum latency due to interrupt context switching etc, but that should always be less than the minimum delay (100uS). At the longer delays, a new edge will have arrived before the last edge has been toggled out, so i will need a suitable length buffer.  The question is probably how to drive the output stream, with h/w or s/w?  The processor will only be doing this task, and this task alone, and the delay amount could easily be set rapidly using one port as a binary selector  (ie set 256 steps of delay, which will be plenty)

 

Ideas anyone?  Have i missed some really simple way of doing this using all h/w resources?

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What resolution can you take? ie what error, in usec, is allowed when both measuring the input pulse width and outputting the output signal.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My immediate thought was something like a software version of a bucket - brigade delay line.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I envisage this as a single bit circular buffer into which the state of the input pin is written and then read out to the output pin a fixed number of sample times later.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brian Fairchild wrote:
a software version of a bucket - brigade delay line

https://en.wikipedia.org/wiki/Bucket-brigade_device

 

that would be a Shift Register ... ?

 

 

EDIT

 

https://en.wikipedia.org/wiki/Digital_delay_line

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Fri. May 11, 2018 - 04:20 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

(sorry for multiple posts, editing a stream of thoughts into a post using a phone is impossible.)

In assembler this feels like a couple of dozen instructions, so about 3us accuracy /error with a, 20MHz chip.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

There have been at least a few threads over the years on this very subject.  Try to search them out.  I'm trying to think of good search terms...

e.g. https://www.avrfreaks.net/commen...

 

I mentioned there Xmega with DMA...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Fri. May 11, 2018 - 05:16 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've been googling, but the tricky bit is indeed finding the right thing to google!

 

In terms of accuracy, a few uS here or there, maybe up to 20uS is fine, it's more important that the delay is fixed and doesn't vary much over time, ie the delay doesn't drift long or short

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

e.g. https://www.avrfreaks.net/commen...

Working code in that thread:

https://www.avrfreaks.net/comment/1950351#comment-1950351

 

Will be difficult to achieve a 10 ms delay.  The code above can get you to 255 us on a device with 1K SRAM.  a 4K device could get you over 1 ms, but 10 ms is out of reach unless the granularity is changed.  Currently the granularity is 500 ns.  To reach 10 ms on a 4K device you'd need a granularity of 5 us.

 

Plenty of room for improvement in the code, though.  See the thread.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Seems like you could use something like a 10,000 bit shift register

for 10ms delay

read your pin every 1us, put into bit buffer

beginning pointer store bit (do not shift data,too slow)

move ending pointer & output bit from ending location

 

That seems doable with a handful of asm code.

 

As shorter buffer/wraparound will give less delay.

Is 1us fast enough "scope" resolution?  That's 1us out of 200us..not great, not bad.

You might even be able to sample faster, if the code is shorter.

 

convert bit number to extract to byte location (and the reverse) is the bottleneck....store, say, 16384 bits in 2048 bytes, will smallest fuss

 

sift bit pointer by 3 spots (divide by 8) gives byte number.  The 3 chopped bits give the bit number.

 

SaveBit:movw ZH:ZL, XH:XL  ;XH:XL has buffer pointer value
			
		mov YL, ZL
		andi YL, 0x07

		lsr ZH
		ror ZL
		
		lsr ZH
		ror ZL
		
		lsr ZH
		ror ZL

		ld temp, Z ;get ram byte

		bst myBit, 0   ;copy the read bit (here, lsb) into T-reg

		cpi YL, 0 ;find which bit needs update
		breq bitnum0
		cpi YL, 1
		breq bitnum1
		cpi YL, 2
		breq bitnum2
		cpi YL, 3
		breq bitnum3
		cpi YL, 4
		breq bitnum4
		cpi YL, 5
		breq bitnum5
		cpi YL, 6
		breq bitnum6
		cpi YL, 7
		breq bitnum7


bitnum0:bld temp, 0  ;update proper bit
		rjmp did_bit
bitnum0:bld temp, 1
		rjmp did_bit
bitnum0:bld temp, 2
		rjmp did_bit
bitnum0:bld temp, 3
		rjmp did_bit
bitnum0:bld temp, 4
		rjmp did_bit
bitnum0:bld temp, 5
		rjmp did_bit
bitnum0:bld temp, 6
		rjmp did_bit
bitnum0:bld temp, 7

did_bit:st Z, temp  ;save result ram byte
		ret

 

  

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Fri. May 11, 2018 - 07:32 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You could use one timer to capture timestamps of transitions and put those timestamps in a circular buffer.

Use another timer to toggle an output based on the values from the circular buffer.

This can be done in hardware and thus with clock cycle accuracy.

Then use 2 interrupts to handle the input and output events.

Or use an interrupt for one of these, and do the other with polling.

If you only have a single interrupt in the whole program, then interrupt latency is pretty constant. (Bit of jitter because AVR instructions have different length).

 

Doing magic with a USD 7 Logic Analyser: https://www.avrfreaks.net/comment/2421756#comment-2421756

Bunch of old projects with AVR's: http://www.hoevendesign.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

With 20us accuracy you only need 500 bits of shift register. As the AVR is dedicated to this task you can use 500 bytes which will save you some processing time.

Also 20us gives you 320 cycles of processing with a 16MHz clock which is plenty of time to do a shift register.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
a 4K device could get you over 1 ms, but 10 ms is out of reach unless the granularity is changed.

Use external SRAM?  A refresher pass of your code indicates extra cycles for a minimal number of wait states. 

 

Now that interest has again waxed, I did a Google search "delay line with microcontroller".

 

Patented by Microchip, 2017:   https://patents.google.com/paten...

Mega32 version on Hackaday: https://hackaday.com/2012/05/25/...

Analog in/out with Cortex M3:  http://www.homebuilthardware.com...

AT90S1200 version, circa 2000, referencing another patent: http://www.schmitzbits.de/ddl.html

1-bit.gif

dsPIC project claiming up to 4 seconds: https://electricdruid.net/diy-di...

 

...and many more

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It might help to know the source of this mystery signal, if we knew the format of the data,  instead of treating it like a random bit stream you could capture the data, delay/buffer it and resend it.

 

Jim

 

 

(Possum Lodge oath) Quando omni flunkus, moritati.

"I thought growing old would take longer"

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm certainly not an Assembly wizard, but i'll have a play around in C, and see what i can get working, and then maybe try and tweek the assy for a bit of speed ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

max_torque_2008 wrote:
I'm certainly not an Assembly wizard, but i'll have a play around in C, and see what i can get working, and then maybe try and tweek the assy for a bit of speed ;-)

??? Have you bothered to use the links that we dug out?  Have you looked at the C approaches and the discussion of timing limitations?  Have you worked through the refinements of joeymorin code?

 

Surely in the nearly 100 posts in the previous discussion there will be pearls of wisdom.

 

Those who cannot remember the past are condemned to repeat it.

https://en.wikiquote.org/wiki/Ge...

 

* George's URL repaired. *

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Sun. May 13, 2018 - 01:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have read and digested the links!  But as i said, i'm not going to just "copy" those links because a) i don't really understand how they work and b) i wouldn't learn anything!

 

So, i'm going to play around in C, at a "low" speed and get a feel for what i have to do to robustly achieve my goals, then, i can look at that assy. and of course compare to the posted assy. and between them a) i'll learn something and b) i'll be better at Assy as a result  ;-)

 

IME, the first step of any new thing is the biggest / hardest. By making a first step, even if proven later to be not optimum, much is learnt, generally meaning the second attempt is much more successful.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yesterday i wrote some code to produce a simulated pulse train, with some random (ish) jitter and the sorts of pulse frequencies i am expecting to see from the real device.  I can now use that to write some "delay" code to try to do what i need to do to process that pulse stream !

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Trying to think through an approach that captures the state change, rather than the actual bits themselves, as every low->high is necessarily followed by a high->low of course (and vise versa)

 

Could have a free running counter, incrementing at some frequency (lets say every 10uS for the sake of argument) and looking for an edge.  When that edge occurs, it's time the corresponding counter value is stored into an array, plus the offset  (delay) trim value.  Each time the counter update function runs, we check if we have a match in the next array location, and toggle the output port if we do.

 

Doing this without interrupts in the main loop, means no context switching overhead, so could be faster than the hardware ICP (which would be faster to capture the state change, but slower to decode?

 

This way, the buffer could be very long indeed (in terms of max edges) because we are only tracking the state change and not the signal itself?

 

The buffer management is very simple as well, with just a "load" and a "unload" pointer.  Every state change in the incoming signal loads it's timestamp+offset into the buffer at the load pointer. If the unload pointer is behind the load pointer, we just check for the next buffer clock value to occur, toggle the output port, and then move the unload pointer up one place.

 

 

I'd like to load the delay (offset) from a parallel input of say 10bits, which would be pretty easy to do quickly as well  (a separate processor is determining the offset value in near real time, from the same input signal (it's looking for a certain pattern in that signal), and will put out the necessary offset out of a matching parallel port)

 

 

hmm, choices choices!

 

(Btw, i suspect that 10uS sampling will actually be enough, 5us would definitely be enough, 2uS would be an overkill)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Use an input capture  mode (ICP1) to capture the timer value at the exact edge instant (rising) and when programmed for falling.  You would select falling trigger upon getting a rising edge notification & vice-versa.

Then store the time difference from the previous edge.  If this difference is always less than 15 bits, you could use the msb to note whether it is going high or low (to ensure no loss of sync).  Upon a timer IRQ, the output pin is set high or low as needed, the next time difference is  read from storage buffer & used to set when another IRQ happens.

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

max_torque_2008 wrote:

I'd like to load the delay (offset) from a parallel input of say 10bits, which would be pretty easy to do quickly as well  (a separate processor is determining the offset value in near real time, from the same input signal (it's looking for a certain pattern in that signal), and will put out the necessary offset out of a matching parallel port)

 

If you want to do that vary of 10b offset 'live' - ie during a active pulse session, that makes the edge-capture approach very tricky.

You can easily adjust a ring-buffer playback pointer 'live', but edge capture is queued in a dT manner, so there is no identifiable time related offset.

 

If you need 10ms delay and 10us sampling, that's just 1000 binary points, which is sounding modest 

 

if you do need to pack into bytes, there would be 2 possible ways : adjacent times in adjacent bits within a byte, or adjacent times in same-bits-next-byte. 

In the second case, you scan the array 8 x, shifting a mask x1 each time.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

...

Now that interest has again waxed, I did a Google search "delay line with microcontroller".

...and many more

 

And some 2018 devices that might be suitable ...

I see ISSI now have 512k/1M/2M/4Mbit QuadSPI SRAM in SO8, up to 45MHz.

 

in SQI modes, those can read a 8b in 12 clocks, 16b in 14 clks and write 8b in 10 clks, 16b in 12 clks .. etc.

With a 10us sampling rate, that's 160us average for 16b R/W, 5us is 80us etc.

Allow say 30 clk slots for CS+RD+WR, and even 5us is an average of < 2.66 us/clock, to 'keep up', should be quite doable...

The on-chip buffering here is quite small, 1-2 or a few bytes.