[Solved] ATMEGA128 timing problem

Go To Last Post
17 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi freaks,

 

I've an interesting problem:

 

I'm using an ATMEGA128 to control single wire protocol of ws2812b led stripes.

It uses a simple pulse length protocol which perfectly fits into 20 assembler cycles @ 16MHz.

I've two routines to program the leds:

1. A RAM based buffer is sent to the stripe with an inline assembler routine: Works perfectly!

2. A program flash based buffer is sent to the stripe with lpm Rxx, Z+ has timing jitter problems!?

   I calculated the assembler routine to fit into 20 cycles, assuming that lpm takes 3 cycles constantly. Sometimes the sent patterns are having a disturbed timing!?

   Copying the data from flash into RAM and sending it via the RAM-routine works also perfectly.

 

Conclusion: For my surprise, the timing of the lpm instruction does seem not to be constant  (three cycles)...sometimes it changes...

This is very bad for the timing of sending data directly from flash...I've to copy it to RAM and send it from there, which unfortunately costs time which I'd like to save!?

 

Questions:

Is it really true, that the read of the program flash has not a RISC style constant timing???

...I've some doubts about this explanation of my problem...but it is the only one I see in the moment!

 

Any other idea to explain the strange behavior is very welcome.

 

...have fun...

Michi

 

 

 

 

This topic has a solution.
Last Edited: Wed. Oct 1, 2014 - 08:00 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

According to the datasheet, all the LPM instructions take 3 cycles.

Before concluding that sometimes they do not, I'd write some code specificly to test said notion.

If you cannot get LPM to work, you might try LDI.

I can think of ways that might make it convenient at the source level.

ICALL, LDI, RET is 8 cycles and four bytes of storage per byte of data.

Don't know how fast you have to load data.

International Theophysical Year seems to have been forgotten..
Anyone remember the song Jukebox Band?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thx for your response:

lpm works fine, the only thing which makes trouble for me is, that I have a sequence generator which depends on exact timing.

The almost identical generator works perfect when accessing RAM using "ld":

  asm volatile(
	       "bitLoop:"              "\n\t" // Clk Pseudocode     (T = 0)
///////////////////////////////////////////////////////////////////////////////
	       "out %[port], %[hi]"    "\n\t" // 1   High           (T = 1)
	       "rjmp .+0"              "\n\t" // 2   nop nop        (T = 3)
	       "rjmp .+0"              "\n\t" // 2   nop nop        (T = 5)
	       "nop"                   "\n\t" // 1   nop            (T = 6)
///////////////////////////////////////////////////////////////////////////////
	       "out %[port], %[data]"  "\n\t" // 1   Data           (T = 7)
	       "rjmp .+0"              "\n\t" // 2   nop nop        (T = 9)
	       "rjmp .+0"              "\n\t" // 2   nop nop        (T = 11)
	       "rjmp .+0"              "\n\t" // 2   nop nop        (T = 13)
///////////////////////////////////////////////////////////////////////////////
	       "out %[port], %[lo]"    "\n\t" // 1   Low            (T = 14)
	       "nop"                   "\n\t" // 1   nop            (T = 15)
	       "ld %[data] , %a[ptr]+" "\n\t" // 2   b = *ptr++     (T = 16)
	       "sbiw %[count], 1"      "\n\t" // 2   n--            (T = 18)
	       "brne bitLoop"          "\n"   // 2   if(i != 0) ->  (next data)
///////////////////////////////////////////////////////////////////////////////
	       : [data]  "+r" (data),
		 [count] "+w" (n)
	       : [port]  "I"  (port),
		 [ptr]   "e"  (ptr),
		 [hi]    "r"  (hi),
		 [lo]    "r"  (lo)
	       );

When accessing flash using lpm it works too, but in some cases it fails reproducable:

  asm volatile(
	       "bitLoopP:"               "\n\t" //                    (T = 0)
///////////////////////////////////////////////////////////////////////////////
	       "out %[port], %[hi]"      "\n\t" // 1   High           (T = 1)
	       "rjmp .+0"                "\n\t" // 2   nop nop        (T = 3)
	       "rjmp .+0"                "\n\t" // 2   nop nop        (T = 5)
	       "nop"                     "\n\t" // 1   nop            (T = 6)
///////////////////////////////////////////////////////////////////////////////
	       "out %[port], %[data]"    "\n\t" // 1   Data           (T = 7)
	       "rjmp .+0"                "\n\t" // 2   nop nop        (T = 9)
	       "rjmp .+0"                "\n\t" // 2   nop nop        (T = 11)
	       "rjmp .+0"                "\n\t" // 2   nop nop        (T = 13)
///////////////////////////////////////////////////////////////////////////////
	       "out %[port], %[lo]"      "\n\t" // 1   Low            (T = 14)
	       "lpm %[data] , %a[ptr]+"  "\n\t" // 3   b = *ptr++     (T = 15)
	       "sbiw %[count], 1"        "\n\t" // 2   n--            (T = 18)
	       "brne bitLoopP"           "\n"   // 2   if(i != 0) ->  (next data)
///////////////////////////////////////////////////////////////////////////////
	       : [data]   "+r" (data),
		 [count]  "+w" (n)
	       : [port]   "I"  (port),
		 [ptr]    "e"  (ptr),
		 [hi]     "r"  (hi),
		 [lo]     "r"  (lo)
	       );

The code is almost identical, except replacing

nop
ld %[data] , %a[ptr]+

by

lpm %[data] , %a[ptr]+

I removed an delay "nop" due to 3 cycles for "lpm" instead of 2 cycles of "ld".

Mostly everything is fine, except in some rare cases the communication is disturbed reproducible which makes me believe that "lpm" has a variable timing!?

 

...have fun...

Michi

Last Edited: Mon. Sep 29, 2014 - 03:31 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not familiar with the magic incantations of that toolchain.  Are you sure that "ptr" is valid in all cases?

 

Is the data always located in the lower half of flash?  (I.e., no ELPM considerations?)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes ptr is definitely valid...the bits following the problematic ones are working correctly!

It is always a defined part of a long bit sequence which fails. Fortunately it is pretty deterministic :)

%a[ptr]+ is correctly replaced by the inline assembler by "Z+".

I will make later some measurements of the timing with a logic analyzer...

 

...have fun...

 

Michi

Last Edited: Mon. Sep 29, 2014 - 03:56 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ptr should be an input output register.

You might have gotten lucky or it might have disrupted something later.

For LPM, the Z register is the only pointer register pair that works.

You seem to have gotten lucky in that regard.

Also, your cycle-counting is wrong.

The LPM version is 21 cycles.

14+3 != 15

15+2 != 18

 

International Theophysical Year seems to have been forgotten..
Anyone remember the song Jukebox Band?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi skeeve,

 

you are right in both topics:

 

1. LPM and Z: the inline assembler used the correct Z register...I'm not very familiar with the avr-gcc inline assembly...what's your proposal %z[ptr]+?

2. You are absolutely right, I miscalulated the timing of !both! versions...I will fix that and give it a second try.

 

...have fun...

Michi

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think the constraint should be "+z" in the output section of the constraints.

Its reference can remain the same.

The a in %a[ptr]+ indicates the lowest byte of a group of registers.

Not sure why it needs to be that way.

I'm not sure why you are using inline assembler.

A subroutine would seem adequate.

International Theophysical Year seems to have been forgotten..
Anyone remember the song Jukebox Band?

Last Edited: Mon. Sep 29, 2014 - 06:31 PM
This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've found the problem!

It is not the LPM timing...it works exactly as specified.

 

I did a pre-initialization of the data variable in C, but i didn't use the proper pgm_read_byte() macro...dumbass!

One short look into the logic analyzer output showed clear a wrong leading bit which is related to the faulty pre-initalization in C.

 

devilThe problem is always sitting in front of the monitor.devil

 

Everything is now working as expected!

Thx for your support...

especially @skeeve.

    I will try to find the correct tag for the Z register now. 

    Maybe you are right to switch over to pure assembler would be better...will think about that later...meanwhile the inline-assembler works.

 

...have fun...

Michi

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

but i didn't use the proper pgm_read_byte() macro...dumbass!

Maybe time to look at ditching the old PROGMEM and switching to the new __flash? (it doesn't need pgm_read_byte() any more which makes writing code more "natural").

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes...sure...

...but I solved the problem by rewriting the complete function(s) in pure assembler to get even rid of that ugly inline assembler stuff.

 

Thx for your help!

 

...have fun...

Michi

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

After a peek at the datasheet of the device, indeed this is a case for cycle-counting.

 

As this is a one-bit-wide serial stream, it appears you are burning a complete I/O port, as well as only using one bit of each "data" byte.  So for one value transfer, you need to burn 24 bytes of storage?  Interesting.

 

The venerable '128 doesn't have the bit-toggle feature, which could be used to advantage here.  It still would appear to me that there are enough NOPs in the 20 cycles to do some kind of bit shift, and/or constant-time conditional logic.  All three bytes could be "prepared" during the RES time (assuming only one target device).

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You are a good reviewer ;)

 

I use 4 or 8 stripes for  Quadro- or Oktocopter light effects (each stripe with 10 or 11 Leds).

There are two options:

1. Sending the data all serial 4/8 riggers with 10/11 Leds each.                           Serial mode: 8 * 11 Leds * 3 bytes per Led = 264 Bytes or 2112 bits = 2112 cycles @ 1.25us

2. Sending the data parallel on up to eight channels, one per microcopter rigger.  Parallel mode: 8 * 11 Leds * 3 bytes per Led = 264 Bytes = 264 cycles @ 1.25us

 

To reduce the protocol overhead I transmit 4 or 8 Bits in parallel over the out port byte, serving up eight channels (riggers). The data is preprocessed according to this format and stored statically in memory or processed dynamically or a combination of both.

Doing that I reduce the time for transmission so that I have a lot time to process the data...I use something about a 3% timeslot for transmission which would increase to 24% for 8 channels in pure serial mode.

It seems a bit strange, but we calculated several processing schemes and got no advantage in organizing the data different, but a big penalty from communication overhead.

 

...have fun...

Michi

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you had a look at this:?

http://www.avrfreaks.net/forum/q...

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"Read a lot.  Write a lot."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

Not yet...thx for the link.

Meanwhile my configuration is running perfectly and I can care about the the next open topics:

1. PWM decoder as control input from a servo signal

2. Pattern generator to easy configure fenzy light effects.

 

...have fun...

Michi

 

 

 

 

 

 

 

 

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Michi01 wrote:

1. PWM decoder as control input from a servo signal

Configure a timer (probably a 16-bit timer, but an 8-bit is also possible) for input capture mode, with interrupt on capture.

Configure IC edge to rising (assuming normal RC servo signal polarity).

On the rising edge, the timer value will be captured, and an interrupt generated.

In the ISR, save off the captured value and switch the IC edge to falling.

On the falling edge, timer value will be captured and an interrupt generated.

In the ISR, save off the captured value and switch IC edge to rising.  Your pulse width is 2nd_val - 1st_val (in timer tick units).

 

This works as long as your timer period (input clock / prescale / natural count - 256 or 65536) is longer than the longest servo pulse, typically a little over 2ms.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thx for your advice:

You described exactly what I've in mind:

The input servo signal is connected to IC1 pin and I plan to use 16bit-timer1 for this purpose.

Well ... my ideas were a little different to your proposal: As far as I understand the manual, there is a pulse width capture mode which allows to automatically store the width in a timer register...!?

As soon as I've the time, I will do some studies and experiments on that...

 

...but we are a bit out of topic regarding to the title of this thread...maybe I should close this one and as soon as I run into problems(hopefully not :) I will open a new one...

 

...have fun...

Michi

Last Edited: Wed. Oct 1, 2014 - 07:59 AM