atmega1284p UART Rx glitch

Go To Last Post
13 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have been struggling with UART comm problems on an ATmega1284p (5V, 20 MHz crystal, date code 0805). The symptom looks like code runaway; control will jump into my bootloader at random times. The bootloader always displays the cause for entry by analyzing MCUSR; when the problem appears, MCUSR flag bits are all cleared.

I can force the problem to occur by:

1. Setting up UART0 (57600, 8N1), Tx ONLY is enabled (Rx and UART interrupts are disabled).

2. Writing a known pattern to a large array in RAM.

3. Disabling interrupts [cli();].

4. Sitting in a loop that tests all cells of the array for the correct value and displays any errors detected. The number of times through the loop is displayed on the UART (using Hyperterm), so I see a constantly incrementing count to 0xffff, then a wrap back to 0.

Here's the problem: If I let the above program run with no interaction on the UART, the program works perfectly. The loop counter increments, then rolls, forever (or so), and the error counter is always 0.

However, if I hold down a key on the keyboard and force Hyperterm to send a stream of chars to the UART, the program will consistently crash back to the bootloader, which reports MCUSR as 0. The timing of the crash is not predictable, but it always happens, generally within about 30 seconds.

This behavior happens on two different '1284p devices, both with date codes of 0805.

The only reference to something like this that I can find on the web is here: http://uzebox.org/forums/viewtop...

Has anyone else seen this? If so, how did you fix it?

klaxon44

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

However, if I hold down a key on the keyboard and force Hyperterm to send a stream of chars to the UART, the program will consistently crash back to the bootloader, which reports MCUSR as 0. The timing of the crash is not predictable, but it always happens, generally within about 30 seconds.


Strange, indeed, since RX isn't enabled. If it were and being buffered somewhere then the buffer would overflow at some point and weird things could happen.

C program, or ASM? A routine that fusses with the stack and leaves it unbalances...

Try this: dump your stack pointer(s) each pass, along with the UCSR0A.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It seems the PC's TX is causing problems somehow. Maybe it is somehow getting on another pin. I assume you have a level converter on board. Maybe that is causing problems. Maybe your power supply is too weak to handle the level converter.

RX and TX are easy to get confused. Especially if you use a Maxim converter or one of it's clones.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here's another datum...

I changed the crystal to 16 MHz and rebuilt the test code for that frequency. The problem has not recurred after 35K+ loops, which is at least 5x farther than it ever made it before.

I will continue testing, but it looks like the Uzebox group might be correct; there appears to be a flaw in the 2008 silicon for the '1284p that introduces glitches on the internal data bus when there is activity on the UART Rx pin *AND* the device is "overclocked" (where overclocked = 20 MHz in this case).

Note that the bootloader seems to behave properly when downloading, though the files are very small. Even so, I have seen very rare verification failures when downloading and was never able to reproduce them, so those failures may well have been an artifact of the above flaw.

klaxon44

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Try to make another shorter loop and test whether
that also crashes.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

At 20MHz and 57600 you get a -1.4% error and a 2.1% error at 16MHz. (all there in the data sheet)

Try a 18.4320 MHz crystal.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've finished my testing and have confirmed to my satisfaction that there is a flaw in the 05/2008 silicon. Here is a program that demonstrates the flaw:

#include  
#include  


#define  TESTPIN		0


void			toggle(void);


int  main(void)
{
	volatile  uint16_t			n;

	DDRB = DDRB | (1<<TESTPIN);

	while (1)
	{
		for (n=0; n<20000; n++)  ;
		toggle();
	}
}


void  toggle(void)
{
	PORTB = PORTB ^ (1<<TESTPIN);
}

While the program is running, monitor PB0 with an oscilloscope or hook up an LED and watch it blink (the 'scope gives a better picture). :-)

If I run this program with a 16 MHz crystal, the trace is perfect, whether I inject serial data on RX0 or not.

If I run this program with a 20 MHz crystal, the trace is perfect so long as I do not inject serial data on RX0. When I inject serial data on RX0, the trace shows immediate jitter and the high and low durations become randomly irregular. This effect is immediate and is clearly caused solely by presence of serial data on RX0.

If anyone can run this test on their own '1284p hardware and report back, I would like to hear from you. Please include the date code on your chip.

Tomorrow, I will contact the local Atmel FAE and see if Atmel can shed any light on this.

In the meantime, I'm running the '1284p at 16 MHz and looking forward to new silicon and a 20 MHz crystal. :-)

klaxon44

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You don't say what your clock fuses are set to, but if its not set to full swing with 20mhz, you will get weird problems on the usart.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Could you make a simple test for me: Add a series resistor of 1k in front of the RX pin. Add a small capacitor ( 100pF or so ) between RXpin and GND.
( so built a lowpass before the RXpin) Does
that change the behavior ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ossi,

Brilliant! I tried your suggestion (using a 220pf cap, no 100pf in my junkbox). The jitter on the pulse output in the test program above disappeared, as near as I can tell with my 'scope. (It's a very old analog 'scope, no math functions, no storage).

I then reloaded my bootloader and ran some download tests, to ensure the RC circuit didn't muck up the exchange of RS-232 data. Downloads at 57.6K appear to work as usual, with no verification errors.

I then downloaded the loop test, described in an earlier post. I put a weight on the keyboard's space bar so Hyperterm would flood the RS-232 port with ASCII spaces. So far, the program has run about a million loops without failing, a result that previously I could only get at 16 MHz.

Thank you for your suggestion! I'm basically a firmware guy and turned to the freaks hoping someone could provide a simple fix using electronics.

I will continue testing and report back if I spot anything out of the ordinary.

Thanks again!

klaxon44

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have bad news for you:
When reading your thread it took some time but then I remembered: I had observed something similar. But the
100pF trick seems to help only sometimes. In other
cases it doesn't. I will try to find the thread I opened that time. It was about a Tiny2313. The
behaviour was really crazy. The crashes seemed to appear and disappear depending on many causes.
I got no real clue.

http://www.avrfreaks.net/index.p...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I read through the thread you posted, and I'm not so sure the two cases are related.

In my case, the crash back to the bootloader never appeared as a BOD; MCUSR was always reported by the bootloader as 0 (no known cause). The bootloader always zeroes MCUSR before handing off to the application; if the bootloader restarts later for any reason, it always reports the cause.

Secondly, when I was running the pulse test and injecting data on the RX0 serial line, the pulses were shortened on the high and low lengths, but I could not tell that the device reset itself. (This last is a bit tenuous, as I had nothing in the code that would lock on reset, so I don't know for sure that the device wasn't resetting.)

My hypothesis here is that this is a picopower issue, similar to that described in the Uzebox thread I mentioned above. In that case, transitions on the RX0 data line seem to induce glitches on the internal RAM data lines, causing bad fetches from RAM. This in turn would cause the pulses I was seeing to be misshapen (the timing variable was misread when fetched from RAM).

This would also cause the program to reenter the bootloader (a return address is misread when pulled from the stack and the CPU takes off into the tall weeds, eventually hitting the bootloader entry point at 0xf000).

I finally halted my previous test after literally millions of loops without a single failure. I have added your RC network to both RX0 and RX1 and am now on to the next phase of the design.

If further testing turns up any problems, I'll get back to you. I will also let you know if I turn up anything with Atmel tomorrow morning. (Does anyone from Atmel read this list? Sure seems like they should...)

klaxon44

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I will be back after a 4 days journey.