ATmega1284 Misbehaves at Low Temperature

Go To Last Post
43 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

We have a simple Controller built around an ATmega1284 microcontroller. We have been building these for years, and never had any problem (with the micros, at least).

Last week we encountered two boards that failed the routine production test. Both boards showed odd misbehaviour by the software. Sometimes the main-line code seemed to hang while some interrupt-driven stuff kept going, but there were also times when the main-line user interface was still working but the interrupt driven communication code was not working.

After experimenting with the two boards for a while, the problem got harder to replicate, and then disappeared. On a whim, I put the boards outside (it's Winter) and let them cool off for a while. When I brought them back in, the problem was back immediately!

I have spent most of today looking into clock fuse settings, the ceramic resonator, the reset line, the power supply, and stray conductance on the PCB, and none of those appear to be the cause of the problem. That seems to leave only the microcontroller itself.

Has anyone experienced this kind of temperature-dependent failure of the micro software to execute properly?

Any advice would be most welcome.

Bert Menkveld
bert@greentronics.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

One of the other things that can happen is that when you bring a cold board inside a warm place, moisture will condense on the board. Then, as the board warms, it will slowly evaporate. Solder rosin residue can be quite conductive when moist from condensation.

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm sure others can give you some real advice, but I'll share a thought or two.

Is the uC in spec, temperature wise? There are different grades for the processors, with automotive typically have a broader temp range, and industrial or some such name yet another temp range. A failure of the chip if it is being used out of spec isn't really a "failure", but I assume this isn't the case.

It would be nice to confirm that it is the uC that is malfunctioning and not another component, or a bad solder joint, crack in a trace, etc., that fails when the temp falls.

An option, assuming you don't have a temperature chamber available to you, is to cool the board down outside, and then use a wall wart to heat a power resistor, and hold the power resistor against the body of the uC, to heat it up, while leaving the rest of the PCB at the cold ambient temperature.

In theory you could use an aerosol can of "freeze it" or similar to just cool the uC while the rest of the PCB is at room temperature, but the cans run empty too quickly, and the above might work better for you, with well localized heating.

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There have been sporadic accounts of strange behavior with USART0 when the '1284 is clocked at a high rate. Despite several attempts, I was not able to replicate this problem, but then I ran my experiments at room temperature.

Just wondering if that the two might be related. Let us know what you find out.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It sounds like something that would happen if the clock is running above 8MHz and not in full swing mode.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

js wrote:
It sounds like something that would happen if the clock is running above 8MHz and not in full swing mode.

Sound along the lines of what I'd suggest. Another thing I ran into is that if you are using the internal RTC clock, the frequency varies with temperature.

Test I would do if you are using the internal clock, is rework one of the boards with an external 8MHZ oscillator and see if the 'temp problem' goes away.

Also, as the guy above mentioned, condensation can play havoc with electronics.

AKA put unit in the fridge, stops working... then after awhile starts working.... take it out... stops working.... then starts working.... I have seem many engineers dazed and confused by this over the years.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If condensation is a problem then the board needs conformal coating.

It's always a good idea anyway for outdoor stuff, but that may have nothing to do with the problem of course as it seems that other boards work well.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What's the temp spec of the resonator? I'd try it with a crystal. If the problem disappears, Bob's Yer Uncle.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And, of course, you might have 2 wonky chips. You never know.

The largest known prime number: 282589933-1

It's easy to stop breaking the 10th commandment! Break the 8th instead. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gibbon wrote:
js wrote:
It sounds like something that would happen if the clock is running above 8MHz and not in full swing mode.

Sound along the lines of what I'd suggest. Another thing I ran into is that if you are using the internal RTC clock, the frequency varies with temperature.

Test I would do if you are using the internal clock, is rework one of the boards with an external 8MHZ oscillator and see if the 'temp problem' goes away.

Also, as the guy above mentioned, condensation can play havoc with electronics.

AKA put unit in the fridge, stops working... then after awhile starts working.... take it out... stops working.... then starts working.... I have seem many engineers dazed and confused by this over the years.

Thank you all for your responses. It's great to have some other people to make sure I'm thinking clearly.

I do not believe the problem is related to condensation. The problem first showed up during our room-temperature production testing. And our boards are normally inside a sealed case, which I've been using for most of the outdoor cool-down and subsequent indoor testing.

The temperature is only just below freezing outside, so I'm not pushing temperature extremes. The ATmega1294-PU we're using is rated down to -40C.

The CPU is clocked at 20MHz, the maximum clock frequency. Power supply is 5.0V from a simple linear regulator. It's using the internal oscillator using a ceramic resonator with built-in 15pF caps. Fuses are set for full swing oscillator: CKSEL3-0=0110, SUT1-0=11.

The scope shows a nice big sine wave (1V to 4V, I think it was) on the XTAL1 pin.

I have replaced the ceramic resonator with a device from a different manufacturer, with no change in the observed software failure at low temperature.

Could I have a bad power supply bypass capacitor? Guess I could simply replace them and see if it makes a difference.

I'd like to switch out the micro, but it's very hard to desolder a 40-pin DIP without damaging the PCB.

Maybe I should try cobbling up an external 20MHz clock source, configure the micro for external clock input, and see if that fixes the problem.

Thanks for thinking through this with me. Any further insights most welcome.

--
Bert

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Dry joint somewhere ? Temp fault/condensation shouldn't be a problem ( like yours) unless you have very high input impedance paths. Some of my stuff runs off 2M2 pullups and is sensitive to damp and flux.Those boards have to be clean and conformally coated....You can also 'isolate' the fault by using freezer spray, applied to various parts of the circuit. If something is abnormally temperature sensitive, you will find it quite quickly.
Removing a 40 pin dil... ( so long as it's a cheap chip!) cut the legs by the side of the package and then pull out the pins that are left sticking out of the PCB....

Last Edited: Tue. Feb 26, 2013 - 03:45 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You should also set the brownout reset fuse.

Without brownout reset, the CPU can do crazy things on low voltage.

Especially on high frequency (20MHz) the effect was critical.
The highest brownout level should be selected for 20MHz.
And also the longest reset time.

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Sometimes the main-line code seemed to hang while some interrupt-driven stuff kept going, but there were also times when the main-line user interface was still working but the interrupt driven communication code was not working.

Weird things like that can happen with noise spikes. Does the test setup involve real hardware, like firing contactors/solenoids/motors? I'd look for missing/faulty suppression components. Also, does the board still fail if it just runs in a "quiet" environment?

Side note on the symptoms: Enhance your watchdog so it is an "AND" condition. See https://www.avrfreaks.net/index.p... https://www.avrfreaks.net/index.p...

If it actually resets, then it is a little easier. Count your resets (you >>might<< perhaps be seeing continual resets). Examine the reset cause in MCUSR (or whatever it is for that model).

Is BOD used? That would pick up e.g. AVcc not connected.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Some random ideas...

If you turn an ordinary "air" duster can upside down, it becomes a can of freeze spray. :) It probably doesn't get as cold as real official freeze spray, but it's cheaper and can be bought at 8 PM on Sunday at the general store. The effect lasts for several seconds, depending on how warm the part you have sprayed gets in normal operation.

How is the ESD control at the production line? Maybe somebody forgot to plug in his or her wrist strap first thing in the morning, and didn't mention this to you. This is more likely to show up in the winter than in the summer.

One way I have isolated power supply trouble in the past is just to use a plain old battery, either alkaline or lead-acid, wired into the circuit. You *know* a battery is not noisy and is isolated from the line and everything else. It's even adjustable in 1.5 volt increments. :) Use a 1N4001 in series to knock off a volt, and a Schottky to knock off half a volt. (Okay, it's probably not a good idea to float your average AA cell 200 volts above ground, but for most microcontroller-y things it works pretty well.)

I hope this helps!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

More good ideas. Thanks!

I do have brown-out enabled, with the voltage set to 4.3V.

Biggest resistor value on the board is 100K, used for pull-up on the RESET line.

The test environment is not at all electrically noisy. But I will put a scope on the 5V power supply to see if anything unexpected is going on.

My software does enable the watchdog timer, but I discovered that it's simply reset by the timer tick interrupt handler, so software failures elsewhere won't produce a watchdog reset. That definitely needs to be improved, but does not address what's causing my current problems.

There are definitely no continuous resets going on, as that's easy to see when the LCD display is re-initialized.

I've been doing more testing, and have found more boards with the same problem. Some even show the problem at room temperature, though the problem seems to show up more readily at lower temperatures. Only some boards show the problem.

Weirdly, most boards will only show the problem once. When I cycle the power and try again, the problem refuses to re-appear. Is it possible I actually have a software bug that depends on the initial values of some RAM locations? Does anyone have any ideas whether the power-up value of uninitialized RAM might vary with temperature? Pretty far out, but I'm grasping at straws....

Could ESD damage produce these symptoms? The software misbehaviour is remarkably consistent. You'd think ESD damage would destroy a pin, but there's no evidence of anything like that.

???

--
Bert

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You said internal oscillator, but you said 20MHz. To me, this means you have saved a new value in the Oscillator Calibrate register to goose up (too technical?) the 8MHz internal oscillator to WAY UP. I'd recommend using an external crystal, NOT an external resonator, NOT the internal oscillator. An external oscillator is OK... that is the rectangular can with 4 pins: 5V, Gnd, Clk Out, and another gnd.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As far gcc is concerned a reset will do the same processing as a power on. You might try the maximum startup delay to rule out some slow rise in vcc.

But other devices might be in various states after mcu restart, e.g. a powered-up SPI device might need a different initialization than one that has been put to sleep.

When the expansion coefficients differ temperature change causes strain in the solder pads, and then a hairline fracture could intermittently open. That condition will show up with the cooling spray test. Then resolder the joint, or heating the entire board will sometimes fix this, even if only to 80-90C.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There ARE ceramic caps that have a very wild temperature coefficient. Y5V is one. At -30C, it will LOOSE 60%+ of its room temperature capacitance. That can be a killer for LDOs and for some switch mode supplies.

That said, the fact that you have this problem on only 2 boards does not make this a likely cause.

By the way, do you mean Mega1284? I cannot find a 1294 and that seems like an improbable Atmel P/N.. If so, hook up a JTAG debugger and find out WHY it is hanging in the places you indicate. It is really hard to provide much help with just "It hangs in an ISR". And, anything that you come up with will be little more than a guess unless you really find out what stops working.

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Clarifying some things I have apparently confused:

It's an ATmega1284 -- that 9 was a typo.

I'm using the full swing crystal oscillator, not the internal RC oscillator. Using an external 20MHz ceramic resonator with built-in caps.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You could always grab a few other productions boards and toss them outside, cool them down, and then bring them in and test them. Do they have the same failures? If not, then you probably have 2 bad boards. If other boards fail then you need to dig deeper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I'm using the full swing crystal oscillator....external 20MHz ceramic resonator
Note that there are DIFFERENT FUSE SETTINGS for external CRYSTAL and external RESONATOR for the full swing mode.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This sound a lot like the problems I was having last week when it turned out to be static, see https://www.avrfreaks.net/index.p.... Try using an earth wrist strap while testing and see if this helps.

What the difference between temperature and static? It all good fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Rfrost2 wrote:
This sound a lot like the problems I was having last week ...

How appropriate ... a response to "misbehaves at low temperature" by Mr. Frost! ;)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This sounds like the trouble I was having the other day and it turned out to be static (ESD), try using an earthed wrist strap while testing the boards and see if this makes any difference. Static can send the uC into the unknown and a power cycle can be the only way to recover. If the design has worked previously then it has to be new source of components or something daft like static. See my comments regarding strange behavior with Tiny828 last week.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This sounds like the problem I was having last week that turned out to be static (ESD), when testing the boards make sure an earthed wrist band is used so that the test technician is discharged before touching it. A short static spike can send the uC into unknown.

Moving about with the board will only increase the chance of static build up and the temperature might be a red herring. See, https://www.avrfreaks.net/index.p... for previous posts re my problems.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@Rfrost2 and OP maybe... high impedance circuits with pin change ints.(inputs)are very touchy sensitive and can easily give the impression that they are static sensitive.
Doesn't sound to me like the OP's problem is a static thing.. Mind you, it is Canada and it is winter...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the latest ideas.

Here are my latest results in this sadly on-going saga.

I tried doing the same test while wearing a (grounded) ESD wrist strap. Makes no difference -- same failures as without ESD strap.

I removed the ceramic resonator, and replaced it with an external clock source from a 20MHz oscillator can. Changed the micro fuse settings for CKSEL=0000 and SUT=00. The micro runs as usual, but also fails exactly the same as with the ceramic resonator and its internal oscillator. So the clock is not the problem.

We have quite a number of boards that show this problem, so it's not just one board or one part.

One of the curious things is that the software failure is quite consistent (when it happens). Most of the time the software misbehaves in exactly the same way: the communication code sends false data (this is what first alerted us to the problem). If the problem were in the hardware, I would expect random problems, and usually total lockup and/or reset.

So what if the problem is actually due to some obscure software bug? Why is that problem only exhibited on some boards, and why does it show up more readily at low temperature?

I used to think I was pretty smart....

--
Bert

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, I'll defer to the experts, but a few more thoughts:

If you would:
Confirm that good PCBs remain good, at both room temp and cold temps.
Confrim that bab PCBs remain bad. Run the same test on a bad PCB tomorrow and it will still show up as bad.

I don't think you have yet determined if this is a HW or a SW issue.

If the error rate is truely higher when the PCBs are cold, and you can take a bad PCB and reproducibly tell the difference between running it at room temp and at cold temps, then do what I suggested above. Run the PCB in the cold but locally heat the uC with a power resistor taped/glued to its back. This lets the PCB run in the cold, while the uC is warm. Does the failure rate match that of the PCB when it is cold, or warm?

You still can put a good frequency counter or O'scope on the USART and measure it's exact baud rate when cold, or warm. Still in spec? (Test software generating "UUUU...".

With a good O'scope look at the power supply when it is warm and cold. As Jim already mentioned, the power supply may be getting noisy as a function of temperature, worse when cold, generating more failures when cold. HW problem with the power supply and not with the uC or the remainder of the PCB.

I assume you have a By-Pass cap across EACH V+/Gnd pin pair, and from AVcc to Gnd? One can certainly have erratic operations if this is not the case.

One could also power the PCB with batteries, for a perhaps cleaner supply for testing. (Keep the batteries warm, though, while testing in the cold). Break the Regulator Vout lines and insert your battery supply line (s), unless your regulators don't mind Vout > Vin, (many do mind!).

The fact that the board still runs but the USART data becomes scrambled makes me think about SW issues, also. Is the stack getting overwritten, trampling on a few variables? Do you have memory to increase the various stacks for a quick test, (faster than trying to measure the stack usage under real conditions)?

Do you have ISRs that are taking too long during the USART routines? Are you using the USART HW module or a SW USART?

Are there variables being modified by ISRs that weren't being read/written atomically? Do you need to disable and then reinable interrupts around any special parts of code. (You could change the USART TxData routine to always output "M" instead of real data to see if the "M" shows up on a terminal dump, or if it, also, is being trashed.) (Pointing to bad data getting to the USART driver routine.)

I still think the first step is sorting out HW vs SW as the primary problem, and the temp variability sure points towards a HW factor. If it was a SW factor alone then one would not expect to see the big difference based upon temp.

For testing HW issues it is difficult to take your work bench outdoors! Much easier to use a small mini-fridge/freezer, (such as those used by college students), to have a cold environment inside, on your bench; and much cheaper than a real environmental testing chamber.

You make make it get quite cold if you pull the cover on the internal thermostat and tweak it such that the compressor won't turn off, (just make sure it doesn't overheat the compressor).

I feel your pain.

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

When I have timing problems that caused a uC lockup in the past they have turned out to be due to a change in component type or manufacturer. Only the other week someone told me they had chilled a SRAM chip to emulate the problem but the problem was the adjacent processor chip, an earlier revision of the chip had been fitted into the board by mistake and would not run properly at the oscillator frequency being used and as the days were cold and the equipment was outside a bus timing problem had been created.

If you have an old build of the design, try comparing it with the new build units, you might get a surprise.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think, even more, that it is time to analyze with JTAG. Rather than guessing, really find out. The OP is the only one who can look into that chip with the debugger.

When it happens, BREAK See where it is within the code. Then, you may find some more concrete things to test. Maybe some specific statements to put breakpoints on, and check for actual values compared to expected ones.

You really will not find out what is going on until you start analyzing in such a way.

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are you able to see all the soldered joints on the boards, I've also seen boards behave good one minute and bad the next due to poor soldering or to be exact on soldering, e.g. A gull wing pin just leaning against the PCB track can cause havoc. If you have any chips in carriers you might want to check these as I have seen hairline cracks in the carrier frames cause poor connections leading to failure. I would like to see a photo of the board, any chance? Ray.frost@btinternet.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you have been making the boards for years, maybe you are using old component stocks that have not been stored correctly I.e. they have absorbed moisture from the air, then when the board went through the soldering process it caused chip damage(popcorn affect etc), If in dought make some more boards with new component stock and see if things improve.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Another thing that caused grief in the past was bad PCB boards, delamination of PCB causes tracks to go high resistance to almost open circuit, especially if you have tracks close to the edges of the PCB and if your PCB's have been broken from a step and repeat using scored lines, roughly broken off boards can get damaged along the edges - worth a thought.
The bad tracks/vias can cause intermittent functional behavior.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for all the good ideas of potential long-shot ideas.

I decided to pursue the software bug idea a little, and tried expanding the size of all the stacks. That didn't fix the problem, but did change the symptom.

I then encountered an occurrence of the problem on a very old board I've been using for development for a long time, and at room temperature.

So I'm going to proceed on the assumption that it really is a software bug of some sort. Why it would show up more readily at low temperature is beyond me, but maybe I'll have some insight when I find it.

Now to hunt that bug....

--
Bert

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You will all be pleased to know I have finally gotten to the bottom of this problem. And if you ask if it turned out to be a hardware or software problem, the answer is "Yes". :) (Sorry, I'm feeling mightily relieved!)

Having spent just about an entire week chasing this problem, and production units sitting around waiting to be shipped, I was getting pretty desperate. The problem was not the CPU clock, the RESET line, or the power supply. So it looked like a software problem. And yet the problem was much more pronounced at lower temperatures. That really did not compute.

This morning I woke up with a brand new idea of how the different things I observed might fit together. I won't burden you with the long story, but the upshot is that I found a brand new way that an unconnected input pin can bite you.

My board implements both the UART's in the m1284. However, most applications only use one of them. The parts for the RS-232 interface on the second UART are not installed on those boards. That leaves the RX pin unconnected. While that's not recommended, the only reason I've ever seen given for not leaving floating input pins is to avoid excess power supply current. Since I don't care about a few mA, it never occurred to me to worry about this thing. And if the line should occasionally float high or low and inject a false 0x00 or 0xFF byte into the UART, that wouldn't matter either.

HOWEVER!

Both UART's run at the same baud rate. And the other UART is very active, continuously transmitting and receiving stuff.

What appears to happen is that the unconnected RX line sometimes floats to a voltage near the high/low transition, and that some small copy of the other UART's tx or rx signal (I don't know which) couples onto that RX line, creating just enough change in the voltage for the UART to see valid 1's and 0's. This results in the unconnected UART receiving not just 0x00 or 0xFF, but all sorts of random values.

That still wouldn't be so bad, except that I use that second serial port for debug purposes. I have a task that handles that serial port, providing a simple command line debug interface. It's just debug code, so it's all just hastily written and not carefully tested. And so it happened to contain a nasty bug. When one particular value is received by the unconnected UART, it triggers that bug, which overwrites a chunk of RAM with essentially random data.

And there's the explanation for my misbehaving software, only sometimes, only on some boards, and mostly only at lower temperatures. A floating input pin, completely unexpected coupling of data to that pin, and a latent software bug.

I thank you all for your advice and encouragement in chasing this problem down. Although nobody guessed the exact nature of the problem, many suggestions forced me to stand back and re-examine my thinking.

Now to see if I can enable the pull-up resistor on that UART RX line....

Enjoy the weekend -- I certainly will!

Bert

P.S. I hope you won't mind my mentioning this, but last night I asked my Bible study group to pray for me in dealing with this problem. This morning I woke up with the solution. The credit is definitely not mine!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Good digging. Any ideas on why the situation seems to be more pronounced on the new batch of boards, and the temperature correlation?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Glad the answer 'floated' into your mind!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you for that terrible pun!

To me floating CMOS inputs are essentially magical -- anything can happen. It's not hard to imagine that leakage currents will be slightly different from one chip to the next, and they will certainly be different at different temperatures.

How the signal from one UART couples into the other is beyond me. The TX pin of the other UART is right beside the affected RX pin, but on my PCB the two signals are routed in different directions. Maybe the coupling happens internal to the chip?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Glad you got it solved!

Of course the follow up question is what do you do with the units that are already in the field?

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hmmm, yes, units already out there do pose a question.

We have had no reports of this problem in the field to date. It seems the problem is so rare it's either not happening at all in the field, or happening so infrequently that it gets passed over.

We'll have to keep alert to see if anyone encounters this bug.

--
Bert

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you leave the buggy debugging code in the shipped product?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bmenkveld wrote:
You will all be pleased to know I have finally gotten to the bottom of this problem.
Glad to hear you got it figured out, and thanks for posting what the answer was!

Note: you have independently reinvented the "screaming TTY". :) http://www.catb.org/jargon/html/...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

RickB wrote:
Do you leave the buggy debugging code in the shipped product?

It was a simple bug (once I saw it!), so I fixed that. That debug code is awfully handy, so I didn't want to remove it.