ATtiny2313 issues.

Go To Last Post
18 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

At the recent AVR Seminar in Melbourne, during the networking time, I was speaking with a user of tiny2323's who has had problems with some tiny2313's, where some would just crash after sometime of operation whereas some operate flawlessly.
He has some hundreds of unit in operation and has collected quite a number of "faulty" units, although these units work quite OK in other application.
In fact he has dealt with ATMEL, whom have tested some of the "faulty" units and verified that they were within spec, despite that the consistently fail in service.
He is going to send me some of the MCU's to have a "play with"

I suspect that this will be conundrum.
a) Too many working units for the hardware & software to be suspect.
b) Too many Tiny2313's which when replaced make the units reliable.
c) The suspect Tiny2313's are entirely within spec.

So whilst I have some ideas that I will explore any ideas on the three premises?

The units are used in some sort of 3 phase to 1 phase conversion unit.

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Given the application, it could be a problem with transients. Do the units have any protection against them?

Leon Heller G1HSM

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It was one of my thought too Leon. I haven't seen the circuit diagram but I was told the I/O does not directly interface with the mains, but it controls some relays & other logic.
Once again, some MCU's work perfectly ?? (He is fairly cluey(all those with ham tickets are of course :) ))

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Some of them might be more susceptible to such nasties, and still be in spec.

Leon Heller G1HSM

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Note that using a relay will not stop these problems from happening.
When a relay opens and closes also a lot of transients are generated on the contacts of the relay. this then can be picked-up by the enourmous coil of the relay and thus be fed into the controller.....

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

LDEVRIES wrote:
(He is fairly cluey(all those with ham tickets are of course :) ))

:? :D

I agree with Leon. If the MCU is directly tied to the relays that is most likely where the problems are stemming from. Opto-isolation, if not already used, would be the best way to protect the MCU from transients.

Lee, do you know what type of environment these are used in?

leon_heller wrote:
Some of them might be more susceptible to such nasties, and still be in spec.

I suspect that, given more time, if the design is flawed more units will fail.

W7PAN

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
and has collected quite a number of "faulty" units, although these units work quite OK in other application.

Lee, does this mean that the units will also work again, if re-powered up in the faulty units/ installations?

Do they all fail in the first week, or run for variable lengths of time and then fail?

I would look closely at the relay snubber circuits. Diodes hold out for a while and then one starts getting shoot through to the uC as the diode begins to fail.
The diode may or may not test bad with a low voltage voltmeter type test...
Redoing the snubber with a heavier diode is usually easy, as would be putting two diodes in parallel on the snubbers, on a series of test boards, to compare against the performance (failure rate) of the original designs.

Bad solder joints also come to mind. Marginal joints that fail after a series of thermal/humidity/vibration cycles.

Consumer grade chips used in an industrial environment, (thermally), ? The extended temperature range chips are spec'd that way for a reason.

Power supply issue? Is the power supply on the PCB, or stand-alone unit? When a failed board is replaced is the power supply also replaced?
When a failed unit is replaced, does that particular installation fail again? Repeatedly?

When the "faulty" chips work in another circuit, and Atmel gives them a clean bill of health, it sure points the finger back to a marginal hardware design, or a software issue, and not a failed uC issue.

I remember (still remember!) spending months tracking doewn an occassional, "random" crash in a GPS/GLCD/etc. project. I'd never been so frustrated before! Finally bought a DSO just to debug that project!
It crashed on a rather rare combination of specific hardware states and ISR triggerings. Clueless ham error. :oops:

I remember another project, 4 or more large white protoboards, plus some attached PCBs, plus the panel, ..., 1000's of connections, an an intermittant, rare error. Tore it all down and rebuilt it from the start several times. As several of the protoboards were "old", I even broke down and bought several new boards, as I was sure it was an intermmittant contact problem...

Then I found the bug in my software... :oops:

In my experience tough to find failures have usually been traced back to hardware/software design errors, not to faulty components.
That said, I had an intermittant XBee module that gave me great headaches once, failed Xtal's a time or two; but these were single item failures, not a bunch of failures of the same device.

I am interesting in hearing what you find.

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Very interesting report. Give us more details!

Quote:
I was speaking with a user of tiny2323
that is a typo I suspect?

If these chips work fine in another application, perhaps they do not reuse the same resources which, used in a primary one, failed? The problem can be rather trivial, like a dead clampling diode, or eeprom wear-out, but I guess Atmel would pin-point that easily.

More cunning damages, not leaving permanent traces, like running the chip out of specification for extended periods (100*C, 6V, 250mA for example) are harder to find, but still Atmel could manage to find it(if they would like to).

I own one wounded m16 with crashed port A and two pins of port B in the GND. All the rest works perfectly.
Another veteran is m162. It can be programmed via JTAG, but OCD is dead.

Some time ago I was thinking about writing a set of libraries of tests which self-test all documented/expected features of the chips. Like reset values of IO registers and hidden states, crosstalks and shorts between IOs, operation of timers, ADC, just any process which can be performed in several seconds to indicate the condition/health of the chip.
Ideas never brought to life, as it is cheaper to buy a new one..

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brutte wrote:
Some time ago I was thinking about writing a set of libraries of tests which self-test all documented/expected features of the chips...Ideas never brought to life, as it is cheaper to buy a new one..

So very true. I can only imagine how many man hours would be spent in such an endeavour. If you ever do write the libraries please share them.

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There are a few things that can be different between different units and make one work while another one may not work, even if all ab according to specs.
The obvious things are the internal clocks, and ADC-ref (though not with Tiny2313) and the input logic levels. But there are a few more, like initial RAM content, output strength and susceptibility to noise or the voltage limits. The pin-out (DIP and SOIC) of the 2313 is sensitive to noise since GND and VCC are at opposite ends.
If the design relies on properties than the specs such behavior is expected.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
and the input logic levels.

You are right, the cause of these malfunctions could be an ordinary bug (hard or soft). It is enough to wdr at boundary, and the dog will bite only at certain occasions, temperatures, chips(as WDT is not calibrated and varies with about every parameter). Same with IO hysteresis, drive, RC, BOD, EEPROM/FLASH timings..

What is worse, such analog bugs can be hard to catch. But if it is possible to reproduce them separated from the environment where the chips had worked, I would be glad hearing about those.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Guys,
There are some real interesting comments there.
Yes, a typo, they are 2313's (OK, I have been waiting to say it...My bad!..... grrr that hurt!:lol: )
I was told that some of the "failed" units came from one manufacturing batch/all with same date code.
Several batches have been purchased and every batch has units that fail.
Units that "fail" consistently fail even in the same application in other hardware units.
I don't believe EEPROM is used but in any case the failure occurs very early on.

I am not sure which tests ATMEL performed on the chips they examined.

Some good leads picked up from your comments, which I can't comment on as don't have all the details yet.I will have to do some more research to answer them.

I am mindful of Doc's experiences that most really tough faults are usually hardware related, but sometimes they are real sneaky bugs which were assumed were not there.

Kleinstein mentioned the initial RAM content, which I had not yet crossed my mind, but I have been caught out with it in my previous life before I became a re-born C programmer. In fact, I coined a new programming law, Lee's Law." which I often quoted to my students. "Despite wishful thinking, variables won't initialize themselves"

My "gut" feeling is that it is some "incomplete coding" issue and a very possible one, that I am sure we have all have had" is the uninitialized variable bug. This is entirely likely in this case, because the code was written in .asm, whereas.
I have offered to write a state machine driven version of the code, which should be more robust then a spaghetti algorithm written in .ASM . In any case, it should help creating a paradigm shift in where to look.
The shot would be when the fault occurs again, to replace the code in the same micro & external hardware with the alternative code (which would have to pretested & known to work of course)
You can rest assured that I will report back a solution if & when, I/we get on top of it.

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
a spaghetti algorithm written in .ASM .
Is that the fault of the language or the programmer? Have you told your mate that he is a bad programmer and should be shot? :wink:

...maybe he won't be friends any longer after that...

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Another of my canons, "Any code written in .ASM and not deploying FSM's always degenerate into chaos & spaghetti(or capellini) code."
On the other hand FSM's written in C give you Penne code! Aww....now I fear losing you as a friend! :wink:

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Se la pasta e al dente, sarà bene! :wink:

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just as well I have no idea what FSM means... :lol:

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Finite state machine.
If the debugger was not used during development (initialization is a classical bug, easy to pinpoint with debugger), then there are far more bugs than this one.
In most cases it is enough to upload SRAM/registers with random variables at startup to find uninitialized variables... If that was not made, then...

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

An explicit FSM can be much easier to develop and understand then a multitude of little flags. The latter can escalate into incomprehensible 9 level nested if statements.

And don't set the state from anywhre else but from within the FSM Itself, so no forced state changes from an ISR example. Little flags are useful this then.

You can code an FSM with a big switch, if/elseif, or speed function pointers. In OOP you can use some OOP magic. You can also nest FSMs. Or more advanced, you can add an ability much like GOSUB where you push the the state onto a stack, popping it off later then jump to it. Add a little more and you suddenly have something akin to a CPU.