How Reliable Are MCU's?

Go To Last Post
39 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello everybody

 

i have some doubts with respect of the reliability of the  Microcontrollers from some time ago

i know that the engineer have to make a good desing with the hardware and the software taking considerition like electrical noise, use of varistor, electrical fuses(not the fuses that we use in the software) and other things

 

but my questions are:

 

why people use ARM architecture to embedded desing and not other, what have of special ARM CORTEX?

 

why are microcontroller of aeronautic grade more reliable(i guess)?

 

are automotive and other kind of microcontroler reliable?

 

how often would they fail? (example 1 in 100.000.000,00)

 

its not necesary that you answer all or exactly answer those questions, but if you can give me some idea of the topic i would be thankfull :) saluts

 

 

Last Edited: Fri. Feb 1, 2019 - 12:07 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

Microcontrollers are basically unreliable!

 

There are a number of sources of unreliability- the obvious one is the silicon. Mil spec devices have been tested more rigourously to root out marginal devices. The other source of unreliability is cosmic rays, other high energy particles and EMF. This is countered by special design techniques and ECC (error correcting codes) on the memory, cpu and peripherals. TI's Hercules and NXP's Quoric are examples of this. Then there's software defects - validated compilers and quality control techniques are used to minimise these. Then there's redundancy - hardware that checks hardware.

 

As for failure rate, if my memory serves me correct a safety system MTBF is expected to be better than 1 in 10^27 hours. This is only a predicted value and could be much less in practice. The reality is your device should be able to detect an error and fail safe.

 

Note - this is not a trivial subject!

 

 

why people use ARM architecture to embedded desing and not other, what have of special ARM CORTEX? Nothing special apart from popularity. Cortex R are specifically for high reliability applications.

 

Last Edited: Fri. Feb 1, 2019 - 12:32 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Aeronautical and automotive grade microcontrollers are often NOT unique designs but are simply tested over a wider temperature range; lower minimum and higher maximum temperatures than standard industrial grade.

 

There are other factors such as vibration, that ARE package design decisions. 

 

Other things such as surge voltages and EMI are the responsibility of the  circuit designer. We know a lot more about surge and EMI than even 10 years ago, and it is more and more common to design about these things at the very beginning.

 

Failure rates are very difficult to get. Most manufacturers do not release such numbers, even if they do track them, except perhaps for MIL-SPEC requirements. Even product manufactures do not keep track of such things unless, perhaps, they make the product in the millions. And they certainly will not tell you. That would be a "trade secret".

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Last Edited: Fri. Feb 1, 2019 - 02:20 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I would give both you something as a "facebook like" but i dont know how to use this forum platform very well 

thank you very much both answers have been really helpfull

 

and a last question is...

 

what  norms of electronical desing(more in the hardware place) will you recommend me to read?

 

i am electromechanical engineer, but im entering in the world of electronic and embedded systems 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Charpie wrote:
what  norms of electronical desing(more in the hardware place) will you recommend me to read?

 

By 'norms' I'm translating that to 'standards'. There's the European CE standards. If you attend a university or are employed by a large company, they usually have access to these standards - otherwise they can get very expensive and more so if you don't quite know which one you want. What standards depends on the type of equipment, where it's used and so on. It's all rather complex I'm afraid.

 

Back to your question of reliability - if you just mean how long you could expect a standard microcontroller to last without failure, I'd say over 20 years continuous. I have thousands of my designs still ticking over after 20+ years in the field and in hostile places on earth. These frequently see 70+C temperatures. These all talk back to server computers, so statistics like watchdog reboots, comms errors are logged. Yes, there are unexplained watchdog restarts - if lightning was nearby, you can expect devices local to that to maybe restart and you can see this in the logs, but some just randomly restart. Same with the communications - errors do happen and that is why we have error checking. In terms of percentage, the number of errors vs good is very, very small.

 

The microcontroller is only a part of a system, so the rest of the system affects reliability as well. I had a bad batch of relay driver ics once - the boards passed the manufacturing test but stated failing in the field after 6 months. Luckily the design was safety critical and it did self-checking and identified the failure so the board just shut down and reported the error. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Once you get to the "high reliabilty stuff" you get into special considerations which set them apart from the consumer market.

Electronics for airplanes and (I think also) critical medical stuff is not only exempt from RoHs, but it is even forbidden to use lead free solder there.

 

A big part is also in designing stuff in such a way that if a single component fails it does not take the whole system down.

System reliability must be considered for the whole system, not just a uC.

 

About longevity, you could start with reading some on:

https://duckduckgo.com/html?q=longevity+microcontroller

 

And / or look into retro computing. There are plenty of Z80's from the 80's still running fine even though probably designed cheapish with a life expectancy of a few years.

Electrolityc Capacitors are the single worst component in electronics. Everything else is at least 10x better.

(Carbon compoisite resistors from the early days of electronics are also notorious).

 

Maybe you like Mr Carlsons Lab on Youtube. He's into restoring vintage equipment (Oscilloscopes on wheels which are bigger than a washing machine).

 

Charpie wrote:
I would give both you something as a "facebook like"
In the grey area above each post there are some grey stars which you can make yellow if you like, but nobody here at AVRfreaks uses them much.

Doing magic with a USD 7 Logic Analyser: https://www.avrfreaks.net/comment/2421756#comment-2421756

Bunch of old projects with AVR's: http://www.hoevendesign.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Charpie wrote:
why people use ARM architecture to embedded desing and not other, what have of special ARM CORTEX?
Same reason people drive Ford's
Charpie wrote:
are automotive and other kind of microcontroler reliable?
When a company makes silicon chips on the whole all examples of one model are generally created equal. However during the production they do various factory tests and if some are found to operate in a wider temperature range than others they maybe qualified as "automotive" but it's probably the very same thing really.
Charpie wrote:
how often would they fail? (example 1 in 100.000.000,00)
Most manufacturers will give a MTBF (Mean Time Between Failure) which will usually suggest that the chances of silicon failing is less likely than being killed when a donkey drops out of a tree and onto your head! Silicon just does not fail. The packaging might. Certainly the soldering to the PCB might. The PCB itself might get hairline fractures in tracks and some "more analog" components may fail (such as overheated capacitors having the electrolyte dry out and fail - don't ask me how I know about this one!!). The likelihood of anything failing on the actual silicon die (as long as it's operated within published specs) is astronomic. In 40+ years and being involved in 100's of products and literally millions of units the only time I've ever known of a failure was a batch of DRAM chips from Micron that has a manufacturing fault because something was not done right one day in their fabrication factory.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Same reason people drive Ford's

 

Classic!

 

So, what sort of application are you figuring on where the silicon is likely to fail? Realize your coding is likely to be the weakest link, followed by your engineering. You're planning a mission to Proxima Centauri?

The largest known prime number: 282589933-1

It's easy to stop breaking the 10th commandment! Break the 8th instead. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

for " Aeronautical and automotive grade "

 

I will add that the "plastic" the house often are made of for consumer chips are cheaper than automotive chips. (you could rip the internal connections apart if a 70deg chip becomes 125deg).

 

and a aeronautical chips often have a shielded house.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

From MAPS, the 64KB 150C 8-bit MCU :

  • ATmega64C1-AUTOMOTIVE
  • ATmega64M1-AUTOMOTIVE
  • PIC18F2680
  • PIC18F4680

16-bit MCU : PIC24 and dsPIC - many (popular for CubeSat)

Radiation tolerance is directly proportional to maximum ambient temperature.

 

MAPS - MCUs & MPUs page

ATMEGA64M1-AUTOMOTIVE - 8-bit AVR Microcontrollers - Microcontrollers and Processors - Microcontrollers and Processors

PIC18F4680 - Microcontrollers and Processors - Microcontrollers and Processors - Microcontrollers and Processors

 

edit : strikethrus

ATMEGA64C1-AUTOMOTIVE - 8-bit AVR Microcontrollers

 

"Dare to be naïve." - Buckminster Fuller

Last Edited: Fri. Mar 29, 2019 - 05:03 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Torby wrote:
So, what sort of application are you figuring on where the silicon is likely to fail?
Relative to bulk CMOS, avionics and LEO; there are CMOS processes that produce avionics-ready megaAVR :

Radiation Tolerant | Aerospace and Defense | Microchip Technology

Increased GCR will result in more SEU in bulk CMOS all the way to Earth's surface (a spray of particles from one GCR impact in the upper atmosphere)

 

Microchip's high-reliability MCU are a mega128, mega64M1, and one dsPIC :

High Reliability | Microchip Technology

 


LEO - Low Earth Orbit

GCR - Galactic Cosmic Ray ('30 is the beginning of the next grand solar minimum for, IIRC, an estimated half century in duration on a 400y period)

SEU - Single Event Upset

A brief review of Total Ionizing Dose (TID) effects and Single Event Effects

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Before going too far into 'space-rated' stuff it would behoove you to mention cost.  They can get stupidly expensive really really fast...

 

For example, an Actel CPLD that I was prototyping with at about USD$2.50 for 'on earth' work had a manned-spaceflight variant version (silicon on sapphire, I believe) that sold for about USD$5,000.00 each.  And the space version was one-time programmable.  Actel "strongly recommended" that we just send them the VHDL and let them program the chip.

 

Microchip's being coy.  "Call for pricing".  Yeah, and be sitting down and have a stiff drink handy when you do.  S.

 

Edited to add PS:

PS - in my life working with these things, I don't think I've ever seen an MCU failure that could be proven the MCU's fault.  Oh, they quit working quite often, but every time I've found that something else horribly out-of-spec was happening to them.  Static-zapping being part of it.  You can break them, and I've heard of some dead from the factory, but again, every time I've proved what happened, it was something external to the chip doing horrible things to it.  S.

Last Edited: Sat. Feb 2, 2019 - 09:09 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Scroungre wrote:
Microchip's being coy.  "Call for pricing".  Yeah, and be sitting down and have a stiff drink handy when you do.  S.
For a competitor, a case of beer (share the painkiller with your fellow co-workers)

New VORAGO Technologies Products | Mouser

That thin PCBA shows it's for geo-electronics (down-hole, another high reliability application)

The CubeSat board's about an order of magnitude more expensive than one with a COTS MCU.

 

http://www.pumpkininc.com/content/doc/forms/pricelist.pdf (CPU PCBA start on page 19)

via CubeSat Kit - Purchase Information

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

BAE make a hardened SBC based on on the PowerPC750 (same CPU used in the first view iMac lines).  $200K.

In 2010 it was reported that there were over 150 RAD750s used in a variety of spacecraft.[6] Notable examples,[2] in order of launch date, include:

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Scale Space Applications with COTS-to-Radiation-Tolerant and Radiation-Hardened Arm® Core MCUs | Microchip Technology

Devices enable designers to begin development with a commercial device before moving to different levels of radiation-qualified versions, reducing development time and costs

Chandler, Arizona

March 28, 2019

...

Based on the automotive-qualified SAMV71, the SAMV71Q21RT radiation-tolerant and SAMRH71 radiation-hardened MCUs implement the widely deployed Arm® Cortex®-M7 System on Chip (SoC), enabling more integration, cost reduction and higher performance in space systems.

...

While the SAMV71Q21RT’s radiation performance is ideal for NewSpace applications such as Low Earth Orbit (LEO) satellite constellations and robotics, the SAMRH71 offers the radiation performance suited for more critical sub-systems like gyroscopes and star tracker equipment. 

...

To protect against the effects of radiation and manage system mitigation, the architecture of the devices includes fault management and data integrity features such as Error-correcting Code (ECC) memory, Integrity Check Monitor (ICM) and Memory Protection Unit (MPU). 

...

 

Development Tools

To ease the design process and accelerate time to market, developers can use the ATSAMV71-XULT evaluation board. The devices are supported by Atmel Studio Integrated Development Environment (IDE) for developing, debugging and software libraries. Both devices will also be supported in MPLAB® Harmony version 3.0 by mid-2019.  

 

...

SAM V71 Xplained Ultra Evaluation Kit

http://packs.download.atmel.com/#collapse-Atmel-SAMV71-DFP-pdsc

Atmel Studio 7 | Microchip Technology

MPLAB Harmony v3| Embedded Software Development Framework | Microchip Technology

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Maybe an add-on is worth while, here.

 

What are the ways that an MCU can fail. The silicon, itself (that is, the ordinary logic) seem not to wear out, for all practical purposes. So, what can go wrong?

 

1) EEPROM can wear out. If you write too often, it WILL go.

 

2) FLASH can wear out. If you write too often, it WILL go. This is relatively low probability unless you try to write run-time data to the flash memory. Otherwise, it only happens if you reprogram it too many times, and that threshold is pretty high.

 

3) I/O transient control diodes can burn out from an over-voltage transient. This will leave a portpin stuck high or low. If that happens to be an output and its normal state is the opposite to the burned out diode, the resulting device current can be relatively high while still appearing to work. 

 

4) Cosmic ray or other high energy EM "particle" can damage a transistor, anywhere in the device. 

 

5) Excessive high temperatures can allow internal diffusions to shift location. This can cause shorted transistors or shifted gate thresholds. There tends to be a (time * temperature) involved. Fortunately, its not that easy for an AVR MCU to generate its own high temperatures internally, so it is usually due to near-by heat sources such as load divers, voltage regulators. and such.

 

6) Low temperatures, unless REALLY low (like liquid nitrogen temperatures) generally produce only temporary failures. Most prominent has to do with operation of the EEPROM. Normally, if you warm it up (slowly) it will start working again. If you change temperature too rapidly, the epoxy package can crack.

 

7) Excessive power supply voltage. Every MCU has a "Do Not Exceed" limit. That includes AVRs. Exceed at your peril.

 

8) Negative supply voltage will kill it, dead.

 

9) Electrostatic discharge into a port pin can kill it. The failure is usually one of the protection diodes; see (3).

 

10) Bad soldering. This usually does not kill the MCU but can result in loss of signal or power. Also, solder splashes can make shorts.

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In fact, what is reliable enough?  You have to consider the application.  For example:

 

In 1968, the assistant to the secretary of defense for atomic energy, Dr. Carl Walske, codified the safety standard for nuclear weapons:

The probability of a premature nuclear detonation of a bomb due to bomb [or warhead] component malfunctions, in the absence of any input except for specified signals (e.g. monitoring and control) shall not exceed:

(1) Prior to receipt of the pre-arm signal, for normal storage and operational environments described in the STS (Stockpile-to-Target Sequence), one in one billion) per bomb [or warhead] lifetime.

(2) Prior to receipt of the pre-arm signal, for the abnormal environments described in the STS, one in one million per bomb [or warhead] exposure or accident.26   

 

The following are safety criteria design requirements for all U.S. nuclear weapons:
Normal environment—Prior to receipt of the enabling input signals and the arming signal, the probability of a premature nuclear detonation must not exceed one in a billion per nuclear weapon lifetime.
Abnormal environment—Prior to receipt of the enabling input signals, the probability of a premature nuclear detonation must not exceed one in a million per credible nuclear weapon accident or exposure to abnormal environments. 

 

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Waaay back then integrated circuit manufacturing wasn't as developed as it is now. Contamination and purple plague were problems that caused failures a long time after parts were in service. Fifty years later things are better. The computers in my 1999 Ford Windstar are working after 20 years in service.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Recently I read some articles and saw some video's about the early days of semiconductors.

Making reliable transistors was a real problem and for some years vacuum tubes were more reliable than transistors.

 

10 to 15 years later there were real horror stories about whole semiconductor factories producing 100% garbage.

Yep, yield of 0% for several months in a row for a whole semiconductor factory...

 

I believe this was traced down eventualy to the use of spraying phosphor containing compound in pesticides in argiculture, which interfered with the naked semiconductors during production.

 

We've com a long way since then, but to be fair, I do not understand / comprehend that modern electronics work at all.

Some of the largest chips have 19 billion transistors on them. Think about the error rate to make something like that work...

Doing magic with a USD 7 Logic Analyser: https://www.avrfreaks.net/comment/2421756#comment-2421756

Bunch of old projects with AVR's: http://www.hoevendesign.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Indeed, Paulvdh points at an important "fact", the failure rate of an individual transistor in a modern MCU, even, has to be extraordinarily low. Then, think of  a CPU chip with maybe more than an order of magnitude more transistors. It really is quite incredible.

 

As a side note, I worked at Tektronix in the 1970-1977 time frame. There was a big presentation to Engineering (must have been Intel, but not sure) on the 4004. At that time, a few products were using bit-slice devices to implement a complete ALU. After this grand presentation about how the 4004 was going to revolutionize electronics, I asked one of the presenters what the odds were for the improper output from a logical operation. The response was, in effect, "You dummy, why would you expect ANY errors?"

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Scroungre wrote:

PS - in my life working with these things, I don't think I've ever seen an MCU failure that could be proven the MCU's fault.

 

Not quite an MCU but I have an 8155 here that 'forget' bits of the code blown into it. And likewise, some bipolar PROMS used as address decoders whose contents are no longer correct.

 

Both are, I suspect, related to the re-growth of fuses.

#1 This forum helps those that help themselves

#2 All grounds are not created equal

#3 How have you proved that your chip is running at xxMHz?

#4 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand." - Heater's ex-boss

Last Edited: Fri. Mar 29, 2019 - 06:59 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Kartman wrote:

Microcontrollers are basically unreliable!

No!
My Home-Control ATMEGA128 including some other electronic has been working reliably for more than 10 years now.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Unfortunately, that’s what you perceive. By unreliable, i’m referring to ‘soft’ failures. For example - in my car there are a number of microcontrollers. Many of them have redundant circuitry to ensure that the device behaves as it should or fails safe.
Microprocessors are complex machines and it can only take one bit flipping the wrong way to upset them. Considering you can get Devices with special hardware features to enhance reliability suggests that the average device’s reliability is, errr, average.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Also, a sample of one does not "prove" anything positive. It can prove failure but it cannot prove success. After all, it just might have its first failure an hour from now. Before that hour has passed, you would never know that it is going to fail.

 

That is one of the reasons for calling it "Mean Time Between Failures" (MTBF). Some will fail early, most will fail in a range near that mean time, but some will last almost "forever". Think "bell curve".

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Last Edited: Sat. Mar 30, 2019 - 02:41 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Think "bell curve".

Or perhaps, the more popular bathtub curve for semiconductors. In any case, what happens after the eventual failure is  important.  For example, traffic lights have (had?) designs such that even if the micro went wacko & ordered all the lights to turn green, it still wouldn't happen, due to some failsafes beyond the micro.   So assume certain failure will occur & prepare to mitigate "to the extent needed".   Determining that extent is often a difficult tradeoff. 

 

A vending machine that gives an extra candy bar is one thing; a flamethrower that won't shut off is another. 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Bell curve represents well the odds of a failure in a given time interval (age).

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

Last Edited: Sat. Mar 30, 2019 - 05:46 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Remember that billions of billions controllers around us work every day very reliable. They can do this because the underlying laws of nature are highly reliable. A residual risk remains, because people sometimes do not master the complexity. But the good thing about technology is that it's not only able to solve problems but also monitor and correct itself.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

GermanFranz wrote:
They can do this because the underlying laws of nature are highly reliable.

 

Choke! It's the laws of nature that make them unreliable! Like cosmic rays, ESD and lightning! Most appear reliable as the reliability is not measured. Once you start measuring this, you'd be amazed at what really happens. For example - the average server computer has ECC (error correcting code) on its memory. The system can monitor memory errors and keep a count. Run the average server for one week and see how many memory errors have occurred. Then consider your average computer has no ECC on its memory. Why do computers crash?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
Why computers crash?

I can not remember such a (Win8/10) case.
Whether the technology has continued to improve?

Kartman wrote:
cosmic rays, ESD and lightning

... are part of the world, but always better manageable. Electronic technology can basically be constructed with sufficient reliability.

Last Edited: Sat. Mar 30, 2019 - 11:03 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
By unreliable, i’m referring to ‘soft’ failures.
Jack Ganssle's "Reason #4" on why embedded software projects run into trouble | AVR Freaks

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I remember back in 1980 a friend had a 6502 that didn't work in a UK 101, but fine in a PET2001. 

After a lot of tests it was just one instruction where a flag in some cases had a wrong value.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here is some inifo on the Batchub curve....had to slog it out with some failure calculations back in '84...wonder if those chips are still running?

 

https://www.nxp.com/about/about-nxp/about-nxp/quality/product-qualification:QUALITY__QUALIF

 

Semiconductor failure rates (as described in the standards above) follow a bathtub curve.

There is an initial decreasing failure rate followed by a long, low-level failure rate and then eventual wear out.

For each part of the bathtub curve, NXP has methods in place to guarantee low failure rates:

  • Test and Burn-in are used to screen for early product failures (called infant mortality). This reduces the early failure rate
  • Robustness during useful life is obtained by design and checked by electrical and mechanical robustness tests like electrostatic discharging, latch-up events, soft errors and drop or shock events
  • Built-in reliability is used to delay the on-set of wear out. Extended life tests are performed for verification

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I’ve been maintaining a system that has been working 24x7 since 1992. There is 22 motor controllers based on a Z180 connected via RS422 to a PC. There’s been many PCs, power supplies and hard drives replaced over the time. Even the operating system had a year 2012 problem and was horribly obsolete. So the code was rewritten to run on something more modern. The motor controllers have had various problems over the years. Bridge rectifiers would die, two transformers died, connectors failed, fuses failed. With the silicon, I’ve not had a failure of the z180s, the eproms or the rams (although the battery would die after around 15 years). I observed weaout on a particular brand of cmos ic ( there were two brands used) and on the RS485 transceivers. Once I noticed a common failure on a couple of units,I would replace them all. Overall, I’d say the silicon is pretty reliable. As for the micro doing what it is supposed to every time, then not so reliable. I have observed you get individual random scattered errors and clusters of errors. The clusters are probably due to lightning ( the stuff is in the top of a 10 storey building). Apart from the watchdog, the system has no redundancy. If i had more modern micros with ecc etc, it would be interesting to see the statistics.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK, including infant mortality, then "bathtub".

 

Jim

 

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

Just a few thoughts,

 

Infant mortality is way down from the 'old days'. Process controls are tighter, and I think that they've solved some of the issues such as whiskers that would grow and short things out.

I'm not sure if anybody does burn in anymore. It's a terribly expensive process.

 

If you get MTBF numbers they will typically be for the nominal temp and voltage. I forget the numbers, but failure rate increases fairly dramatically with rising temps.

 

Reliability prediction  tests that I have seen were statistical -- much like the inspection process. A number of units would be run under conditions that would accelerate failure. The idea was to try and get an early indication of failure rate by running enough unit so that the early failures would predict the failure distribution . Does have a weakness that showed up on one project. Failures can be just frequent enough to not be definitive --i.e. you couldn't say that the failure rate was more than the spec, but you couldn't say that it was better than the spec.  This means that testing continues -- something tha nobody likes. 

 

hj

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cost is relative is it a $5 toy or a satellite or in a medical instrument where failure can cost life.  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In the list of failures I listed in #16, we might add noise.

 

Now, this requires a bit of dancing. IF the MCU is properly designed (that is, the various N & P channel FETs have the proper thresholds, etc) AND the power distribution system (in the chip) is properly done, AND if the external power remains within the specified voltage range (which implies proper power pin bypassing), then the MCU, itself, will NOT fail from internal noise. This is pretty much bread-and-butter technology, these days. Any device that violates these design rules does not deserve to be in the market, IMHO. 

 

BUT, external noise, especially on MCU inputs and external oscillator, CAN cause the SYSTEM to fail. This is not the same as an MCU failure, of course, though it tends to have very similar consequences. This is the kind of stuff that voltage spikes, ESD tests, and the like, for CE certification, are designed to detect. This is necessary because the hardware design technology has not yet reached the point where noise immunity is "bread-and-butter stuff".

 

So, for this particular "failure" mode, it is not, strictly speaking, the MCU. But, that said, the system really does not know the difference.

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
For example - the average server computer has ECC (error correcting code) on its memory.
Likewise for some EEPROM and NOR flash; MRAM has built-in ECC.

 

Improving reliability of non-volatile memory systems | Embedded

SEMPER NOR Flash - Cypress Semiconductor

16Mb MRAM - Parallel Interface | Everspin

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Product Change Notification - SYST-09HVXW011 - 10 Sep 2019 - Data Sheet - SAMV71Q21RT Radiation-Tolerant 32-bit Arm Cortex-M7 MCU

...

 

Description of Change:
1) Extends minimum operating temperature to -55°C for all package types throughout the data sheet.

2) Removes previous temperature range for Flash programming total Write/Erase specifications.

3) Eliminates USB option for SAM-BA boot programming.

4) Moves Drive Level specification (PON) for 32.768 kHz crystal oscillator from Table 58-20 to Table 58-19.

5) Moves Drive Level specification (PON) for 3 to 20 MHz crystal oscillator from Table 58-23 to Table 58-22.

6) Other minor typographic corrections throughout the document.

 

...

 

Attachment(s):

SAMV71Q21RT Radiation-Tolerant 32-bit Arm Cortex-M7 MCU

 

...

 

"Dare to be naïve." - Buckminster Fuller