When I started my current job the team I joined had one engineer doing circuit card design/development work. There were no development processes of any kind, documented or otherwise. This lack of processes combined with a lack of rigor / attention to detail on the engineer's part led to very poor designs and significant rework which eventually led to that engineer being reassigned to a position where they could do less damage. When that engineer was reassigned another engineer and I were tasked with putting some quality control processes in place as we took over. So far we have instituted the following:
- Hardware design description documents are developed prior to / in parallel with the schematic to capture functional intent
- Parts libraries, schematics, and board layouts are under configuration control (using git)
- Schematics are no longer illegible messes
- Reviews are performed prior to considering parts library parts ready for use or ordering prototypes
- CAD automated checks (ERC and DRC) are used
- Parts lists are generated from the EAGLE project instead of being maintained by hand
There is still lots of room for improvement. Some things I am particularly interested in (but am unsure how feasable they are) are:
- Where should simulations and manual circuit analysis be captured to aid reviewers and future engineers? Is the hardware design description document a good place for this? A project journal? Is there a more CAD centric approach that would be better?
- Is it possible to have automated part selection guidance (e.g. getting resistor power dissipation requirements from the simulation of a notional circuit design)?
- Would it be possible to use a CAD model to validate part selection (e.g. ensuring power and voltage ratings of selected parts are sufficient)?
- Is it possible to automate power consumption calculations?
We are currently using Cadsoft EAGLE for circuit card development work but will likely be switching to KiCad. Any insight anyone has regarding the above questions would be greatly appreciated as would any general hardware development tools, processes, and best practices advice. The rest of this post is just a description/rant of what I am dealing with that may help you tailor anything you have to offer.
Personal Background
- General exposure to electronics when growing up through assisting with home electrical projects and ham radio.
- B.S. in mechanical engineering specializing in mechatronics with a minor in math. Got just enough education/experience in electrical engineering and bare metal software development to be dangerous.
- 5.5 years work experience all of which has been at my current job which I started a few months after graduating. Nearly all the work I have done has been electrical engineering and bare metal software development.
- Since graduating the majority of my electrical engineering and software development knowledge has been gained through reading and self-teaching I have done on my own time. While I did have a mentor at work (the engineer who ended up getting reassigned), the vast majority of what this engineer was able to provide was a demonstration of the multitude of ways things shouldn't be done.
Organization/Team
The team I am a member of provides engineering, logistics, and user support services for a control systems used by another component of the wider organization. The control system we support is low volume (there are about 60 currently operational installations) and long lived (some installations have been in use for more than 25 years). Reliability and safety are major requirements. There have been 5 major generations of this control system:
- The first generation was designed in the mid 1970s, installed in the late 1970s - early 1990s, and retired in the mid/late 1990s.
- The second generation was designed in the late 1980s, installed in the mid/late 1990s, and is still operational. This generation is currently receiving a form/fit/function replacement that addresses obsolescence and supportability issues, and will need to remain operational for another 15 years.
- The third generation was designed in the late 1990s, installed in the early 2000s, and is still operational.
- The fourth generation was designed in the early 2000s, began to be installed in the mid 2000s (new installations are still occurring), and is still operational. This generation is receiving regular upgrades that address obsolescence and deliver improved capability.
- The fifth generation's design was recently completed, and preproduction validation is currently underway. This generation is a form/fit replacement of the third and fourth generations, will be used for future new installations, and will receive regular upgrades just like the third generation.
The team's tasking includes:
- Design and development of both hardware and software
- Acceptance testing of newly delivered hardware
- System installations and upgrades
- Local/remote system troubleshooting
- Component repair
- End user training and tech support
- Spare part procurement and forecasting
- Obsolescence management
My primary tasking is design and development of both hardware and bare metal software for some of the system's circuit cards that perform real-time control functions. There is currently one other engineer on the team who works with me on this, and another engineer (the one who was reassigned) who used to. I am occasionally involved in acceptance testing, system installations/upgrades, and troubleshooting.
Hardware Development History
Second Generation Form/Fit/Function Replacement Development
The first project I was involved in after being hired was the form/fit/function replacement of the second generation system. I was hired when the project had reached the prototyping phase.
The second generation control system consists of a single control panel that provides the user interface (electroluminescent display with infrared beam touch detection plus discrete indicators and switch inputs) and houses the system electronics. The system electronics (DIP and through hole parts), excluding the touch screen display, are distributed across approximately 30 small circuit cards (each approximately 2.5 inches by 2 inches) that are connected via a wire wrap backplane. A Motorola 6809 is the brains of the system. Software was written in a combination of assembly and Pascal. A Mac Classic and another computer running Microware OS-9 are required to compile the software, split the resulting hex file into multiple hex files for burning, and burning the hex files to the EPROMs. While everything in the system is obsolete, burn in issues with the electroluminescent displays and software supportability concerns were the major drivers of the form/fit/function replacement. The system wasn't simply replaced with one of the newer generations because of footprint differences which would drive up installation costs, and the higher costs of the newer generation systems compared to the cost of the replacement (including development costs). The touch screen displays were not procurable and the original manufacturers were unwilling to support repair efforts so it was decided to reverse engineer the display's interface to develop a replacement. This was done by another team within the organization while the team I am a member of took on the software supportability issue.
The original plan for addressing the software supportability issue was to just replace the microcontroller and EPROM circuit cards with a modern microcontroller. The engineer doing the electronics work selected an Intel 8051 based microcontroller (Atmel AT89C51) as the replacement because "the 8051 is an 8-bit microcontroller with a 16-bit external address space just like the 6809". This effort quickly stalled due to a lack of documentation for a variety of ICs used in the system and their tight coupling to the 6809. At this point it was decided that the entire wire wrap backplane and associated circuit cards would be replaced by a single monolithic modern circuit card (approximately 14 inches by 8 inches and 16 layers) but the microcontroller selection was not reevaluated. No other engineers on the team at the time these decisions were made had the background to meaningfully evaluate the merit of these decisions.
A combination of the standard version of EAGLE not supporting layout of a circuit card of the required size and the engineer on the team not having board layout experience led to the schematic being developed in house (using the engineer's personal copy of OrCAD instead of EAGLE) with PDFs being passed on to a support contractor to do the layout in a different tool. Any design decisions, calculations, and simulations that went into the development of the schematic were not captured, and the schamtic was illegible. No review of the layout was performed before the first prototype was ordered.
The first prototype was delivered about 2 weeks after I started and I immediately became involved in its testing alongside the engineer who designed it (my "mentor"). This initial prototype only had the digital section populated because the design required the digital portion to be working in order to test the other sections. Numerous issues were discovered during testing including mismatches between the schematic and board layout, and a general lack of decoupling which prevented even the testing of the digital section from being successfully completed. It ended up taking 6 prototypes and 1 calendar year of effort to get the circuit card to a (barely) functional state. Multiple multi-day manual reviews were required prior to the ordering of each prototype just to ensure the schematic and board layout matched up. At one point it was discovered that the engineer had lost configuration control of the schematic and the schematic had to be reverse engineered from the board layout. After installation had started a hardware race condition was discovered that had to be corrected. Anytime something goes wrong with one of these circuit cards it is next to impossible to diagnose.
On the software side of things the engineer insisted on the use of assembly which I went along with at the time since I didn't know any better. Additionally, the engineer insisted on restricting the project to using a single source file so that we could fall back on a legacy toolchain they had if necessary. The single source file restriction was eventually abandoned but I failed to convince management that we should switch to C for the project. The source code ended up being approximately 55000 lines (including comments and whitespace), the vast majority of which written by me, and I ended up with a much greater appreciation for compilers.
Fifth Generation Design/Development
Design/development of circuit cards for the fifth generation system began when testing of the second generation form/fit/function replacement was still in process. My involvement in this project was limited until development of the second generation form/fit/function replacement was complete.
Going into this project it was decided to develop both the schematic and board layout for the custom 3U CompactPCI cards in house using Cadsoft EAGLE. I pushed for automating parts list generation, and using git for configuration control but this was ignored. Design decisions, calculations, and simulations that went into the development of the schematic were once again not captured, and the schematic was once again illegible. Parts list issues were encountered during ordering of the first prototype. Issues with the circuit card layout, such as silk on copper, were identified by the circuit card manufacturer. Various issues were encountered during testing, including multiple drivers attempting to drive the same signal at power on, but nothing was anywhere near as bad as the other project. These kinds of issues continued to be encountered when the second prototype was ordered, and a lack of configuration control made it impossible to know what exactly was ordered.
When it came time to order a third prototype of this card and the first prototype of another, parts list issues led to multiple ordering delays which resulted in management having another engineer perform a review. The engineer performing the review was familiar with EAGLE and started by running DRC which identified more than 2000 errors. The vast majority of these errors were caused by the original engineer not even knowing how to set a pad size in EAGLE (one of the first things covered in any EAGLE tutorial) so they instead just added copper around the pad which generates overlap errors. One severe issue was found buried in all this noise, a short between a power rail and ground. When these issues were reported to management the original engineer was reassigned. The engineer who performed the review and I then took over.