Lessons learned and a couple of questions

Go To Last Post
13 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Gents,

I recently fought a few hardware/software battles and actually won! Don't ask me how... But I learned a few things (that I KNOW most of you 1000+ post counters already know!) that I want to share. First, a simple description of the project:

1) ATmega328 controller for my aquarium.
2) Drives 3 external high voltage (62V) LED drivers via PWM. 180W of LED power!
3) DS1307 RTC onboard and communicating via I2C
4) External 20x4 LCD display that communicates via I2C
5) Several Dallas 1-wire DS18B20 temperature sensors (water, free air, hood air, pump temperature, etc)
6) LED drivers on the back on the left, controller on the back on the right and the LCD display on the front on the left.

The first problem was found because of the second problem:

Watchdog time was NOT resetting the AVR even though it had "locked up".

Cause of problem: watchdog reset was inside an interrupt, not the main loop (or other critical) loop! Moving the watchdog reset to the inside of a fast and critical function caused the AVR to actually reset when the second issue occurred!

Watchdogs are nice, but know what you are doing with them. I didn't and it definitely could have been a problem on the fully implemented controller (heater anyone?).

The root cause of the "lockup" took way longer to figure out than I want to admit to. Especially because all the hardware worked PERFECTLY on the bench for over a month before I mounted the lighting system. Which was when these lockups started.

Cause of the problem: The cable carrying the I2C signal to the LCD had to transverse the length and width of the hood. To minimize the flexure on the cable when the hood was opened an closed, I ran cable right along the hinge joint nearly the width of the hood. Which put the cable nearly running the length the LED arrays right between two of the arrays.

When the LEDs approached 90% or more PWM (used a sine wave for dusk to dark lighting), they arrays generated enough noise on the I2C line to cause the RTC (same I2C bus as the LCD) to lockup. The RTC generated a 1 second square waver output that was used to set several different flags that told the main loop when to do things... No flags, nothing for the AVR to do, so it just sat there doing, well, nothing. Which looks a whole lot like a "lockup"!

Solution: Reroute the LCD cable to keep it as far from the LCD arrays as possible. My attempt to keep the LCD cable from being a long term problem (breaking wires) lead to a huge short term problem.

This does lead to one question I have: Does anyone have any recommendations for how to filter the signal and power lines on something like this to minimize noise issues? Would a simple RC filter were everything comes into the board work in "most/many" cases? Better ideas?

Does anyone have an recommended online resources for dealing with these types of noise issues?

Thanks for listening! I really had to get that off my chest for some reason!

 

Clint

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

5V/.02A is 250 ohms. Put 250 ohm pullups on the i2c lines. Lo impedance is harder to induce noise on?

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I2c is prone to lockups - especially when you run a length of wire. Implement a timeout so that if you don't get a response in X time that you reset the i2c state and retry. You could keep a fail count in eeprom. Similarly with the rtc interrupt.put a timeout on it as well. You could have a led flash out a code sequence to inform you of the failure.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, I2C is really not designed for that kind of service. How does one know this? The pull-up resistors are fairly large (in the general region of 1K-4.7K, IIRC). I2C uses open drain transistors to pull the line to ground. Open drain architecture is an invitation for noise. Much better when a line is actively pulled low or high.

Much better suited is RS422/485. First it is differential which tends to be less sensitive to noise. Second, it is terminated in about 100 ohms. This lower resistance makes it more difficult to induce noise.

There are a couple of things you could do to help your current situation. One is to make each line (clock and data) at twisted pair. This means two ground wires but that is OK. This may increase line capacitance with MIGHT mean slowing down your clock.

It is quite probable that the coupling mechanism is magnetic rather than electric. Magnetic coupling happens when there are open loops and the twisting will help this. Shielding rarely helps magnetic but will help electric. Have you tried shielded wires (twisted pairs)?

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Solution: Reroute the LCD cable to keep it as far from the LCD arrays as possible. My attempt to keep the LCD cable from being a long term problem (breaking wires) lead to a huge short term problem.

I assume you are referring to re-routing the I2C cable and keeping it away from the LED cable??

Twisting the I2C cable with a fairly fine twist will improve things. Using shielded cable cable would improve the situation too!

Where did the 20 mA come from Uncle Bob? I assume that there would have been 2.2K (or so) pull ups on the bus already for it to work. 2200//250 might be a little low?

Edit. Double post with Jim!
I actually took the assumption that it is more likely to be electric field induction Jim. Hence the shielding suggestion.

PS.
A few years ago, I was asked to troubleshoot a large pipe organ, which had been modified to operate with electronics driving solenoids. There were 100's of solenoids. Every once in a while the system would crash when stops were being changed.
The communication between the keyboard was with 10 wire IDC cables about 5 meters long. After checking lots of things , I finally decided to twist these cables with a twist every inch or so. Never played up again! It was actually not fun working inside the organ!

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

RRRoamer!
Congrats on identifying and solving the problems. Makes you feel great don't it?

Quote:
But I learned a few things (that I KNOW most of you 1000+ post counters already know!) ......

Since when does 1000+ mean you already know....WHAT?

Everyone here goofs up from time to time over the simplest things.

Great to hear you got that gizmo going. How about some pictures?

@Bob: What is up with that quip? :?

I would rather attempt something great and fail, than attempt nothing and succeed - Fortune Cookie

 

"The critical shortage here is not stuff, but time." - Johan Ekdahl

 

"Step N is required before you can do step N+1!" - ka7ehk

 

"If you want a career with a known path - become an undertaker. Dead people don't sue!" - Kartman

"Why is there a "Highway to Hell" and only a "Stairway to Heaven"? A prediction of the expected traffic load?"  - Lee "theusch"

 

Speak sweetly. It makes your words easier to digest when at a later date you have to eat them ;-)  - Source Unknown

Please Read: Code-of-Conduct

Atmel Studio6.2/AS7, DipTrace, Quartus, MPLAB, RSLogix user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

No flags, nothing for the AVR to do, so it just sat there doing, well, nothing. Which looks a whole lot like a "lockup"!

https://www.avrfreaks.net/index.p...
I posted about watchdog and noise in that post. But indeed, most of my apps have a "simple" WDR: Do whan a single "good" thing happens.

More robust is to code so that the WDR is only done when all critical things are occurring--an AND condition. E.g. The main loop is being traversed AND the timer tick is firing AND ADC interrupts are firing AND ... [rest depends on app] .

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for all the feedback and the very helpful links! I definitely plan on implementing a bit more robust WDT in the near future.

Thankfully, the system has been 100% reliable for the last 5 days since I moved that I2C cable away from the LED arrays. I also plan on putting a smaller (2.2k versus 4.7k) pullup on the I2C data line to make it just a little bit more robust. It probably won't make TOO big a difference, but it would make me feel a bit better!

I do like the idea of using shielded cable as well. Of course, I don't have any on hand, so I will need to pick some up soon.

 

Clint

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not sure if it is still "soon", but I DID finally install shielded cable. I had been running on two LED arrays, and when I brought the third array back online, the lock ups started again. Not as bad as before, but any lockup is unacceptable.

I installed shielded cable going to the LCD display and did my best to route the thing as far away from the arrays as possible. No lockups since. On the bright side, the changes I made to the watchdog timer did appear to do it's job and rebooting things when the RTC locked up.

 

Clint

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Clint, if possible, you might want to keep a count in eeprom as to how many watchdog reboots you have. Watchdogs are good at covering up problems which really should be resolved. I2C is notorious for lockups - you would want to have a timeout on the i2c operations. The lockups are generally due to interference - a glitch at the wrong time upsets it. Running I2C on a cable is asking for trouble. Like many things 'working' and 'robust' are two different words. With async serial comms, we expect errors so we add error checking mechanisms. With any sort of error strategy, the first question is:
can we detect it? If you can detect it, then you can take steps to mitigate the problem. Eg with your RTC, if you don't get a response in the required time, then it has failed - you can hit the watchdog or you can take steps in your software to recover the situation without having to reboot. You can even keep a count of the failures and put a message on the LCD to tell you there is a problem. You can then collect statistics - how many failures per day? If you make a change, you can then see if it is better or worse based on the numbers rather than guessing.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It appears the noise was causing the DS1307 to lock up, not the AVR. But the RTC was generating a 1Hz signal that was used to set the task flags for the AVR (nothing going on that had to be done fast than once per second).

Unfortunately, the LCD display driver I am using is I2C as well and the display needs to be mounded almost 6 feet from the controller (originally, they were going to be much closer, but plans changed...). The length plus the fact that the cable has to run near very high power LED lighting (180 watts worth) leads to a lot of noise on a bus not designed for it. Apparently, it is enough noise to really piss off the DS1307!

I was thinking about it last night and I think my longer term plan is to build my own LCD driver from an AVR and use RS485 instead of I2C. That should directly address the root cause of the problem (I2C noise) by using a more robust comms for the LCD AND by getting the LCD comms off the I2C bus and away from the RTC.

Plus, it gives me a reason to work with RS485 as well as processor to processor communications.

 

Clint

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well you can certainly lower the I2C resistors, both of them. By specification, I2C chips should be able to work with 3mA worth of current, which makes the minimum pull-up size of about 1.66 Kohms in a 5V system. I use 1.8k resistors daily with several meters of (shielded) cable, and it just works.

It is hard to say if wire capacitance is a big problem for you, because that would be seen on an oscilloscope by looking at the rising edge. If wire is too long and makes too much capacitance, you must change stronger pull-ups or slow down the clock frequency.

The capacitance is a problem, if the rising edge is too slow. I guess you use I2C speeds below 100kHz, so 1us rise time should suffice as per the specification, but some devices can still require the 300ns rise time.

But if you closely check the electrical specs of each I2C chip, if you can make the bus conditions work with all chips it does not matter if it is not done as per the I2C specification.

To fight off reflections, you can use series resistors from chips to wire, instead of connecting chips directly to the bus. Depending on bus pull-up value and chip IO pin strength, the resistor could be between 33R to 100R. It must still provide fast enough signal fall time with the existing capacitance.

Once I had to work with a I2C chip that was very sensitive to noise, and the chip datasheet suggested that small capacitors are added directly to chip I2C pins, with the series resistors I mentioned above as well. So basically, the chip sees a RC filtered version of the bus. I forgot if they were 22pF or 27pF, and if the resistors were 100R or something else, but something in that ballpark. The datasheet also suggested optimum pull-up resistors, which were surprisingly large, near 3k in a 3.3V system, allowing only near 1mA current.

The RTC might go haywire anyway because of EMI interference, not just because I2C. The crystal circuitry is very sensitive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:

No flags, nothing for the AVR to do, so it just sat there doing, well, nothing. Which looks a whole lot like a "lockup"!

https://www.avrfreaks.net/index.p...
I posted about watchdog and noise in that post. But indeed, most of my apps have a "simple" WDR: Do whan a single "good" thing happens.

More robust is to code so that the WDR is only done when all critical things are occurring--an AND condition. E.g. The main loop is being traversed AND the timer tick is firing AND ADC interrupts are firing AND ... [rest depends on app] .


The way I do it (probably very common) is to set a byte or word with 1s in bit positions corresponding to important WDT-resetting actions. Each action clears its associated bit and only when the entire byte/word is 0 will the WD get reset (and the bits set for the next pass).