Difference between mega88 and mega48 causing problem?

Go To Last Post
15 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

We have a product that runs a DC motor using PWM for speed control, and keeps track of motor position using a magnet/hall sensor encoder. The product was developed using mega88, but was ported recently to mega48.

The AVR is configured to run at 8 MHz on the internal RC oscillator with OSCCAL set to the value calibrated using the AVR053 app note procedure. The hall sensor inputs trigger a pin change interrupt to start a position "reading", but the I/O pins are debounced before their state is read, and the pin change interrupt is masked during the debounce period. Timer0 is used for PWM, and Timer2 is used with overflow interrupt for a heartbeat. The analog comparator is used polled (its interrupt is disabled).

During the transition to mega48 we were seeing the position drift between motor runs in cycle testing, and we believe we've narrowed it down in the most recent test to pcbs that have the exact same hardware except for the mega48/mega88 change. We tested about 16 boards, about half each with mega48 and half with mega88, and burned the exact same code except for 1) using the appropriate include file for the m48 or m88, 2) eliminating an EEARH reference in the mega48 revision, and 3) changing the location where OSCCAL is loaded from. The pcbs with mega48 drift significantly and the pcbs with mega88 do not.

Does anybody know of any differences (documented or otherwise) between mega48 and mega88 that could account for this difference? I've read through the sticky thread on undocumented errata and nothing sticks out glaringly.

Thanks,
Mark

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are you using C for programming?

Nothing as silly as running out of RAM with the smaller chip?

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Nothing as silly as running out of RAM with the smaller chip?

Usually running out of RAM means program crash.

M48's flash is almost full ? If yes, perhaps the compiler is trying a size code optimization in order to fit it in a smaller space and some time-sensitive loops takes longer.
George.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

angelu wrote:
Quote:
Nothing as silly as running out of RAM with the smaller chip?

Usually running out of RAM means program crash.

And exactly what form would that crash take?

All one can say is that running out of ram can cause the programs execution to be come erratic. Symptoms may be as mild as corrupted data, or as severe as returning to random locations in flash due to a corrupted stack, resulting in an apparent reset.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Exactly this form:

Quote:
All one can say is that running out of ram can cause the programs execution to be come erratic. Symptoms may be as mild as corrupted data, or as severe as returning to random locations in flash due to a corrupted stack, resulting in an apparent reset.

I just want to add that if the uC is driving some power semiconductors, the effects could be disastrous.
For example if the uC is driving the motor, the damage is happening so fast, and can result in power transistors, drivers, uC it self to get smoke and it is very hard to estimate the events order.
But only a drift from the exact position, it is not a run out of ram symptom.
George.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But my point, which you seemed to have missed, is that the "crash" could still be a fully operating program, with only apparent error in data (position in this case). The program need not crash in a disastrous manner. So I am saying that yes a memory overflow could result in exactly this type of error.

Having said that, we would need to have much more detail to be able to answer more thoroughly.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I stopped using the internal RC because of the drift due to temperature of the mega 48. I can't say that it's any different than the mega88, but, if timing is critical, stay away from the RC.

official AVR Consultant
www.veruslogic.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

^^ absolutely. I have had several cases of unusual program behaviour that occurred because stack was overwriting a data array on a small device. The effects were relatively minor but the chip doesn't "crash" as such, it simply continues executing with garbage data and as a consequence the decisions it makes are incorrect.

micros don't give you a bluescreen when they crash, they either carry on executing despite garbage data or if you go really wrong then the program counter goes somewhere interesting, and eventually wraps around to the reset vector and your micro re-starts.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I actually wrote the code the OP is talking about. It is entirely in assembly language, and has been tested a lot on boards using the mega88 so I am sure there are no memory overflows. The processor runs a dc motor using a P+N mosfet H bridge with the N FETs being PWM'd to keep a constant speed derived from the quadrature hall sensors. That is all working well. The position is also determined from the hall sensors. The pin change interrupt is used to start a debounce timer for the hall sensors, the interrupt is turned off during that debounce time, then the encoder is read to determine direction.
the RC oscillator can be off a few percent and not really hurt anything.

The product cycle test is: move to a position and stop (apply brake) - then power is removed. Apply power again which causes a move the other way, back to the start position, brake, off. Over and over.
The problem occurred when we changed to the mega48, the stop positions slowly drift in the direction of the motor torque.
we are hoping someone out there can say that the mega48 handles something different in timing or latency or anything that might impact an interrupt or something to cause this error.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why try to guess what's happening inside the "black box" when the 48/88 have debugWire? Just connect a Dragon or JTAGICEmkII and observe what is causing the problem.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
has been tested a lot on boards using the mega88 so I am sure there are no memory overflows.
But the point we are trying to make is that the M48 has only HALF the ammount of ram of the M88.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

John,

As Asm programmers generally don't dynamically create stack based variable frames (like C compilers sometimes do) then presumably the only danger is that the CALL/RET stack at RAMEND might be growing down and hitting the last allocated variable. I'd have thought it should be fairly easy in a 4K application to determine what the deepest CALL/RET nesting were and hence what the stack requirement would be.

Again, if a dW interface were used a useful technique would be to flood SRAM with a known value, run for a while and then determine that there are still some bytes holding that value beneath the stack.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Agree, but we just don't know how the code is written. So the method you describe above would work to make sure that nothing is amiss there.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The code uses 5 RAM variables using a grand total of 5 bytes, so we're definitely not running out of space there. All other variables are in registers. There is a circular buffer that stores position in EEPROM, and that buffer is mirrored in RAM. It's conceivable there's some issue in the position buffer code, but that is highly unlikely given 1) the buffer code is used in many of our products and is one of the most thoroughly tested pieces of code we use, and 2) the symptom is a gradual drift rather than a drastic displacement.

Regarding the possibility of the stack overwriting system variables, the app leaves 256 bytes available to the stack on the mega48. As clawson pointed out, only call/return balancing and nesting is really a potential issue here, as well as a handful of push/pop pairs for preserving registers in subroutines or ISRs.

FYI the size of the code when assembled is actually sub 2K.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
EEAR8 is an unused bit in ATmega48 and must always be written to zero.

Quote:

2) eliminating an EEARH reference in the mega48 revision,

Just on a whim, you might try putting EEARH setting back in, in case EEAR8 "floats".

(let's see how CV does it...) Never mind--CodeVision also leaves it untouched in a Mega48 and I've moved several/many apps '48<=>'88 without problems.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.