TWI oddities on the AT90CAN128

Go To Last Post
19 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I seem to have an odd problem with the TWI bus on the AT90CAN128 which I haven't seen with the same program running (with register address changes) on the Mega128.

I'm using the TWI bus for a very simple purpose: to write a setting to a Digipot chip. This requires sending only 3 bytes: address, function code and new setting - taking a couple of hundred microseconds at most. The whole TWI process happens with interrupts turned on, and it doesn't seen to matter whether I use the TWI interrupt to wake up the CPU or I simply loop until complete.

I have a debug program attached to USART 0, streaming out data at 115200 baud, and also two software UARTS at 9600 baud using timers 1 and 3. These all work fine until I use the TWI interface to send the three bytes mentioned above, at which point data gets lost on USART 0 and/or either of the two software UARTs, and data flow seems to stop, sometimes for a second or more at a time, exactly as if the processor had interrupts turned off for that time. Then the program picks up as if nothing had happened, except for some occasions when the watchdog times out and there is a reset - and of course the program gives an error if any serial data has been lost during the outage.

I would assume I was doing something silly, but the TWI routines have worked fine in several other projects, transferring a lot more data than I am attempting at the moment, and as I said the same program seems to run fine with the Mega128. Unfortunately the Mega128 is not a long-term option as at some point I will need the CAN bus (I'm not using it yet).

Has anyone else seen any funnies which seem to have been introduced into the TWI in the CAN128?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Check out the current data sheet for the AT90CAN32/64/128 data sheet page 210 section 18.5.2 Bit Rate Generator Unit. It has this note:

Quote:
Note: TWBR should be 10 or higher if the TWI operates in Master mode. If TWBR is lower than 10, the master may produce an incorrect output on SDA and SCL for the reminder of the byte. The problem occurs when operating the TWI in Master mode, sending Start + SLA + R/W to a slave (a slave does not need to be connected to the bus for the condition to happen).

I did not find this on the ATmega128 data sheet. So far, why this TWBR note only appears in a few AVR data sheets has been an unanswered mystery. As far as I know, no one has asked ATMEL (or at least reported back any information to AVRfreaks).

Still, this does not sound like a probable source of your problem, however it is a documented TWI difference.

An unanswered TWCR register TWINT flag will stretch SCL and this flag is not automatically cleared by hardware when executing the interrupt routine. If the TWIE bit is set, as long as the TWINT flag is set a constant interrupt will be generated. If your TWI does not need interrupts make sure you never set the TWI bit. In any case the USART and timers have a lower interrupt priority, so an out of control constant TWI interrupt should not be able to block these interrupts. However, an out of control constant interrupt re-enables global interrupts one instruction after the interrupt return (reti instruction), which means the non-interrupt code only gets one instruction execution for every rogue repeated interrupt response. Even though TWI cannot block the USART or timer interrupts, it can force the program code to run so slowly that the code cannot get anything done after a USART or timer interrupt. Off the top of my head, an out of control TWI interrupt is the only thing that I can think of that might do this.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

interesting that same quote appears in my mega128 datasheet (admittadly it's an old copy) so it appears to have mysteriously dissappeared over time. (the note was added between revisions C & D of the datasheet, according to the change log, the new datasheet does not go back that far in the change log, nor does it show it being removed) Either this requirement has changed (doubtful) or it was accidentially deleted in a revision, and has since been copied into other datasheets.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Mike B wrote:
Check out the current data sheet for the AT90CAN32/64/128 data sheet page 210 section 18.5.2 Bit Rate Generator Unit. It has this note:
Quote:
Note: TWBR should be 10 or higher if the TWI operates in Master mode. If TWBR is lower than 10, the master may produce an incorrect output on SDA and SCL for the reminder of the byte. The problem occurs when operating the TWI in Master mode, sending Start + SLA + R/W to a slave (a slave does not need to be connected to the bus for the condition to happen).

I did not find this on the ATmega128 data sheet. So far, why this TWBR note only appears in a few AVR data sheets has been an unanswered mystery. As far as I know, no one has asked ATMEL (or at least reported back any information to AVRfreaks).

TWBR is set to 11. The module was originally written for the Mega168, and several Mega128 modules seem to have been copied to the CAN128. This could explain the different behaviour from the Mega128, but not why my code works fine for the Mega168 and not the CAN128. Setting TWBR too low might result in bad timing and so incorrect response from the slave device, but this is not happening. The TWI transaction is working just fine - it is the rest of the interfaces which seem to be going mad.

Quote:
Still, this does not sound like a probable source of your problem, however it is a documented TWI difference.

An unanswered TWCR register TWINT flag will stretch SCL and this flag is not automatically cleared by hardware when executing the interrupt routine. If the TWIE bit is set, as long as the TWINT flag is set a constant interrupt will be generated. If your TWI does not need interrupts make sure you never set the TWI bit. In any case the USART and timers have a lower interrupt priority, so an out of control constant TWI interrupt should not be able to block these interrupts. However, an out of control constant interrupt re-enables global interrupts one instruction after the interrupt return (reti instruction), which means the non-interrupt code only gets one instruction execution for every rogue repeated interrupt response. Even though TWI cannot block the USART or timer interrupts, it can force the program code to run so slowly that the code cannot get anything done after a USART or timer interrupt. Off the top of my head, an out of control TWI interrupt is the only thing that I can think of that might do this.


I like the idea of the continuous interrupt slowing everything down, but the fact is I'm not enabling the TWI interrupt at all. If I was, then presumably the same fault would happen on the Mega128, but it doesn't.

Clock stretching is not happening - both SCL and SDA are high between transmissions. In any case, I don't see how this could hold up everything else.

I've now had exactly the same program (apart from the different register addresses) running for several hours on the Mega128 with a test loop doing a TWI send at least twice a second, with no problems. On the CAN128, even transmitting once a second was normally enough to make everything else slow to a crawl.

Very odd. Since the TWI transmission I need is very simple, I may disable the TWI controller and bang out the bits in software, but I would like to know what is happening all the same.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

timbierman wrote:
Clock stretching is not happening - both SCL and SDA are high between transmissions. In any case, I don't see how this could hold up everything else.
I didn't mean it did contribute, I was just parroting part of the data sheet. The failure to automatically clear the interrupt flag was the interesting part. When you get around to using it, you will also find all the CAN IT interrupts are not automatically cleared.

Could you try a simplified test case. Maybe only send a single byte on the TWI? It will not control the POT correctly, but playing with the overall TWI execution timing and evaluating any crawl responses might yield some more information.

There is an ATmega128 to AT90CAN128 migration document (AVR096). It is longer then most, but if you take the time to go through it you might discover an important difference. Maybe something like a register bit name change that was overlooked or a side effect from some unexpected/overlooked change.
http://www.atmel.com/dyn/resourc...

A way to see the SREG register global interrupt I bit and USART/timer interrupt flag values during the TWI execution would be useful, even if it was only a few samples stored in some registers or SRAM and viewed later. Do you have any JTAGICE or ICE50 capability?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Mike B wrote:
timbierman wrote:
Clock stretching is not happening - both SCL and SDA are high between transmissions. In any case, I don't see how this could hold up everything else.
I didn't mean it did contribute, I was just parroting part of the data sheet. The failure to automatically clear the interrupt flag was the interesting part. When you get around to using it, you will also find all the CAN IT interrupts are not automatically cleared.

Thanks for the tip about the CAN interrupts. I can see there's a hell of a lot I'll need to learn about that stuff.

Quote:
Could you try a simplified test case. Maybe only send a single byte on the TWI? It will not control the POT correctly, but playing with the overall TWI execution timing and evaluating any crawl responses might yield some more information.

In a sense I've done that already. The code drives three pot chips but the circuit currently only has the first of these. If I address the second or third chip it should give up after the address byte when it doesn't get an ACK. I think the problem happens a bit less often when I do this, but it's hard to be sure.

I'm wondering whether the problem happens if the TWI is transmitting while one of the other devices is ready to interrupt, and is somehow causing an interrupt to be lost. It would be odd if this caused the interrupts of two timers and one USART all to fail. The appearance of a complete hangup could be if it causes corruption of the data stream out to my debug program, causing a bad CRC, which would make the display freeze until it got some good data. Still seems like a long shot that something like this could go unnoticed at Atmel, though.

Quote:
There is an ATmega128 to AT90CAN128 migration document (AVR096). It is longer then most, but if you take the time to go through it you might discover an important difference. Maybe something like a register bit name change that was overlooked or a side effect from some unexpected/overlooked change.
http://www.atmel.com/dyn/resourc...

Yes, read it, thanks. The only mention of TWI is that the register formats have not changed (though the addresses and interrupt vector have).

Quote:
A way to see the SREG register global interrupt I bit and USART/timer interrupt flag values during the TWI execution would be useful, even if it was only a few samples stored in some registers or SRAM and viewed later. Do you have any JTAGICE or ICE50 capability?

That may be worth a try. I'm not using JTAG or ICE, but as I said I have my own debug system. This works by repeatedly streaming the whole of memory out of USART 0 plus framing and CRC, whenever the CPU has nothing better to do. A PC program displays it all as hex and also interpreted memory locations based on the symbol table from the assembler. The result is I can view and patch memory almost in real time (it refreshes about once a second at 115200 baud), so if I want to trace a bit of code I make it store stuff in a spare bit of RAM where I can see it. That way the program is running more or less exactly as it would in real life - if anything it is stressed a bit extra by the debug routine, which makes anything that is going to go wrong more likely to do so during debugging. I've gone through the code looking for places where the interrupt enable in SREG could get turned off, but I haven't found anything. And there is still the mystery of how the problem does not occur at all on the Mega128.

One idea just occurred to me ... the debug routine streams out the I/O registers as part of the memory space. I deliberately avoid those registers for which reading has a side effect: UDR0, UDR1, SPDR and possibly TWDR. I wonder whether the new TWI interface has side effects from reading the other TWxx registers as well. It's a long shot, and doesn't explain why the same does not happen on the Mega128 or Mega168 (which seems to be the same TWI interface as the CAN128). Next time I try the CAN128 I'll stop it reading any of those registers and see if it helps.

I'd like to spend ages tracking this down, but I suspect I'll have to use software TWI/I2C for now - the customer isn't paying me to tinker!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The ATmega128 has its registers in lots of different places that were moved across the 0x1F I/O address boundary in the AT90CAN128. Not much chance, but maybe one of the new register placements makes it sensitive to being read all the time?

The USART interrupt flags RXCn and UDREn are strictly controlled by the USARTn state. The only thing bad that happens with RXCn could be USARTn data buffer overflows before UDRn is read. An unanswered UDREn would be bad, as the interrupt response usually disables this sooner or later (still it could only be unanswered if higher priority interrupts were occurring or global interrupt are disabled anyway). The TXCn interrupt flag is more conventional being set by a USARTn Tx complete and automatically cleared by executing the interrupt vector.

The timer counter interrupt flags are also more conventional in nature. The key is setting an interrupt flag multiple times only guarantees an eventual single interrupt response to any previous interrupt event or events. These do not really loose interrupts in the sense of always providing the guaranteed single interrupt response, they can just get delayed while multiple interrupts might keep occurring. Can you tell I'm mostly rambling here :)?

The only other thing that comes to mind is a bug in a higher priority interrupt routine. This may include something like a formerly used external interrupt that is not connected to anything in the current hardware configuration and is getting stray noise based triggers. It might not be enough all by itself to cause the crawl, but it could soak up enough interrupt response availability to be the straw that helped break the camel's back.

I don't see how it would make any difference to trigger the crawl, but have you taken the new CLKPR into account and cleared the CKDIV8 fuse? If you are using external SRAM there was an old AT90CAN128 errata about not placing the stack there. BTW, I expect someone at your obvious level of competence to have read through all the AT90CAN128 data sheet errata. How is your AT90CAN128 chip marked (it might be an early engineering/evaluation chip with maximum errata including some errata that only exists on the original AT90CAN128 data sheet)?
https://www.avrfreaks.net/index.p...

Still, the observed connection of having the TWI running as being the defining condition for a crawl and not using TWI interrupts is a puzzler. I would say there was an accidental cli, except it runs on the ATmega128. Have you checked to see if the assembler/compiler/whatever is the same version for the ATmega128 code and AT90CAN128 code. It's a real long shot, but you may have been bitten by a compiler bug/difference or an "upgrade" of a library routine. If they use the same tool chain try doing a fresh ATmega128 (assuming your ATmega128 might have been compiled a long time ago?) and quickly testing it for problems.

Please PM me and provide an e-mail address if you want a copy of my CAN on the AVR article. It is around 118 kb, which is larger than I want to try and stuff through my PM out-box.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for all the suggestions, Mike, but I can't see anything in them that could apply here. I think I'll have to do the software TWI for now, and come back to the problem when I have more time.

I was just hoping someone else had found a hardware problem with the CAN128 TWI, but it doesn't look like it. Either I have a subtle bug, or there is a hardware problem waiting to be discovered. Either way I'll report back here if I find it.

Compiler/assembler changes are not an issue. I use my own assembler for the code that needs to run fast, and my own compiler - a sort of hybrid of C and BCPL - to generate virtual machine code for the slower, user interface stuff. The only thing that changes between Mega128 and CAN128 assemblies is the file of register definitions. Unfortunately I can't do a straight byte comparison of the two binaries since the move of I/O registers into the region that needs LDS/STS instead of IN/OUT opcodes has made the CAN128 binary about 100 bytes longer (out of a total of around 50,000).

Thanks for the offer of your CAN article. Any guidelines to get me started are welcome. I'll PM you an email address.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK, I think I've got it working, but the solution is even more confusing than the original problem ...

First I disabled the TWI hardware and wrote a software loop to control SCL and SDA instead, at a modest 250KHz clock speed. This worked fine with the Digipot on the Mega128, without causing the program to crash. Next I went back to the CAN128, expecting everything to be fine now, but no ... same problem as before! Then I noticed that the TWI sequence on the scope was taking about 350 microseconds, where it should only take about 140. Could the CPU somehow be running 2.5 times slower than the crystal speed? Why 2.5 and not, say, 2 or 4 times slower? Very odd!

Still stranger, as long as I wasn't driving the TWI stuff, the baud rate and timer clocks seemed to be running at the right crystal speed (14.7456MHz) - data in and out of USART 1 at 115200 baud was working fine, the software UARTS on timers 1 and 3 (9600 baud) and two piezo sounders on timers 0 and 2 all seemed fine. How could the CPU be running at a different speed from the peripherals?

Anyway, clock speed was a clue, so I looked through the data sheets and discovered that the fuse bits selecting crystal type had changed: I had CKSEL 3..0 set to 1011, as for the Mega128, when it should have been 1111 for the crystal speed I was using. Changed this and - yeah - it works!

But how can this happen? I was about to write that I hadn't a clue, but I just had a flash of inspiration. When the fuses were set wrongly, the crystal was ocillating very weakly. Even attaching a 10x scope probe to either lead caused it to stop. So, what signal pins are on either side of the crystal? One side is ground - no problem - but the other side is SCL!

So, I think that is the solution. Driving the crystal too weakly was working OK as long as there weren't any changing signals close to the crystal pins, but as soon as I started driving the TWI bus the crystal oscillator turned flaky. This caused all of the baud rates and timers to go temporarily haywire, not to mention clocking the CPU with a bad waveform (which could explain the watchdog resets).

That was a really weird one. It was fun to find the answer, but I could have done without the hassle!

(Note to Atmel: The application note describing differences between the Mega128 and CAN128 (AVR096) says that CKSEL3..0 are the same in the two chips. Maybe it should be revised to show the differences.)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Congratulations, you found a tough one and your analysis sounds completely plausible to me.

I agree, the AVR096 CLKSEL bits should not have been labeled "Idem" in table 8 Fuse Low Byte. Although, they are still clock select bits, their setting values did change.

I think this AVR096 documentation problem sounds worthy of posting in the AVR errata sticky (you could also let ATMEL know):
https://www.avrfreaks.net/index.p...

There is no telling how many other people might get caught or having trusted the ATMEL application note documentation may already be running weak AT90CAN clocks, and just not have noticed because they didn't use TWI. Even without using TWI, I suspect this could also become a problem when these chips are exposed to real word noise/interference or variations in AT90CAN32/64/128 chip production runs.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think I may have spoken too soon. The new fuse settings definitely make the CPU a lot less flaky than it was, but the crystal still seems to be being driven at very low power. The waveform on the pin adjacent to SCL has serious noise on it whenever the TWI interface is in use. This may be because I have the CPU chip in a TQFP to DIL adaptor on a breadboard, so the Xtal and SCL tracks probably run close to each other for several centimetres, but it is still worrying that the CAN128 seems to drive the crystal at a level which is seriously sensitive to noise pickup - particularly as the chip is intended for automotive applications (maybe the automotive version has a higher crystal voltage - I don't know).

The CPU will not work reliably at all in the breadboard if I use the full 16MHz crystal, and seems to be a lot more prone to falling over than the Mega128. I think I may have to abandon the on-chip crystal oscillator altogether and use an external oscillator.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There is no question about finding the problem in the migration document AVR096: Migrating from ATmega128 to AT90CAN128 (for the wider audience out there).

I would double check the crystal specifications and the oscillator safety factor. This ATMEL application note is a goof reference AVR042: AVR Hardware Design Considerations (although its information is generally more useful for finished PCB designs).
http://www.atmel.com/dyn/resourc...

ATMEL seems to always be fine tuning the oscillator designs, so I would not necessarily expect the mega128 to be a valid comparison for loading capacitor values.

There is always the possibility the crystal itself is bad (as in out of specification). Try another crystal.

Breadboards are notorious for clock noise and clock interference (a clean shielded PC board layout is not possible). Breadboard designs like the ATMEL STK500 use an external oscillator with the crystal socket, to avoid the problems of using an external crystal directly on the AVR in a breadboard wiring environment (as you are thinking of doing).

The act of probing the crystal leads itself unbalances the crystal loading, so measurements made on these pins are distorted and effect the oscillator operation (even with high quality test instruments).

Fellow AVRfreak KKP reported having the MLF version of an AT90CAN128 in a product for acceptance testing. If I remember correctly the testing facility tried out their upgraded RF amplifier and subjected the product to very strong external RF interference using an RF source of over a kilowatt. The AVR did just fine. I haven't seen/heard anything to suggest the AT90CAN128 external crystal operation is substandard, quite the opposite in fact.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The design guidelines look useful. I'll pass them on to the person doing the PCB layout, though he already has a fair bit of experience laying out AVR circuits.

I'm using 22pF loading capacitors as suggested. If I remove them at 16MHz the oscillator stops working reliably, but at 14.7456MHz it still seems OK. I've moved the crystal as close to the chip as I can, shortening the connection by 3-4cm, but that still leaves 5-6cm to the actual chip in its ZIF socket.

It's usually a good idea to make a breadboard as sloppy as possible, then the final board if more or less guaranteed to be more reliable. Of course it's no help if the breadboard design fails to run at all.

It's encouraging that you know somebody who has the CAN128 working reliably, so maybe the final board will be fine. Just in case of problems, I'm allowing an option to use an external oscillator, and I'm moving the software-emulated TWI to pins well away from the crystal. I'll make sure the pin adjacent to the crystal is one that does not change state regularly, such as maybe the one that holds on the power while the program is running. I'm also going to reduce the frequency to 12MHz provided that the program can keep up with all it has to do - it's a frequency that divides down well to both standard baud rates and CAN clock speeds.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

We have the same problem with the TWI interface on AT90CAN128
We verified this problem with some probes of AT90CAN128 (0718) and some ATmega128.
Both running on internal oscilators at 8MHz and other with 6MHz external crystal oscilators.

You see, there is no acknowledge from the slave with AT90CAN128.

We use a Phillips PCF8582AP EEPROM, and other I²C (TWI) devices does not work too.

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gottfried wrote:
We have the same problem with the TWI interface on AT90CAN128

I was confused by your images. It seems, SCL was inverted.

Are you sure, that there was no short circuit from SDA to another output pin?

Or the Pullup resistor was wrong?
The pullup should not exceed 3mA (>= 1.8k).

It seems so, because you can see the ACK from the slave, but it was only able to drive down to 3V.
So there must be something on the SDA line, which drives strong high.

Try to remove the pullups from the SDA line to see, if the I2C-master function drives the strong high (push-pull instead open drain).

Also try to write 0 to the SDA-Pin on the PORT register, even if it was overwritten by the SDA function.

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is your O-scope inverting SCL or is that what the signal actually looks like?
You wouldn't get an ACK if that's what the SCL signal is.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

danni wrote:
gottfried wrote:
We have the same problem with the TWI interface on AT90CAN128

I was confused by your images. It seems, SCL was inverted.

Are you sure, that there was no short circuit from SDA to another output pin?

We used AVR STK500 with STK501, we plugged AVR JTAGICE on the board and use the same software compiled on AT90CAN128 or ATmega128. We replaced the cpu on the socked.

danni wrote:
Or the Pullup resistor was wrong?
The pullup should not exceed 3mA (>= 1.8k).

We have pullup resistor 4.7k.

danni wrote:
It seems so, because you can see the ACK from the slave, but it was only able to drive down to 3V.
So there must be something on the SDA line, which drives strong high.

Every I²C (TWI) adresses drive down to 3V, the same on addresses where no device plugged on the TWI.

danni wrote:
Try to remove the pullups from the SDA line to see, if the I2C-master function drives the strong high (push-pull instead open drain).

Also try to write 0 to the SDA-Pin on the PORT register, even if it was overwritten by the SDA function.

Peter

We wrote the PORT register to output and 0 to SDA-Pin and also we wrote PUD to 1 (Pull up disable) but it have no effect.
JTAG fuses written and all AT90CAN128 and ATmega128 runs on internal clock 8MHz. But the same with external crystal oscilator.

Maybe there is a problem on TWI at AT90CAN128 or there are new features we do not know?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

We build a software TWI master and now it works :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Setting the port pins to tri-state while the TWI is enabled works at the TWI interface now!