AT90CAN128 - How to handle CANSTMOB after CAN error

Go To Last Post
11 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am developing a CAN-driver for my AT90CAN128 driven project.

Right now, I am facing problems handling CAN-error-interrupts.

The testing environment is simply unplugging all CAN-nodes, then trying to send a CAN-Message.

After sending, I get an interrupt from the CAN-engine which is correct.

At first, CANGIT contains 0xA0 (CANIT & OVRTIM)
I don't want OVRTIM-interrupts, so I didn't set the according flag, but obviously the OVRTIM-flag is set even with no interrupt.

CANIt is read-only, so I don't have to handle it.

I wonder, that AERG is NOT set in this situation ( see below)

In CANGIE all bits are set, except for ENBX and ENOVRT.

CANTEC is 0x86 and stays there, meaning it does not count any higher.

Remember - the CAN message is blown into space, because all bus members are unplugged.

CANSTMOB contains 0x01 after setting CANPAGE to the MOb which triggered the interrupt.

So this means "Acknowledgement error", which must be result of unplugging the other nodes, meaning that noone received the message.

Q1: will always only ONE bit be set in CANSTMOB ? e.g. TXOK or RXOK or one of the error bits or can there be a combination ?
Q2: I am facing AERR, but why only in the according MOb, but not as an general error ? (see above)

Q3: The datasheet says something "• Bit 0 – AERR: Acknowledgment Error
This flag can generate an interrupt. It must be cleared using a read-modify-write software routine
on the whole CANSTMOB register."
I do a "CANSTMOB=0" which does not change the status of AERR.
So I get the interrupt again and again until I replug a CAN-node; then the AERR disappears without any manipulation.
How must I do a "read-modify-write" ?

Thank you for your help

I program like a man:
COPY CON: > firmware.hex

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Usually interrupt flags are cleared by writing a one to it

CANSTMOB|=(1<<TXOK);

will clear TXOK to zero.

(No idea if the registername is right, just for illustration)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

The collection of Message Object buffer (MOb) registers are different from the normal AVR I/O registers. When reset, they have no initialized value and the CANSTMOB interrupt flags are not cleared the usual way by writing a 1. In the data sheet section 19.11.1 CAN MOb Status Register - CANSTMOB, the way to clear these bits is a read-modify-write software routine. For example:

CANSTMOB &= ~(1 << TXOK)

This will work for clearing the TXOK interrupt flag assuming there is no interference from any compiler optimization. Keep in mind the CANGIT register is a normal AVR I/O register and not a MOb register, so it must have its interrupt flag bits set to a 1 in order to clear them.

drnicolas wrote:
I don't want OVRTIM-interrupts, so I didn't set the according flag, but obviously the OVRTIM-flag is set even with no interrupt.
With a few exceptions, typically the interrupt flag is what remembers an interrupt occurred inside the AVR hardware. Often interrupt flags are still able to be set, even when the associated interrupt enable is not set. If the interrupt is not enabled, then it is not responded to when its flag is set. In fact with some interrupts you can just poll the interrupt flag without ever enabling the interrupt. This type of interrupt flag behavior is why an interrupt flag is usually manually cleared before enabling the interrupt, to prevent an unwanted interrupt from some old event that set the flag.

drnicolas wrote:
CANIt is read-only, so I don't have to handle it.
You do not handle it directly. You handle it by clearing any enabled CANSTMOB or CANGIT register flags.

drnicolas wrote:
In CANGIE all bits are set, except for ENBX and ENOVRT.
Did you remember to also set the CANIE1 and CANIE2 bits for any enabled MOb(s)?

In the past when I did what you described, I also got the expected CANSTMOB register AERR interrupt flag response. When a Tx CAN node gets no acknowledge, the automatic retries just keep sending the same repeated Tx message over and over.

Bosch wrote:
If during start-up only 1 node is online, and if this node transmits some message, it will get no acknowledgment, detect an error and repeat the message. It can become ’error passive’ but not ’bus off’ due to this reason.
This type of Tx error is a special exception to the normal case, which is why CANTEC does not keep incrementing past 128 for only an AERR (at least I remember this is what it did for me). Your 0x86 (134 decimal) count indicates there must have been some other Tx error other than AERR to increment the counter beyond 128 (this other error or errors are probably a one time event). If you look at the Bosch specification there are complicated rules about how much an error counter is incremented for a particular type of error or decremented when an operation completes without any error.

The AERG bit was not set because an AVR CAN Tx may only be initiated through a MOb. So, the CAN hardware always knows any Tx error belongs to the Tx MOb that sent it. Specifically, the CANGIT error bits that are duplicated by the CANSTMOB error bits, are for Rx only. This is why BERR only exists in the CANSTMOB register and there is no duplicate bit for it in CANGIT. When looking at the CAN frame Rx data stream, the Rx error rule is after the 6th bit in the End Of Frame (EOF) field, up until the start of the 4 bit Data Length Code (DLC), any Rx errors are sent to the CANGIT register. By the time the DLC starts, the CAN Rx acceptance filter has either found a MOb match or it has not. If a matching MOb was found, then that MOb CANSTMOB register gets any errors. If a matching MOb was not found, then the CANGIT register continues to get any errors. However, an active Tx only sets its own CANSTMOB error flag bits.

I would expect the AERG bit is set when a CAN node (that is not sending) receives something like an active error flag that destroys the fixed form of an acknowledge field (i.e. forces the otherwise always passive ACK Delimiter bit into a dominant state). I didn't double check the Bosch specification on that, so I could be wrong.

One behavior of CANSTMOB is only a Reset, forced Standby mode, Abort, RXOK or TXOK will disable an active MOb. An active MOb may be written to by the CAN hardware any time it needs to. This means the CANSTMOB read-modify-write software may not always work as expected, especially if your program is slow in responding to a changed CANSTMOB interrupt flag. Since a global interrupt disable has no effect on the CAN hardware writing to an active CANSTMOB register, for normal operation there is no way to prevent the CAN hardware from writing to it. Since the RXOK or TXOK flag mean the MOb is now disabled (not active), these are the only CANSTMOB interrupt flags that cannot unexpectedly change value after they are set. Keep in mind the RXOK or TXOK clears the CANSTMOB interrupt error flag bits. The only thing to watch out for is when you enable the CANGIE register ENERR, is you must complete the CANSTMOB software read-modify-write before the next possible RXOK or TXOK. If you are slower responding to a CANSTMOB error flag bit, the worst possible case is if you read CANSTMOB, then the CAN hardware writes a RXOK or TXOK to CANSTMOB, you modify what you read (which does not include the RXOK or TXOK) and write it back to the CANSTMOB register. This final write will wipe out the RXOK or TXOK flag before you were ever able to notice it had ever been set. Take note this is an exact timed sequence with a very narrow timing margin, so it is only an unlikely possibility. However, if you handle the CANSTMOB error bits quickly enough, this potential problem cannot occur.

The information on when a TXOK or RXOK may occur is in the data sheet section 19.11.1 CAN MOb Status Register – CANSTMOB, under the TXOK and RXOK bit descriptions. You may estimate a minimum response time before another possible RXOK or TXOK after a CANSTMOB error by counting all the CAN field bits until the RXOK or TXOK trigger in the smallest CAN message you use and adding 3 CAN bits from the intermission time. Calculate how many AVR instruction cycles are in a CAN bit time and multiply that by the number of CAN bits required by your minimum response time. Since CAN uses NRZ coding, a simple divide the baud rate into 1 (1/CAN baud) will give you the time each CAN bit takes. Remember to take any CLKPR division (including the CKDIV8 fuse) into account when determining the AVR instruction cycle time.

If ENERR is enabled and you cannot guarantee a response within the minimum response time, then you probably should not enable ENERR. If you do not, then either general polling or checking CANTEC, CANREC or the CANGSTA register ERRP bit from the CANIT interrupt response code may be used as an error level detection alternative.

Of course the CANSTMOB error flag bits may be directly polled anytime. Since polling does not require clearing the CANSTMOB error flag bits (in contrast, the interrupt response code requires clearing these error flag bits in order to return from the interrupt), polling does not need a minimum service time limit. However, the transitory nature of the CANSTMOB interrupt error flag bits makes detecting these with polling unreliable.

2 typos (CANGIE to CANGIT) were corrected

Last Edited: Sun. Oct 7, 2007 - 07:47 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

You got me thinking about this and I suppose you could try this. Enable ENERR and if you ever get a CANSTMOB error interrupt flag set go ahead and do the read-write-modify to clear the CANSTMOB interrupt error flag. After servicing the CANSTMOB interrupt error flag bit(s) then go check the CANEN1 or CANEN2 register MOB enable bit while you are still in the CANIT interrupt response. Reasonably assuming you have not reset the CAN controller, forced a Standby state or applied an Abort, you should always find the MOb is still enabled (the RXOK or TXOK should be the only other way to disable the MOb). If you ever find the MOb is disabled after an error flag read-write-modify, then you could safely assume it was because of a missed/lost RXOK or TXOK (whichever one you were expecting for that MOb) that was overwritten.

This is a special case for using the CANEN1/CANEN2 register behavior to detect a RXOK or TXOK after handling a CANSTMOB error.

It is a little complicated, but this only applies to processing CAN errors which should not be the normal case. It should free you from having any minimum response time limit before a RXOK or TXOK after a CANSTMOB error.

However, if you do not respond to an error generated CANIT interrupt quickly enough, a RXOK or TXOK could come along and wipe out any CANSTMOB interrupt error flag bits. All this means is you could still miss some CANSTMOB errors, so you cannot rely on this alone for all your error detection. This means you still need to check the CANTEC, CANREC or the CANGSTA register ERRP bit to be aware of your error levels.

BTW, if you power up your AVR CAN node with the CAN bus already unplugged, you will probably only get a CANTEC error count up to 0x80 (128 decimal). If you unplug the CAN bus from an active CAN network, you will probably get additional CAN bus errors during the connection loss which could account for getting a CANTEC higher than 0x80.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

drnicolas wrote:
I do a "CANSTMOB=0" which does not change the status of AERR. So I get the interrupt again and again until I replug a CAN-node; then the AERR disappears without any manipulation. How must I do a "read-modify-write"?
Sorry, I just noticed I overlooked this one.

Assuming you set CANPAGE correctly, the "CANSTMOB=0" does work. The problem is after an AERR the failed CAN Tx is automatically retired over and over again. Each failed retry sets the CANCDMOB register again. So, you clear CANSTMOB, then the CAN hardware sets it again and again until the CAN Tx finally gets an acknowledge or until you manually disable the Tx MOb and stop the automatic retries. If you clear CANSTMOB then sample AERR before the CANIT interrupt occurs, CANSTMOB should be zero up until the next interrupt. The CANTEC behavior is it will stop incrementing just for AERR after it gets above 127 decimal (this is the special case I mentioned earlier, otherwise CANTEC would keep going up to 255 decimal for any other repeated Tx error).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you for your very interesting information about the CAN-engine. As I am not native english, it took me some time to understand.

As far as I know now, AERG will be only in question for a RECEIVED message which was not correctly acknowledged - ok.

As I understand you, my AERR-problem can only be solved by coreectly sending messsages or stopping the retry.
My exeperiences with teh current CAN-code tell me, that I get only into problems if other nodes are not reachable (e.g. cable unplugged).
The TEC will count to 0x80, Bus is error passive and AERR will be rised again and again.

In my world, TEC counts up to 0x86, but I didn't detect other errors, and TEC counts no further than 0x86.
Also, my code becomes trapped inside the CAN-interrupt.
This situation is solved by itself after re-plugging a CAN-Node - the message is simply transmitted correctly.

So, if I understand you right, there would be no other way then disabling the MOb from sending if a AERR occurs. The information will then definitely be lost (which would be no real problem in my case).

By the way another CAN-question about the correct settings for BT1-3 registers ...
In another setting working with a MCP2515 CAN-controller, I tried to find the correct bus-parameters for existing CAN-Nodes with a kind of auto-baud procedure by trying all kinds of parameter combination.
Doing this, i found several working combinations - all had the same baudrate, but different settings for all the other parameters.

Is it okay to accept the first working parameter set or can this procedure lead into problems ?

I program like a man:
COPY CON: > firmware.hex

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

drnicolas wrote:
As I understand you, my AERR-problem can only be solved by correctly sending messages or stopping the retry.
Yes.

drnicolas wrote:
Also, my code becomes trapped inside the CAN-interrupt.
This is a serious problem/bug. Your code should never be able to get trapped inside the CAN interrupt. Typically this bug happens when you do not clear all the CANSTMOB interrupt flag bits and all the CANGIT interrupt flags before you exit the CANIT interrupt. Your program must manually clear all of these interrupt flags before returning from the interrupt or you will get trapped inside the interrupt. If an interrupt flag is set and you did not detect it in your program code, maybe this is why your CANTEC is going beyond 0x80?

If you use BXOK with the CANCDMOB register CONMOB1:0 bits set to 11 - enable frame buffer reception, then you have to follow special rules (see the data sheet) in order to be able to clear BXOK. Unless you follow the rules after BXOK is set, you will get stuck inside the interrupt. Do not enable BXOK (CANGIE register ENBX) unless you are going to use frame buffer reception.

drnicolas wrote:
So, if I understand you right, there would be no other way then disabling the MOb from sending if a AERR occurs. The information will then definitely be lost (which would be no real problem in my case).
Yes, but only if the AERR keeps happening over and over. Setup an AERR retry counter and only disable the MOb is you exceed the retry count. If you write your software so that all the MOb setup information is kept until the final RXOK or TXOK, you could setup the same MOb again after the CAN starts working and not loose anything.

Understanding CAN bit timing is covered in these two Bosch CAN documents (specifically in the second document):

http://www.semiconductors.bosch....

http://www.semiconductors.bosch....

Sorry, there is no simple answer to CAN baud.

As explained in the Bosch CAN bit time document chapter 7 Conclusion, you start with the CANBT2 register PRS2:0 Propagation Time Segment. This must be large enough to compensate for the propagation delay caused by the physical length of your CAN bus and the propagation delay of your CAN hardware. Bosch expects you to actually measure this in the manner described. The measured delay is converted into time quanta (rounded up to the nearest TQ value). A small CANBT2 register PRS2:0 Propagation Time Segment will limit the length of your CAN bus and if you make it too small for the actual delay in your CAN bus, you will get CAN bus errors (the CAN message identifier priority arbitration will really have trouble if the propagation delay is to small).

The Bosch document explains the next steps. The setting interact with each other making it hard to understand.

If you look at the ATMEL data sheet Table 19-2 Examples of CAN Baud Rate Settings for Commonly Frequencies, you will see that CANBT2 and CANBT3 are always identical for the same Tbit value. Only the CANBT1 register changes depending on the AVR CLKio frequency.
Tbit = 20 always means CANBT2 = 0x0E and CANBT3 = 0x4B
Tbit = 16 always means CANBT2 = 0x0C and CANBT3 = 0x37
Tbit = 12 always means CANBT2 = 0x08 and CANBT3 = 0x25
Tbit = 8 always means CANBT2 = 0x04 and CANBT3 = 0x13

If we isolate the CANBT2 register PRS2:0 Propagation Time Segment we get (see the data sheet for the reason a 1 was added):
For Tbit = 20 and PRS2:0 = 0x7 + 1 = 8 (Tprs = 8 )
For Tbit = 16 and PRS2:0 = 0x6 + 1 = 7 (Tprs = 7)
For Tbit = 12 and PRS2:0 = 0x4 + 1 = 5 (Tprs = 5)
For Tbit = 8 and PRS2:0 = 0x2 + 1 = 3 (Tprs = 3)

However, each CAN bit time is divided into Tbit sized sections. For example: a Tbit of 8 divides the CAN bit time into 8 parts, while a Tbit of 16 divides the CAN bit time into 16 parts. The PRS2:0 Propagation Time Segment value is in integer units of Tbit. So, this is true:
Tbit = 20 with Tprs = 8, which is 8/20 (.4) of a CAN bit time
Tbit = 16 with Tprs = 7, which is 7/16 (.4375) of a CAN bit time
Tbit = 12 with Tprs = 5, which is 5/12 (.41666) of a CAN bit time
Tbit = 8 with Tprs = 3, which is 3/8 (.375) of a CAN bit time

The ATMEL example baud rates were setup for any fixed baud rate where Tbit = 8 favors a shorter CAN bus (Tprs is only .375 of the CAN bit time) and Tbit = 16 favors a longer CAN bus (Tprs is .4375 of the CAN bit time). This is only because ATMEL setup their examples this way. Now for the hard part, if you just change the PRS2:0 Propagation Time Segment value, then it changes the Tbit value.

Tbit = Tsyns + Tprs + Tphs1 + Tphs2

Tbit must be from 8 to 25.

Tsyns = 1 (this synchronization segment is always 1TQ long)
Tphs1 and Tphs2 are from CANBT3 with a 1 added to each value.

There are also rules about limits for Tphs1 and Tphs2 (see the data sheet). These rules may limit the Tprs value. You cannot set Tprs to any high value if that value is so high that it interferes with the Tphs1 and Tphs2 values (remember they all have to add up to your Tbit value).

Your CAN baud rate will be this (each Tbit is 1 TQ time unit long):
CAN baud = 1 / (TQ * Tbit)

CAN baud cannot be any larger than 1 mbps.

In conclusion, the “best” CANBT1, CANBT2 and CANBT3 settings depends on your desired CAN baud rate, on your AVR clock frequency, on the physical length of your CAN bus and on the propagation delay of your CAN hardware. If any of these change the “best” CANBT1, CANBT2 and CANBT3 settings will also change.

CAN baud is really complicated, which makes autobaud really hard to do. Since all the possible autobaud rates you may use depend on the AVR clock frequency, it is usually better to do autobaud for only a fixed number of baud rates. Then you use an already calculated table of CANBT1, CANBT2 and CANBT3 settings for your autobaud.

drnicolas wrote:
Is it okay to accept the first working parameter set or can this procedure lead into problems?
If the CANBT2 propagation delay is too short it will cause really big problems. Your physical CAN bus will determine exactly what to short really is.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you again.
I have to check whether there is an unsolved error source. This could explain the value of 0x86 for TEC.

As I read your explanation for CAN-baud, i think I will stay at standard-baud-rates or do an try-and-error.
As i understand you, an auto-baud-routine could work trying different parameters until error-free messages are received. Then, if sending with these parameters work it should be o.k.

I program like a man:
COPY CON: > firmware.hex

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

drnicolas wrote:
As i understand you, an auto-baud-routine could work trying different parameters until error-free messages are received. Then, if sending with these parameters work it should be o.k.
Only if the error free receive came from the CAN node farthest away (at the longest distance in the CAN wiring). If you accept an error free reception from a CAN node that is close to you and set your autobaud propagation delay based on that short propagation delay, any CAN bus reception from nodes with a longer propagation delay (farther away) will fail.

You should set your autobaud with a predetermined maximum Tprs (propagation delay) that will handle anything your CAN bus will ever be physically wired for. This may mean you have to specify how long your maximum CAN bus wiring length may be (based on your maximum Tprs), even if your specification is smaller than the normal maximum CAN wiring length. If you set the propagation delay so it is not a variable, then your autobaud will work upon an error free reception.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

I know this is such a loong time from the initial posts, but hopefully somebody from the initial participants are still active and give some advice?

I am having the same problem as what was treated in this topic, basically my CAN driver gets blocked in an endless loop if I disconnect the CAN.

I am using a while loop in which basically CANSTMOB is checked for errors or a correct sending, but when I am sending CANSTMOB is always 0, CANREC! is 128 but CANTEC is 0,

so in the driver this is seen as a message which sould be resend and so it loops forever.

Now the strange thing is that this happens only for big IDs, like 7E0, but if I send with the same driver small IDs like 15, then CANSTMOB will return 1 which means an error and then the while

loop is exited and the rest of the program runs.

The main question is how to handle correctly the case when sending when the CAN is disconnected and what registers to read to see that actually the CAN is disconnected, as obviously CANSTMOB

is behaving very unpredictable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello, I could solve the reset by taking into account some bits from the CANGSTA and CANGIT registers and then release the MOB and exit the while loop that was blocking my uC.

More precisely I am also evaluating the BOFF, BOFFIT and SERG bits to see that the CAN is disconnected.

I don't know if this is the proper way to solve the problem, but it works for me and when I reconnect the communication resumes ok so for me it is ok.

Also I tried the same code on two different boards with different Transceivers and it was no reset, so either the Transceiver that I use is making some strange thing on the RX TX lines

or something else is happenning, but it is definitely related to the transceiver.