AT90CAN128 - CAN problems

Go To Last Post
18 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm having a problem using CAN Bus on AT90CAN128. I have three nodes connected via CAN Bus, a master, a slave and a CAN sniffer. The master sends a RF message, the slave receives the message, and send it to master (and sniffer) using CAN. The message needs three CAN packets (24 bytes) to be complete. The sniffer receives all packets, but the master lost the second one, so the master has an incomplete message (he receives first and last packets). HW doesn't detect bit, ack, frame, etc. errors; CANREC and CANTEC are zero; packets ids are fine. The speed is 500kbps, the oscillator is 8mhz. Any ideas about what happens?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just to be sure that I understand, the "master" sends out a message to the "slave". Then the slave transmitts a message to both the "master" and the "sniffer", but the master looses one packet ?

Hmm.. sounds strange... never tried that controller, just used the CAN controller mcp2515 from microchip and never had that type of problem. are you sure that the master's receive buffers arn't full ? so it ignores the second byte due to processing and by a change it is finished when the third one arrives ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not sure that the buffer isn't full. However, since the master and the sniffer uses similar software (the only difference is the rf communication), and the CAN Bus receive routines are short, there is a low probability that the buffer is full when a packet comes.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm confused by your master/slave terminology and how you apply these to individual CAN nodes. In a fully functional active CAN bus system, every CAN node may send (Tx). In fact every CAN node may send at the same time and the priority arbitration will determine which single CAN node completes sending while all the loosing CAN nodes stop sending and become receivers. Except for TTCAN, the loosing CAN nodes automatically keep trying again until they win priority arbitration and send their complete CAN frame. The closest fit to the master description is when any CAN node is sending. CAN is called multi-master because any CAN node may send, however no particular CAN node is ever in total control like a traditional master node. When you call a particular CAN node the “master” node and another the “slave” node it get confusing. Your master node will receive messages just like a slave when it has nothing to send, and your slave node will send messages just like a master when it has something to send. It would make more sense to simply call them node A and node B or anything that is not confusing.

Your sniffer is probably a special case of a receive only CAN node. It may not transmit anything which means it does not participate in reporting errors to the CAN bus. This means the receive only sniffer must accept any errors in the CAN frame because it has no way to signal the CAN bus sending node to try again.

giusepe wrote:
The master sends a RF message...
What does RF mean? Are you saying you have a Radio Frequency wireless CAN bus?

giusepe wrote:
The master sends a RF message, the slave receives the message, and send it to master (and sniffer) using CAN.
If the so called master sends a CAN frame to the so called slave, why does the slave send it back to the master? The master already knows what it sent. The active CAN error handling means the slave will automatically tell the master if there was any error in the CAN frame. There is no need to send the exact same thing back to a CAN node that sent it for error checking, because error checking/handshaking is built into each CAN frame and its acknowledge cycle.

Are you saying you have a master RF node (that is not CAN) attached to (or part of) what you called your slave CAN node? In this case the slave CAN node would be a RF-to-CAN gateway sending the separate RF data to the CAN bus nodes.

If you are using the CAN bus correctly, there are methods to ensure 100 % of all CAN frames will be received by a CAN node, but there are timing and MOb setup/use issues involved. However, right now I'm not sure if this is your problem or not.

Please clarify your description of what is going on. Until I understand what you are trying to do, its not possible for me to tell if you are even using CAN correctly or if you are expecting CAN to do something weird.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

agreed, was kinda wierd to me to :) Just had exam about CAN buss protocol, so just fire away how you use the protocol :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I write a bad English, and speak worst, OK?

Mike B wrote:
I'm confused by your master/slave terminology and how you apply these to individual CAN nodes. In a fully functional active CAN bus system, every CAN node may send (Tx)... It would make more sense to simply call them node A and node B or anything that is not confusing.

I say master, slave and sniffer to distinguish three different sofwares. You also can say A, B and C.

Node A, node B and node C uses AT90CAN128 mcu, a 8Mhz oscillator and they are connected to the same CAN Bus at 500kbps. The three form a CAN Bus "network". Also, all the nodes, has a CC1000 to send and receive radio frequency data.
The node C has four mobs ready to receive four different CAN packets ID, and another mob used to send. The data received in those MObs are sent to a normal desktop computer, with the UART (serial port) at 57600bps.
The nodes A and B, has three mobs ready to receive three different CAN packets ID, and uses another one to send. The three nodes uses mobs from 0 to 5.
Node A sends using CC1000, a radio frequency bits string known as "message", that is received by node B through radio frequency. This message is NOT send through CAN Bus by node A. Node C doesn't use its radio frequency capabilities.
When node B receives a radio frequency message, it sends the message via CAN bus. The message is 24 bytes long, so it needs three 8-bytes CAN bus packets to be complete. B sends the packets one at time. He sends the first one, when finish the first one sends the second one, when finish the second one sends the last one. All the packets carry the same CAN bus ID. Nodes A and C has one mob ready to receive packets with the ID that the node B send them. Reception and transmission are interrupt driven.
When B sends to CAN bus the recently received radio frequency message, the first 8 bytes CAN packet is received via CAN by A and C. The second 8 bytes CAN packet is received via CAN only by C (and not by A). The last 8 bytes CAN packet is received via CAN Bus by A and C. If the node C is off, the second packet is not received by A anyway.

How do I know that A loses the second packet?
When A receives a complete message from radio frequency or CAN bus, it sends the message via CAN, to an ID that is accepted by C. After some time (20 miliseconds) that the message is not complete, the node A sends what he has as an incomplete message to C (using CAN bus again), so I can see what was the failure. The failure was the missing of second packet, all the 8 bytes. However, C receives that packet, so the packet exists.
All the CAN Bus error flags in nodes A, B and C are clear during each packet reception, and CANREC and CANTEC has values equal to zero.

Why is relevant that the radio frequency message is sent by A?
If the same radio frequency message received by B comes from another node, different from A (a node not connected to same CAN Bus that B), the CAN bus transmission is normal and A and C receives all the packets sent by B.

I know all about CAN protocol, and using CAN bus in AT90CAN128, don't explain me that. My suspicion is that the second receive interrupt isn't not attend before the last packet comes, and this one overwrites the second one. How can I check that?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you for helping me understand the problem.

First, lets focus on the topic of a possible lost CAN Rx interrupt. In CAN specifically on the AT90CAN series chips, there are 13 interrupt sources associated with any individual CAN frame. The CANGIT register has 6 of these interrupt sources. These CANGIT interrupts are always used after the 6th bit of the End Of Frame (EOF) field at the end of a CAN frame up until the first bit of the Data Length Code (DLC) field of the next CAN frame. If no matching Rx MOb is found, the CANGIT register continues to be the only active interrupt source for the remainder of the CAN frame (beyond the DLC field). None of these CANGIT interrupts have anything to do with “enable reception” Rx CAN frame completion and the CANGIT register interrupt flag bits are cleared the traditional way by writing a one to the flag or flags you want to clear. If a matching enabled Rx MOb is found then the MOb CANSTMOB register has 7 interrupt sources (with each individual MOb selected by CANPAGE). If a Rx MOb match is found, the CANSTMOB register interrupt error flags are used instead of the CANGIT register interrupt error flags. In the case of an “enable reception” Rx CAN frame, the RXOK is the critical interrupt that signals an error free CAN frame has been received. The CANSTMOB register interrupt flag bits are only cleared by you with a read-modify-write operation to write a zero to the interrupt flag or flags you want to clear.

BTW, if you are using “enable frame buffer reception” Rx MObs, then the behavior is more complex and the CANGIT register is involved in CAN frame reception.

The basic “enable reception” Rx MOb behavior is as follows:

1) The CANEN1 / CANEN2 register bits show a disabled MOb.
2) The Rx MOb is completely setup except for the CANCDMOB register configuration bits.
3) An “enable reception” is set in the CANCDMOB configuration bits (for a Rx MOb).
4) The CANEN1 / CANEN2 register bits now show an enabled MOb.

At this time any number of CANSTMOB register error interrupt flag bits may be set if any errors are detected. However, the MOb is not disabled (the CANEN1 / CANEN2 register bits still show an enabled MOb) by any interrupt error flags. This means the AT90CAN hardware may still write to CANSTMOB if it detects another error or if RXOK is set.

Notice that any CANSTMOB register error interrupt flag bits are cleared by the final RXOK. So, the life span of any CANSTMOB error interrupt flag bit may be very short and may be lost if not responded to very quickly.

5) When CANSTMOB RXOK is set, the CANEN1 / CANEN2 register bits now show a disabled MOb.
6) The CANCDMOB register Configuration bits are not changed.
7) The Rx MOb information is in the MOb waiting to be read.

Because the Rx MOb is now disabled, it will not receive any more CAN frames. If this is the only CAN node that is listening for a particular message identifier, then that CAN node will not receive any more CAN frames with that particular message identifier. You must either enable a different MOb for that particular message identifier or re-enable the original MOb for that particular message identifier. To have 100 % CAN node Rx coverage for a particular message identifier after the RXOK, you have from the time the RXOK was raised (after the 6th bit of the EOF field) until the first bit of the DLC field to have an enabled MOb initialized for that particular message identifier. If there is no enabled MOb for a particular message identifier, the CAN frame on the CAN bus with that particular message identifier will be ignored by any CAN mode without an enabled matching Rx MOb. Its the job of the acceptance filter that finds the matches for the MObs to ignore any unwanted/unmatched message identifiers. This reduces the burden on the CAN node microprocessor.

This is not a lost interrupt, in fact its an interrupt that never happened because at the critical moment there was no enabled matching MOb in the CAN node. After the first RXOK, the MOb will not receive any more CAN frames until it is enabled again by writing to the CANCDMOB register configuration bits.

The timing to provide an enabled or a re-enable a Rx MOb after the 6th bit where RXOK is raised, is covered in the data sheet errata “CAN transmission after 3-bit intermission”. For part A 11 bit identifiers you have 19 CAN bit times and for part B 29 bit identifiers you have 39 CAN bit times. Since you have an 8 MHz clock with a 500 kbps CAN bit time, this is 16 AVR instruction cycles per CAN bit time. For part A you have 16 * 19 = 304 AVR instruction cycles to have an enabled Rx MOb after the RXOK. For part B you have 16 * 39 = 624 AVR instruction cycles. Keep in mind if you have lots of other interrupts loading your system or inefficient interrupt code that takes way too long to execute, you may not have many usable AVR instruction cycles left or maybe not enough of them. If you are short on AVR instruction cycles or are just not sure, you may enable a second Rx MOb for a particular message identifier. The AT90CAN hardware acceptance filter will only find the first matching enabled Rx MOb (starting with the lowest numbered MObs) and leave any other enabled RX MObs alone (even if the higher numbered MOb has the exact same message identifier meaning they also match). When the RXOK happens in the first Rx MOb, the second still enabled Rx MOb will keep receiving that particular message identifier for the next CAN frame. This gives you allot more time to re-enable the first MOb (probably after you copy the important information after the RXOK).

Since you are using the same message identifier to send each 8 byte CAN data frame, you will need to make sure you always have an enabled Rx MOb ready. You could just enable 3 separate Rx MObs with the same message identifier, but then getting them reassembled in the same sequence may not always work (other higher priority CAN bus frames, error responses, re-enable MOb times, etc. could cause problems). If you sent the first 8 byte, second 8 byte and third 8 byte packets with three different CAN message identifiers, the sequence you send them in would not matter and they could always be reassembled in the correct order.

Keep in mind that only one single Tx CAN node is ever allowed to send the same message identifier and RTR bit value. If two or more CAN nodes Tx the exact same message identifier and RTR bit value together, the CAN frame arbitration field will fail and allow two CAN nodes to keep sending beyond the arbitration field. This will probably cause a BERR in the Tx MOb CANSTMOB register, cause bus wide error flag and increment the CAN error counter Only one single CAN node should ever be allowed to still be sending after the CAN frame arbitration field. BTW, any CAN node may Rx any message identifier (there is no restriction on Rx CAN nodes).

Because you said you are checking the CANSTMOB register error interrupt flag bits, be aware when the CANGIE register ENERR bit is set and you answer an error interrupt, that the AT90CAN hardware may still write to the CANSTMOB register even while you are also doing a CANSTMOB register read-modify-write. If you are slow in answering the error interrupt flag event, it is remotely possible that your software read-modify-write could wipe out the RXOK bit (or a TXOK bit as well). This would be a case of a lost interrupt. The best solution I have come up with is only when ENERR is enabled as an interrupt source, after doing the CANSTMOB software read-modify-write, go check the CANEN1 / CANEN2 register bits for that MOb. If the CANEN1 / CANEN2 register bits unexpectedly show the MOb as disabled after processing the error then you missed a RXOK or TXOK interrupt. Of course if you apply CANGCON register standby mode or abort request, then the CANEN1 / CANEN2 register bits are expected to show all the MObs as disabled.

To summarize:

1) Node A could be missing the entire 8 byte second packet because no Rx MOb is enabled before that CAN frame DLC field starts. Maybe something like an infinite interrupt reentry loop in node A is slowing down the MOb setup so it only catches the first packet and last packet???? Maybe its a more conventional case of just not getting a MOb re-enabled in time after the first packet RXOK?

2) Since you said you have no CANSTMOB errors, it is not a lost RXOK interrupt caused by a software read-modify-write conflict with the enabled MOb CAN hardware write. However, if you enable ENERR for interrupts, you want your code to detect this conflict.

3) Only because I can't see exactly what is happening, it may be possible you are creating a CAN frame conflict by having two or more CAN nodes Tx the exact same identifier + RTR value. If the Tx CAN data frames are completely identical you will not see any errors (in this case there will not be any BERR condition). It does not sound like this is a problem, I'm only mentioning it just in case.

4) An even less likely failure scenario is you might be failing to clear all the CANGIT and CANSTMOB interrupt flag bits. Because the interrupt RETI instruction always executes the next AVR instruction before re-enabling global interrupts, the main AVR program will still run (very slowly) even if you are locked in a constant unending CAN interrupt reentry loop. An easy check for this is using software to blink a LED (initially set the software delay loop blink timing value with global interrupts disabled) with the AT90CAN. If it suddenly stops blinking or becomes very slow, you probably have found an interrupt reentry loop. Unlike lots of other AVR interrupts, the CAN hardware does not automatically clear the interrupt flags when it executes the interrupt response. This is because there is more than one single interrupt flag for the single CANIT interrupt vector. If the hardware automatically cleared the interrupt flags for CANIT, the interrupt response program code would not know what to do without the interrupt flag values.

5) As an improvement I would suggest using different CAN message identifiers for each of the three 8 byte packets. If your RF module receives the complete 24 bytes before transferring the data to the CAN bus, the order you initialize and send the CAN node Tx MObs would not matter. Actually, if more than one Tx MOb is setup when the AT90CAN hardware is ready to send a CAN frame, it scans all the Tx MObs to find the one with the lowest value message identifier and sends it first. This would simplify the handling of the RF 24 bytes by not having to wait for each 8 byte packet to be sent on the CAN bus. It would also simplify detecting which 8 byte packet failed. However, if your CAN nodes ever go into error passive mode (a value of 128 or larger in CANTEC or CANREC), the potential 100 % loading of the CAN bus Tx with sequential CAN frames could possibly block detection of a CAN bus error passive flag. In this case your software might want to treat error passive as a failure state (you should already be looking for CANTEC/CANREC values of 96 or higher as an indication of a heavily disturbed CAN bus).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

¡Thanks! I'll use all these knowledge to diagnose the problem and find a solution; and then I'll put the info here. Thanks again.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You were right Mike B. The MOb wasn't ready to receive the second packet. I solve the problem using two MObs to receive an specific ID, and is working just fine.
¡Thank you!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello, I am posting here but if you think this is not ok, then this could be moved to a new topic. (AT90CAN128 resets when sending with the connector disconnected)

 

I am sending 3 messages at 500ms and also blinking a led at 1000ms.

 

I am having a very strange problem related to CAN, basically when the CAN is connected then everything works ok, but if I disconnect the CAN connector, then it seems the AT90CAN128 resets after trying to send some CAN IDs.

 

The function for sending a message is part of a library.

 

If I send low IDs, for example hex 15,16,17, then the microcontroller works even when I disconnect the CAN (led keeps on blinking)

I I send high IDs, for example hex 7E0,7E1,7E2 it seems that the microcontroller resets after calling the send function at 500ms, and then the led never blinks.

 

I reached this conclusion by debugging the sw counter for ms and setting a breakpoint after the send function is called:

-with the CAN connected the counter has values multiple of 500: 500, 1000, 150 etc

-with the CAN disconnected the counter has always the value 500, and as this counter is initialized only when the uC starts I reached the conclusion that the uC resets.

 

My question:

1.Is there any mention in that the uC can reset in case the CAN lines are open? Has anybody had this problem before? What could cause it?

2.What could case this reset for certain IDs and not for other IDs? At a first view it would look like the longer IDs would cause some overflow in the CAN registers, or just sending a bigger identifier would crash something

in the CAN HW compared to the lower identifier.

3.How can I detect further what the problem is? Unfortunately I can't do any source code debug for now, only assembler debug.

4.Is there any way to check what kind of error is happening in the CAN module, I can see with the Debugger all the CAN registers, theoretically I could set a breakpoint before the function call and then check instruction by instruction the values

of all the CAN registers and see what is wrong.

Last Edited: Wed. Jul 17, 2019 - 12:28 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bbogdanmircea wrote:
but if I disconnect the CAN connector, then it seems the AT90CAN128 resets after trying
Sounds a lot like a watchdog reset (CAN waiting too long for a non-existent thing to happen). Do you have WDTON or do you enable the WDT in code?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I enable it in the code, before the main 1ms loop, the it is reseted each 1ms.

 

So the code is as follows.

TImer0 set for 1ms overflow, I have another counter then based on this 1ms that goes to a high value and then back to 0.

I have a loop at 1ms, then inside based on the big 1ms counter I send the messages at various rates.

 

As my big counter reaches the value 500 until the function is called and then goes back to 0, doesn't this mean that the uC resets?

I am debugging now the CAN registers in ATMEL Studio, and after the call almost all the registers are 0.

I will also check them with the small IDs, but as the uC keeps on working with the small IDs I guess they would be ok?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The point is this that if you start the WDT and then the code goes into some while(status_not_achieved) loop that, when CAN is connected and active usually completes in 3ms (say) but when CAN is not there it waits 150ms (or perhaps indefinitely) them if the loop is not performing regular WDRs and say the WDT limit is set at 64ms then when it's been "stuck" for 64ms the WDT will then reset the chip.

 

I guess one check of all this is .. ."do the resets you are observing go away if WDT is not enabled?"

 

(of course without the WDT to reset things then maybe the indefinite loop will wait indefinitely in which case the code will appear to "freeze"/"lock up" anyway).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

From what I see, the watchdog is enable with the maximum (7) prescaler setting, so it should reset each 2.1s which is a long time, but of course I can disable it and give a try no problem.

I don't know if it is clear enough, but if I send small IDs with an unconnected CAN, then the uC keeps on going, so this is also depending on the IDs. (or is this just a coincidence???)

I noticed then after the breakpoint for the send function, in case of a disconnected CAN, the PC jumps at a low address, more precisely 006E, while when the CAN is connected the address is much higher and probably corresponding to the instruction after the function call.

Would this point to a watchdog reset? Or could this be some interrupt generated by CAN that has no handling routine maybe? How can I find what resides at this address?

 

Of more info, I modified the IDs to smaller values like 15,17,18 and of course it works with the CAN disconnected.

The CAN registers look all ok, although I would have expected to see some BOFF error with the CAN disconnected.

Could this depend just on the value of the IDs and how would this even work?

What is the normal behaviour when sending with  a disconnected CAN?

Last Edited: Wed. Jul 17, 2019 - 03:47 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson I have to give it to you, indeed the problem is a while loop and then the wdt resets the thing and so on.

Unfortunately I don't know how to solve the problem, as I am not totally sure about how the CAN works, but from what I see this is what happens:

There is a while loop which evaluates CANSTMOB to see if the transmission what completed successfully.

If I send small ids, then CANSTMOB has a value of 24 which means there were errors and TX was not ok, so this exits the while loop.

But in case of big idsm then CANSTMOB has the value 0, which basically means that the transmission is not completed, so it waits and waits, but of course this will never be ok as the CAN is disconnected.

Now why this behaves differently for small ids and for bigger ids is a mistery to me, but is seems that for some reason for bigger ids it doesn't detect correctly that a problem is happening.

If you have any advice, I will continue debugging and trying to understand how this CAN is supposed to be working.

 

3 ideas that came in my mind related to debugging:

1.small IDs have higher priority, could this be one of the causes???

2.longer ids take longer to transmit, so maybe the while loop checks STMOB too fast, as it doesn't have time to set properly

3.this is some strange error given by the combination of the at90can128 and the tranceiver that I use, as from what I understood with the same library but other tranceiver this problem doesn't appear

Last Edited: Thu. Jul 18, 2019 - 04:46 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Some other info that I gathered during debugging:

I am sending all my messages with MOB4, I have 4 messages, 1st message ID is 15 the rest are 7E0, 7E1, 7E2

During sending ID 15 with disconnected CAN:

CANREC = 0, CANTEC = 0, CANGSTA = 20

 

During sending ID 7E0 with disconnected CAN:

CANREC = 135, CANTEC = 0, CANGSTA = 13

 

Why is the controller going into ERRP for RX when I am trying to send?

Is this some strange behaviour because of the disconnected CAN line and why is this happening just for longer IDs?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

We don't get that many CAN users here - I'm afraid you now need to hope that someone who understands the Atmel CAN peripheral reads this.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh no, this thing is a total nightmare ... it behaves totally randomly for different IDs, well hopefully somebody leans in and provides support, probably I will delay solving this issue as it already took me a week, at least

I learned how to debug in the process.