Atmega 1284P, unknown behaviour

Go To Last Post
37 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello, I will try to explain this as good as I can. I am developing a project consisting a few sensors and also a HM-10 copy. After the code started growing, I encounteried a problem. The code works fine when powering the board, after about 4 minutes , the behaviour changes. I will try to explain what I observed:

 

I start sending data from the sensor continously. The program works without any problem for 3,4 minutes, after this time the MCU does not respond to any command sent through bluetooth and it stops sending data. The MCU does not restart only hangs-up.

 

What is interesting is that no interrupt works anymore  ( ex: an interrupt is generated when the phone is connected to the bluetooth module). The program is made as a state machine, the MCU is able to switch to the other state. For example I have 2 states ACTIVE and SLEEP, if no BLE connection is up for 1 minute it enters in sleep mode. After hanging up the MCU enters sleep mode but no interrupt can wake him up anymore. Usually when it is working as soon as I connect the phone to the module, it exits sleep mode.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

We're going to need to see code, schematics and preferably the smallest cut down code that still exhibits the same fault. In preparing the latter you will likely find the issue anyway!

 

If the code is modular and each module has unit tests then the points of failure should already be identified anyway.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nikel1992 wrote:
after this time the MCU does not respond to any command sent through bluetooth

Are you certain that the BT is still working?

 

WHat if you take BT out, and just use a wired connection?

 ex: an interrupt is generated when the phone is connected to the bluetooth module

 

Is it?

 

Again, how have you verified that it's not the BT module that's stopped generating interrupts?

 

The program is made as a state machine

So instrument the state machine to allow you to see what state it's in.

 

Usually when it is working as soon as I connect the phone to the module, it exits sleep mode.

But sometimes it doesn't?

 

You need to take a methodical approach to find where the problem occurs - this is the essnce of debugging.

 

http://www.avrfreaks.net/comment...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:
ake a methodical approach to find where the problem

 

I am reading the incoming signal from the bluetooth with a serial converter. The bluetooth is connected through UART to the MCU. On the Rx pin of the MCU I have also the RX of the serial converter and I receive the text sent from the phone.

ex: an interrupt is generated when the phone is connected to the bluetooth module

Yes it is , before the MCU hangs up I can check if the interrupt is generated. Also the bluetooth has a led for connection.

 

So instrument the state machine to allow you to see what state it's in. 

I am using a led for this issue and I see when the state changes, and it is able to do this .

Usually when it is working as soon as I connect the phone to the module, it exits sleep mode.

I said usually by mistake, every time I connect the phone , before this error occurs , it works.

 

We're going to need to see code, schematics and preferably the smallest cut down code that still exhibits the same fault. In preparing the latter you will likely find the issue anyway!

 

 

I was thinking that this is the next step, but I am not able to post any code or any schematic here. I am trying to find some clues where to find. I am also thinking at some memory corruption. Is it anyway to check the memory state, or the memory usage time to time. I am thinking at some pointers badly mannered, or some stack overflows, but I am not sure. I lost a lot of time trying to identify the problem. Also if anyone want to see the code, we can use some remote control programs.  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

two things come to mind, for projects that work for awhile then suddenly stop working.

Stack overflow, corrupting memory,  can you place "guard bytes" below the stack to track how deep it goes before the problem happens?

or improper sleep mode setup, i.e. no wake up source setup before going to sleep mode.

 

For the first one, can you substitute a larger family member part, i.e. tiny85 instead of tiny25, mega 328 instead of mega 48, etc.... so you have more head room to work with? 

..... ok, scratch that, I see your using a large member part already!

Jim

 

 

edit: family part correction...

Last Edited: Wed. Sep 27, 2017 - 03:20 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It may not be your problem but the 1284P is known for having problems when using an external crystal and running the oscillator not in full swing mode. Especially when using the USART.

'This forum helps those who help themselves.'

 

pragmatic  adjective dealing with things sensibly and realistically in a way that is based on practical rather than theoretical consideration.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ki0bk wrote:

two things come to mind, for projects that work for awhile then suddenly stop working.

Stack overflow, corrupting memory,  can you place "guard bytes" below the stack to track how deep it goes before the problem happens?

or improper sleep mode setup, i.e. no wake up source setup before going to sleep mode.

 

For the first one, can you substitute a larger family member part, i.e. tiny85 instead of tiny25, mega 328 instead of mega 48, etc.... so you have more head room to work with? 

..... ok, scratch that, I see your using a large member part already!

Jim

 

 

edit: family part correction...

I don`t know how to put such guard bytes, if you can tell me, it would be perfect.

It may not be your problem but the 1284P is known for having problems when using an external crystal and running the oscillator not in full swing mode. Especially when using the USART.

I am using the oscillator in full swing mode . I have a crystal attached to xtal1 and xtal 2.  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For Stack checking, One of the GCC freaks will need to guide you on how to do it with that toolchain. 

But for an overview see, https://en.wikipedia.org/wiki/Bu...

 

Jim

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ki0bk wrote:
Stack overflow, corrupting memory,  can you place "guard bytes" below the stack to track how deep it goes before the problem happens?

Atmel Studio

Stack Overflow Detection Using Data Breakpoint

http://www.atmel.com/webdoc/GUID-ECD8A826-B1DA-44FC-BE0B-5A53418A47BD/index.html?GUID-A4FC8DB5-6B28-4893-93BA-7A4406698E5D

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gchapman wrote:

ki0bk wrote:
Stack overflow, corrupting memory,  can you place "guard bytes" below the stack to track how deep it goes before the problem happens?

Atmel Studio

Stack Overflow Detection Using Data Breakpoint

http://www.atmel.com/webdoc/GUID-ECD8A826-B1DA-44FC-BE0B-5A53418A47BD/index.html?GUID-A4FC8DB5-6B28-4893-93BA-7A4406698E5D

 

 

I need to mention that I do not have access to debug. I am programming through ISP.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have used a modification of the technique described here when developing code:

http://www.avrfreaks.net/forum/soft-c-avrgcc-monitoring-stack-usage

David (aka frog_jr)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nikel1992 wrote:
I need to mention that I do not have access to debug. I am programming through ISP.

 

Then test for a change in your main() loop and light an LED (or provide some indication) when detected.

 

Jim

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am doing the led thing already , but as I said, I cannot obtain some specific clues.

 

frog_jr wrote:

I have used a modification of the technique described here when developing code:

http://www.avrfreaks.net/forum/soft-c-avrgcc-monitoring-stack-usage

 

I will try this right now.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is the error mode reproducible?

 

i.e. Will it always crash after 3 - 4 minutes?

 

Will it crash if it just sits still for 4 minutes without the BT connection going active?

 

Does increasing the number of BT connection / disconnection episodes shorten the time to crash?

 

How do you in your programming language set the stack?

(i.e. in Basic I set three stack sizes in the program's header)

 

If the crash is reproducible, can you simply significantly increase your stack sizes and see if that changes the crash behavior?

 

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

DocJC wrote:

Is the error mode reproducible?

 

i.e. Will it always crash after 3 - 4 minutes?

 

Will it crash if it just sits still for 4 minutes without the BT connection going active?

 

Does increasing the number of BT connection / disconnection episodes shorten the time to crash?

 

How do you in your programming language set the stack?

(i.e. in Basic I set three stack sizes in the program's header)

 

If the crash is reproducible, can you simply significantly increase your stack sizes and see if that changes the crash behavior?

 

JC

 

Now while I was testing I see that the device does not respond anymore. The state hangs in sending data state and also does not respond to any command. What I tested today was if the amount of data corrupts the MCU. I tried to send a bigger quantity of data over serial, by decreasing the delays used. My states had delays to decrease the speed of data transmission. I removed all the delays and the MCU hangs at the same amount of time it hanged before.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

frog_jr wrote:

I have used a modification of the technique described here when developing code:

http://www.avrfreaks.net/forum/soft-c-avrgcc-monitoring-stack-usage


 

I downloaded the files included in my project and called only the function StackCount. I receive value 0 no matter what I do , even if I call the function right after port initialization, without calling any function. What am I doing wrong?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Did you add the StackPaint to your program (in the .init1 section)?

And if using gcc, you use the following (depending on what you want to measure):

extern uint8_t _etext;		// End of program (actually points to next not used location)
extern uint8_t __data_start;	// Start of initialized variables (bottom of SRAM)
extern uint8_t __data_end;	// End of initialized variables (== __bss_start)
extern uint8_t __bss_start;	// Start of uninitialized variables
extern uint8_t __bss_end;	// End of uninitialized variables (== __heap_start,  == _end)
extern uint8_t _end;			
extern uint8_t __heap_start;	// Start of dynamically allocated memory (malloc)
extern uint8_t __stack;		// Top of Stack == Top of SRAM

 

 

David (aka frog_jr)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok. I added the folders to my project, I included them and I did the next things:

 

#include "string.h"
#include <avr/wdt.h>
#include "stackmon.h"


void StackPaint(void) __attribute__ ((naked)) __attribute__ ((section (".init3")));

first part .

 

memorySpace=StackCount();
	printf("Valoare %d\r\n",memorySpace);

second one

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Now , the MCU is restarting continuously, after the same amount of time. I think I could time the restarts perioad :))). It is like a counter did this stupid thing. I really don`t know how to solve this.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sounds like the Watchdog ...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am checking the MCUSR register. No watchdog reset is there.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am trying to implement the stackCount , but I had no success so far. I don`t know how to do it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The Mcu restarts around 7 minutes. It happened to restart exactly at 7 minutes.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What does mcusr say >>is<< the restart cause? It may be time to go get yourself a debugger. In lieu if that, you could insert some code into .init0 which could test mcusr and send captured state to the usart in the event it traps a restart with no cause.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"Read a lot.  Write a lot."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
time to go get yourself a debugger

+999999999999999999999999999999999999999999999999

 

Time for that quote again:

 

js wrote:
[not having a debugger is] like a mechanic not having any spanners.

 

See: http://www.avrfreaks.net/comment...

 

And: http://www.avrfreaks.net/comment...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
d state to the usart in the event it traps a restart
  It does not show any restart source. The problem is that i designed the whole program for not having debugger. On the pins that a debugger is connected i have other things.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The problem is that i designed the whole program for not having debugger. On the pins that a debugger is connected i have other things.

Seems like you've painted yourself into a corner.

 

Looks like you'll have to:

insert some code into .init0 which could test mcusr and send captured state to the usart in the event it traps a restart with no cause.

... or some other interface in lieu of the usart.

 

Ask if you need help with that.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"Read a lot.  Write a lot."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nikel1992 wrote:
On the pins that a debugger is connected i have other things.

So not only do you have no spanners yourself, but you've also ensured that spanners cannot be used at all!

 

surprise

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:
nners yourself, but you've also ensured that spa

 

Kinda! I made some changes today,( more exactly the entire day and I connected the JTAG ( Dragon). Now the problem is that I don`t know how to look for the problem. What memory to check for. On the memory tab on debugger I have :

 

progFlash

prog BOOT_SECTION_1

prog BOOT_SECTION_2

prog BOOT_SECTION_3

prog BOOT_SECTION_4

data registers

data MAPPED_IO

data EEPROM

data IRAM

osccal osccal

 

What should I do to track the problem. Any chance that any of you connect with me on skype to help me a little bit more. I want to find this error quickly :(

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nikel1992 wrote:
What should I do to track the problem.
If you're fortunate a lint would identify the actualized defect.

Otherwise, sprinkle the source code with asserts.

If you have data or information or intuition that there's a probable/likely stack overflow then the assert will test for SP in the stack guard region.

Other asserts can examine for buffer overflows, divide by zero, range constraints, etc (overflow results in an infinite loop that's terminated by the watchdog)

The problem with AVR libc's assert macro is it calls abort(); when SP is out of bounds then abort is inconsistent so exit() may not be invoked (breakpoint in exit(), operate debugger to read data)

Could create an application-specific assert macro to do what you want like write a byte in .noinit that's specific to that instance of assert or write that byte to a spare port to be captured by a logic analyzer.

 

An alternative is a circular buffer in .noinit that contains "breadcrumbs"; this is akin to the in-RAM instruction trace buffer of AVR32 UC3 and ARM Cortex-M.

Another is, if there's a spare SPI or a spare port, to write a unique byte for a source code line; this is the instrumented trace of MPLAB X with REAL ICE's logic analyzer.

 


The Ganssle Group logo

The Ganssle Group

Automatically Debugging Firmware

By Jack Ganssle

Major rewrite: May, 2014

http://www.ganssle.com/dbc.htm

http://www.nongnu.org/avr-libc/user-manual/group__avr__assert.html

http://www.nongnu.org/avr-libc/user-manual/mem_sections.html

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

While I was trying to make some debug I discovered something. I was in debug mode, on pause. Even If I was pausing at a instruction the MCU restarted itself and the code was reexecuted.

 

 

Last Edited: Sun. Oct 1, 2017 - 03:41 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nikel1992 wrote:
Even If I was pausing at a instruction the MCU restarted itself and the code was reexecuted.
IIRC, there's a debugger configuration during break to disable ISRs and/or timers.

Otherwise, an oscillating Vdd could trigger BOD or the reset signal is inadvertently active.

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is another photo of the stack pointer and it seems to keep its value. How can I see who is the entity which writes that portion.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For me it does not look so good, I think i know the function that writes that part of memory.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've not read the whole thread but if you have rogue writes why can't you use a data breakpoint to catch the culprit?