Software freeze on a ATMega 328P

Go To Last Post
23 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi!

 

So this issue is a bit complicated (at least for me) and I've been trying to find the root of the issue for months now, but I'm unsuccessful.

 

I have a project (called a "clicker") that consists of a board, a 16x2 LCD-screen, five push-buttons and a CC41 bluetooth module (the HM-10 copy). It's powered by a 3.3V 2500mAh battery.

 

The code is written by me and is far from perfect, and I suspect that the issue is somewhere in there. Also, as general information, I work as a nightclub bouncer in Sweden and therefore the unit is subjected to temperatures of -15C (that's around 5F) for up to 5 hours when we're working the door. I mention this as I guess that this could have an affect on the run-time stability of the code. I'll upload the code, but first, the issue.

 

I've been using this "clicker" at work for almost a year now and most of the time it works without issue. However, I've had three major bugs that have shown up more than once, these are:

 

1. I can't see the bluetooth module with my BLE connector app on my iPhone. It just stops showing up. This can happen hours after I started using the "clicker" up to which it has been working. If I remove the back cover on the unit then the BLE module is still blinking, so charge *is* reaching it.

 

2. The unit freezes. The screen still shows the same information but the buttons do nothing. If I take off the back cover and touch the pins of the 328 with a screwdriver, it jumps to work again. I'm guessing that it's waiting for some specific input and therefore locks all other communication. But I can't for the life of me find what it is.

 

3. The unit reboots by itself. I'm guessing that this is a power fluctuation for some reason or other and that this clears the runtime memory of the 328. I'm going to put a capacitor between the DCC and GND to see if this alleviates the issue. But maybe you guys have some other idea?

 

I know that this is a big request for help and I don't really expect anything. Any help I get will be taken most gratefully.

 

All the best, Christian

 


 

#define F_CPU 1000000
#include <avr/io.h>
#include <util/delay.h>
#include <avr/interrupt.h>
#include "lcd.h"
#include "uart.h"

void timer1_init()
{
    TCCR1B |= (1 << WGM12)|(1 << CS11)|(1 << CS10);    		// set up timer with prescaler = 64 and CTC mode
    TCNT1 = 0;    											// initialize counter
	OCR1A = 15624;       									// initialize compare value
    TIMSK1 |= (1 << OCIE1A);   	 							// enable compare interrupt
}

/*-----VARIABLES FOR INTERRUPT-----*/
int arraycounter = 0;
int arrayshowcount = 0;
int arrayshowcounttext[200];
int secondcounter = 0;
int minutesince = 0;
char minutesincetext[20];
int writetimer = 0;
int arrayin[200] = {0};
char arrayintext[20];
int arraysumma[200] = {0};
char arraysummatext[20];
char arraytext[20];

/*-----VARIABLES FOR COUNTING AND PRINTING*/
int in;	
int ut;													
int summa;

int c;
int stop = 0;
int fivemin = 0;

char intext[20];
char uttext[20];
char summatext[20];

char btintext[20];
char btsummatext[20];


ISR(TIMER1_COMPA_vect){
	secondcounter++;
	writetimer++;
	
	if(secondcounter >= 60){
		minutesince++;
		secondcounter = 0;		
	}
	if(writetimer >= 300){
		arrayin[arraycounter] = in;
		arraysumma[arraycounter] = summa;
		
		arraycounter++;
		writetimer = 0;
	}
}


void init (void){											// collect hardware initialization here, as well as initialization of screen 
	/*--------------BOOT-SCREEN---------------*/
	lcd_init(LCD_DISP_ON); 									// initialize LCD-screen
	lcd_gotoxy( 0, 0);
	lcd_putc("DOORMEISTER 2000");							// print turn-on text
	lcd_gotoxy( 0, 1);
	lcd_puts("");

	_delay_ms(500);											// wait 4 seconds
	lcd_clrscr();											// print run-time screen
}

void mainScreen(void){
	lcd_clrscr();
	lcd_gotoxy(0,0);
	lcd_puts("Antal inne: ");
	lcd_gotoxy(0,1);
	lcd_puts("In:    Ut:");
}

void updateNumbers(void){
	lcd_gotoxy(11, 0);
	itoa(summa, summatext, 10);
	lcd_puts(summatext);
	lcd_gotoxy(3, 1);
	itoa(in, intext, 10);
	lcd_puts(intext);
	lcd_gotoxy(10, 1);
	itoa(ut, uttext, 10);
	lcd_puts(uttext);
	_delay_ms(100);
}

int main (void)
{	
	timer1_init();
		
	/*-------------DECLARING PINS-------------*/
	
	PORTC |= (1 << PC1); 									// initialize pullup resistor on our input pin 
	PORTC |= (1 << PC2); 									// initialize pullup resistor on our input pin 
	PORTC |= (1 << PC3); 									// initialize pullup resistor on our input pin 
	PORTC |= (1 << PC4); 									// initialize pullup resistor on our input pin	
	
	DDRC  |= 0b00000001;									// turn D-ports into output
	DDRC  |= 0b00010000;
	PORTC |= 0b00000001;									// turns on DB1
	
	PORTB |= (1 << PB2); 									// initialize pullup resistor on our input pin 
	
	DDRB  |= 0b00000001;									// turn D-ports into output
	DDRB  |= 0b00010000;
	PORTB |= 0b00000011;									// turns on DB1
	
	init();													// running the initializing procedure
	uart_init(UART_BAUD_SELECT_DOUBLE_SPEED(9600, F_CPU));
	
		
	
	while(1){
	/*---------------CLOCKSETUP---------------*/
		
	int run = 1;
	int hour = 20;											// set up integers for hours, and minutes
	int minute = 0;
	int second = 0;
	int stop = 0;											// stop integer for time setting while-loop
	int checktime = 0;										// binary integer for the timechecking while-loop
	long currenttime = 0;									// integer for storing current time in seconds
	long boottime = 0;										// variable for storing the user-defined time in seconds
	long holder = 0;
	char hourtext[20];										// chars for converting integers to ascii
	char minutetext[20];
	char currenttimetext[30];
	
	
	lcd_clrscr();
	lcd_gotoxy(0,0);
	lcd_puts("Stall in klockan");
	lcd_gotoxy(0,1);
	lcd_puts("     20:00   ");
	
		/*-----------HOUR SELECTION-----------*/
	
		while(stop == 0){
	
			if(bit_is_clear(PINC, PC1)){
			
				if(hour < 9){
					hour++;
					lcd_gotoxy(6,1);
					itoa(hour, hourtext, 10);
					lcd_puts(hourtext);
					_delay_ms(200);
				}
			
				else if(hour >= 9  & hour <23){
					hour++;
					lcd_gotoxy(5,1);
					itoa(hour, hourtext, 10);
					lcd_puts(hourtext);
					_delay_ms(200);
				}
				
				else if(hour == 23){
					hour = 0;
					lcd_gotoxy(5,1);
					lcd_puts("0");
					itoa(hour, hourtext, 10);
					lcd_puts(hourtext);
					_delay_ms(200);
				}
			
			}
		
		
		/*----------MINUTE SELECTION----------*/
		
			else if(bit_is_clear(PINC, PC2)){
		
				if(minute < 5){
					minute = minute + 5;
					lcd_gotoxy(9,1);
					itoa(minute, minutetext, 10);
					lcd_puts(minutetext);
					_delay_ms(150);
				}
			
				else if(minute >= 5 & minute < 59){
					minute = minute + 5;
					lcd_gotoxy(8,1);
					itoa(minute, minutetext, 10);
					lcd_puts(minutetext);
					_delay_ms(150);
				}
				
				else if(minute = 55){
					minute = 0;
					lcd_gotoxy(8,1);
					lcd_puts("0");
					itoa(minute, minutetext, 10);
					lcd_puts(minutetext);
					_delay_ms(150);
				}
			
			}
			
			else if(bit_is_clear(PINC, PC3)){
				stop =1;
				_delay_ms(100);
			}
			
		}								//Bootup time setting
		
		stop = 0;
		boottime = (hour*60) + (minute);
		
	
		
	/*------------------COUNTER------------------*/			//Beginning of actual program
		
		
		
		mainScreen();
		
		sei();
		
		while(run == 1){
	
			_delay_ms(50);
			
			if(bit_is_clear(PINC, PC4)){							// check if the in-button is pressed
				in++;												// add one to integer "in"
				summa = in - ut;									// refresh the sum
				itoa(summa, summatext, 10);
				itoa(in, intext, 10);
				itoa(ut, uttext, 10);
				
				while(bit_is_clear(PINC, PC4)){							// check if the in-button is pressed
				}
				
				mainScreen();
				updateNumbers();
			
				_delay_ms(25);	
			}
		
			if(bit_is_clear(PINC, PC5)){
			
				_delay_ms(50);
				
				if(bit_is_clear(PINC, PC5)){
			
				if(summa > 0){
					ut++;
					summa = in - ut;
					itoa(summa, summatext, 10);
					itoa(in, intext, 10);
					itoa(ut, uttext, 10);
					
					while(bit_is_clear(PINC, PC5)){							// check if the in-button is pressed
					}
					
					mainScreen();
					updateNumbers();
				
					_delay_ms(25);
				}
				
				}
			}
			
			
			if(bit_is_clear(PINC, PC2)){
				
				lcd_clrscr();
				_delay_ms(50);
				
				lcd_gotoxy(0,0);
				lcd_puts("Starta om enhet?");
				lcd_gotoxy(0,1);
				lcd_puts("  JA       NEJ  ");
				
				stop = 0;
				
				while(stop == 0){
					if(bit_is_clear(PINC, PC1)){
						_delay_ms(150);
						lcd_gotoxy(0,0);
						lcd_puts("Startar om enhet");
						lcd_gotoxy(0,1);
						lcd_puts("                  ");
						cli();
						
						in = 0;
						ut = 0;
						summa = 0;
						
						arraycounter = 0;
						arrayshowcount = 0;
						secondcounter = 0;
						minutesince = 0;
						writetimer = 0;
						
						_delay_ms(1000);
						run = 0;
						_delay_ms(150);
						stop = 1;
						_delay_ms(150);
					}
					
					else if(bit_is_clear(PINC, PC3)){
						_delay_ms(150);
						
						mainScreen();
						updateNumbers();
						
						stop = 1;
						_delay_ms(150);
					}
				}
				
				_delay_ms(150);
							
				stop = 0;
			}
						
		
			if(bit_is_clear(PINC, PC3)){
				
				_delay_ms(100);
					
				if(bit_is_clear(PINC, PC3)){
					
					_delay_ms(100);
					
					stop = 0;
					lcd_clrscr();
				
					lcd_gotoxy(0,0);
					lcd_puts("Skicka data?");
					lcd_gotoxy(0,1);
					lcd_puts("  JA       NEJ  ");
				
					while(stop == 0){
					
						if(bit_is_clear(PINC, PC1)){
						lcd_clrscr();
						uart_puts("\nShift started at ");
						
				
						holder = boottime;							// Use another variable to hold the currenttime
						minute = holder % 60;							// Use modulus to take out the amount of minutes that have gone by
						itoa(minute, minutetext, 10);					// Simple conditional to make sure that the minutes are presented correctly
						holder = holder / 60;							// Divide the number by 60 (60 minutes per hour)
						hour = holder % 24;								// Same as about to pick out the number of hours
						itoa(hour, hourtext, 10);
		
							if(hour <= 9){								// Conditional as with minutes to make sure that it's presented correctly
								uart_puts("0");
								uart_puts(hourtext);
							}
		
							else if(hour >= 10 & hour <=23){
								uart_puts(hourtext);
							}
					
							uart_puts(":");
					
							if(minute <= 9){							// if the number of minutes are smaller than 10 then present it as :09 not :9
								uart_puts("0");
								uart_puts(minutetext);
							}
			
							else if(minute >= 10 & minute <=59){        // if the number of minutes are larger than 9, just present as normally
								uart_puts(minutetext);
							}
					
					
							for(c = 0; c < arraycounter; c++){
								itoa(arrayin[c], btintext, 10);
								itoa(arraysumma[c], btsummatext,10);
								uart_puts("\nTotal: ");
								uart_puts(btintext);
								uart_puts(", Inne: ");
								uart_puts(btsummatext);
							}	
				
						_delay_ms(500);
						mainScreen();
						updateNumbers();
					
						stop = 1;
					}
				
					if(bit_is_clear(PINC, PC3)){
						stop = 1;
						_delay_ms(150);
					
						mainScreen();
						updateNumbers();
					}
				
					}
				}
			}
		
		
			/*
			if(bit_is_clear(PINC, PC4)){
				lcd_clrscr();														// Clear screen and output the new interface
				lcd_gotoxy(0,0);
				lcd_puts("Vid 00:");
				lcd_gotoxy(0,1);
				lcd_puts("Inne:   Tot:");
				itoa(arrayin[arrayshowcount], arrayshowcounttext, 10);				// Arrayshowcount starts at 0, which means the program looks for the first entry in the array
				lcd_gotoxy(12,1);													// Put out the number of people clicked in in the position in the array
				lcd_puts(arrayshowcounttext);
				itoa(arraysumma[arrayshowcount], arrayshowcounttext, 10);			
				lcd_gotoxy(5,1);													// Put out the number of people clicked in in the position in the array	
				lcd_puts(arrayshowcounttext);
	
				holder = boottime;													// Use holder to hold the boottime
				minute = holder % 60;												// Pick out the minutes with modulus
				holder = holder/60;													// Divide boottime by 60 to be able to pick out hours
				hour = holder % 24;													// pick out hours with modulus 24
				holder = boottime;													// Put holder back to boottime, otherwise the next use of holder would be with the divided number
				itoa(hour, hourtext, 10);											// Convert hours to ascii
				itoa(minute, minutetext, 10);										// Convert minutes to ascii
	
													
	
				if(hour <= 9){														// Conditional to make sure that the minutes are presented correctly	
					lcd_gotoxy(4,0);
					lcd_puts("00");
					lcd_gotoxy(5,0);
					lcd_puts(hourtext);
				}
	
				else if(hour >= 10){
					lcd_gotoxy(4,0);
					lcd_puts(hourtext);
				}
	
				if(minute <= 9){													// Conditional to make sure that the minutes are presented correctly
					lcd_gotoxy(7,0);
					lcd_puts("00");
					lcd_gotoxy(8,0);
					lcd_puts(minutetext);
				}
	
				else if(minute >= 10){
					lcd_gotoxy(7,0);
					lcd_puts(minutetext);
				}
	
				_delay_ms(150);
	
	
				while(stop == 0){													// While loop to show the rest of the positions in the array
	
					if(bit_is_clear(PINC, PC3)){
			
						itoa(arrayin[arrayshowcount], arrayshowcounttext, 10);
			
						lcd_gotoxy(12,1);
						lcd_puts("   ");
						lcd_gotoxy(12,1);
						lcd_puts(arrayshowcounttext);
			
						itoa(arraysumma[arrayshowcount], arrayshowcounttext, 10);
			
						lcd_gotoxy(5,1);
						lcd_puts("   ");			
						lcd_gotoxy(5,1);													// Put out the number of people clicked in in the position in the array	
						lcd_puts(arrayshowcounttext);
			
						fivemin = fivemin + 5;										// Add five more minutes to integer fivemin. Fivemin starts at 0.
						arrayshowcount++;											// Add one to the arrayshowcount, so we move to the next position in the array
						holder = boottime;											// Put holder back to boottime after every loop
						holder = holder + fivemin;									// Add the five minutes to holder
			
						minute = holder % 60;										// Pick out the minutes with modulus 60
						holder = holder/60;											// Divide by 60 to pick out hours
						hour = holder % 24;											// Pick out the hours with modulus 24
						itoa(hour, hourtext, 10);									// Convert hours to ascii
						itoa(minute, minutetext, 10);								// Converts minutes to ascii
			
			
						if(hour <=9){												// Conditional to make sure hours are presented correctly
							lcd_gotoxy(4,0);
							lcd_puts("00");	
							lcd_gotoxy(5,0);
							lcd_puts(hourtext);
						}
			
						else if(hour >=10){
							lcd_gotoxy(4,0);
							lcd_puts(hourtext);
						}
	
						if(minute <= 9){											// Conditional to make sure that minutes are presented correctly
							lcd_gotoxy(7,0);
							lcd_puts("00");
							lcd_gotoxy(8,0);
							lcd_puts(minutetext);
						}
	
						else if(minute >= 10){
							lcd_gotoxy(7,0);
							lcd_puts(minutetext);
						}
			
						_delay_ms(200);												// Delay to remove bouncing between 1 and 0
			
					}												//MENU CHOICE: PRESENT ARRAY
		
		
					if(bit_is_clear(PINC, PC4)){
						stop = 1;													
					}										//Conditional that returns to main program if button 3 is pressed
	
				}

				stop = 0;																// Before going back to main program, set stop,
				arrayshowcount = 0;														// arraycount and fivemin to zero. This to make
				fivemin = 0;															// it possible to review the array again.
				mainScreen();
				updateNumbers();
		
			}
			*/
		
		
		
		}
		
		
		
		}
}	

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

A distinct lack of volatile.

 

PS you probably want to Google "cyclomatic complexity" too.

Last Edited: Sat. Feb 24, 2018 - 10:20 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am not a expert but why do you set the PORT direction twice the second line will override the previous line.

 

DDRC |= 0b00000001

DDRC |= 0b00010000     

 

And this is also done with PORTB

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mikedb wrote:
twice the second line will override the previous line.
No it won't. He's using |= so each line "adds bits in".

 

It does seem to be someone who is far too find of |= when they should be using = though. The sequence:

DDRC |= 0b00000001
DDRC |= 0b00010000  

ends up with DDRC set to:

???1???1

the other 6 bits could be anything. A far better idea would be:

DDRC = 0b00010001;

in which case the state of all 8 bits is known exactly. If you want to make that line more "documentary" then perhaps:

DDRC = (1 << 4) | (1 << 0);

so the reader does not have to count the 1's and 0's to see it's bits 4 and 0.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Naccache wrote:
I'm guessing ... 

Don't just guess!

 

Examine the evidence; form a hypothesis; test your hypothesis:

 

http://www.8052mcu.com/faqs/120313

 

... that this is a power fluctuation

An oscilloscope on the power line(s) should be able to confirm or deny that.

 

The ATMega328P has a brown-out detector - are you using it?

 

The ATMega328P has on-chip debug - are you using it?

 

 

More on debugging in general: https://www.avrfreaks.net/commen...

 

 

EDIT 2

 

Changed 8052.com link to 8052mcu.com - see http://www.8052mcu.com/forum/rea... for details

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Tue. Mar 20, 2018 - 11:28 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A photo of the pcb would big help with the power issues, i.e. proper use of bypass caps (100nf)

Fuses settings used (is BOD set correctly)

 

Over all a very nice project!

 

Jim

 

Click Link: Get Free Stock: Retire early!

share.robinhood.com/jamesc3274

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:
... that this is a power fluctuation An oscilloscope on the power line(s) should be able to confirm or deny that.
I doubt that. Power supply disturbances are often very short and intermittent glitches. You'll have to put your scope in single trigger mode and may be waiting for days before a trigger event happens.

 

It's probably easier to test the quality of the power supply itself directly. A not very scientific but usefull none the less is to switch some nearby appliances repeatedly. Stuff with coils in them are often best for this. Old fashioned Fluorescent lights. Halogen lights with transformers.

Making your circuit emit an audible beep after reset can also help pin pointing this, or it could log reset (and other) data to a serial interface or Logic analyser.

 

In my experience a uC power supply always needs an inductor or choke of some kind to combat conducted EMI.

Sometimes the transformer inside the power supply is good enough, sometimes extra filters are needed.

Good PCB design (GND planes, Routing) is also imporant here.

Paul van der Hoeven.
Bunch of old projects with AVR's:
http://www.hoevendesign.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:

 An oscilloscope on the power line(s) should be able to confirm or deny that.

Paulvdh wrote:
I doubt that.

Maybe - maybe not.

 

This guy has certainly managed to catch a power supply disturbance on a scope:

https://community.st.com/message...

 

 

Power supply disturbances are often very short and intermittent glitches.

Depends.

If it is related to some specific event - such as the BLE TX, or screen powering up, or suchlike - then it could be quite "repeatable"

 

 

You'll have to put your scope in single trigger mode

Yes - of course.

 

 

and may be waiting for days before a trigger event happens.

Yes - depends on how often these "freezes" occur.

 

Making your circuit emit an audible beep after reset can also help

Indeed - you could make your scope trigger on this, and have it with the trigger point at the end of the trace ...

 

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Wed. Mar 14, 2018 - 03:21 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Naccache wrote:
The code is written by me and is far from perfect, ...
Bluntly, so what (you are a creator being) and code is likely not perfect (it's an iterative process)

Naccache wrote:
I'll upload the code, but first, the issue.
First is to work compiler and linker warnings then proceed to lint and work its warnings.

Atmel Studio has some lint extensions.

Visual Studio Community has a good zero price lint (/analyze)

Naccache wrote:
If I remove the back cover on the unit then the BLE module is still blinking, so charge *is* reaching it.
Blinking does not mean completely operational (blinking task is functional, another task is non-functional)

After solving the other bugs, add a communications test of the BLE module (BLE-to and from-mega328P) with a hardware reset of the BLE module if that test fails.

Naccache wrote:
But I can't for the life of me find what it is.
Consider:

  1. adding C asserts for underflow and overflow though will have to decide how to get the data out ("somehow" to LCD, "breadcrumbs" in RAM then "somehow" do a memory dump) then how to transform the data into information (data with meaning) into knowledge (defect -> fault -> failure)
  2. reboot may be better than freeze.  Add mega328P's watchdog or add a windowed external watchdog to reset the mega328P.

Naccache wrote:
But maybe you guys have some other idea?
plural and numerous

need information to create recommendations

 

P.S.

Naccache wrote:
Also, as general information, I work as a nightclub bouncer in Sweden and therefore the unit is subjected to temperatures of -15C (that's around 5F) for up to 5 hours when we're working the door.

...

All the best, Christian

I imagine you're a bruiser though colds and flu can make you miss work.

Consider adding a daily dose of vitamin C, powder by body weight, and somehow get enough vitamin D.

Liposomal vitamin C can knock down any infection so you shouldn't miss more than one day of work (one dose, wait 12h, evaluate, if need be then one more dose should kick it)

 

P.P.S.

To the women bouncers in the audience, it's equal opportunity.

The Athena-class women (approx 70kg) can more than hold their own (think Gina Carano)

 


https://gallery.microchip.com/

https://www.visualstudio.com/vs/community/

https://docs.microsoft.com/en-us/cpp/build/reference/analyze-code-analysis

http://www.microchip.com/webdoc/AVRLibcReferenceManual/group__avr__assert.html

https://www.avrfreaks.net/forum/led-indicator-software-debugging#comment-2422671

 

Edit: +4 URL

 

"Dare to be naïve." - Buckminster Fuller

Last Edited: Thu. Mar 15, 2018 - 06:46 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

First of all, thanks for all the replies. For some reason I seem to have shut off email-notifications for replies to my own threads, hence the late reply from my side. I still haven't resolved the issue and I have a hard time imagining how to debug it. Some nights the unit works without issue (around five hours) and other nights some bug shows up immediately. I haven't been able to reproduce the issue at home yet, but I'll try some more.

 

clawson wrote:

[SNIP] 

the other 6 bits could be anything. A far better idea would be: 

[SNIP]

 

Thanks, I'll change to your solution for readability.

 

ki0bk wrote:

A photo of the pcb would big help with the power issues, i.e. proper use of bypass caps (100nf)

Fuses settings used (is BOD set correctly)

 

Over all a very nice project!

 

Jim

 

I've attached a picture of the PCB board in this message. Thanks for the kind words!

 

awneil wrote:

Examine the evidence; form a hypothesis; test your hypothesis:

 

http://www.8052mcu.com/faqs/120313

 

... that this is a power fluctuation

An oscilloscope on the power line(s) should be able to confirm or deny that.

 

The ATMega328P has a brown-out detector - are you using it?

 

The ATMega328P has on-chip debug - are you using it?

 

 

 

Yeah, you're absolutely right. I have my hypothesis but until now I haven't been able to reproduce the issue consistently neither do I have access to a digital oscilloscope able to halt the reading when a fluctuation occurs.

 

Regarding the brown-out detector and on-chip debug, the answer is no (neither did I know about them). That gives me some reading for the day (along with your other links).

 

Quote:

Depends.

If it is related to some specific event - such as the BLE TX, or screen powering up, or suchlike - then it could be quite "repeatable"

 

BLE TX is only on request from the user, screen powers up as soon as power supply is connected. I've really been trying to think about the software and what happens when it's "idling" in the main screen but it's just a while-loop waiting for input from the buttons. I would however like to stay at this issue just a bit more, as I think it's key to one of the bugs.

 

For some reason the unit locks up and doesn't understand input from the push-buttons. The screen is on and working but I'm as of now unable to know if it's refreshing (there's one idea for bug testing, some kind of continuous counter). However, if I mechanically short the push-button legs on the atmega then it starts working again. It's almost like there's an issue between the push-button and the atmega, as if the signal doesn't continue all the way?

 

Again, thanks to everyone and I'll keep working on this until I'm done :-)

 

All the best, Christian

Last Edited: Tue. Apr 3, 2018 - 06:52 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Naccache wrote:
For some reason I seem to have shut off email-notifications for replies to my own threads, ...
An AVR Freaks issue :

https://www.avrfreaks.net/forum/functionality-notifications

Naccache wrote:
... and I have a hard time imagining how to debug it.
Paulvdh and his how to debug with a logic analyzer post :

https://www.avrfreaks.net/forum/led-indicator-software-debugging#comment-2421756

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I just had a short peek at the code.

It seems that the bottom half of it is commented out, and not doing anything.

 

Then I noticed:

/*-----VARIABLES FOR INTERRUPT-----*/
int arraycounter = 0;
int arrayshowcount = 0;
int arrayshowcounttext[200];
int secondcounter = 0;
int minutesince = 0;
char minutesincetext[20];
int writetimer = 0;
int arrayin[200] = {0};
char arrayintext[20];
int arraysumma[200] = {0};
char arraysummatext[20];
char arraytext[20];

Then I upvoded Clawson #2.

Short but powerfull answer.

First to answer to.

clawson wrote:
A distinct lack of volatile. PS you probably want to Google "cyclomatic complexity" too.

Paul van der Hoeven.
Bunch of old projects with AVR's:
http://www.hoevendesign.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here's a thought....
Take your multimeter and set it for reading amps and measure the draw of your circuit at full capacity....meaning when the LCD backlight is on and it is transmitting via the BLE. Do this using a mains powered dc source not the battery. Make sure your power supply can supply at least 1500ma of current at your reqired voltage.

When the unit freezes is it usually after it has been on for about 5 hours?

Reason for this is I am wondering if the circuit draws considerable power to drain the battery in 5 hours time to where the AVR is operating out of spec.

A schematic would help in this theory as voltage regulators do funny things when input voltages get too low.

Just my two pence.
Jim

Edit...Damned autocorrect!

If you want a career with a known path - become an undertaker. Dead people don't sue! - Kartman

Please Read: Code-of-Conduct

Atmel Studio6.2/AS7, DipTrace, Quartus, MPLAB user

Last Edited: Tue. Apr 3, 2018 - 01:26 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jgmdesign wrote:
Take your multimeter and set it for reading maps
   Got love auto-correct!

 

Jim

 

Click Link: Get Free Stock: Retire early!

share.robinhood.com/jamesc3274

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What does your build environment reports as static RAM usage?

I see about 1400 bytes for variables and lots of string literals.

And in addition to that static usage there is a fair amount of local data in main().

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Several hints have already been posted (volatile qualifiers missing, watchdog should be enabled)

I want to add that you should critically check your array accessing/functions:

 

int arraysumma[200] = {0};

...

ISR(TIMER1_COMPA_vect){
	secondcounter++;
	writetimer++;

	if(secondcounter >= 60){
		minutesince++;
		secondcounter = 0;
	}
	if(writetimer >= 300){
		arrayin[arraycounter] = in;
		arraysumma[arraycounter] = summa;

		arraycounter++;
		writetimer = 0;
	}
}

The arraycounter index may exceed the index range of 200!

 

Furthermore look what happens if an interrupt is triggered processing this loop:

 

for(c = 0; c < arraycounter; c++){
								itoa(arrayin[c], btintext, 10);
								itoa(arraysumma[c], btsummatext,10);
								uart_puts("\nTotal: ");
								uart_puts(btintext);
								uart_puts(", Inne: ");
								uart_puts(btsummatext);
							}	

The variables arraycounter, arrayin and arraysumma may be different due to interrupts between the excecution of the complete loop.

 

Flo1991

Last Edited: Tue. Apr 3, 2018 - 07:01 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello again everyone, thanks again for the replies. I've read up some on volatiles and I'll be spending this morning doing some examples. I've also included the picture of the PCB-board this time, as I missed it last time.

 

Paulvdh wrote:

I just had a short peek at the code.

It seems that the bottom half of it is commented out, and not doing anything.

 

Then I noticed:

/*-----VARIABLES FOR INTERRUPT-----*/
int arraycounter = 0;
int arrayshowcount = 0;
int arrayshowcounttext[200];
int secondcounter = 0;
int minutesince = 0;
char minutesincetext[20];
int writetimer = 0;
int arrayin[200] = {0};
char arrayintext[20];
int arraysumma[200] = {0};
char arraysummatext[20];
char arraytext[20];

Then I upvoded Clawson #2.

Short but powerfull answer.

First to answer to.

 

clawson wrote:

A distinct lack of volatile. PS you probably want to Google "cyclomatic complexity" too.

 

 

I should have removed the commented away section. It's code from an earlier prototype that had four (instead of the current three) face buttons. The commented sections code is just for the fourth no longer existing button.

 

 

jgmdesign wrote:

Here's a thought....
Take your multimeter and set it for reading amps and measure the draw of your circuit at full capacity....meaning when the LCD backlight is on and it is transmitting via the BLE. Do this using a mains powered dc source not the battery. Make sure your power supply can supply at least 1500ma of current at your reqired voltage.

When the unit freezes is it usually after it has been on for about 5 hours?

Reason for this is I am wondering if the circuit draws considerable power to drain the battery in 5 hours time to where the AVR is operating out of spec.

A schematic would help in this theory as voltage regulators do funny things when input voltages get too low.

Just my two pence.
Jim

Edit...Damned autocorrect!

 

I'll get my multimeter from the garage and try that out. I've included the schematic. To answer your question, freeze-ups have not been consistent in terms of running time. The battery is way oversize for the application as well. I have a current draw of about 40mA tops and the battery is a 3.3V 2400mAh.

 

sternst wrote:

What does your build environment reports as static RAM usage?

I see about 1400 bytes for variables and lots of string literals.

And in addition to that static usage there is a fair amount of local data in main().

 

I'm on a Mac and I just use TextMate for writing code and the terminal for the makefile. I'll check if static RAM is displayed when running the make.

 

Flo1991 wrote:

(SNIP)

The arraycounter index may exceed the index range of 200!

 

Furthermore look what happens if an interrupt is triggered processing this loop:

 

for(c = 0; c < arraycounter; c++){
								itoa(arrayin[c], btintext, 10);
								itoa(arraysumma[c], btsummatext,10);
								uart_puts("\nTotal: ");
								uart_puts(btintext);
								uart_puts(", Inne: ");
								uart_puts(btsummatext);
							}	

The variables arraycounter, arrayin and arraysumma may be different due to interrupts between the excecution of the complete loop.

 

Flo1991

 

Yeah, the over indexing is a issue I've overlooked because of the larger issue at hand (bugs), this because shifts are 5 hours long and an index of 200*5min gives me more than 16 hours of saving. However, I just realised that there's no harm in just giving the array counter a max value.

 

For the other part of your answer, would it be smart to use cli() before the for loop and sei() after?

 

All the best and thanks again everyone!

 

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Looking at some of your while()'s in main(), you have code that is enabled by run = 1, and at one point sets this to run = 0;  but no where in the code other then by init is run returned to a value of 1.

 

_delay_ms(1000);
run = 0;
_delay_ms(150);
stop = 1;
_delay_ms(150);

So when this happens, it looks like you stop scanning your button inputs.

resetting the chip would restore operational scanning. 

 

Jim

Click Link: Get Free Stock: Retire early!

share.robinhood.com/jamesc3274

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The lock up symptom could be caused by the XC61CN3002 holding nRESET low once Vin (Vbat) drops to below the threshold voltage.

Greg Muth

Portland, OR, US

Xplained/Pro/Mini Boards mostly

 

Make Xmega Great Again!

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ki0bk wrote:

Looking at some of your while()'s in main(), you have code that is enabled by run = 1, and at one point sets this to run = 0;  but no where in the code other then by init is run returned to a value of 1.

 

_delay_ms(1000);
run = 0;
_delay_ms(150);
stop = 1;
_delay_ms(150);

So when this happens, it looks like you stop scanning your button inputs.

resetting the chip would restore operational scanning. 

 

Jim

 

That's when reseting the unit setting run to 0 stops the main while loop and takes us back to the top of the code where run is set to 1 again. Maybe I misunderstood you? Or maybe this is bad practice?

 



	/*---------------CLOCKSETUP---------------*/
		
	int run = 1;
	int hour = 20;			
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Naccache wrote:
setting run to 0 stops the main while loop and takes us back to the top of the code where run is set to 1 again

No it does not return to the top, it falls off the end, where GCC will turn off interrupts and enter a while(1) endless loop of its own!

Thus, Hanging, and requiring you to reset it or power cycle it.

 

Jim

 

Click Link: Get Free Stock: Retire early!

share.robinhood.com/jamesc3274

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
For the other part of your answer, would it be smart to use cli() before the for loop and sei() after?

For best stability all variables effected by the interrupt should be volatile. For high stability using these variables in the main function you would have to declare

a local copy of the variables (they should also be volatile so that they are not optimized away). Then you should copy the global variables to the local variables when interrupts

are disabled (cli(), sei()). All other implementations may have a risk in failing sometimes (this may be really really seldom, but may happen; e.g. in your application you may miss an interrupt if you block for the complete for loop because itoa and puts take some time).

 

You may this keep more simple if you expand your testing (e.g. if you can garuantee never to miss an interrupt you can also cli(), sei() the complete loop).

 

Hope this helps.

 

Flo1991

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nice project!

 

I'll let others comment on the code, as I don't read C.

I presume, however, code-wise, that you corrected all of the "volatile" issues first mentioned in comment #2

You mentioned you are using an HM-10.  The data sheet I have mentions that its operating temperature range is -5 'C to 65'C.

 

You mentioned that the temperature could be -15'C, so it is not surprising to me that the BT link fails sometimes.

 

Can you put the device in your pocket to warm it up now and then, or is it always exposed to the outside temperature?

 

I assume your code properly de-bounces the push button switches, and you don't have a stack overflow problem from multiple bounces.

 

Back on operating temperature again, you will have to check the data sheet for the exact M328 you have to see its in-spec temperature range, particularly as you are relying on its internal RC clock oscillator.

 

If you had a spare pin and had routed it to some spare pads, (always a good idea, I think), you could have a small LED flash from within your Main Loop.  Then when the device appears to lock up you would have an idea if the uC was still "running", or if the clock had stopped.

 

How many PCB's do you have?  Can you build one without the socket for the uC?  It may or may not be an issue, but clearly your device is moved about and undergoes thermal cycling, and perhaps one of the connections is not as solid as one would like it to be.

 

The fact that the LCD still shows some data "but the buttons do nothing" just tells you that the uC isn't running your code.  The LCD will continue to display the last data shown until some new data is correctly clocked into the display.

 

Touching pins on the micro to jump-start it again is not an approved technique!  Are you shorting pins with the screwdriver?  (That might cause a Vcc drop and a power on reset when the screwdriver is removed.  It can also damage the uC and/or other components!)

 

"The unit reboots by itself".  There were a couple of old Threads, which I can't find, that discussed this and its numerous causes.  I always think of a software induced stack overflow first, but there are actually several reasons errant code can reboot. 

 

Assuming you fixed the "volatile" issue, you are still stuck sorting out whether the next problem is hardware or software based.  How many boards do you have?  One might use another micro, perhaps a $2 USD Arduino Nano clone, and a couple of NFets across the switches, and have the external micro simulate an evening's work, firing the push button switches for you.

 

You can run this test setup on the bench, where the temperature is both warm and stable, and depending upon how complex you wish to make the setup you can see if it works properly forever, or fails.  You might need to (slightly ) modify your code so that it toggles a spare pin high and low on a system reset, so that the external micro can count how many times the system resets, (1 being the expected number, multiple resets being a problem).

 

You could also, if you have the spare pins, toggle a pin whenever the count increases or decreases.  Again, the point being you could see if the external tester maintained the same count as the device under test, or not.

 

Have another spare pin, the Main Loop could toggle it now and then and the external tester could see if the micro's clock stopped, or is running errant code, (still toggling, but not at the expected rate).

 

Done making some tests in the nice, warm, bench top environment?

 

Everything working correctly?

 

Then test the device in the cold as see if it repeatedly fails.

 

 Clearly, sorting out HW vs SW as a source of the problem is one of the first steps in tracking this bug ( s ) down.  And if the problem appears to be a HW problem, then sorting out if it is a temperature issue or not would be an important step.

 

Finally, your board looks great.  Did you hand assemble it?  I like a good challenge, but I can't begin to solder such small components!

 

JC