off in the weeds

Go To Last Post
38 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I cant figure out whats causing my program to hang. Seems to be about 30 seconds into execution, not that helps much, but it means things are clipping along fine, then everything stops. I have watchdog turned off trying to sort things out. It doesnt reset, just hangs. I read/reset MCUCSR everytime, but it just shows a pwr reset. I threw in the catch all ISR, just in case, with a counter.

ISR(BADISR_vect){
if(++bad_count > 0x39){bad_count = 0x30;}
}

I spit the bad_count out along with some other stuff in an idle time slice.

I added bounds checking to everything that indexs and array.

Anything global that is accessed inside an ISR is declared 'volatile'.

Memory use looks OK:

Size after:
AVR Memory Usage
----------------
Device: atmega8

Program: 1998 bytes (24.4% Full)
(.text + .data + .bootloader)

Data: 211 bytes (20.6% Full)
(.data + .bss + .noinit)

EEPROM: 404 bytes (78.9% Full)
(.eeprom)

What else do I look for? Someone tell me what I am overloking!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You know what? My crystal ball is occupied full time at the moment trying to work out what I wife wants ... so I guess you will just have to show us your code and schematic if you expect any sort of sensible guess ...

Ross McKenzie ValuSoft Melbourne Australia

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yep, I know what ya mean, but I didnt post the code cause its really long, multiple files, and none of it does anything fancy. I was hoping someone could suggest things to check that I didnt not have listed. I got the interrupts covered, even a catch all, boundry check arrays, declare globals accessed in main and isr as volatile.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well can you comment out a section at a time to try to isolate where it hangs? Do you have a spare serial port that you could use to output a loop counter or the condition of a suspect variable(s)? Did it ever work correctly in an earlier incarnation? What did you add or change since that version? (sound of white cane tapping at my place)

Ross McKenzie ValuSoft Melbourne Australia

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yep, I use the idle time in my scheduler to output status for variables on the serial port. Everything did work correctly in a previous incarnation, what changed was the additon of an ftdi serial-to-usb chip. The USB traces are poorly laid out, and my original thought was garbage on the serial line. but I have not seen any yet. But thats all that changed. Without a debugger, I guess your right, comment code and spit things out on the serial port.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In your idle time slice also print the SP regularly to see if a memory leak exists.

Uninitialised entries in table(s) of pointers to functions ?

Using a 'library' routine where the function prototype has the wrong number of parameters ?

Not having a timeout when waiting for something to occur, eg. a input to go high or a device to do something.

Yeah.., my crystal ball is also somewhat blurry. and the sixpack of beer hasn't helped :)?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ah, the six pack, thats what I'm missing. See your crystall ball was spot on. I'm gonna drink a beer and take another crack at it.

Also, I am not outputting the stack pointer. I'll add that too.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
I didnt post the code cause its really long,

Post main & post any ISR's you are using.
Use the Code button to make the code look like code.

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Unless your code is very time critical you could assign a number to every section and pass to a variable.
If that variable always contain same number when program crashes you can narrow down your search by numbering sub sections.
This strategy have saved me a couple of times when I was clueless where to look.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You used the word scheduler. Better show us that. Somewhere you are loading a 2 byte variable without turning off interrupts, and after the interrupt has messed up the read right in the middle, you are trying to run somthing ffff times by accident.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If "scheduler" means what it usually does, the I suppose you are doing your own "context switches", i.e. messing around with the stack and registers etc in a timer ISR. If you don't get your PUSHes and POPs right, or have interrupts enabled while switching context or ... then you will get into the FUBAR state.

Is this a scheduler that you've written yourself, or is it one of the ubiquitous RTOSes for AVRs? If the latter, which one?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Johans point is well taken, and close to what I was going to suggest without thinking "scheduler". Stack overflow is a very common reason for "hang after operating a while". With mismatched pushes and pops, you get return addresses (from both ISRs and regular functions) mixed up with data and you get a return to place that the call didn't come from.

The other thing that happens is that the stack continues to grow into data space. Then you have stack over-writing data and data over-writing stack. Consequences are similar or worse.

It will take some major digging to discover where it is. But, you have little choice but to do it.

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is easy to write a little stack checker. Declare a var at the end of the global vars in ram. The end of the BlockStorageSection. Init it to 0x55 at the top of main. In main, check to see if the 0x55 is still there. If its not, the stack walked up over it into your vars.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hey, thanks for all of the replies. I sorta mislead you guys with the use of scheduler term. I have a "scheduler" that handles trival stuff once we have handled the critical stuff. Its more of state machine, than a scheduler. However, the trivial stuff is scheduled by timer0. There is a long time after the critcal areas to perform trivial stuff. When the critical stuff beings, we enter:

SIGNAL(SIG_INTERRUPT0)

Inside SIG_INTERRUPT0, TCNT1 is set to 0, OCR1A and OCR1B are loaded based on a value pulled from LUT. The value in OCR1B is always 50 greater than the value in OCR1A. After OCR1B ISR, then I set flag to indicating the critical area is over, and the timer0 based scheduler goes back to scheduling non critical stuff. Here is what I think might be killing the stack. I have a global:

int16_t MAP[101]

That I copy a LUT from eeprom into MAP. The MAP[101] is read in the SIGNAL(SIG_INTERRUPT0) to load the OCR1A and OCR1B values. I assume that is eating up alot of SRAM? I tired to fill the SRAM with known values, then check if they are over written, but I never could get that to work, and I tried using the stackmon code found in another thread. No luck with that yet either.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

To be more specific on the stackmon. I added stackmon.c and stackmon.h, everything builds, but when I call StackPaint in main, everything freezes. I call it first thing,is that correct. Anyone have some experience with the stackmon code? I would love to get it working and have it as a new tool. Or any stack checker routine. I'm out of my lane in this area, so a specific example with code would really help.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

when I call StackPaint in main, everything freezes.

Sounds like it is misconfigured, and that it actually paints over something significant. I have no experience with stackmon in particular so this is a general sweeping speculation.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How about my suggestion? Put a 0x55 at the highest address in the bss and check it once in a while?

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

cause i dont know how. shed some light.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
char cat;
char rat;
int dog;
int pig;
char lastvar;
.
.
.
void main(void){
  //call init subs here
  lastvar=0x55;
  while(1){
    //call inputs
    //call process
    //call outputs
    if(lastvar != 0x55){
      printf("stack overflowed boss!");
      while(1){}; //hang up
    }
  }
}

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

haha! Wow, I was cooking up something a waaaay more complicated, that was introducing more questions than answers. Thanks for the reality check. I'll give it a whirl and post back.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Bob: How will that work if an optimizer decides to rearrange the variables order of allocation as compared with the source code, e.g. because of some hardware-specific alignment constraints?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I appeal to some knowledgable gcc users to tell someguy how to declare a variable that will be schtuck at the end of bss. In the imagecraft compiler, the var at the highest address in the map is the first one in the source file. The gnu dudes might do it different. Put firstvar at the beginning and lastvar at the end and see which one has the highest address in the map?

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bobgardner wrote:
I appeal to some knowledgable gcc users to tell someguy how to declare a variable that will be schtuck at the end of bss.
Simply use the first byte after BSS for that check by using the symbol __bss_end provided by the linker.

extern uint8_t __bss_end;

__bss_end = 0x55;
...

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

i seem to have something wrong with my timer1 compare...

void TIMER0_init(){
	//	Initialize the 8-bit Timer0 to clock at 10.8 kHz @ XTAL of 11.0592 MHz
	TCCR0 = 0x05; 							// Prescale Timer0 @ 1024
	TIMSK |= 1<<TOIE0;						// Enable Timer0 interrupt
}

void TIMER1_init(){
	//	Initialize the 16-bit Timer1 to clock at 172.8 kHz @ XTAL of 11.0592 MHz
	TCCR1A = 0x00;							// Prescale Timer1 @ 64
	TCCR1B = 0x03;	
	TIFR&=~(1<<TOV1);						// Clear overflow flag
	TIMSK = (1<<OCIE1A) | (1<<TOIE1); 		// Enable timer1A compare
											// Enable Timer1 interrupt
	//TIMSK|=(1<<OCIE1B); 				   	// Enable timer1B compare
	//TIMSK|=(1<<TOIE1); 				   	// start timer
}

Then the ISR is:

ISR(TIMER1_COMPA_vect)
{

	if (!first_cycle){

	PORTD |= _BV(Q1_BASE);	//  HIGH
	PORTD |= _BV(TRIGGER);	//  HIGH
	}
		
}

The variable first_cycle is defined as :

volatile unsigned char first_cycle;

and I use it as a flag to signal the first cycle of a sequence. If i take out the enable for timer1a compare the code runs fine, if enable it the watchdog keeps resetting.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

this_guy wrote:
TIMSK = (1<<OCIE1A) | (1<<TOIE1);
And where is your ISR(TIMER1_OVF_vect)?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
ISR(TIMER0_OVF_vect)
{
	// Interrupt after 108 counts at 100 Hz, so preload 256 - 108, or 148
	// (10.8 kHz / 100) = 108
	TCNT0 = 148;
	Idle = FALSE;			// Clear the idle bit flag
}		// SIGNAL(SIG_OVERFLOW0)

ISR(TIMER1_OVF_vect)
{
	TCNT1 = 0x00;
	first_cycle = 1;
	PORTD &= ~(_BV(TRIGGER));		//  LOW 
	PORTD &= ~(_BV(Q1_BASE));		//	LOW
}		// SIGNAL(SIG_OVERFLOW1)
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you set OCR1A anywhere?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

yep, I set OCR1A inside the isr for ext int0. normally i pull the OCR1a value from a LUT, but I have it hardcoded right now to rule out problems with that.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And what triggers "ext int0"? Are you sure that it really gets triggered (and thus OCR1A gets set) before you start getting TIMER1 compare match interrupts?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah, its really getting triggered. I have serial output in every function now to see where I'm at and everything is occuring in the right sequence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK, let's start all over. You say that if you comment the line:

TIMSK = (1<<OCIE1A) | (1<<TOIE1);

Then everything works. Right?
- Have you tried enabling just one of the two interrupts (i.e., not both OCIE1A and TOIE1)?
- When you enable it, does the program work for 30 seconds, or resets right away?
- Have you tried commenting out everything from the interrupt routines?
- Have you tried putting a non zero OCR1A value in the TIMER1_init function?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

i have commented out everything in the compare isr. its like this, with the watchdog enabled, if i enable OCIE1 then it resets over and over, if I disable OCIE1A, no watchdog reset. If I disable watchdog, I can enable OCIE1A and it'll run for a while, then freeze.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you tried setting OCR1A before enabling OCIE1A? Something like "OCR1A = 1000", before the TIMSK.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nope, I'll give that a try.Seems like that shouldnt matter. Not arguing,just sayin.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If that does not help, and you really believe that it is OCIE1A causing this, then you should be able to come up with a complete short example that exhibits the problem and post the source code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

True. I'll work on it some more. Sorry my replies are so short, when i type in the 'reply' box the curser keeps jumping around. driving me nuts. just one of those days, i guess.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Update. I got things playing nice.

Originally, I would enable OCIE1A and OCIE1B at startup, and use a flag inside their ISR to determine if action was required. So the sequence went EXT_INT0_ISR ---> TIMER1_COMPA_ISR ---> TIMER1_COMPB_ISR with the values for the timer1 compares being loaded inside EXT_INT0_ISR. Now, I start with the timer1 compares disabled, I enable them inside EXT_INT0_ISR and then disable them again at then end of their own ISR.

So, everything seems to work. But I still dont understand or see why my originally code would fail so horribly. I really need to up my game on these avr's. I've been using them and abusing them for a long time now, and they always seem to work or I could sort out problems easy enough. This time I was really at a loss. I'm going to start digging around for debugger tools, reading related threads on here, etc. If someone wants to voice their opinion on the debugger subject, feel free.

Also, if i can figure out why the compares enabled all the time was causing problems, I will post back. I still would like to know myself.

Thanks again for all of the replies.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

FYI the AVR ONE is currently on sale from Atmel for $199+ shipping.