Help from PRO’s wanted: Interrupt causes stack crash?

Go To Last Post
88 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Runs on: 1284p-16MHz
Memory used: 6%
Program memory: 15%
I'm struggling to find the reason why my program suddenly, sometimes after hours, stopped working and gets reset.
I finally saw something that I don't understand how that is possible:
I sequentially executed the following code every 5sec.:

uint8_t getGpsData()
{

char caReceiveBuf[STRINGBUFFERSIZE]="";
	
if (sendFIXATcmdWithRespons(caATCGPSSTATUS, caFIX, 1, 4, NORMAL) == 1)
{ // GPS has a fix
    ATOMIC_BLOCK(ATOMIC_FORCEON)
    {
        gpsled = 2; // show GPS led fix
    }
    if (getGpsInfo(caReceiveBuf))
    { // GPS position is in GPS receive array
        if ((decodeGPS(caReceiveBuf) == 1) && (lLatitude > 0) && (lLongitude > 0))
        {
            return 1;
        }
    }
}
else
{ // GPS has no fix
    ATOMIC_BLOCK(ATOMIC_FORCEON)
    {
        gpsled = 1; // show GPS led no fix
    }
}
    return 0;
}

Inside the function getGpsInfo before returning 1 I did some extra checks to make sure that the caReceiveBuf is correct AND displayed it via a serial out for debugging:

uint8_t getGpsInfo(char *receive)
{
    char caReceiveBuf[STRINGBUFFERSIZE]="";
"¦
// complete cleaned-out string is in array
	if ((pos > 0) && (pos < STRINGBUFFERSIZE-1))
	{
		caReceiveBuf[pos] = '\0';
		// print the received string for debugging purposes
		debugPutc('>');
		debugPrintStr(caReceiveBuf);
		debugPrintStrln("<");
		p = strstr_P(caReceiveBuf, caOK);
		if (p == NULL)
		{ // OK not found
			debugPrintStrln("gpslnok");
			return 0;
		}
		else
		{ // filter-out OK
			*p = '\0';
		}
		len = strlen(caReceiveBuf);
		// surely no GPS INFO if length < MINLENGTHGPSINFO
		if (len < MINLENGTHGPSINFO)
		{
			debugPrintStrln("gpslen");
			return 0;
		}
		// could be GPS INFO -> check further
		// extra checks to make sure
		if ((caReceiveBuf[1] != ',') || (caReceiveBuf[12] != ',') || (caReceiveBuf[24] != ','))
		{
			debugPrintStrln("gpscomma");
			return 0;
		}
		// looks ok:  01234567890123456789012345678901234567890123456789012345678901234567890123456
		// 0,340.123455,5005.123456,53.312218,20130411174152.000,0,0,0.000000,0.000000OK
		strcpy(receive, caReceiveBuf);
		return 1;
	}

So via debugPrintStr I saw it was still ok AND the length was also ok and checked.
The call to decodeGPS(caReceiveBuf) that came DIRECTLY after returning from getGpsData however did behave strange:
The function starts with:

uint8_t decodeGPS(char *caReceiveBuf)
{
	// returns following values:
	// 0 no valid GPS data
	// 1 valid GPS data
	//
	// we are getting everything out of CGPSINF=0 data (these are 9 fields)
	// 0   Message ID         0              header
	// 1 * Longitude          123.123456          123.123456
	// 2 * Latitude           5105.123456         1234.123456
	// 3   Altitude           1234.123456         1234.123456
	// 4 * UTC Time           20130313193055.000  yyyyMMddHHmmSS.sss
	// 5   TTFF               23                  xxxx
	// 6 * Number of sats     12                  xx
	// 7 * Speed              123.123456          yyy.xxxxxx
	// 8   Course             320.123456          yyy.xxxxxx
	// looks ok:  
	//            0,340.123455,5005.123456,53.312218,20130411174152.000,0,0,0.000000,0.000000

	int iVeld = 0;
	char *pch=NULL;
	char *pch2=NULL;
	uint16_t len = 0;
	char *p;

	len = strlen(caReceiveBuf);
	if (len < MINLENGTHGPSINFO)
	{
		debugPrintStrln("gpsdecodeminlen");
		return 0;
	}

Via my terminal window I could see that length was not OK??? Why???? It was OK just before.
No other calls in between these 2 functions that could corrupt the character array!
The terminal showed "˜gpsdecodeminle' without the last "˜n' and then reset the cpu.
The problem however is: how can it be that the length is now not OK?/
What happened with the variable in the meantime?
I only suspect that it has to do with an timer interrupt of 100ms which causes to stack to get corrupted"¦
That interrupt however does nothing special. Just some LED indications and generating some pulses.

Can PLEASE somebody explain this to me why this happened?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Korstiaan

Although I have no concrete idea of what is happening, the problem "smells" of stack growing downwards and crashing into any static data that you might have.

If you have code that runs every 5 seconds, might I propose that you print out the contents of the stack pointer at each call time? As the variable "len" is on the stack, printing out its address would be close enough.

As a specific point, though, I am puzzled by this code fragment.

uint8_t getGpsInfo(char *receive)
{
    char caReceiveBuf[STRINGBUFFERSIZE]="";
…
// complete cleaned-out string is in array
   if ((pos > 0) && (pos < STRINGBUFFERSIZE-1))
   {
      caReceiveBuf[pos] = '\0';
      // print the received string for debugging purposes
 

What do you expect this to do?

/A

If we are not supposed to eat animals, why are they made out of meat?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

Thanks for looking and helping!

Sorry, I left out the first part because I didn't had anything to do with it.
In fact the complete code of this function is:

uint8_t getGpsInfo(char *receive)
{
	char caSendBuf[STRINGBUFFERSIZE]="";
	char caReceiveBuf[STRINGBUFFERSIZE]="";
	
	uint16_t timeout = RESPONSETIMEOUT;
	uint16_t x=0;
	uint8_t nrOfCRLF=0;
	uint16_t len = 0;
	uint8_t pos =0;
	char *p = NULL;
	char leeschar;
	uint8_t retry = 0;

	// clear whatever is waiting before sending a command
	while (uartGsmCharWaiting())
	{
		uartGsmGetChar();
	}
	// send command and wait to see that characters are coming
	while (retry < 3)
	{
		// send command to SIM908 module
		strcpy_P(caSendBuf, caATCGPSINF0);
		uartGsmPutString(caSendBuf);
		// output for debugging purposes
		debugPutc('*');
		debugPrintStr(caSendBuf);
		debugPutc('*');
		for(x=0; x 0)
			{
				retry = 99;
				break;
			}
			_delay_ms(5);
		}
		retry++;
	}
	if (retry == 3)
	{ // no response after x-retries -> timeout, go back
		debugPrintStrln("*TO*");
		return 0;
	}
	retry = 0;
	// there is data waiting, read complete
	while (retry < DELAYWHILERECEIVING)
	{ // time from start of receiving to get to the end...
		while (uartGsmCharWaiting() > 0)
		{ // there is data waiting, read complete
			leeschar = uartGsmGetChar();
			if (leeschar == 0x0A)
			{
				nrOfCRLF++;
			}
			else if ((pos < STRINGBUFFERSIZE-2) && (leeschar != 0x0D) && (leeschar != 0x0A) && (leeschar != ' ') && (leeschar != '/') && (leeschar != ':'))
			{
				caReceiveBuf[pos] = leeschar;
				pos++;
			}
		}
		_delay_ms(1);
		retry++;
	}
	// GPS = 3x CRLF
	if (nrOfCRLF != 3)
	{
		return 0;
	}
	// complete cleaned-out string is in array
	if ((pos > 0) && (pos < STRINGBUFFERSIZE-1))
	{
		caReceiveBuf[pos] = '\0';
		// print the received string for debugging purposes
		debugPutc('>');
		debugPrintStr(caReceiveBuf);
		debugPrintStrln("<");
		p = strstr_P(caReceiveBuf, caOK);
		if (p == NULL)
		{ // OK not found
			debugPrintStrln("gpslnok");
			return 0;
		}
		else
		{ // filter-out OK
			*p = '\0';
		}
		len = strlen(caReceiveBuf);
		// surely no GPS INFO if length < MINLENGTHGPSINFO
		if (len < MINLENGTHGPSINFO)
		{
			debugPrintStrln("gpslen");
			return 0;
		}
		// could be GPS INFO -> check further
		// extra checks to make sure
		if ((caReceiveBuf[1] != ',') || (caReceiveBuf[12] != ',') || (caReceiveBuf[24] != ','))
		{
			debugPrintStrln("gpscomma");
			return 0;
		}
		// looks ok:  01234567890123456789012345678901234567890123456789012345678901234567890123456
		//            0,350.123905,5115.123651,53.312218,20130411174152.000,0,0,0.000000,0.000000OK
		strcpy(receive, caReceiveBuf);
		return 1;
	}
	else
	{
		return 0;
	}
}

It gets called every 5 seconds and talks to the gsm/gps module to get the gps string.
It sends the request, and then waits for the answer.
Thereafter I do some checks (min.length, comma's) to make sure that the string is OK before to go to decoding the string.

I have 16KB of SRAM and I only use less than 1024 bytes but indeed, it seems like the stack is corrupt. Why? Is there a way to set the stack size?
Or is it because it was interrupted at a bad moment and the registers, ... were not good saved/restored.
The last time the crash occurred after 3hours...

Korstiaan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK. Then I refer back to my first suggestion.

Write out the address of (say) "len" at every call. As this is on the stack, we will see if your stack is "creeping" downwards.

As for setting stack size, not to my knowledge. I expect that the stack pointer will start at top of RAM and work downwards. I expect that your static variables will start at the bottom of the stack and work upwards. I am guessing that you are not using a heap.

Maybe consider what the address is of your final static variable.

To be honest, though, I am as much in the dark as you are at this stage.

/A

If we are not supposed to eat animals, why are they made out of meat?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Write out the address of (say) "len" at every call. As this is on the stack, we will see if your stack is "creeping" downwards.

Ok. I will try it this evening to get a picture wat happens with the stack...

Korstiaan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Depending on how many units you have (allowing parallel testing), I would also be inclined to make the large (?) character buffers static.

I accept that it SHOULD make no difference, but something in me does not like having large arrays like that on the stack.

Good luck :)

/A

If we are not supposed to eat animals, why are they made out of meat?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

I did some testing and added the following code in the getGpsInfo code:

debugPrintIntln((uint16_t)&len);
debugPrintIntln(StackCount());

the first one Always gives 16186 BUT the second one suddenly, after some minutes, jumped from 13089 to 138 !!!!
Why???
What made my stackpointer corrupt?

HELP.

I used https://www.avrfreaks.net/index.php?name=PNphpBB2&file=printview&t=52249&start=0 for the StackCount funtion
Korstiaan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
... the second one suddenly, after some minutes, jumped from 13089 to 138 !!!!
Why???
What made my stackpointer corrupt?
This doesn't necessarily imply that the stack pointer is corrupt.

The first figure of 13089 implies that there were that many bytes between the last of your variables in bss, and the bottom of the stack. The second figure implies that SRAM was modified, 138 bytes past the end of bss. It doesn't necessarily mean that the stack was responsible for the change, that the stack pointer is corrupted, nor that the stack has grown by almost 13,000 bytes. It doesn't even imply that any other bytes in SRAM have been 'unpainted'.

It is more likely the result of an undefined pointer.

I suggest adding a modified version of the StackCount function:

uint16_t StackCountAll(void)
{
    const uint8_t *p = &_end;
    uint16_t       c = 0;

    while(p <= &__stack)
    {
        p++;
        if (*p == STACK_CANARY)
          c++;
    }

    return c;
}

This version will report the count of all bytes of SRAM that 'appear' untouched since StackPaint() was invoked in init1. I say 'appear' because there is no way to determine if a byte of SRAM is legitimately meant to contain the value of STACK_CANARY.

Now change your debug code to:

debugPrintIntln((uint16_t)&len);
debugPrintIntln(StackCount()); 
debugPrintIntln(StackCountAll());

And report back here.

Quote:
HELP.
Don't shout.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

Quote:
Don't shout

Sorry...

I added the function.
I also found out that it always happens in the same place...(for the few times I tested)
It always happen between gpsinfo3(b) and gpsinfo4.

This is part of code that I suspect of faulty:

// send command and wait to see that characters are coming
debugPrintStr("gpsinfo3-");
debugPrintInt(StackCount());
debugPrintStr("-");
debugPrintIntln(StackCountAll());
while (retry < 3)
{
    // send command to SIM908 module
    strcpy_P(caSendBuf, caATCGPSINF0);
    uartGsmPutString(caSendBuf);
    // output for debugging purposes
    debugPutc('*');
    debugPrintStr(caSendBuf);
    debugPutc('*');
    for(x=0; x 0)
         {
             debugPrintStr("gpsinfo3b-");
             debugPrintInt(StackCount());
             debugPrintStr("-");
             debugPrintIntln(StackCountAll());
             retry = 99;
             break;
         }
     _delay_ms(5);
     }
retry++;
}
debugPrintStr("gpsinfo4-");
debugPrintInt(StackCount());
debugPrintStr("-");
debugPrintIntln(StackCountAll());

The output generated at that moment it went wrong:

Quote:

gpsinfo1-13820
gpsinfo2-13820
gpsinfo3-13820-13832
*AT+CGPSINF=0
*gpsinfo3b-13820-13832
gpsinfo4-2710-13830

So it means that the problem begins as soon that the characters arrive, via the UART, in the ringbuffer.
I'll be honest, I copied some code from an AVR app.note (don't remember which one) for this part... I don't really understand it 100%. So maybe that causes a wrong pointer/corruption, ...?

The code for the UART receiving characters is in attachment.

Hopefully somebody sees something that's wrong...

Korstiaan

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
Quote:
Don't shout
Sorry...
I was just kidding (forgot the smiley).
Quote:
The output generated at that moment it went wrong:
Quote:

gpsinfo1-13820
gpsinfo2-13820
gpsinfo3-13820-13832
*AT+CGPSINF=0
*gpsinfo3b-13820-13832
gpsinfo4-2710-13830
Notice the difference between the output of StackCountAll() in the last two debug output lines is only 2 bytes. This suggests that a pointer to a 16-bit type is being dereferenced for a single write. Or, a pointer to an 8-bit type is being dereferenced for two adjacent writes.

The output of StackCount() in the last debug output line is 2710. Is this always the case?

Try modifying the original code for StackCount():

uint16_t StackCount(uint8_t **ptr)
{
    const uint8_t *p = &_end;
    uint16_t       c = 0;

    while(*p == STACK_CANARY && p <= &__stack)
    {
        p++;
        c++;
    }

    *ptr = p;
    
    return c;
}

Add a new global:

uint8_t **rogue;

Then change your calls:

debugPrintIntln((uint16_t)&len);
debugPrintIntln(StackCount(rogue));
debugPrintIntln(*((*rogue)+0));
debugPrintIntln(*((*rogue)+1));
debugPrintIntln(StackCountAll()); 

NOTE: I may have messed up the pointer-to-a-pointer syntax (I'm hungry). If I got it wrong, another freak will surely point it out.

The intent is to print out the value of the two bytes that are being changed in the painted part of SRAM. It might reveal something.

Also note that you only need to use:

debugPrintIntln(*((*rogue)+0));
debugPrintIntln(*((*rogue)+1));

... in your last section beginning with:

debugPrintStr("gpsinfo4-");

Quote:
I'll be honest, I copied some code from an AVR app.note (don't remember which one) for this part... I don't really understand it 100%. So maybe that causes a wrong pointer/corruption, ...?
That's fairly important information that should have been included with your original post.

I don't have time to review it for errors right now. Perhaps someone will beat me to it.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I will look into it tonight.

Quote:
The output of StackCount() in the last debug output line is 2710. Is this always the case?

No. It is a random number.

Korstiaan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

IMHO, the fact that the address of "len" remains constant means that stack "creep" is not an issue, then. (I am not familiar with the StackCount() function.)

My next approach would be two-fold.

1/ Make the large (?) buffers static. (I know this should make no difference, but I am just suspicious.)

2/ Start to "binary chop" sections of your code out, to see if we can narrow down on what area of your code is causing the crash.

Laborious, I know. What would we give for an ICE, eh?

/Andy


1/ "Binary Chop" was probably not the correct phrase. Commenting out most of the code in the potentally offending function. Seeing if the code runs "for ever". Adding code back piecemeal to see what breaks it.

2/ Do you also want to let us have a look at your ISR Code too?

If we are not supposed to eat animals, why are they made out of meat?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

Quote:
Make the large (?) buffers static

I already changed the large receive buffer to a pointer I receive via an argument in the getGpsInfo function. (char caReceiveBuf[STRINGBUFFERSIZE]=""; is gone)

Quote:
Start to "binary chop" sections of your code out

As I mentioned before. If it occurs (after some minutes now (not hours), I changed the Timer1 from 100ms to 10ms and now it occurs faster) The change of stack always occurs at the same place, between 3b and 4. (= when the receiving of characters begins -> filling the ringbuffer)
So it must be something with the ISR of the UART receive and/or combination with my Timer1 10(0)ms timer.
In some point of time the one influences the other I think.

I don' know.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@joeymorin:

Sorry, I tried the code but I get trash output...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
@joeymorin:

Sorry, I tried the code but I get trash output...

Post it anyway.

And post all of you ISR code, as well as the declarations of any volatiles/globals they use.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I reduced the code to:

	while (1)
	{
		getGpsData(caReceiveBuf);
	}

And it still happened after a while.
Then I disabled my Timer1 (=another interrupt every 10ms) and for now it seems to stay stable)
So maybe I'm on the wrong track and is the fault inside my Timer ISR (see attachment, the other ISR is already posted before) ?

I don't do anything with it for the moment, just start it. (Just before the while loop I do initTimer1()

Still waiting to see if it stays stable (The Stacksize's)

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Update:

It seems like I don't get the stack reduce when I:

- Don't use Timer1 interrupt
- Use it but with very little code inside

Or maybe it is still a problem but after a very long time. To soon to know. The fact is that when I leave the complete code inside the problem comes faster.

Problem when the UART interrupt has to wait too long because the Timer interrupt is still busy?

Korstiaan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What about:

joeymorin wrote:
And post ... the declarations of any volatiles/globals they use.
... ?

Let's see the declarations for these:

DELAYBEFORECHANGE
gpsled
statusled
bContactOnChanged
bContactOn
b5secPulse
bTrackingPulse
bTrackingRefresh
bUpdateWhileSleepPulse
lSleepRefresh

... and let's see all the code that references them.

	OCR1A = 625;             // 6250 compare match register 16MHz/256/->10Hz

TOP is zero-based, so this should be:

	OCR1A = 624;

What other ISR code do you have? That includes any ISR code used by libraries.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
//Global variables used in interrupt routine
volatile uint8_t bContactOn;
volatile uint8_t bContactOnChanged = 1;
volatile uint8_t b5secPulse;
volatile uint8_t bTrackingPulse;
volatile uint8_t bUpdateWhileSleepPulse;
volatile uint8_t gpsled;     // 0=gps off, 1=no fix, 2=fix
volatile uint8_t statusled;  // 0=status off, 1=no communication to server, 2=communication to server
//Variables used inside the interrupt
	volatile uint8_t bStatusIndicator         = 0;
	volatile uint8_t bContactOffDelay         = 0;
	volatile uint8_t bContactOnDelay          = 0;
	volatile uint8_t bMsec5SecPulse           = 0;
	volatile uint16_t iMsecTrackingUpdPulse   = 0;
	volatile uint32_t lMsecUpdWhileSleepPulse = 0;
	volatile uint16_t iTrackingRefresh        = 0;
	volatile uint8_t bContactPin				= 0;

Quote:
and let's see all the code that references them

Doesn't matter anymore. In the meantime I've put them in comment and only used the first part (setting the LED's) and it still occured with all the other code commented out and some extra %32 to generate some extra time.
Now (since 6 minutes, to soon to tell) I'm testing without the %32 because that seems to be a time consuming part...
It is now:

ISR(TIMER1_COMPA_vect)
{
	// GPS led
	if(gpsled == 1) // gps led on
	{
		PORTD |= (1<<PIND6);
	}
	else if(gpsled == 2) // fast blinking of gps led
	{
		PIND = (1<<PIND6); // toggle output !
	}
	else // gps led off
	{
		PORTD &= ~(1<<PIND6);
	}
	// status LED
	if(statusled == 1) // slow blinking of statusled
	{
		bStatusIndicator++;
		if (bStatusIndicator > 128)
		{
			PORTD &= ~(1<<PIND7);
		}
		else
		{
			PORTD |= (1<<PIND7);
		}
	}
	else if(statusled == 2) // fast blinking of statusled
	{
		PIND = (1<<PIND7); // toggle output !
	}
	else // statusled on
	{
		PORTD |= (1<<PIND7);
	}
}

It is really not the code itself I think (or it must be the %32) it is the time that the interrupt spends on it.

Korstiaan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
Problem when the UART interrupt has to wait too long because the Timer interrupt is still busy?
Possibly, but your timer ISR isn't that fat.

What baud rate are you using? Unless you're using a very high baud rate, I would think it unlikely that your USART interrupt is being postponed enough to drop an incomming byte. Remember that the USART receive hardware is double buffered, plus there's effectively a third level of buffering with the receiver's shift register. That's effectively 30 bits of buffering when you count start/stop bits. Even at 115200 baud, that's 260 uS. At 16 MHz, that's 4,166 cycles. I seriously doubt your timer ISR compiles to more than about 500 cycles, probably much less.

In any case, none of that explains the problem you are trying to track down. At worst, a dropped USART byte would result in an erroneous string from your GPS, not SRAM corruption.

Mind you, I think that just about everything you have in your timer ISR shouldn't be there. You should use the timer ISR just to track time. All of the other stuff (flashing LEDs, etc.) should be done in the main thread. You started this project on Arduino? Remember millis() and micros()? You should implement one or both of those, and check the time in the main loop in order to schedule events like LED flashing etc. At most, create handler functions that do the work for each of these tasks. In this way you ensure the slimmest ISR possible:

#include 

#define TIMER_RESOLUTION_MILLISECONDS 100

volatile uint32_t milliseconds = 0;

void initTimer1()
{
	ATOMIC_BLOCK(ATOMIC_FORCEON) {
	  TCCR1A = 0;
	  TCCR1B = 0;
	  TCNT1  = 0;
	  OCR1A = (F_CPU / (256 * TIMER_RESOLUTION_MILLISECONDS)) - 1;
	  TCCR1B |= (1 << WGM12);   // CTC mode
	  TCCR1B |= (1 << CS12);    // 256 prescaler
	  TIMSK1 |= (1 << OCIE1A);  // enable timer compare interrupt
  }
}

uint32_t millis() {
  uint32_t local_milliseconds;
  ATOMIC_BLOCK(ATOMIC_RESTORESTATE) {
    local_milliseconds = milliseconds;
  }
  return local_milliseconds;
}

ISR(TIMER1_COMPA_vect){
  milliseconds += TIMER_RESOLUTION_MILLISECONDS;
}

#define GPS_LED_FAST_BLINK_INTERVAL 200

void GpsLedHandler() {
	// GPS led
	if (gpsled == 1) // gps led on
	{
		PORTD |= (1<<PIND6);
	}
	else if (gpsled == 2) // fast blinking of gps led
	{
		if ((millis() % GPS_LED_FAST_BLINK_INTERVAL) < (GPS_LED_FAST_BLINK_INTERVAL / 2))
		{
  		PORTD |= (1<<PIND6);
  	}
  	else
  	{
  		PORTD &= ~(1<<PIND6);
  	}
	}
	else // gps led off
	{
		PORTD &= ~(1<<PIND6);
	}
}

... for example. Note that GpsLedHandler() should be called as part of the main thread loop. It is not optimal code, but illustrates the point of how to handle this sort of thing outside of an ISR.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
It is really not the code itself I think (or it must be the %32) it is the time that the interrupt spends on it.
Modulo 32 is a very fast operation, it should compile as the equivalent of:
bStatusIndicator & 0b00011111

Looking at your OP, it looks like you posted incomplete code for decodeGPS(). It declares a number of pointers, but we don't get to see how they rest of the code uses them.

Whatever the mechanism, the fact that only two bytes appear to be getting written to in the middle of unallocated SRAM strongly suggests a pointer problem. How your timer ISR is involved is as yet unknown, if in fact it is. It's presence may simply expose another flaw.

Next step here would be to post the full code for decodeGPS(), and the full code for any other ISRs you use (including library code).

But before that, let's take a look at the assemler output for your timer ISR. Compile with -g, then run avr-objdump -S, look at the output with a text editor and find the assembler code for the timer ISR and post it here. That way we should be able to put to rest the notion that the ISR is somehow causing the problem.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

Thanks for the tips. I agree!
In the meantime, after quite some time, the stack suddenly was again gone.
To be honest, I don't know what to think.
I'm tired...
I'm looking for weeks now and find nothing...
I really start thinking about a libc/gcc/compiler bug.
2 interrupts just don't work...

I'll sleep over it (it's midnight now here) and decide what to do...(stop completely, throw everything away before my wife throws me away)

Good night.

Thanks again!

Korstiaan

PS: I let it run overnight, without Timer1 and see tomorrow morning before going to work...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

I changed decodeGPS already to return 1 (no code inside).

Quote:
Note that GpsLedHandler() should be called as part of the main thread loop

That would be very difficult to have steady blinking. Main loop is full with loops, delays, ... Leds would not react steady at the same blinking rate. Some functions are in loops with timeouts up to 20sec.
And also: if it works, I will never now why it failed before and will it be gone forever?
I want to know why: faulty code, faulty library, ...

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
That would be very difficult to have steady blinking. Main loop is full with loops, delays, ... Leds would not react steady at the same blinking rate. Some functions are in loops with timeouts up to 20sec.
Those functions should also be handled asynchronously, instead of spinning in a delay loop. Your main loop should run as fast as possible. Usually it's possible to get it to run dozens or hundreds of times per second or faster. It can look something like:
int main(void) {
  while(1) {
    ledHandler();
    buttonHandler();
    gpsCommandHandler();
    gpsResponseHandler();
    etcHandler();
    .
    .
  }
}

Each handler checks the time with millis(); and compares with local time variables, taking action as appropriate. Usually each handler is implemented as a finite state machine. This let's it perform one step at a time, then return. It's a kind of poor-man's co-operative multitasking.

Often I will dump the time into a global once per loop:

uint32_t now;
int main(void) {
  while(1) {
    now = millis();
    ledHandler();
    buttonHandler();
    gpsCommandHandler();
    gpsResponseHandler();
    etcHandler();
    .
    .
  }
}

... and then each handler can check the time using the global. This avoids disabling interrupts more often then is necessary. The downside is you may not always know the exact time, but depending on how long each handler lingers in any given step or state, and what your timing requirements are, this can be acceptable.

Quote:
And also: if it works, I will never now why it failed before and will it be gone forever?
I want to know why: faulty code, faulty library, ...
As I said, it won't solve the problem. It is nevertheless potentially usefull advice for this and future projects.

I've seen no evidence to suggest that the timer ISR is at fault. Its presence may be revealing a flaw elsewhere in your code that is at fault. I still suspect the problem is with a broken pointer in code you haven't posted.

What about:

joeymorin wrote:
Next step here would be to post the full code for decodeGPS(), and the full code for any other ISRs you use (including library code).

But before that, let's take a look at the assemler output for your timer ISR. Compile with -g, then run avr-objdump -S, look at the output with a text editor and find the assembler code for the timer ISR and post it here. That way we should be able to put to rest the notion that the ISR is somehow causing the problem.

If you don't want to post more code, may I suggest that your debugging efforts start with looking at all of your pointers. The function decodeGPS() has three local pointers, but we haven't seen the code that handles them.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Update:

The free stack space is as follows:

Several minutes 15019-15034
then for several minutes 15015-15020
then minutes 15012-15017
and then for example 1612-15016

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
************************************************************************************
15019-15034 <--- Bottom of stack has gotten as low as 15020 bytes beyond end of bss.
                 In addition, at moment of capture there were 15 bytes on the
                 current stack (and old stack) that happened to match STACK_CANARY
15015-15020 <--- Maximum stack size since reset has increased by 4 bytes (normal)
15012-15017 <--- Another 3 bytes (normal)
 1612-15016 <--- A single byte located 1613 bytes past end of bss has been changed
                 The change from 15017 to 15016 suggests that the stack is otherwise
                 completely unaffected.

This still strongly suggests an undefined or corrupted pointer.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

But where?
There is almost nothing left.

You're sure that it is not a compiler/library problem?
Is it my problem?

Can it be because the JTAG port is enabled? Or I2C? Even if I don't use them during the loop?
Or can such things happen because of electrical interference picked up via power or uart lines?

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
But where?
There is almost nothing left.
There is decodeGPS(), and every other function you haven't posted that contains pointers.
Quote:
You're sure that it is not a compiler/library problem?
No, I'm not sure. But I'd lay the odds against a compiler bug at 1000000:1. You can count on the fingers of one hand the number of code generation bugs that have been identified in the history of these forums. As for library code, that is much more likely. The likelihood drops to near zero when you use known, tested, and trusted libraries. Are you?
Quote:
Can it be because the JTAG port is enabled?
JTAG can interfere with normal port operation, but shouldn't have any impact on registers or SRAM. But for giggles you can try running a test with JTAG disabled and your programmer disconnected. Crank up your timer ISR to speed up the test. I'd bet you'll still see the problem.
Quote:
Or I2C?
Can only affect TWI registers in I/O, unless you're using TWI and the code for it is the source of the pointer problem.
Quote:
Even if I don't use them during the loop?
Do you use TWI? Are you using library code, or your own?
Quote:
Or can such things happen because of electrical interference picked up via power or uart lines?
Of course, if you don't take adequate precautions. However, the behaviour is too consistent. Are you near sources of switching high power/current/voltage? Probably not. And, if the problem was due to external switching noise, then why does it go away when you disable your timer ISR?

Keep looking.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi joeymorin:

-> decodeGPS() has no code inside for the moment. It directly returns 1.
-> last night it 'crashed' even without using Timer1.

Are we sure that the StackCount functions working ok?
When I say crash this means sudden change of StackCount.

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
Are we sure that the StackCount functions working ok?
When I say crash this means sudden change of StackCount.
Should be OK. All it does is count the number of bytes between bss and the first 'unpainted' byte. That's why I suggested StackCountAll to count right up to the end of SRAM. In your other reports it was clear that only 1 or 2 bytes in SRAM were being changed in this way, because while the number returned by StackCount changed drastically (indicating the location of the changed byte), the number returned by StackCountAll only changed by 1 or 2 (indicating the number of 'unpainted' bytes since the last report).

What happens right after StackCount sees a sudden change? Does your program continue to run? Or does it reset or otherwise get wedged?

Why don't you run a new test:

int main(void) {
  while (1) {
    debugPrintInt(StackCount());
    debugPrintStr("-");
    debugPrintIntln(StackCountAll());
  }
}

... with nothing else except StackPaint in .init1 and your serial debug code.

If the problem persists, then that leaves:

    - the stackcount code - the debug code
    - the serial code
    - compiler bug
    - hardware issue
And if it does persist, you can eliminate the debug code and serial code with a new test:
int main(void {
  while (1) {
    if (StackCount() < 14000) {
      // light an LED
      while (1);
    }
  }
}

Curiouser and curiouser...

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

Good idea's!
Will try this evening. (I'm from Belgium, GMT+1. And you??)

PS: Most of the time code keeps running. Sometimes reset (very rare). But because I use PROGMEM variables the bad effect is better. When I use SRAM variables my program gets stuck because then it sends garbage strings to my GSM module.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
Hi,

Good idea's!
Will try this evening. (I'm from Belgium, GMT+1. And you??)

Canada, GMT-5 (-4 this time of year)

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just an extra question (sorry):

Can anybody explain me why I see 17 out 61 const char PROGMEM variables described inside the RAM? Why these 17? I can't see something special about them compared to the others.

I used avr-nm SkorTrackGL.elf -n
Extract of the listing:
The PROGMEM variables are the ones starting with ca...

...
0000003f a __SREG__
0000003f a __SREG__
0000008c T caCMGD
00000099 T caCIPSPRT1
000000a7 T caCIPSHUT
000000b3 T caATCGATT1
000000bf T caATCGPSINF0
000000cd T caATCGPSSTATUS
000000dd T caATCGPSOUT0
000000eb T caATCGPSRST1
000000f9 T caATCGPSPWR1
00000107 T caATCNMI0
0000011a T caATCMGF1
00000125 T caATCSCLK0
00000131 T caATE0
00000137 T caAT
0000013b T caSHUTOK
00000142 T caFIX
00000147 T caOK
0000014a t __c.2499
0000015d t __c.2501
...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

PROGMEM variables described inside the RAM?

What do you mean "RAM"?

The avr-nm output shows " T ". That is the Text section - they are in flash not RAM.

If you define 61 but only see 17 listed in the nm output then are you using -fdata-sections and -gc-sections by any chance? Anything unreferenced will be removed in the link which could be why only 17 remain in the image.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

Sorry but I'm new to Atmel Studio 6.1 and, because of a problem, I read http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_ramoverlap
So I did that and there I saw these 17 PROGMEM variables. And you're correct, for the moment I only use these 17. The others are unreferenced.

So everything looks normal I suppose...

Sorry.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

So everything looks normal I suppose...

Yup. In AS6.1 in the Project properties there are sections for both AVR/GNU Compiler and AVR/GNU Linker. In each there is an "Optimization" section. For the compiler you'll see that Atmel have -ffunction-sections and -fdata-sections ticked by default. This causes each function and each global variable to be built into a separately named section. The linker options have -gc-sections ticked. That is "garbage collect sections". What this does is keep track of whenever a function or a variable object are referenced. At the end of the link, if any function or any data section still has a 0 reference count then the linker realises that nothing has actually made use of that function or variable and therefore it's simply dropped from the .elf file that is generated.

It helps to ensure that the flash image isn't stuffed full of loads of stuff that are never being used but if you want everything in there then switch off either the -ffunctions-section/-fdata-sections options or the overall -gc-sections option for the linker. Then everything in the source will end up in the image whether it's used or not.

If you want to over-ride this behaviour for any single variable or function you can use __attribute__((used)) which tells the system to treat it as if something has made a reference even though nothing did.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you for the explanation !

And this evening I can continue to search my problem...
I thought I found something.

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

I found a way to reproduce it quickly!
If I leave the reading loop before the GSM-module really is finished sending data, the program crashes very quick.
This means I continue with a strstr (saw a crash on that line) and snprintf functions (for the debug output) while in the meantime characters come in (= a lot of interrupts)

Is this a breakthrough?

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
Is this a breakthrough?
Possibly. It suggests a problem with the code (including interrupt code) that handles USART0 receive. Probably the buffer pointer is being updated without ensuring atomicity. I'll have a look at your Serial.c again.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What is BUFFER_SIZE defined as?

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
I found a way to reproduce it quickly!
Before digging any further, try another test. Simply enable serial comms to the GPS and for debugging. Main loop should simply issue a command string to the GPS that will elicit a response string, only don't ever read it. Instead print debug information:
int main(void) {
  >>>enable serial to GPS and for debugging<<<
  while (1) {
    >>>issue command to GPS<<<
    debugPrintInt(StackCount());
    debugPrintStr("-");
    debugPrintIntln(StackCountAll());
  }
}

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I see a few problems with your USART code. The UDRE interrupt calls ring_buffer_put() without checking if the buffer is full. Also uartGsmGetChar() doesn't check if the buffer is empty first by calling uartGsmCharWaiting(). While I imagine you are doing the latter in your own code, it would be more appropriate if it was done in uartGsmGetChar().

Neither of these should result in an incorrect pointer, however. At first glance it looks like the ring buffer is otherwise correctly implemented. The pointer element to the buffer itself is never modified, rather an offset element is maintained and used as an index when dereferencing the buffer pointer as an array.

Perhaps someone else will see a problem that I can't.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi JJ,

BUFFER_SIZE = 100
Should be a ring buffer...

I will do what you asked.

I also changed, yesterday the serial with another sending method... (see attachment)
It is all based on the Atmel Studio 6.1 examples for 1284p. File -> new Example project.
I just changed all these inline functions to not inline, and removed structs, ... for readability to simplify debugging.

PS: I changed in debugPrintInt the snprintf to iota and it now takes longer time to crash...

K

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Before digging any further, try another test

Just sending and printing debug output:

Also crash:

fixAT1-15569
fixAT2-15569
AT1-15569
*AT+CGPSINF=0*15569-15674
fixAT1-15569
fixAT2-15569
AT1-15569
*AT+CGPSINF=0*7776-15672
fixAT1-7776
fixAT2-7776
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
Quote:
Before digging any further, try another test
Just sending and printing debug output:

Also crash:

fixAT1-15569
fixAT2-15569
AT1-15569
*AT+CGPSINF=0*15569-15674
fixAT1-15569
fixAT2-15569
AT1-15569
*AT+CGPSINF=0*7776-15672
fixAT1-7776
fixAT2-7776

Did it crash almost immediately? That would suggest a flaw in the serial receive code.

I'm assuming that you have stripped your test code of ALL other code and libraries. Make it absolutely as slim and minimal as possible to reproduce the problem and narrow down the possibilies. Remove all #includes that aren't needed for the test, like #include "SkorTrackGL.h". If you need macros from an included file, instead copy only those macros directly into your test code so as not to polute the test with other items from those includes.

If you have already done this and are still experiencing crashes, especially if the above test results were from a test that crashed almost immediately, this strongly implicates the serial receive code.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry,

Previous test was via 2 functions and some strcpy_P calls.
Now I'm really just sending :

Until now, OK...(to soon...)

	while (1)
	{
		uartGsmPutString("AT+CGPSINF=0\r");
		debugPrintInt(StackCount());
		debugPrintStr("-");
		debugPrintIntln(StackCountAll());
	}
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Update:

Just crashed.
I had to add _delay_ms(100) otherwise the module does not answer. Only every 2sec. with "ERROR" therefore I added the delay and now long strings come in.

(Monitored the serial rx incoming line not via the program but via hardware...)

I now deleted all the other code like you asked... i2cinit()...

And also crash.

And now?
_delay_ms safe?
itoa safe?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
_delay_ms safe?
Absolutely
Quote:
itoa safe?
The way you are using it, yes. Your string buffer is 25 bytes long, but the longest string it can generate is 7 bytes ("-32768\0"). By the way, you should use utoa(), since you're passing an unsigned int.

Try changing initUartGsm() to not enable the receive interrupt:

	UCSR0B = (1 << RXEN0) | (1 << TXEN0);

... and run the minimal test again.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok.

In the mean time I changed everything so that there are no ring buffers and still crash.

ISR(USART0_RX_vect)
{
	in_buffer[0] = UDR0; //in_buffer[write_offset_in] = UDR0;

}

and sending just:

void uartGsmPutChar(uint8_t data)
{	
	loop_until_bit_is_set(UCSR0A, UDRE0); // wait for transmit buffer to be empty
	UDR0 = data;
}

void uartGsmPutString(char *string)
{
		while (*string)
		{
			uartGsmPutChar(*string);
			string++;
		}
}
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
In the mean time I changed everything so that there are no ring buffers and still crash.
It's time to post the output of avr-objdump -S. Make sure you compile with -g

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Also without interrupt sudden change:

Quote:
16248-16247
16237-16246

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I hope I did it right.
I never did that...
I saw in the first version still some code from Timer1, even I don't call it and the interrupt is not enabled.
However I then put everything in remark...

In the mean time testing again...

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In the meantime I'm looking at the hardware now.
Changed the power supply and added another 100nF capacitor...

No crash yet (even with interrupt)... wait and see...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Update:

Changing the power supply (with a good stable non-switching power supply) didn't helped.
It seems to take longer, but suddenly still stack lost.

Only hardware problem left: air wires from breadboard?
I don't really believe it that this causes stack corruption.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I can find no problem with the assembler output. Looks just fine to me. You should quite simply not be seeing these issues.

Beginning to suspect a hardware issue related to power.

I'm amazed, but reviewing the thread it looks like no-one (even me!) asked about your circuit, whether you have bypass caps on the mcu, what your power supply is...

Ah, while I was reviewing the objdump you made another post:

Korstiaan wrote:
In the meantime I'm looking at the hardware now.
Changed the power supply and added another 100nF capacitor...

No crash yet (even with interrupt)... wait and see...

... [sigh] ...

This might have been a much shorter thread. I know you'd asked whether electrical interference could cause such things. My answer was yes, but that the behaviour seemed too consistent to be explained by EMI. EMI can cause SRAM corruption, register corruption, and all manner of hardware failures. I would have expected to see the mcu simply lock up altogether on some tests, without corroboration from StackCount().

With the new information w.r.t. failures even with your completely stripped down code, plus the results with a new PS, that seems to have been the wrong conclusion.

So... mcu bypass cap? Power supply? Please describe your circuit in detail. Post a schematic if you have it. Even if adding a 100 uF cap to a new power supply seems to fix the issue, you'll need to be very certain the circuit design is robust, especially since this will be deployed in the noisy environment of a vehicle.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Beginning to suspect a hardware issue related to power.

See answer just before you're last one.
It was a switching power supply, but after eliminating it still occurs but it seems to take longer...(could be coincidence)
I also removed the air wires for the jtag.

Tomorrow or beginning next week I will receive the prototype PCB.
I will do further testing on that.

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
It was a switching power supply, but after eliminating it still occurs but it seems to take longer...(could be coincidence)
Yes, but:
Quote:
Even if adding a 100 uF cap to a new power supply seems to fix the issue, you'll need to be very certain the circuit design is robust, especially since this will be deployed in the noisy environment of a vehicle.
... so:
Quote:
mcu bypass cap? Power supply? Please describe your circuit in detail. Post a schematic if you have it.
JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok,

I added a cap.
Forget the VR1 on the schematic
I now use a laboratories power supply.

K

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
I added a cap.
Where?
Quote:
Forget the VR1 on the schematic
Are you not using it yet? Is your 5V coming from:
Quote:
laboratories power supply.
... ?

As for VR1, note that the datasheet states:

Quote:
For input voltage higher than 32 VDC an input capcitor 22 μF / 50 V is required.
I assume your input voltage will actually be between 10 and 14.5 volts if you're running of the vehicle's electrical system. I only mention it because your schematic specifies an input voltage of 8-36 VDC.

I'm guessing C9 is the bypass cap for the 1284P? Is it physically as close as it can be to the chip?

Same question for Q1/C1/C2.

Interesting use of a voltage regulator for IC4. Even though you are using it as a level translator, you should have a Ci = 0.33 uF and a Co = 0.1 uF. Same advice for IC5. Without Co, the regulator can oscillate. Although you are not using them for power, an oscillation will contribute to the noise present in the overall circuit.

At what frequency is IC4 driven by "contact in"? Is it merely a switch contact of some kind?

From http://www.reyax.com/Module/GSM/SIM908/SIM908_Hardware_Design_V1.00.pdf:

Quote:
The power supply range of SIM908 is from 3.2V to 4.8V. The transmitting burst will cause voltage drop and the power supply must be able to provide sufficient current up to 2A.
Of course, this may not be an issue if you're not using burst mode.

C8 should also be a low ESR type, and should be as close to the module as possible (same with C3).

I know you're running from a bench power supply at the moment, but when you switch to the TSR1-2450 you may run into trouble with the 2A peak currents pulled by the module. Consider powering the SIM908 from a separate DC-DC regulated supply rated for 2A or more.

I have to say, I'm now out of ideas ;)

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I have to say, I'm now out of ideas

Me too!

Power supply is via a stable good Laboratories model. So for the moment we must exclude this possible problem of the TSR2450.

Quote:
I assume your input voltage will actually be between 10 and 14.5 volts

No. It has to work in cars (12-14VDC) and trucks (24-27VDC)

Things left:

- PCB instead of breadboard.
- Removing the voltage regulators (so that they can't influence the circuitry)
- Try with another Atmel chip (644p). I already tried with another 1284p but also not good.
- Let it run for a longer time without communication because THEN I didn't saw the problem yet.
- Say goodbye to Atmel and start all over with some other processor...

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No use blaming the processor! The whole power side of things looks decidedly dodgy. For operation in cars and trucks you would want an automotive grade regulator. The car's power system has some nasties that will kill a standard regulator. You also want to add some protection - reverse polarity and overvoltage.
You've also tied AREF to VCC - normally one would just use a 100nF cap to 0V.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
The whole power side of things looks decidedly dodgy

Can be, but is NOT the issue for the moment because it is not used now AND the problem still exists.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wasn't only talking about the regulators. The placement of bypass capacitors is critical. Using a good quality power source is not a get out of jail card.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I agree! Therefore they are in place and as close as possible to the CPU.
And as told before I will also get rid off all the parts that are not needed for the moment so that they can't influence the circuitry.
Therefore I will wait until I have a PCB so that I can get rid off the other long air wires.

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Don't know what compiler you're using or what version, but over in the GCC forum there's a thread on the latest version emitting erroneous entry/exit code for functions. Popping different registers than Pushed.

https://www.avrfreaks.net/index.p...

Maybe an issue, maybe not.

Greg

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Greg,

Thanks for bringing to my attention.
I'm using AS6.1 so also 4.7.2.
The latest reply however suggest it's no bug. (for now)

Wait and see what the discussion says...

I don't understand the assembler code so it is not so easy to follow.

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
Quote:
I assume your input voltage will actually be between 10 and 14.5 volts
No. It has to work in cars (12-14VDC) and trucks (24-27VDC)
Be aware that the voltage on a car's electrical bus can drop below 10 volts during normal engine start. I don't know about diesels in trucks, but I imagine you'll see a similar drop to around 20 volts. Note also that charging voltage on a 12 volt system while engine is running is 14.5-15 volts. It is not unusual to see prolonged spikes above 16 volts. Expect a similar situation on a 24 volt system... charging voltage would be around 29-30 volts, with spikes as high as 32 volts.

You should also keep in mind what will happen to your device if there's an electrical systems failure. I just had to replace the alternator on my own vehicle because the voltage regulator inside it was flaky and producing long periods of up to 20 volts. Imagine a 24 volt system having a similar failure and pushing out 40 volts for seconds or minutes at a time. Don't kid yourself, these failures happen more often then you'd imagine. Do you want to have to replace your GPS logging device at the same time?

Consider including overvoltag protection. Although this TI application note is for a lower power circuit than would be suitable for your needs, have a look.

Also, the raw electrical bus is a very noisy place. See the attached trace from a voltage logger I placed on my vehicle after the new alternator was installed. The loggin session was 10 hours long. The first view is of the whole session, and the others are zoom-ins of parts of that session. These are screen captures from an audio editor. I convert the log file to a wav file for easy display in my favourite audio editor. The scale at the bottom is in minutes:seconds, the scale on the right is a bit odd because it's an audio file. The zero line at centre corresponds to 10 volts, 0.1 corresponds to 11 volts, 0.2 to 12 volts, etc.

Quick links to each image:

Note the extreme spike at engine startup, dropping below 10 volts, and rising above 15 before settling. Note also the spike at engine shutdown, the stepping as accessories are turned on and off, and the overall noise.
Korstiaan wrote:
The latest reply however suggest it's no bug. (for now)
It is absolutely not a bug.

JJ

Attachment(s): 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi JJ,

Thank for sharing the info.
However I want to find the problem now before thinking about the hardware.
I wish I was already there...

Korstiaan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Try the following:

#include 

#define SRAM_CANARY 0xC5

void __attribute__ ((naked)) __attribute__ ((section (".init0"))) sram_test(void) {

  __asm volatile (
  // disable WDT
    "    cli\n"
    "    eor __zero_reg__,__zero_reg__ \n"
    "    ldi r24,%[change]             \n"
    "    out  %[mcusr], __zero_reg__   \n"
    "    sts %[wdtcsr],r24             \n"
    "    sts %[wdtcsr],__zero_reg__    \n"
  // paint all of SRAM
    "    ldi r24,%[canary]             \n"
    "    ldi r25,hi8(__stack)          \n"
    "    ldi r30,lo8(__data_start)     \n"
    "    ldi r31,hi8(__data_start)     \n"
    "    rjmp .cmp_paint               \n"
    ".loop_paint:                      \n"
    "    st Z+,r24                     \n"
    ".cmp_paint:                       \n"
    "    cpi r30,lo8(__stack)          \n"
    "    cpc r31,r25                   \n"
    "    brlo .loop_paint              \n"
    "    breq .loop_paint              \n"
  // test all of SRAM forever
    "    sbi %[ddrd],6                 \n"
    "    sbi %[ddrd],7                 \n"
    "    cbi %[portd],6                \n"
    ".repeat_test:                     \n"
    "    ldi r30,lo8(__data_start)     \n"
    "    ldi r31,hi8(__data_start)     \n"
    "    rjmp .cmp_test                \n"
    ".loop_test:                       \n"
    "    ld r26,Z+                     \n"
    "    cp r24, r26                   \n"
    "    brne .fail_test               \n"
    ".cmp_test:                        \n"
    "    cpi r30,lo8(__stack)          \n"
    "    cpc r31,r25                   \n"
    "    brlo .loop_test               \n"
    "    breq .loop_test               \n"
    "    dec r26                       \n"
    "    brne .repeat_test             \n"
  // toggle status LED about every 0.5 seconds
    "    sbi %[pind],7                 \n"
    "    rjmp .repeat_test             \n"
  // light GPS LED and halt on test failure
    ".fail_test:                       \n"
    "    sbi %[portd],6                \n"
    ".halt:                            \n"
    "    rjmp .halt                    \n"
  :
  : [canary] "M" (SRAM_CANARY),
    [mcusr]  "I" (_SFR_IO_ADDR(MCUSR)),
    [change] "M" ((1<<WDCE)|(1<<WDE)),
    [wdtcsr] "M" (_SFR_MEM_ADDR(WDTCSR)),
    [ddrd]   "I" (_SFR_IO_ADDR(DDRD)),
    [pind]   "I" (_SFR_IO_ADDR(PIND)),
    [portd]  "I" (_SFR_IO_ADDR(PORTD))
    );

}


int __attribute__ ((__OS_main__)) main(void) {
  while(1);
}

Start a new empty project, and paste the above code, and upload.

It will continuously test SRAM. The status LED on your board will flash (every 2 seconds at 16 MHz) to indicate test in progress. If the test ever fails, the GPS LED will light and the test will halt.

The program does absolutely nothing else. No USART, no TWI, no interrupts of any kind, no external activity whatsoever apart from the LEDs. If it fails, the problem lies almost certainly in your hardware.

Here's the hex file if you have trouble building:

:100000000C9446000C948B000C948B000C948B0089
:100010000C948B000C948B000C948B000C948B0034
:100020000C948B000C948B000C948B000C948B0024
:100030000C948B000C948B000C948B000C948B0014
:100040000C948B000C948B000C948B000C948B0004
:100050000C948B000C948B000C948B000C948B00F4
:100060000C948B000C948B000C948B000C948B00E4
:100070000C948B000C948B000C948B000C948B00D4
:100080000C948B000C948B000C948B00F89411242E
:1000900088E114BE809360001092600085EC90E4CB
:1000A000E0E0F1E001C08193EF3FF907E0F3D9F31D
:1000B000569A579A5E98E0E0F1E003C0A1918A1742
:1000C00041F4EF3FF907D0F3C9F3AA95A1F74F9A8E
:1000D000F2CF5E9AFFCF11241FBECFEFD0E4DEBF78
:1000E000CDBF11E0A0E0B1E0E0E2F1E000E00BBF45
:1000F00002C007900D92A030B107D9F71BBE11E0E6
:10010000A0E0B1E001C01D92A030B107E1F70E946C
:100110008D000C948E000C940000FFCFF894FFCF5C
:00000001FF

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

JJ,

Already thanks to everyone but especially you JJ.
They should give you medal for all your work and your support !
Thanks to you I didn't gave up.
Incredible !

In the mean time I was also doing almost the same thing, because YOU directed me that way.

Starting from an empty project:

I started without the GSM/GPS module and sending/receiving data via UART0 (by connecting the TX with the RX) and at the same time debugging the stack with UART1.
Result: no problems.(not very long tested, but still, seems ok)

Then I did connect the GSM/GPS module again and just sending garbage to it. This had as result that the module only sends every x-seconds "ERROR". (=not much receiving)
I let it run for some time and it also seems to work.

Then I send the correct string "AT+CGPSINF=0\r" continuously and then large strings came back from the module and a crash occurred (sometimes) very fast. Since we looked so much to the software I ignored the software.

So I suspected some electrical interference coming from the RX input (TX from the module).
Again, by good advice from you, I added a capacitor 100nF on the voltage regulator in the RX-circuit.
Same result -> not good, but a must in the definitive design.

Then I remember something from an old Arduino project: There I had to connect 100nF capacitors between the RX/TX lines and ground. If I didn't do that it just didn't worked. (Between a atmega328p and a MAX485).

So I placed a capacitor of 100nF close to the atmega1284p between the RX and ground, and .... problem gone!
NO MORE STACK CRASH! (for the moment anyway)
Now I have to let this run for at least 12-16hrs. to be sure.

But it is a fact that when I remove the capacitor the stack changes almost immediately and gradually goes to zero (crash). So I hope I found it.

I never thought that this would give such a strange behavior. And still don't understand how the processor can react so strangely on this bad input signal. This is something for the classroom.

I will update this tomorrow (or sooner if it crashes again) to give the result.

Let 's hope. It's about time.

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
Already thanks to everyone but especially you JJ.
They should give you medal for all your work and your support !
Your opinion places you in the minority, but thanks ;)
Quote:
Then I send the correct string "AT+CGPSINF=0\r" continuously and then large strings came back from the module and a crash occurred (sometimes) very fast. Since we looked so much to the software I ignored the software.

So I suspected some electrical interference coming from the RX input (TX from the module).

It may be as you say, or it may be the GPS module is placing a large load on the power supply. Despite the fact that you are using a bench supply, and have a bypass cap on the mcu and a (very large, I hope) bypass cap on the module, if power distribution in your breadboard isn't very clean with heavy wires, the 2A peak draw from the module can wreak havoc with the mcu.

I'd be interested to know what the string you are sending to the GPS modules gets it to do. I would suspect that the problem isn't the length of the response, rather the internal activity in the module. Are you asking it to connect to the cell network? If so, that will incur heavy peak loads during the transmission phases of the negotiation. Try sending commands that elicit long response strings from the module, but which don't do anything else.

Quote:
So I placed a capacitor of 100nF close to the atmega1284p between the RX and ground, and .... problem gone!
NO MORE STACK CRASH! (for the moment anyway)
Now I have to let this run for at least 12-16hrs. to be sure.
Even with this evidence, you should still investigate the power bus. Put a scope on the RX line, for sure, but do the same on Vcc of the mcu and the GPS module.

If indeed it proves to be simply noise on the RX line and not a power/bypass issue, that begs an answer to the question: "Why is the GPS module generating so much noise on it's TX line?"

I must admit, I am not nearly an expert on line-level conversion circuits, but I am confused about your use of IC5 as the low-voltage reference. It provides a 3.3V reference for the low side of the level converter, but your GPS module will be pushing out a voltage of between 4 and 4.4 volts, depending on the voltage drop on D1, which will depend upon the current pulled by the module.

I don't know what will happen to that simple mosfet-based level converter when the input is higher than the low reference voltage, but I can't imagine it will be good.

So, why use IC5 at all? Why not simply use the cathode of D1 as the low voltage reference to the level converter?

Incidentally, D1 is rated only for 1A continuous. Even though peak currents of 2A will not be continuous, it would be a good idea to overrate that component. Better yet would be to replace it with a proper regulated source capable of delivering 2A.

Quote:
But it is a fact that when I remove the capacitor the stack changes almost immediately and gradually goes to zero (crash). So I hope I found it.
Just to be clear, if the figure returned by StackCountAll() is still above 10,000, you are not seeing the stack grow down to the bottom of free SRAM. You are seeing SRAM get corrupted. If the count from StackCount() continues to drop, that is evidence of multiple, probably randomly distributed SRAM corruptions. In this scenario, it's only a matter of time before an important variable or a stack item is clobbered, leading to a 'crash'.

If the stack pointer itself were getting corrupted, you could probably expect to see little if any debug output following the very first change in StackCount(), an it would mean that the next return from a function call would almost certainly cause an immediate crash.

Quote:
I never thought that this would give such a strange behavior. And still don't understand how the processor can react so strangely on this bad input signal. This is something for the classroom.
If indeed the RX line is delivering significant noise to the mcu, this is not strange at all. All pins must adhere to the Absolute Maximum Ratings for the device. The voltage on any pin must be > -0.5V and < Vcc+0.5V. Operation outside of these limits can cause any number of problems.

If adding a cap to the RX line at the mcu solves the problem, I'd speculate that it is smoothing out high positive or negative spikes on the line, keeping the signal within specs.

I think the next thing to try is eliminating IC5 and feeding the low reference input of the level converter with the cathode of D1, then repeat the long-response-string-from-module test.

One thing I hadn't considered, and can't find any data for, is if the GPS module has an internal 3.3 volt regulator for the 20-pin header signals you are tying into. If that is the case, my analysis above is moot.

However, in both cases, I wonder why you're using level conversion on the GPS-TX to mcu-RX line anyway? If indeed it is a 3.3V TX, the 5V mcu will have no trouble reading that as logic HIGH without a level converter.

Excellent detective work, by the way :)

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

JJ,

The interface circuitry is based on a module from Open Electronics and others on the net:
http://store.open-electronics.org/Arduino/Shield/GSM_GPRS_GPS_SHIELD
My SIM908 modules also come from them (breakout board - 7100-FT971)

Also:

- The module sends only 2,8V out on it's TX line. This is not enough for the atmega.
- The 2A (theoretical max.) is only a burst when GPRS is active and is covered by the 470uF Cap. and only for a duration of 577 micro seconds every 4,6 milliseconds when GPRS is sending data. So the diode is more than sufficient, the cap after the diode fills in.
- All the testing was without GPRS communication and in the future the communication lasts max. a few 100ms every 1-3 minutes. Otherwise the complete board consumes about 100mA when GPS is active and less than 15mA when not using GPS. The fraction of a second I send some bytes to my server I don't see any difference in the power supply (the 470uF takes care of that). But I repeat, for the moment GPRS is not even active.
- I have a lightweight version already in my car for several months now without any problem.

But I also think I can remove the 3,3v regulator IC5 and use the 5v directly to the mosfet?

Anyway tomorrow I will try with my 30yr old oscilloscope and see if I find something.

PS: It was stable for more than 3hrs. Then I took the RX 100nF capacitor away and 2 seconds later the stack dropped...

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For what it's worth, the greatest stack usage usually happens during an interrupt, because that sooner or later occurs on top of everything else (unless interrupts are disabled during the call to the subroutine that uses the most stack).

So saving the SP in a global during the interrupt and printing it (in foreground) every time it decreases will usually show if it drops below the end of .bss.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
But I also think I can remove the 3,3v regulator IC5 and use the 5v directly to the mosfet?
Not if:
Quote:
- The module sends only 2,8V out on it's TX line.
If it's really 2.8V, then a 2.8V reference would be best. You can create one with IC5 wired up as an adjustable regulator. See figure 16 in the datasheet, or get a 2.8V regulator.
Quote:
Anyway tomorrow I will try with my 30yr old oscilloscope and see if I find something.
That's all I have! :)

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dak664 wrote:
For what it's worth, the greatest stack usage usually happens during an interrupt, because that sooner or later occurs on top of everything else (unless interrupts are disabled during the call to the subroutine that uses the most stack).

So saving the SP in a global during the interrupt and printing it (in foreground) every time it decreases will usually show if it drops below the end of .bss.

Good idea.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oeps,

I thought I found it:

By placing the capacitor on the rx input at the 1284p the signal was not high enough so that the 1284p could 'see' the data.
So: no data, no interrupt, no corruption.

(I monitored the line with an external terminal and I saw the data coming from the gsm/gps module but the voltage was to low for the 1284p to see it.)

Now I've placed the 100nF on the TX line from the module before the mosfet.

Now the 'dirty' data gets filtered before it gets amplified.
It does has influence because when I remove the cap, it crashes after some time.

And running again... and waiting...

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Seems to keep working...
Now back to load the complete program and see what it does overnight.

K

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
By placing the capacitor on the rx input at the 1284p the signal was not high enough so that the 1284p could 'see' the data.
So: no data, no interrupt, no corruption.

(I monitored the line with an external terminal and I saw the data coming from the gsm/gps module but the voltage was to low for the 1284p to see it.)

Now I've placed the 100nF on the TX line from the module before the mosfet.

Now the 'dirty' data gets filtered before it gets amplified.

Some alternatives:
    - put the cap across RX and Vcc instead of RX and GND - put 2 caps: 1 across RX and Vcc and 1 across RX and GND
    - use a smaller cap, say 10 nF or even 1 nF
    - some combination of above
JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi JJ,

Tried some alternatives but the best result was the 100nF on the TX of the module.
My complete program runs fine (all night ok) and I can continue.
Now I have to go back to the drawing board and make some changes/additions.
The prototype PCB I ordered will and cannot be used.
This is for the trashcan.

Thanks for everything and whenever you come to Belgium, let me know!

Korstiaan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
Thanks for everything and whenever you come to Belgium, let me know!
It may be some time before I make it to the continent, but I will keep you in mind, thanks!

Best of luck,
JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
Tried some alternatives but the best result was the 100nF on the TX of the module.
This might be relevant.

Oddly nobody (me included) asked you what your fuse settings were. Were you by chance using the Low-power crystal oscillator?

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Joey,

My fuse is set in Atmel Studio on:
EXTXOSC_8MHZ_XX_16KCK_65MS
This means low fuse = 0xFF.
Must it be ?:
FSOSC_16KCK_65MS_XOSC_SLOWPWR (fuse 0xF7)?

Korstiaan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
My fuse is set in Atmel Studio on:
EXTXOSC_8MHZ_XX_16KCK_65MS
This means low fuse = 0xFF.
This selects the Low Power Crystal Oscillator.
Quote:
Must it be ?:
FSOSC_16KCK_65MS_XOSC_SLOWPWR (fuse 0xF7)?
This would select the Full Swing Crystal Oscillator.

It sounds like your problem might go away if you simply change the fuse to select Full Swing. It's worth a try. If you're re-designing your PCB anyway, it probably won't hurt to also use the capacitor as you have been, but I recommend first reading the material I linked in my last post. The solution found by others was to use an RC filter with a 10K resistor in series, and a 100 pF capacitor across to ground. This filter should be placed between the 1284P and the level converter T1.

Since a high slew rate on RXD1 doesn't seem to cause interference with XTAL1, another option might be to use USART0 for debugging on JP1 and USART1 for comms with the SIM908, but that would require a new PCB layout anyway. As a paranoid, I might use all three techniques to avoid any potential problems, especially since your device will be in a pretty harsh environment. I would, however, evaluate the effectiveness of each separately.

Even if the 1284P didn't have this flaw, and even if the SIM908 didn't have such a high slew rate on TXD, the Full Swing Crystal Oscillator fuse setting is a good idea anway due to the noisy vehicle environment.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Korstiaan wrote:
...
This selects the Low Power Crystal Oscillator.
...
This would select the Full Swing Crystal Oscillator.

Yes, I found it also in the datasheets.
Quote:

the Full Swing Crystal Oscillator fuse setting is a good idea anyway due to the noisy vehicle environment

Totally agree, as described in the datasheet. I maybe read this a long time ago but didn't remembered it anymore. It is for sure needed in my environment.

Quote:
As a paranoid...

Like me !

The prototype PCB solved the problem already a lot but it was not 100% gone if I didn't use the 100nF cap. Since the capacitor is installed I didn't had any problems anymore.
I will change the fuse and some other recommendations AND adding watchdog functionality as a final solution in case something still goes wrong.

Korstiaan

Thanks.

PS: FYI, see a part of my c# program. The final results of the data sent from the atmega1284p to the server...

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I don't know how much power the crystal will need to have (max. drive) with this full swing. Due to place restrictions, I need to use a low profile HC49 which has max 0.1mW (100uW). Is that sufficient? It works but for now but how long?

And what about using the internal 8MHz oscillator? 3 components less and total consumption dropped during idle from 9 to 6mA.

Korstiaan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Korstiaan wrote:
I don't know how much power the crystal will need to have (max. drive) with this full swing. Due to place restrictions, I need to use a low profile HC49 which has max 0.1mW (100uW). Is that sufficient? It works but for now but how long?
That's a good point, and one I'd never considered.

The only relevant thread I've found is this one. It goes into some detail, but there are no conclusive findings.

@theusch suggests:

Quote:
Short answer: Fuggedaboudit.
There is some evidence (theoretical, really) that Full Swing will run a 100uW crystal out-of-spec, but there has been no evidence of any failures of crystals that may have been subjected to these conditions.

If you're worried, read the thread thoroughly, and the application notes it refers to. The formula for calculating the drive power is in the thread, but beware of the 'wrong' formula first posted. Read the whole thread. You'll need all the spec of your cyrstal (ESR, etc.).

Note that the PDF from ST mentioned in the OP is no longer available at the posted link. It is ST Application Note AP2867 and can be found here. This is a later revision of the appnote that corrects for the formula error mentioned in the thread.

If you can find an HC49 crystal with a higher max drive (many can take as much as 1 mW), you should definitely be safe... but you'll probably be safe anyway.

The two Atmel appnotes mentioned in the thread are AVR042 (AVR Hardware Design Considerations) and AVR186 (Best Practices for the PCB layout of Oscillators). Both are important reads for anyone planning to use a crystal oscillator.

Quote:
And what about using the internal 8MHz oscillator? 3 components less and total consumption dropped during idle from 9 to 6mA.
The difference of 3 mA you saw does not represent the power being delivered to the cyrstal, rather the biggest component of that difference is due to the change in core speed.

However, switching to the internal oscillator should eliminate the problem with RXD0 and XTAL1.

You may need to use OSCCAL to calibrate the clock speed. The specs say +/-10%, but in practice you'll find most devices are within 2%. Depending on your baud rate and other parameters, the required error in the USART speed may need to be no more than +/-1.5%. It should be possible to calibrate to within 1% but both the factory calibration and any calibration you perform will drift both with Vcc and with temperature. Vcc should be stable, but temperature is likely to see large swings in a vehicular application.

You would need an accurate time base to periodically re-calibrate. While your GPS module can give you that, you have to talk to it over a USART, so that's a chicken/egg problem. Probably the best way would be to use the asynchronous mode of Timer/Counter2 with a 32.768 kHz watch crystal on TOSC1/TOSC2.

While on the 1284P those are separate pins from XTAL1/XTAL2, it's unclear whether a watch crystal on TOSC1/TOSC2 would be subject to the same disruption that a high RXD0 slew rate causes to a system clock crystal on XTAL1/XTAL2. It is at least possible that it would be. However, any disruption to the watch crystal should not result in register corruption. At worst, I expect inaccuracies in Timer/Counter2. If you only rely upon Timer/Counter2 (at least in asynchronous mode) for calibration, and only calibrate while the GPS TXD is silent, this shouldn't pose a problem.

So I think it's workable, but given the hoops you'll have to jump through to use the 8 MHz internal oscillator, I'd say it would be better to resolve the issues with the 16 MHz crystal instead.

Quote:
PS: FYI, see a part of my c# program. The final results of the data sent from the atmega1284p to the server...
Sweet!

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

total consumption dropped during idle from 9 to 6mA.

As the datasheet indicates 1mA at 8MHz and 2mA at 16MHz in Idle, the AVR's draw is only part of the situation.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:
total consumption dropped during idle from 9 to 6mA.
As the datasheet indicates 1mA at 8MHz and 2mA at 16MHz in Idle, the AVR's draw is only part of the situation.
Perhaps the OP had some peripherals enabled during the test...?... although that doesn't account for the full 3 mA...
Korstiaan wrote:
The prototype PCB solved the problem already a lot but it was not 100% gone if I didn't use the 100nF cap.
Consider using the RC filter with 10K and 100 pF instead. This filter was selected because it's cut-off frequency should allow baud rates of up to 115.2 kbps while blocking the higher frequency components that trigger the problem on the 1284P. Simply using a capacitor as you have been may work, but the real cut-off frequency is not known since you don't know the output impedance of the level shifter. The cut-off frequency is likely to be much higher than is necessary (assuming a low output impedance) for the baud rates you are using, and that puts you that much closer to the frequency components that might trigger the problem.

It works fine on the bench now, but wait until you put it into a noisy vehicle, perhaps one with a bad electrical system, maybe an inverter... maybe the truck driver has a TV that he runs of off that inverter. Maybe a vacuum cleaner. Any number of sources of additional noise might push the 1284P over the edge. Be conservative.

But then, since you're considering implementing several lines of defence:

    - Full Swing - WDT
    - RXDn Filter (C only, or RC)
    - Switching to RXD1
... I might just be paranoid ;)

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]