Small modifications in source code, big differences in the final result

Go To Last Post
12 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I know this question is somewhat vague, but I'm stuck with this problem for all the day and I don't know what is happening.

 

I created a full program that works well on ATmega328. It uses the UART for a half-duplex master-slave RS485 bus (it's the master on the bus), some LEDS, some digital inputs, some analog inputs, some digital outputs...

 

Now I noticed I have some problems when the bus configuration changes. When more slaves are added to the bus, it seems the MCU doesn't communicate with one slave anymore.

 

I started investigate and I noticed I can make very small modifications on the source code, but the result is very different. For example, I can comment out a function call in the main task and the problem on the bus doesn't appear anymore. But the function isn't directly related to the bus stuff. Even if I disable compiler optimization, the problem disappears.

 

I know this kind of problem can be related to stack overflow, but the RAM usage is about 70%, so I don't think.

Interrupts? I disabled every interrupt, leaving only timer interrupt that I need for timing operations. Moreover, interrupt related problems should appear at random times, but in my case the behaviour is always the same.

 

I'm trying to reduce the code to a small test program that shows the problem, but it's very difficult: if I comment something, the problem disappears, but it doesn't depends on the code just eliminated.

 

I tried step-by-step debug, but in this case I can't reproduce the problem. Could the problem be related to timings on the bus?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Even if I disable compiler optimization, the problem disappears.

My initial guess (which is all any of us can do since you have not shown us a single line of code) is that you have a variable that is accessed both inside and outside of an ISR that is not declared "volatile".

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

but the RAM usage is about 70%, so I don't think.

The often suggest 70% "limit" is only a suggestion. It kind of depends on your ratio of globals/bss variables to automatics. If you are a heavy user of automatics with things like:

void foo(int a, long b) {
    char bufA[64];
    int bufB[40];
    ...
}

then you can easily "eat" that last 30% that isn't already allocated.

 

If you do suspect memory starvation then the key thing is whether SP ever descends lower than the end of .bss at the __bss_end label. There are various ways to check this. One would be a timer interrupt that keeps sampling SP and notes its position on entry. If it is seen lower than last time remember the new "low water mark figure". If you ever get anywhere close to __bss_end you may have a problem

 

(sorry I just realised I am making an assumption that you are using avr-gcc but the "70%" figure sounds a lot like the report from Studio 6 and if so there's a strong chance you do mean avr-gcc).

 

Another way to check for low SP is to paint the whole of RAM with some known value before the program starts, let it run for a while  then break in a debugger and ensure there are still some bytes of that known value above bss and below the low SP point.

 

Yet another way to do this is to arrange to put one variable at the very end of BSS. Stick a known value like 0xDEADBEEF or 0xBABEFACE in it. Periodically check the variable and ensure it still holds that value. If it's changed then the chances are the stack bumped into BSS.

 

EDIT: having read Steve's reply - that seems far the more likely.

Last Edited: Mon. Mar 16, 2015 - 03:54 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My initial guess (which is all any of us can do since you have not shown us a single line of code) is that you have a variable that is accessed both inside and outside of an ISR that is not declared "volatile".

The program is big and, as I wrote, I can't reduce it to a small test program that shows the problem.

I'm using only a timer interrupt with a global unsigned long variable (ticks) that is incremented in the ISR that first every 1ms. I use ticks variable to make timings. ticks is defined volatile and I use the following code to retrieve its value:

typedef unsigned long ticks_t;
extern volatile ticks_t ticks;

static inline ticks_t
ticks_now(void)
{
	ticks_t t;
	ATOMIC_BLOCK(ATOMIC_RESTORESTATE) {
		t = ticks;
	}
	return t;
}

Another way to check for low SP is to paint the whole of RAM with some known value before the program starts, let it run for a while  then break in a debugger and ensure there are still some bytes of that known value above bss and below the low SP point.

I already tried this technique:

extern uint8_t _end;		/* linker symbol */
extern uint8_t __stack;		/* linker symbol */

void StackPaint(void) __attribute__ ((naked)) __attribute__ ((section (".init1")));

void StackPaint(void)
{
	__asm volatile ("    ldi r30,lo8(_end)\n"
	"    ldi r31,hi8(_end)\n"
	"    ldi r24,lo8(0xc5)\n"
	"    ldi r25,hi8(__stack)\n"
	"    rjmp .cmp\n"
	".loop:\n"
	"    st Z+,r24\n"
	".cmp:\n"
	"    cpi r30,lo8(__stack)\n"
	"    cpc r31,r25\n"
	"    brlo .loop\n"
	"    breq .loop"::);
}

I fill the stack with 0xC5. When I stop the debugger, I see many many 0xC5 bytes available...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Personally I would just write from 0x0060 (or 0x0100 depending on your AVR) to RAMEND. I wouldn't bother with "niceties" like _end and __stack. In .init1 (or .init3 which is where I'd personally put this) there is no bss yet. That all gets copied later so you might as well just paint the whole RAM.

 

But if you see 0xC5's then presumably that is not the issue.

 

I guess this is why God invented JTAG debuggers ;-)
 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Could one set a data breakpoint for the "variable" at __bss_end+1?

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pozzugno wrote:
I tried step-by-step debug, but in this case I can't reproduce the problem.

Does suggest timing.

 

Of course, it could be timing and volatile and stack.

 

And we could throw-in buffer-overruns and bas pointers for good measure...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Could one set a data breakpoint for the "variable" at __bss_end+1?

Yes :) Use similar tricks with HEAP (or whatever the symbol is these days) to break when my stack smashes the heap :)

 

The databreakpoints in studio can also take raw addresses if you want to get dirty, or some 'evaluateable symbol' which have an address.

:: Morten

 

(yes, I work for Microchip, yes, I do this in my spare time, now stop sending PMs)

 

The postings on this site are my own and do not represent Microchip’s positions, strategies, or opinions.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

don't forget buffer overflow problems

Keith Vasilakes

Firmware engineer

Minnesota

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think I have found the problem :-)

It was a lost pointer in an automatic variable (so allocated in the stack).

 

Look at the following example code:

struct foo {
    int type;
    void *ptr;
};

void f(void) {
    struct foo s;
    s.type = 1;
    g(&s);              /* Call g() without setting ptr member!!! */
}

void g(struct foo *s) {
    if (s->type == 1) {
        *(unsigned char *)s->ptr = 0x00;        /* !!! */
    }
}

This is a very difficult bug to detect and debug. s.ptr contains random values (depends on the last values written in the stack) and could point to random addresses in RAM.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Does GCC have an option to warn about that?

 

It can (does!) certainly warn about scalar variables being used before having a value assigned ("uninitialised") - dunno of it can detect that part of a structure has not be assigned...?

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you mean runtime or compile time? As f() and g() in the above could easily be in different compilation units I don't see how it could do it at compile time. I suppose if the pointer were created then written through without assignment in the same function it ought to be able to spot something like that but I think this is where lint/splint/cppcheck/etc. come into their own.