Inconsistently reproducible crash on function return on Tiny167

Go To Last Post
23 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi, I'm pulling my hair out over a crash that is reproducible only in my production code and not in a test harness. Here's where things are going wrong:

 

// (In a library file)
bool arraysAreEqual(byte* arr1, byte* arr2, byte len) {
    for (uint8_t i = 0; i < len; i++) {
        if (arr1[i] != arr2[i]) {
            return false; }
    }
// (some inactive #ifdef left out here)
    return true;
}

// (In same library file, function definition truncated)
byte test_function() {
    // byte arrayX[] defined w/ size < DEFINED_CONST_LENGTH, based on a #define
    // byte arrayY[] defined w/ size = DEFINED_CONST_LENGTH, based on a #define
    // byte ok = 0  used to accumulate other types of errors
    //...

	if (!arraysAreEqual(arrayX, arrayY, ARRAYX_LEN)) {
		I2C_Print("bad\n",0);
		return 4;
	}
	//...
	return (ok ? 0 : 0xFF);
}

// (In a function within main.c)

//...
LED_Panic(test_function());
//...

// LED_Panic() displays variable contents on a PORT and traps execution

Ok, so:

 

in my test harness, data is set up such that test_function() should return 0, and it does (test harness replaces the LED_Panic() with a different call context, but should have a functionally equivalent set-up of the data structures test_function() tests).

 

in my production code, based on the fact that I receive "bad" on my I2C monitor but the LED_Panic() call is never made, the CPU locks up at the function return. A previous variation on the added troubleshooting code resulted in the CPU rebooting after an observable delay. Note that the expected behavior is for test_function() to return 0; we are already in unexplained behavior when the arraysAreEqual() returns false.

 

I am fairly confident I'm not using the watchdog timer. I haven't yet narrowed in on what causes reset vs. hang, or tried to time the delay between last LED function call and reset in the reset case. 

 

The reported static memory usage for the production code is 167 of 512 available bytes.

 

Thanks for any suggestions on diagnosing & eliminating this behavior!

 

This topic has a solution.
Last Edited: Thu. Sep 14, 2017 - 05:54 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you ruled out hardware? You seem to use some pins to drive LEDs, your test harness maybe OK but the other boards may not be.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Should've been more specific: "test harness" refers to a software-only unit testing frame work in this case. Both of the cases I described, including production code misbehaving, are running on the same hardware in exactly the same hardware 'test harness' which is just an I2C and ISP connection to the same PCB. That is, the only change between the two scenarios I describe is what code & EEPROM image was loaded onto the same hardware. 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you try your code in simulator? To see where the code flows cause the problem.
.
MG

I don't know why I'm still doing this hobby

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Unfortunately, the simulator does not appear to be a good option for 2 reasons:

  1. The ATtiny167 is not supported by the simulator shipped with AS 7. I've been advised that it is supported by the earlier simulator shipped with AS 4.18.
  2. The "data setup" that I refer to involves I2C write transactions to a peripheral storage device, and test_function() is retrieving and then comparing parts of that data to reference values.

 

The questions that may help me plan best next debugging steps include:

  • Are there modes of memory corruption other than a stack overflow that could be going on here? Could master-mode I2C transactions (no HW UART usage) create problems?
  • What is actually going on, or could be going on, when the ATtiny167 hangs upon a function return?
  • What might it mean that with a change to code that generates debugging output surrounding the misbehaving code, the behavior changes from hanging to resetting immediately?

 

Current planned next debugging steps:

  • Reproduce more of the specific code sequence in main.c in the unit test fixture, to try to reproduce the bad behavior. This is problematic because my text fixture uses significant memory and I've already run into putative stack overflows because of it.
  • I will likely also try using my I2C console to print out the arrays being compared that I expect to be identical but are evaluated as not.
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not sure about your most recent questions, but I'm curious so I have a few (simple) questions:

 

How do you know LED_Panic() isn't called?
Have you tested LED_Panic() in isolation?
Are the output pins used by LED_Panic() set to outputs?
Why is test_function() expected to return 0? It looks like it can only return 0 if ok != 0. Is there some code in the ... section to do this?
I presume ARRAYX_LEN < DEFINED_CONST_LENGTH?

 

Some things to try would be doing two I2C_Prints in a row to make sure it isn't getting stuck in there, and using LED_Panic(1) at various places in the code to see where it gets to and where it doesn't. You could also temporarily declare all your larger arrays and data structures static to get a better idea of actual ram usage.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

eecharlie wrote:
The ATtiny167 is not supported by the simulator shipped with AS 7.
tiny167 has debugWIRE; connect it to an Atmel-ICE.

debugWIRE might not have data breakpoints so may have to add source code to test data then a breakpoint on data failure code.

fyi, tinyAVR 1-series (tiny1616) has two hardware breakpoints (address comparators on the unified memory space) and run-time output of the SP.

eecharlie wrote:
Are there modes of memory corruption other than a stack overflow that could be going on here?
Yes one of which is a buffer overflow; apparently arraysAreEqual is being mis-invoked (or earlier so unwind the stack on a failure)

eecharlie wrote:
Current planned next debugging steps:
Before instrument and test and become a debugger operator consider lint and C assert.

If possible, run the source code through Microsoft Visual Studio -analyze (a zero price lint) else "a" lint.

Sprinkle asserts (i.e. arraysAreEqual); will increase code size a bit.

eecharlie wrote:
Reproduce more of the specific code sequence in main.c in the unit test fixture, to try to reproduce the bad behavior.
If you think or intuit (better) that a source code snippet has a defect then can run that snippet through on-line PC-lint (but will need to copy source code from headers)

 


http://www.microchip.com/wwwproducts/en/attiny167

http://www.microchip.com/wwwproducts/en/attiny1616

Microsoft

Microsoft

https://docs.microsoft.com/en-us/visualstudio/code-quality/c6029

warning C6029: possible buffer overrun in call to <function>: use of unchecked value

...

via https://docs.microsoft.com/en-us/visualstudio/code-quality/code-analysis-for-c-cpp-warnings

from https://docs.microsoft.com/en-us/cpp/build/reference/analyze-code-analysis

The Ganssle Group logo

The Ganssle Group

Automatically Debugging Firmware

By Jack Ganssle

Major rewrite: May, 2014

http://www.ganssle.com/dbc.htm

...

... (you do use Lint, don't you? It's an essential part of any development environment) ...

...

http://www.gimpel-online.com/OnlineTesting.html (DIY C near the bottom)

via http://www.gimpel.com/html/index.htm (Interactive Demo)

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

LED_Panic() blinks an LED at about 1Hz forever, and I am not observing that. Yes, I've tested it in isolation, and the pins it uses are dedicated for LEDs, so I'm confident it does what it's supposed to. In particular I've put that function call just before the misbehaving test_function() and it behaves as expected.  I can also trade it out for an I2C print (haven't specifically tried that yet).

 

I misstated the setup of ok; it's actually e.g.

bool ok = true;
ok &= test_function();

and it's used to catch I2C bus errors as distinct from data errors. So the nominal successful case wraps up with ok = true, so the return value is 0.

 

Correct that ARRAYX_LEN < DEFINED_CONST_LENGTH

 

Responding to gchapman:

 

I've tried using debugWIRE but am unable to set breakpoints or single step at all, typically after enabling code that enables an external crystal. I may try temporarily disabling the crystal code to see if I can regain breakpoint functionality.

 

You've given me some helpful homework on code quality. I didn't realize one could use sizeof() on C arrays which is embarrassing. Before diving into size checking and lint though, in this particular case arraysAreEqual() only reads data so if I specify a too-long length it should read otherwise-allocated memory and possibly erroneously return false, correct? But it shouldn't be able to cause a corruption of the calling function's execution beyond the bad return value, in contrast to the calling function itself causing a hang upon return, right?

 

You've also given me a good idea to write an assert() macro that pushes things to my I2C monitor and can probably even be left in production code where no one will be listening.

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The precursor to me answering your questions is me doing more than a cursory code review.

eecharlie wrote:
You've also given me a good idea to write an assert() macro that pushes things to my I2C monitor ...
TWI interfacing may cause a side effect unless it's simple (no calls, register writes and reads only especially in assembly language, exit (i.e. not a return))

An assertion is a terminal condition such that the AVR spins after spitting data waiting for the operator to press the reset button (operator records the data before the reset)

A diagnostic method writes breadcrumbs to a spare AVR port, akin for TWI, that are recorded via a logic analyzer (AVR port, standard I/O UART, error output UART, TWI, SPI, etc)

Similar to the tracing feature in MPLAB X.

eecharlie wrote:
... and can probably even be left in production code where no one will be listening.
Yes as doesn't add much to the program space.

After a customer or operator reports an issue then would attach the logic analyzer to record assert output and attempt to repeat the issue.

 


embedded.com

Troubleshooting real-time software issues using a logic analyzer

David B. Stewart, PhD, InHand Electronics, Inc.

February 27, 2012

http://www.embedded.com/design/debug-and-optimization/4236800/Troubleshooting-real-time-software-issues-using-a-logic-analyzer

Using MPLAB® REAL ICE™ In-Circuit Emulator for MPLAB X IDE (Poster)

(bottom right, I/O Port Trace)

via http://www.microchip.com/mplab/mplab-x-ide

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

eecharlie wrote:
I've tried using debugWIRE but am unable to set breakpoints or single step at all, typically after enabling code that enables an external crystal.
tiny167 has a crystal oscillator errata.

eecharlie wrote:
I may try temporarily disabling the crystal code to see if I can regain breakpoint functionality.
Could change to external clock and feed it from an external oscillator or a signal generator; that way maintains clock accuracy and the clock appears before debugWIRE is functional (instead of debugWIRE then crystal)

An AVR is very limited on clock jumps (XMEGA less so) though I do see the tiny167 dynamic clock feature; that may be why debugWIRE drops.

Usually an AVR crystal oscillator is running before wake completes; therefore, crystal before debugWIRE.

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gchapman wrote:

eecharlie wrote:
I've tried using debugWIRE but am unable to set breakpoints or single step at all, typically after enabling code that enables an external crystal.
tiny167 has a crystal oscillator errata.

Good to know!

 

Quick update: I threw in assertions for the two array lengths going into arraysAreEqual() and they are failing when the production code is running but not in my unit test framework. I expect to close in on the problem soon. gchapman your linked article including the example assert() macro definition was very helpful. assert() is looking like a good compromise on utility vs eating memory (I am of course counting how many characters are in the expression passed to assert since every call will cost that much data memory...)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

So of course using sizeof() inside of arraysAreEqual() doesn't work as intended since C passes arrays as pointers to the first element...

 

Ok, so I learned about the comma, #, and ## operators this morning, and I have resolved my issue by discovering that I was incorrect in my assurance that ARRAYX_LEN < DEFINED_CONST_LENGTH. I was reading past the end of one of my arrays in a call to arraysAreEqual(). For everyone's reference, here is the macro and declaration I used to capture bad calls to arraysAreEqual():

 

// Compares two arrays, returns true if equal.
#define arraysAreEqual(arr1, arr2, len) ((bool) (ASSERT(sizeof(arr1)>=len), ASSERT(sizeof(arr2)>=len), _arraysAreEqual(arr1, arr2, len)))
bool _arraysAreEqual(byte arr1[], byte arr2[], byte len);

 

But more importantly than reading past the end of an array, I was in fact writing past the end of that array in another place. So that is consistent with my expectation that reading can only result in junk data, but writing can corrupt program flow.

Last Edited: Thu. Sep 14, 2017 - 05:49 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gchapman, further thoughts on best practices for debugging & deployment:

 

This system is a simple and mass-produced product used by non tech-savvy users. One inevitable source of multiple failure modes is the PCB getting wet.

 

Right now, for final debugging, I like the idea of asserts trapping execution and waiting for an external console to be connected. But once these ship, my thinking is that failed asserts should result in a device reset, to allow the most likely and most graceful recovery to usability. Maybe I can put breadcrumbs in nonvolatile memory that's not otherwise touched.

 

I hear you on avoiding side effects of complex communication engaged by assert. There is an entire class of errors that that will obfuscate (such as caused by an invalid bus state), but having already hammered out the communication code fairly well, it's high-level program logic that seems the dominant source of my remaining bugs.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

eecharlie wrote:
... and they are failing when the production code is running but not in my unit test framework.
If debugWIRE was functional then you could unwind the stack to the interim cause.

unit test framework - ideally can create an integration test that will reproduce the problem (without the entire application); useful for code review, lint, static analysis, more test cases for regression testing with CI.

eecharlie wrote:
assert() is looking like a good compromise on utility vs eating memory
Ideally most of the source code can be unit and integration tested on :

  • a PC or Mac
  • tiny167 in AVR Studio 4's simulator
  • tiny167

Swap the generic assert macro with an embedded assert macro when on tiny167.

An embedded assert macro will take some thought (data copying and/or transmission, exit is replaced)

eecharlie wrote:
(I am of course counting how many characters are in the expression passed to assert since every call will cost that much data memory...)
assert is a macro instead of a function so it's relative to the current stack frame.

assert's impact is on program space and side effects.

Ganssle's assert macro makes apparent any side effects via lint (calls in assert, assignment operators instead of relational operators)

 

May you have victory!

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

lint may have detected the two defects.

Relative to your effort, PC-lint is inexpensive wink

Microsoft Visual Studio has a zero price lint that can the invoked from the shell :

Microsoft

Microsoft

Code Analysis for C/C++ Overview

2016-11-4

https://docs.microsoft.com/en-us/visualstudio/code-quality/code-analysis-for-c-cpp-overview

...

Command-line support

In addition to the full integration within the development environment, developers can also use the analysis tool from the command line, as shown in the following example:

C:\>cl /analyze Sample.cpp

Some CI have lint plug-ins that are zero price :

http://hudson-ci.org/PluginCentral/

...

checkstyle

...

codescanner

...

coverity (not zero price other than for agreed upon FOSS)

cppcheck

...

 https://plugins.jenkins.io/

(Build Management, Build Reports radio button)

(Checkstyle, CppCheck, may be more)

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

eecharlie wrote:
One inevitable source of multiple failure modes is the PCB getting wet.
Water resistant coatings inside mobile phones are popular.

Such may be at the local hardware store; can trial it on a test PCBA.

Where that may come up short is for an operator replaceable battery.

eecharlie wrote:
... it's high-level program logic that seems the dominant source of my remaining bugs.
Should be able to locate some of those bugs while on a PC or Mac via lint then via unit and integration testing.

 


NeverWet

Commercial & Industrial

http://www.neverwet.com/commercial-industrial/index.php

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Great.

 

I also have a unit testing and mocked hardware environment that builds on x86_64. The AVR test environment is only for things that I can't easily mock or don't produce the same errors in the mock environment.

 

PC-lint looks good but at ~$400 I have to at least check into other options. What about command-line splint applied to my x86_64 unit test project? And are there others I should consider that have already been integrated into AS 7?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

eecharlie wrote:
What about command-line splint applied to my x86_64 unit test project?
splint has been in at least a few threads here; cppcheck may be more popular here.

For non-FOSS there's Microsoft Visual Studio Community and "cl.exe /analyze <filename.c>"

eecharlie wrote:
And are there others I should consider that have already been integrated into AS 7?
In the past day, I went through Atmel Gallery and couldn't locate the lint and static analysis tools that were once there.

IIRC, those Atmel Studio extensions were for Atmel Studio 6.

IIRC : Cppcheck, Goanna

Goanna is an extension in Microsoft Visual Studio :

Microsoft

Visual Studio

Marketplace

Goanna Studio - Static Analysis for C/C++

https://marketplace.visualstudio.com/items?itemName=RedLizardSoftware.GoannaStudio-StaticAnalysisforCC

...

Works with

Visual Studio 2005, 2008, 2010, 2012

...

Red Lizard Software is the first company to combine the technologies of static analysis and model checking to create a unique static analysis solution. The analysis is performed quickly, often in a matter of seconds, does not require test cases or even fully developed code, reports bugs precisely and has one unique goal: to bring higher quality software to market faster.

but Red Lizard Software may be dead (inactive URL) :

http://redlizards.com/

Oh ... Synopsys ... that's not a surprise as the demand for C and C++ static analysis has greatly increased.

 


http://splint.org/

http://cppcheck.sourceforge.net/

https://www.visualstudio.com/vs/community/

https://gallery.atmel.com/

https://en.wikipedia.org/wiki/Red_Lizard_Software

 

Edits : Wikipedia, non-FOSS

 

"Dare to be naïve." - Buckminster Fuller

Last Edited: Thu. Sep 14, 2017 - 10:41 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks. I just discovered in a fairly old makefile that I had been using scan-build / clang. I liked this one because I only had to go through the suffering of creating a working Makefile once (for my cmocka unit test build), and then scan-build could use it to find all the included libraries instead of me learning how to convey all that same information to a code analyzer. However I had it set up is now broken, but Ubuntu has clang as a standard package.

 

Splint falls on its face out of the box without knowing where include directories are for avr-glibc etc.

 

Current roadblock is that scan-build executable is a Python script that appears to expect 2.7 but my environment default is 3...

 

Hmm Synopsys (SpyGlass Lint?) is a little daunting in their offerings.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

eecharlie wrote:
... clang.
Thanks for I had forgotten about the Clang Static Analyzer lint in LLVM.

fyi, AVR is in LLVM 5.

eecharlie wrote:
Splint falls on its face out of the box without knowing where include directories are for avr-glibc etc.
Ideally that's a post config setup issue; splint should be good to go as it went through Debian quality control.

eecharlie wrote:
Current roadblock is that scan-build executable is a Python script that appears to expect 2.7 but my environment default is 3...
Another Python 2.7 environment is PlatformIO.

PlatformIO has tiny167 with forthcoming AVR GDB client and AVaRICE (AVR GDB server, Atmel-ICE, AVR Dragon, AVRJTAGICE mkII, etc)

PlatformIO Plus has static analysis (lint?) "soon".

 


http://releases.llvm.org/5.0.0/docs/ReleaseNotes.html#changes-to-the-avr-target

https://packages.debian.org/search?keywords=splint&searchon=names&suite=all&section=all

https://packages.debian.org/search?keywords=cppcheck&searchon=names&suite=all&section=all

http://platformio.org/boards?filter%5Bplatform%5D=atmelavr (ATtiny167 into MCU)

http://platformio.org/pricing#solution-static-project-analysis

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Got scan-build and scan-view working. Easily installed in linux (Ubuntu) via apt-get install clang

 

There is a bug in the ubuntu package for scan-view but it's not hard to do a command-line workaround.

 

For anyone following this, in my existing Makefile for an x86-based unit testing framework, I have

scan-build: clean
    scan-build -V -o . make

This will break when it tries to call scan-view, at which point you can use the above workaround.

 

Having gotten all that working, this tool isn't catching much at all, so I'm not sure how to interpret that. Maybe I'll just sleep a little better :)  But I'm definitely going to add more asserts to check array lengths and probably make some nested calls more cumbersome to ensure I'm checking array lengths at each point...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

eecharlie wrote:
Hmm Synopsys (SpyGlass Lint?) is a little daunting in their offerings.
SpyGlass Lint is for design of programmable digital logic in hardware description languages (VHDL, Verilog, etc)

https://www.synopsys.com/verification/static-and-formal-verification/spyglass/spyglass-lint.html

Synopsys aquired Coverity so that may be the way in.

http://www.coverity.com/

 


Synopsys

Static Application Security Testing (SAST)

Manage risk, costs, and compliance by building better software.

https://www.synopsys.com/software-integrity/security-testing/static-analysis-sast.html

...

Enterprise-scale speed, accuracy and agility with Synopsys Static Analysis (Coverity)

...

 

Edit : typo

 

"Dare to be naïve." - Buckminster Fuller

Last Edited: Fri. Sep 15, 2017 - 12:19 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

eecharlie wrote:
But I'm definitely going to add more asserts to ...
lint plus assert is probably enough.

A static analyzer in addition to lint would reduce risk at a price (purchase order) and cost (training, mentoring, practicing, setup to reduce false positives, etc)

PC-lint is an order of magnitude less expensive than a static analyzer.

PC-lint can be a bit difficult to setup; could start with the PC-lint setup file from a code review book then modify to fit.

 

Gamasutra: The Art & Business of Making Games

Gamasutra

In-Depth: Static Code Analysis

by John Carmack

December 27, 2011

https://www.gamasutra.com/view/news/128836/InDepth_Static_Code_Analysis.php

...

(bottom)

The takeaway action should be: If your version of Visual Studio has /analyze available, turn it on and give it a try. If I had to pick one tool, I would choose the Microsoft option. Everyone else working in Visual Studio, at least give the PVS-Studio demo a try. If you are developing commercial software, buying static analysis tools is money well spent.

A final parting comment from Twitter:

Dave Revell: @dave_revell The more I push code through static analysis, the more I'm amazed that computers boot at all.

 

"Dare to be naïve." - Buckminster Fuller