Complex nested function returns incorrect values, but only when run in context

Go To Last Post
18 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Morning all,

 

I've got a funny question that I'm wondering if there are some general 'gotchas' etc to look into, before posting lots of detail about in the problem.

 

In essence, I've got an application split across ~15 source files with everything driven by a co-operative scheduler, and the scheduler itself 'ticks' on the overflow interrupt of a hardware timer. The 'tick' ISR then counts in software to split into different task frequencies and set task flags. The scheduler dispatches these in main by checking for set task flags.

 

My application is behaving as designed across the 50 or so tasks that it is performing, bar one little niggle at the moment. I'm reading 8 different sensors, most of which have their engineering values calculated by simple linear formulae, one however requires a lookup table due to non-linearity.

 

The nested set of functions that deal with this sensor in effect; reads the raw ADC value, compares it to the lookup to determine which set of values it is between, and then interpolate its before store. The issue I'm having is my interpolate function returns the wrong value... BUT - only when run in the full application, if I strip the code segments out and run them in isolation, it calculates exactly the right value as I'd expect, and the formulae themselves calculate the correct value. I originally had to deal with integer overflows that was causing this problem, even when run in isolation, but those are resolved.

 

So, before I pull out all the various bits of code and post it up here, a general question:

 

  • what are the typical 'gotchas' to look for in co-operative type scheduling that could cause execution to behave differently to running it in isolation?
  • I thought that given this one function is the only function that takes longer to execute than my timer overflow, that it might be due to the ISR interrupting the task (though no shared variables), so I wrapped the task in cli(); & sei(); to no avail.
  • equally, if the above is literally of no help at all without code, shout and i'll throw it up!

 

Thanks as always

This topic has a solution.
Last Edited: Mon. Dec 30, 2019 - 01:14 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are you using volatile as needed?  Are you handling any atomic accesses properly?

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Any variable accessed by any ISR is volatile, yes.

 

As far as I'm aware, I don't have any accesses going on that need to be atomic (is there a list anywhere, rather than spread out mentions in the datasheets?) I've disabled interrupts during execution of all pending tasks as pretty nuclear option just to check, and nothing changes - the problem still exists.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I take it at this stage you're just looking for "What to check next ?"  type comments:

 

  • 50 tasks seems a bit excessive, I don't even think FreeRTOS could cope with that number. Have you tried running your scheduler with just one extra task ?
  • I guess there is a high memory load with that many tasks, check that your stack pointer isn't eating your variables.
  • Write some checking code to trap the fault and using your debugger, set a breakpoint to allow examining the variables in your function.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You might want to see how much stack you are using. Is it possible to replicate the problem in the simulator?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is the " overflow interrupt of a hardware timer " that initiates the tasks, is it being cleared at the start or held till the end of the ISR.  If the first, then it may well be re-interrupting itself before the tasks have completed.   Some sensors are read through TWI??  There can be internal INT's used in the processing of bytes on the TWI wire that can get mixed with your own INT's.   Doing a ADC value acquire for every sensor?  Perhaps not enough aquire time between tasks?

 

Just guessing here...  of course.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The overflow flag is cleared by hardware on entry to the isr as is the global interrupt enable flag, so no reentry is possible.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

N.Winterbottom wrote:
I take it at this stage you're just looking for "What to check next ?"  type comments

Some more of those:

  • buffer overruns
  • uninitialised pointers
  • "dangling" pointers
  • other pointer faux pas ...

 

EDIT - and ...

 

jtw_11 wrote:
Any variable accessed by any ISR is volatile, yes

Remember that 'volatile' doesn't solve atomicity issues ... 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Sun. Dec 29, 2019 - 11:41 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

>one however requires a lookup table

in ram or flash (or eeprom) ?- if flash then one less potential corruption point can be marked off the list

 

>reads the raw ADC value

Replace that part of the function with just simulating a specific adc value (or sequence through them all). If the problem persists, then it would sound like a stack thing. If it goes away, then maybe task 0-49 is changing the adc some way that upsets task 50 (like switching the 8/10 bit res).

 

It also sounds like you either know the adc input value and the expected lookup value and can see both to know you have a problem, or you are assuming the adc is supposed to be a certain value and are expecting a certain lookup value but really do not know the actual adc value used for the lookup. If the latter, then you also cannot know if the lookup value is correct or not since you do not know what is being looked up.

Last Edited: Sun. Dec 29, 2019 - 12:00 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

50 tasks!!!

Which AVR is this running on?

i would guess your running into a stack or heap issue.

jim

 

Click Link: Get Free Stock: Retire early! PM for strategy

share.robinhood.com/jamesc3274
get $5 free gold/silver https://www.onegold.com/join/713...

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Making a lot of guesses here:  Given that the cooperative schedule swaps on a timer overflow interrupt, I guess you are assuming that all tasks complete by the time that interrupt hits.   Also, I assume your scheduler spins waiting for some flag from the interrupt handler.  That flag must be declared volatile.  Do you have a way to make sure the tasks complete in time?  You could add a flag in the interrupt handler to check if the scheduler is ready and if not, fire on a red led or something.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


jtw_11 wrote:
what are the typical 'gotchas' to look for in co-operative type scheduling that could cause execution to behave differently to running it in isolation?
mis-operation of synchronization operations (OS, RTOS, framework, primitive); mutexes and semaphores must be correctly in-order (sequence)

Results : deadlock, task starvation, race

Solutions : swap the order, invoke higher-level sync (protected queues, protected pools, mailboxes)

 

priority inversion

Results : rates are no longer monotonic, jitter, watchdog bites

Solution : correct the OS/RTOS/framework, or, correct the application (enable detection of priority inversion)

jtw_11 wrote:
... if the above is literally of no help at all without code, shout and i'll throw it up!
Before, pipe that source code through a linter(s)

Some C coding standards identify concurrency issues; some linters flag concurrency defects (true positive, a "hint", false positive)

Most linters are whole file instead of whole program; those can still be of use.

Assertions will reduce the defect density though a significant effort to sprinkle these throughout the files.

Ideally, a source code snippet that repeats the issue will ease the review and knowledge transfer.

 


Modern Embedded Programming: Beyond the RTOS « State Space

by Miro Samek

[second figure]

Rule 14. Concurrency (CON) - SEI CERT C Coding Standard - Confluence

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

N.Winterbottom wrote:
50 tasks seems a bit excessive, I don't even think FreeRTOS could cope with that number.
FreeRTOS tasks' stacks consume heap.

AN2751 Process Scheduling on an 8-bit Microcontroller

[PDF page 8]

4. Benefits of Using FreeRTOS™ on AVR® Microcontrollers

[second paragraph]

...

Adding one additional task consumes additionally 22 bytes of flash, and no additional RAM as each task gets its stack allocated on the heap.

...

more :

[TUT][SOFT][CODE]FreeRTOS for ATmega2560/1 | AVR Freaks

[post #1]

...

The problems that crop up regularly are:

1 - Insufficient stack space allocated to a task. ...

...

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The OP said cooperative scheduler, so I assume only one stack, tasks run to completion.  That together with initiation based on timer interrupt I suspect a task overrun.

 

OP, if this is not the case, it would be helpful to let us all know.

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Wow, and the above says why avrfreaks is such an invaluable resource - the quantity and quality of replies in no time at all is great.

 

So, to summarize;

 

  • several years ago my AVR dragon failed (with the infamous USB vreg issue), and the depths of my mind thought I'd never replaced it. I dug it out this evening, to find I must have done some time ago - fired it up and started a JTAG debug session with breakpoints at the various points where I'd 'debugged' using LED flashes previously
  • Working through all of the suggestions above, I still couldn't find the issue at all - so decided again to compare between my test code/application that runs fine in simulator, and the real code that runs (but not correctly) on the target board
  • I had previously isolated the problem to my interpolation function just by flashing an LED if certain criteria were/were not met (for example, forced sensor output to 0V, and set the function to turn an LED on if result >0 [which it shouldn't have been if the input were 0V]. To my surprise, it would turn on despite sensor readings of 0). Using Dragon tonight, what really surprised me was to see 0 as an input to the interp function, and for some seemingly random long int come back out, which was totally wrong. My test code did not exhibit this behavior, 0 in, 0 out as expected.
  • There was one difference..., the lookup array in the test code did not use PROGMEM (I removed this when I pulled the snippets out of my application into this test code, just to avoid linking in more lib files to keep the test code as small as possible. Lesson learned there.). The real application puts the lookup into flash using PROGMEM, but was accessing the data without using pgm_read_xxx, and thus writing total garbage into my interpolation function.
    • Removing PROGMEM usage from my lookup table, and the application works entirely as designed
    • Now just need to re-write my interp function to use the correct read functions. Feel totally stupid now, but as always a methodical approach to discussing the problem found the solution!
      • One thing I do need to investigate, is why reading without pgm_read returns different values each time

 

General replies to all comments below, as all feedback greatly appreciated.

 

N.Winterbottom wrote:

I take it at this stage you're just looking for "What to check next ?"  type comments:

 

  • 50 tasks seems a bit excessive, I don't even think FreeRTOS could cope with that number. Have you tried running your scheduler with just one extra task ?
  • I guess there is a high memory load with that many tasks, check that your stack pointer isn't eating your variables.
  • Write some checking code to trap the fault and using your debugger, set a breakpoint to allow examining the variables in your function.

 

 

Ah, when I say 50 - there are only ~10 'tasks' per se, but to execute these they collectively call around 50 functions. This will increase to ~100 or so when the full application is complete. The checking code was my first major step to finding the problem, done by firing off LEDs if certain threshold were met in the application that I knew shouldn't have been met.

 

The tasks themselves are (other than the interpolation function in question) mainly very fast to execute, with low memory overhead. Total SRAM usage from function calls is very low, albiet total RAM usage very high due to sensor read buffers.

 

Kartman wrote:
You might want to see how much stack you are using. Is it possible to replicate the problem in the simulator?

 

I really need to learn how to do this, I've only ever done really basic debugging using Dragon (i.e. checking variable contents etc). The problem itself CAN now be replicated in the simulator if I PROGMEM the lookup first, without re-writing the interpolation function to use pgm_read

 

curtvm wrote:

>one however requires a lookup table

in ram or flash (or eeprom) ?- if flash then one less potential corruption point can be marked off the list

 

>reads the raw ADC value

Replace that part of the function with just simulating a specific adc value (or sequence through them all). If the problem persists, then it would sound like a stack thing. If it goes away, then maybe task 0-49 is changing the adc some way that upsets task 50 (like switching the 8/10 bit res).

 

It also sounds like you either know the adc input value and the expected lookup value and can see both to know you have a problem, or you are assuming the adc is supposed to be a certain value and are expecting a certain lookup value but really do not know the actual adc value used for the lookup. If the latter, then you also cannot know if the lookup value is correct or not since you do not know what is being looked up.

 

This is the main way I was able to isolate the problem to the interpolation function first off, by cutting out the ADC read - and just inputting a known max (1023 or 0 for example) into the interpolation function, and receiving the totally wrong answer out of it. This was followed up by bringing the ADC reads back and driving the sensors to min/max values, and achieving exactly the same garbage output from the interp function.

 

ki0bk wrote:

50 tasks!!!

Which AVR is this running on?

i would guess your running into a stack or heap issue.

jim

 

 

1284p, a personal favorite wherever large dynamic buffers are needed etc due to SRAM size!

 

MattRW wrote:

Making a lot of guesses here:  Given that the cooperative schedule swaps on a timer overflow interrupt, I guess you are assuming that all tasks complete by the time that interrupt hits.   Also, I assume your scheduler spins waiting for some flag from the interrupt handler.  That flag must be declared volatile.  Do you have a way to make sure the tasks complete in time?  You could add a flag in the interrupt handler to check if the scheduler is ready and if not, fire on a red led or something.

 

Correct, the scheduler is designed to be as lightweight as possible and relies on the guarantee of all tasks completing before they're next due (which they all are, confirmed by cycle count), else they will simply be missed. The entire scheduler is implemented in somewhere in the region of <100 lines, including prioritization, MCU/timer initialization etc.

 

Perhaps one day soon I'll write it up into a tutorial style guide, given how many people are looking for really lightweight schedulers it would appear.

MattRW wrote:

The OP said cooperative scheduler, so I assume only one stack, tasks run to completion.  That together with initiation based on timer interrupt I suspect a task overrun.

 

OP, if this is not the case, it would be helpful to let us all know.

 

In summary of above, no task overrun experienced - just the PROGMEM issue that I'm kicking myself for.

 

gchapman wrote:

 

Before, pipe that source code through a linter(s)

Some C coding standards identify concurrency issues; some linters flag concurrency defects (true positive, a "hint", false positive)

Most linters are whole file instead of whole program; those can still be of use.

Assertions will reduce the defect density though a significant effort to sprinkle these throughout the files.

Ideally, a source code snippet that repeats the issue will ease the review and knowledge transfer.

 

 

Had never heard of linters before tonight, will read into this!

 

----------------------

 

Whilst a major breakthrough, only downside of this evening - whilst prodding around on my board with jumpers to feed known rail voltages into the ADC inputs on the unprotected side, I managed to prod 12VDC into an ADC input and have damaged the ADC, which will now only read up to ~2/3rds VCC... only replacement MCUs I have are the wrong package, so new ones on order!

 

Thanks again all for your input, many of the replies have given me new things to go and brush up on which'll no doubt come in useful in the future. 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well done!

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jtw_11 wrote:
The real application puts the lookup into flash using PROGMEM

Time to learn about the AVR specific keyword __flash, The IAR compiler supported this from the beginning of time, but avr-gcc has caught up. Using that you can dispense with the awful pgm_read_byte() and similar.

 

The documentation is here: avr8-gnu-toolchain-x.x.x/share/doc/gcc/Named-Address-Spaces.html#AVR Named Address Spaces

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Similar for CodeVisionAVR V2 and subsequent.

CodeVisionAVR V2 Revision History (very bottom)

 

"Dare to be naïve." - Buckminster Fuller