Variables sharing memory?! very strange.. to me at least!

Go To Last Post
15 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello fellow Freaks!

I have recently been baffled by a section of my own code that does not seem what it appears.

Basically, I have an if > 0 statement. Followed by a switch statement. To get into the in statement the variable must be >0, but in the switch statement, I find that the variable is zero.

I have attached the code below. It should monitor a timer that measures the time between encoder pulses on a motor-driven axis. At every pulse, the timer is set to zero and if the motor current is off the timer is set to zero. Otherwise, if the current has been on a long time without sensing an encoder pulse, an error is flagged.

So, the code is below. I believe that the MotorControlAxisError really is zero the whole time, because if it would be set to anything else, I would get an additional message (MotorControlAxisError can only be set to 1,2,4 or 8 at the top of this function and I should then get a message). So I think that the entering the if > 0 statement is incorrect.

BUT I then went and added the "What am I doing here?!" which tells me sure enough that the MotorControlAxisError is 1 or 3 or 9 or something!

Okay I think, maybe I did set this variable somewhere. Maybe I went outside a table boundary and trashed some memory...

But then, when I go to the switch statement, it checks the variable and tells me that no, MotorControlAxisError is zero.

So, apart from printing out lots of UART messages, I also installed GCC 4.4.2, searched everywhere to see if MotorControlAxisError is accessed by another function and did a lot of swearing.

In summary, it appears that the actual value of MotorControlAxisError is incorrectly changed to > 0, then somehow magically changes back before the switch statement.

The PCMsg and PCMsgNum functions should not have any effect, other than to take a lot of time.

The error that I mentioned here is one of those nice ones that occur a few times a day - making it especially easy to track down.

I do have a heap of interrupt-driven processes in the background, but what the f*@k, in hardware or software, takes a variable, changes it, then changes it back again?

I'm going insane here and would be grateful for any ideas!

static void MotorControlEnsureMovement(void)
{

	if (MotorControlAxisError==0)	//If there is not already a pending error...
	{	
		//Check the condition of the primary axis
		if (!(MotorPrimHL==MOTOR_PRIMARY_ENCSTATE)) 	//Clear timer if encoder changes high-low or low-high
		{						//MOTOR_PRIMARY_ENCSTATE =>  (PINE&(1<<4))
			MotorPrimEncTimer = 0;		//If this timer reaches limit => a motor error.
			MotorPrimHL = MOTOR_PRIMARY_ENCSTATE;	
		}
		else if (MOTOR_PRIM_CURRENT_OFF) MotorPrimEncTimer = 0; //Keep the timer zeroed so long as there is no current to motor.
		else if (MotorPrimEncTimer>ENCODER_PRIMARY_TIMEOUT)	//There has been current to the motor for a long time without a 
		{								// response from the encoder, so flag error.
			MotorControlAxisError=1;
			PCMsg("Setting MCAE 1");		//PCMsg() => #define PCMsg(debugString) PCMsgFn(PSTR(debugString))
		}							// prints a string stored in flash


		// ----- And so on for another motor  and a backup encoder for each motor ----------------

		//Check the condition of the secondary axis
		if (!(MotorSecHL==MOTOR_SECONDARY_ENCSTATE))
		//.....
		else if (MotorSecEncTimer>ENCODER_SECONDARY_TIMEOUT)
		{
			MotorControlAxisError=2;
			PCMsg("Setting MCAE 2");
		}

		//Check the condition of the primary axis backup encoder
		if (!(MotorPrimCkHL==MOTOR_PRIMARY_CHECK_ENCSTATE))
		{
		//...
		else if (MotorPrimCkEncTimer>ENCODER_PRIMARY_CHECK_TIMEOUT)
		{
			MotorControlAxisError=4;
			PCMsg("Setting MCAE 4");
		}

		//Check the condition of the secondary axis encoder
		if (!(MotorSecCkHL==MOTOR_SECONDARY_CHECK_ENCSTATE))
		//...
		else if (MotorSecCkEncTimer>ENCODER_SECONDARY_CHECK_TIMEOUT)
		{
			MotorControlAxisError=8;
			PCMsg("Setting MCAE 8");
		}



		//------------------- Now comes the fun bit ------------
	
		//Note that the top if can only be entered  if (MotorControlAxisError==0)
		//so if MotorControlAxisError has changed, I should have on of the messages, 
		// e.g. "Setting MCAE 8" -> but I don't

		if (MotorControlAxisError>0)	//First loop with the new error!
		{
			PCMsgNum("What am I doing here?!",MotorControlAxisError);
				//This prints the message and the number value of MotorControlAxisError.
				//Sure enough, it is > 0, but how?!

			PCMsg("ERROR! MOTOR ENCODER ERROR HAS OCCURRED!"); 

			//Now find out which of the errors this is (the user needs to konw!)
			PCMsg("Source of the error is: ");
			switch(MotorControlAxisError)
			{	
				case 1:
					PCMsg("PRIMARY AXIS - MAIN ENCODER");
					break;
				case 2:
					PCMsg("SECONDARY AXIS - MAIN ENCODER");				
					break;
				case 4:
					PCMsg("PRIMARY AXIS - BACKUP ENCODER");				
					break;
				case 8:
					PCMsg("SECONDARY AXIS - BACKUP ENCODER");				
					break;
				default:
				//Theoretically, I will never get this because the MotorControlAxisError
				// is ONLY set to 1,2 4 or 8 above, but low and behold,I get
				// MotorControlAxisError = 0
					PCMsgNum("UNKNOWN! - code ",MotorControlAxisError);	
					break;
			}
		}
	}
	//.......
}

//--------- Output example: -------------

14:28:35 What am I doing here?! 3
14:28:35 ERROR! MOTOR ENCODER ERROR HAS OCCURRED!
14:28:35 Source of the error is: 
14:28:35 UNKNOWN! - code  0

//------------------------------------
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ISR register corruption? There are some versions of avr-gcc that don't properly save all the registers in the ISR. It causes all kinds of weird errors. Make sure you have a proper toolset.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

In summary, it appears that the actual value of MotorControlAxisError is incorrectly changed to > 0, then somehow magically changes back before the switch statement.

Is there anything obvious in the .lss? How about when you step the code in the simulator while using the C+Asm view?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Any "normal" cause for a memory corruption is also possible:
- stack grows into data
- out of bounds array access
- rogue pointer
...

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks Bill,

I did update my compiler from the really old one that was in the Ubuntu repositories to the latest GCC 4.4.2., Binutils 2.20, LibC 1.6.7. I'm also using the Kontrollerlab IDE with its integrated 'makefile' in case that makes any difference.

The fact that I changed my whole too set and still have the problem seems to rule out the toolset itself.

I would guess it has something to do with the ISRs because I don't think any other code could come between my different printouts of the variable in question.

Is ISR register corruption really something that can stuff up a variable temporarily and then somehow make it okay again ?!

The variable that is affected is in no way present in the ISRs.

Am I missing the point about how the variable MotorControlAxisError is stored and retrieved?

For example, is the variable read from yyy to the xxx register, then along comes the interrupt and makes a mess of the xxx register. The program doesn't realise this and reads the xxx register, thinking that the variable is correct and the value is 3. Then comes the switch statement, the variable is again loaded from yyy to xxx so that the correct value 0 is now in xxx. Anyone get what I mean?

If this type of scenario is really possible?

btw. It is declared globally as a uint8_t and used in one or two other files for reference purposes (only read, not written). The variable is only written to in this function.

I am also using globally declared variables in a statically declared function. But have not read anything to say that this is a problem.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@Clawson,
this is probably the first place I should look, but have not yet familiarized myself with the simulator (on my to-do list) and don't appear to have a .lss file (linux + Kontrollerlab).

I am just surprised that between reading a variable twice, it can be changed and changed back again. Should I really expect extra code to be 'inserted' by the compiler that could change the variable value like this or should change the behaviour off the program?

@Stefan
I am 99% sure I do not have an out of bounds array access, but I will check again. Would this not have the effect of just trashing everything, rather than changing a variable and changing it back again?

What would be the typical cause of the other two errors (stack grows into data, rogue pointer?) and could they have the symptoms described (variable is changed and then changed back again with no write to that variable)?

Last Edited: Fri. Dec 4, 2009 - 05:02 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I am just surprised that between reading a variable twice, it can be changed and changed back again.

Before you jump to that conclusion, actually output the variable value from your default: block. Just as a guess as AVR compilers are pretty mature and stable, your value is not 0, but also not 1,2,4,8. For example, what if a "double error" and more than one bit is set?

(I'd perhaps skip the if() test and have a case 0:, but that is style and also depends on the app, how long the case list is, and how I feel on a given day.)

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks Lee,

You are right about using the if() and then a switch, this could be simplified.

I actually do print out the value of MotorControlAxisError in the default: block, and it appears to be zero. (I have an example output at the end of my code section).

Also, the only parts of the code that write a number to the variable do not appear to be activated (they would output a message).

Finally, there is also another part of code further on, that stops the motors if the MotorControlAxisError is not zero, and this is not activated, also suggesting that the variable gets changed temporarily, then changed back to zero before it has any effect.

I am a bit concerned about how many of my other variables might be changing back and forth.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I am a bit concerned about how many of my other variables might be changing back and forth.

Variables don't change back-and-forth.

It might be worth the while to post the generated code listing, maybe as an attachment as the whole function would be most useful for analysis. A compiler bug is possible, but usually improbable.

Your symptoms could occur if e.g. a frequently-run ISR clears that variable.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What are PCMsg and PCMsgNum?
Functions? Macros?

"Demons after money.
Whatever happened to the still beating heart of a virgin?
No one has any standards anymore." -- Giles

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It would also be very useful to see all the declarations of
MotorControlAxisError. actual declaration and the "extern" declarations.

It could be that the value is not changing and not "zero".
It could be your function/macro PCMsgNum() is not interpreting it correctly.

Perhaps it is something goofy like a -0 vs +0 value on a signed int/char.

Funny things can happen with 8 bit values
when not using unsigned chars. (uint8_t)

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you all for your suggestions!

Lee, the variable in question is not involved in any ISRs, but perhaps the ISRs are affecting this vairable (and possibly others indirectly). I agree that a compiler error is unlikely (especially seeing as I have updated to the latest compiler).

Bill, Skeeve,

#define PCMsgNum(debugString, uin16_t) PCMsgFnNum(PSTR(debugString), uin16_t)

so no signs in sight. I am really not expecting and error here because I have been using these functions for a long time in a lot of places and have not experienced any unusual errors.

//------------------------------------

In new news, I have focused my attention now on the hardware. I have set up two other systems with practically the same program (some minor differences in mechanical equations- but all the main bits identical) and surprise - surprise - no errors as yet (fingers crossed!!!)

With the dodgy system, I have increased the diagnostics and noticed that some of the other variables are being fragged. Also, the RTC chip seems to be getting hit and registering inappropriate errors.

I think some nasty stuff is coming along the power-supply line or in on the sensor cables... but this has nothing to do with the GCC forum.

I will do some more tests and post an update.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why are you resistant to posting the generated code for the routine in question? Surely that would slice through much of the speculation.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Now we see what PCMsgNum() is but it simply references PCMsgFnNum() which is either another macro or a two argument function with no idea what the expected type of those 2 arguments are or how they are used.
uin16_t is simply a typeless macro parameter.
Also, still no clue what the declaration of MotorControlAxisError is.

How about providing the generated code as was already asked for or give us something that we can compile ourselves?

Without additional information it is a guessing game that folks will soon start to walk away from.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Lee, Bill, thank you both very much for your time,

In summary,

Quote:
What...in hardware or software, takes a variable, changes it, then changes it back again?

Answer:

Quote:
Variables don't change back-and-forth.

This and the ISR comments really gave me enough to go on.

Given that I have observed some other 'corruption-like' errors, I would prefer to find out for myself where it all starts and ends.

With any luck the hardest part will be reproducing the errors. Since changing the power supply yesterday, everything has been fine.