MEGA128RFR2, LwMesh, Callbacks causing reset

Go To Last Post
37 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Per subject line: I am running LwMesh on an ATMEGA128RFR2.

I tried to base my code on the peer2peer demo.

My init function (gets called once at startup):

void BR_meshInit(void) {
	SYS_Init();					//LwMesh API initialization (must be called before any other API calls)

	NWK_SetAddr(opSettings.selfID);
	NWK_SetPanId(MESH_PANID);
	PHY_SetChannel(MESH_CHANNEL);
	PHY_SetRxState(true);
	NWK_OpenEndpoint(MESH_ENDPOINT, meshDataInd);
#ifdef NWK_ENABLE_SECURITY
		NWK_SetSecurityKey(opSettings.securityKey);
#endif
}

In my main function:

SYS_TaskHandler();	//Call to LwMesh Stack
if(BR_RxFlag) {
     BR_handleMsg();
}

//Other stuff (which takes time...)

callback:

bool meshDataInd(NWK_DataInd_t *ind) {
   uint8_t i;
	
   if(BR_RxFlag) {
      fprintf_P(&comExt_str, PSTR("Warning - meshDataInd(): New wireless message being ignored.\n"));
   }
   else {
      BR_BufCnt = 0;
      for(i = 0; i < ind->size; i++) {
	 BR_RxBuf[i] = ind->data[i];
	 BR_BufCnt++;
      }

      BR_RxFlag = 1;
   }
   return(true);
}

Note: BR_RxFlag gets reset to 0 in BR_handleMsg().

In my application, the remote source transmits repetitive messages fairly rapidly (at 10ms or 20ms intervals). This local device sleeps for periods of time, wakes up and listens for a message, and then acts on it if one is received (i.e., send a response). It doesn't matter if it misses one (or many) messages, because the remote source will continue transmitting until it gets a response.

I am pretty new to callback functions. But as I understand it, they are called automatically when certain events happen (i.e., when executing SYS_TaskHandler). As such, I created the callback function meshDataInd to not move any data into BR_RxBuf until all previous data has been "handled". I didn't care if I dropped incoming data on the floor - it would be repeated by the other source.

My intent was to just use the printf as a debug statement to allow me to know when the local device was dropping packets. However, pretty regularly (though not EVERY time), the local device will print the warning ("ignored") three or four times and then the processor will reset. It looks like some sort of memory leakage issue, but I'm not sure how to begin tracking it down.

Does the stack have a problem if data isn't transferred out of the incoming message buffers?

Science is not consensus. Science is numbers.

Last Edited: Fri. Oct 16, 2015 - 01:45 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No, there absolutely no requirement to handle incoming data in any way.

Your problem is not in this code, but somewhere in BR_handleMsg(). Look for the reasons not to clear a flag.

BTW, you can just do "BR_BufCnt = ind->size;", no need to increment it up to a known value.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the tip regarding the size.

However, even if the flag isn't getting cleared, it shouldn't cause the chip to reset -- it should just print that message...

The handlerMsg function:

Note, BR_rxBuf is 255 bytes large, and the longest messages (below) are on the order of 10-15 bytes.

void BR_handleMsg(void) {
	uint8_t i;
	
	if(!strncmp((char *)BR_RxBuf, ARM_MSG, strlen(ARM_MSG)) && (BR_BufCnt == sizeof(ARM_MSG))) {
		//Respond with an ack
		BR_sendPacket((uint8_t *)ARM_ACK_MSG, sizeof(ARM_ACK_MSG));
		currentSysStatus.msgsRcvd.armRcvd = 1;
	}
	else if(!strncmp((char *)BR_RxBuf, ARM_CLOSED_MSG, strlen(ARM_CLOSED_MSG)) && (BR_BufCnt == sizeof(ARM_CLOSED_MSG))) {
		//Respond with an ack
		BR_sendPacket((uint8_t *)ARM_CLOSED_ACK_MSG, sizeof(ARM_CLOSED_ACK_MSG));
		currentSysStatus.msgsRcvd.armClosedRcvd = 1;
	}
	else if(!strncmp((char *)BR_RxBuf, ALERT_MSG, strlen(ALERT_MSG)) && (BR_BufCnt == sizeof(ALERT_MSG))) {
		//Respond with an ack
		BR_sendPacket((uint8_t *)ALERT_ACK_MSG, sizeof(ALERT_ACK_MSG));
		currentSysStatus.msgsRcvd.alertRcvd = 1;
	}
	else if(!strncmp((char *)BR_RxBuf, DISARM_MSG, strlen(DISARM_MSG)) && (BR_BufCnt == sizeof(DISARM_MSG))) {
		//Respond with an ack
		BR_sendPacket((uint8_t *)DISARM_ACK_MSG, sizeof(DISARM_ACK_MSG));
		currentSysStatus.msgsRcvd.disarmRcvd = 1;
	}

	//Print to terminal, if in debug mode
	if(SYS_DEBUG) {
		fprintf(&comExt_str, "BR_BufCnt: %d\n", BR_BufCnt);
		fprintf_P(&comExt_str, PSTR("Received Message: "));
		for(i = 0; i < BR_BufCnt; i++) {
			fprintf(&comExt_str, "%c", BR_RxBuf[i]);
		}
		fprintf_P(&comExt_str, PSTR("\n\n"));
	}

	//Reset flag for next message
	BR_RxFlag = 0;
}

With subfunctions:

uint8_t BR_sendPacket(uint8_t *buf, uint8_t len) {
	uint8_t ret = 0;
	
	if(meshDataReqBusy || len == 0) {
		ret = 1;
	}
	else if(len > WLAN_BUF_SZ) {
		ret = 2;
	}
	else {
		memcpy(meshPacketBuf, buf, len);
		meshPacketBufPtr = len;
		meshSendData();
	}
	
	return(ret);
}

void meshSendData(void) {
	if(meshDataReqBusy || (meshPacketBufPtr == 0)) {
		return;
	}

	memcpy(meshDataReqBuf, meshPacketBuf, meshPacketBufPtr);

	meshDataReq.dstAddr = opSettings.targetID;
	meshDataReq.dstEndpoint = MESH_ENDPOINT;
	meshDataReq.srcEndpoint = MESH_ENDPOINT;
	meshDataReq.options = NWK_OPT_ENABLE_SECURITY;
	meshDataReq.data = meshDataReqBuf;
	meshDataReq.size = meshPacketBufPtr;
	meshDataReq.confirm = meshDataConf;
	NWK_DataReq(&meshDataReq);

	//Reset for next packet
	meshPacketBufPtr = 0;
	meshDataReqBusy = true;
}

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I can't tell what might be the problem, but here are some suggestions:
1. Why not use binary messages? It will be just one byte with no hassle of parsing this stuff.
2. Do "BR_RxBuf[BR_BufCnt] = 0" and use "if (!strcmp((char *)BR_RxBuf, ARM_MSG)) ..."
3. Why double copy data first to meshPacketBuf and then to meshDataReqBuf?
4. What happens if you put debug message some place earlier in the function?

LE: 2 might require to strip '\r' and '\n' characters if you are using human terminal input.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Good points Re: double copying. That is left over from when I first started playing w/ LwMesh. I have cleaned it out.

I have been studying the peer2peer demo project a little more. Could you explain what the appDataConf function does? It looks like it is assigned as the .confirm callback in appSendData. It also calls appSendData() - which could conceivably create an infinite loop. This may be a case of me following the example too closely without grasping everything it is doing.

I think I do not need to have my conf function call the equivalent of appsendData again. Did you do this to keep going as long as user was entering text via uart?

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

NWK_DataReq() puts request to the queue and exits, so infinite loop is impossible. Remember - all callbacks are called from SYS_TaskHandler() only, so there is no chance of appDataConf() being called from NWK_DataReq() anyway.

Confirmation handler calls request again to send data that might have been accumulated while data was sent. You don't need to follow this in your application. It was also changed in the new version (will be released this week) to better handle transmission of large files.

Attachment(s): 

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

When is the confirmation handler called?

I think when my conf function was called, it re-called the send data function, which did not fall out (like the peer2peer demo) because the data was still sitting in the [improperly double buffered] array.

I removed the call to the sendData function from the conf function, and am not seeing the behavior I saw before as frequently (i.e., hanging, followed by reset - which is odd, because the watchdog is set up for interrupt only...) - but it still happens occasionally. Obviously, I still have a problem elsewhere in my code, but I think this was at least part of it....

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is called when data is sent. In your case you should at least set "meshDataReqBusy = false;" there. Otherwise nothing will work.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So somehow, my call to SYS_TaskHandler() is getting hung in an infinite loop. I inserted a series of printed debug messages, including brackets around the call to SYS_TaskHandler():

	while(1) {
		//Feed the watchdog
		// At first, use wdt to wake up and listen.
		// Once tripped, use wdt to trip out of Irid loop
		wdt_reset();


if(debugFlag) {
	fprintf(&comExt_str, "mainTest\n");
}

		SYS_TaskHandler();		//Call to LwMesh Stack

if(debugFlag) {
	fprintf(&comExt_str, "mainTest1\n");
}

		appHandler();

if(debugFlag) {
	fprintf(&comExt_str, "mainTest2\n");
}

	return(0);
}

The messages received are in the attached picture. Because "mainTest" printed but "mainTest1" did not, I believe that the call to SYS_TaskHandler() is where the program is hanging. See that the message from meshDataInd is printed twice for a single call to SYS_TaskHandler(). Is this due to the fact that the remote source is transmitting messages very rapidly (i.e. 20ms interval)?

Note, the "Warning - meshDatatInd()..." message above is printed in the callback function called from SYS_TaskHandler(). Also, the "Watchdog" message is printed from the WDT ISR (just for debugging).

Based on your suggestions above (and your posted example in the new peer2peer), my "app" functions for the stack interface are now:

void BR_meshInit(void) {
	SYS_Init();					//LwMesh API initialization (must be called before any other API calls)

	NWK_SetAddr(opSettings.selfID);
	NWK_SetPanId(MESH_PANID);
	PHY_SetChannel(MESH_CHANNEL);
	PHY_SetRxState(true);
	NWK_OpenEndpoint(MESH_ENDPOINT, meshDataInd);
#ifdef NWK_ENABLE_SECURITY
		NWK_SetSecurityKey(opSettings.securityKey);
#endif
}
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
uint8_t BR_sendPacket(uint8_t *buf, uint8_t len) {
	uint8_t ret = 0;
	
	if(meshDataReqBusy || len == 0) {
		ret = 1;
	}
	else if(len > WLAN_BUF_SZ) {
		ret = 2;
	}
	else {
		memcpy(meshDataReqBuf, buf, len);

		meshDataReq.dstAddr = opSettings.targetID;
		meshDataReq.dstEndpoint = MESH_ENDPOINT;
		meshDataReq.srcEndpoint = MESH_ENDPOINT;
		meshDataReq.options = NWK_OPT_ENABLE_SECURITY;
		meshDataReq.data = meshDataReqBuf;
		meshDataReq.size = len;
		meshDataReq.confirm = meshDataConf;
		NWK_DataReq(&meshDataReq);

		//Reset for next packet
		meshDataReqBusy = true;
	}
	
	return(ret);
}
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bool meshDataInd(NWK_DataInd_t *ind) {
	uint8_t i;
	
	if(BR_RxFlag) {
		fprintf_P(&comExt_str, PSTR("Warning - meshDataInd(): New wireless message being ignored.\n"));
	}
	else {
		BR_BufCnt = ind->size;
		for(i = 0; i < BR_BurCnt; i++) {
			BR_RxBuf[i] = ind->data[i];
		}

		BR_RxFlag = 1;
	}

	return(true);
}

How is the return value of the ind callback function used? I made it always true, again, based on the peer2peer example. Is this a misunderstanding on my part?

Attachment(s): 

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'd start printing debug messages inside the stack starting from SYS_TaskHandler(). There is one well known reason for stack to hang like this - NWK_DataReq() on the same structure twice, but meshDataReqBusy protects you from this. Any other reason I can think of is a badly damaged stack memory, which is impossible to guess, only debugging will help.

Quote:
How is the return value of the ind callback function used?

It can be used to suppress the acknowledgment (if false is returned). Returning true all the time is fine, especially given that you don't request an ack anyway.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For the record, the program hangs (and sometimes causes the chip to reset) after the 2nd call to Sys_TaskHandler() after the remote source starts transmitting (at least the second call after it receives the first message from the remote source).

I have narrowed the problem down to the function void nwkRxTaskHandler(void) in nwkRx.c

Specifically, in that function, there is a while loop:

while (NULL != (frame = nwkFrameNext(frame))) {...}

Working theory: if messages come in too quickly, they will continue to queue up, causing this loop to never end.

To test this, I started a 16 bit timer, and grabbed the TCNT value at the beginning and end of the loop (for now, not accounting for multiple rollovers).

I mentioned above that it hangs the SECOND time (extremely repeatable) this function is called.

The first iteration takes on average 66 clock cycles (unless there was a rollover event). The second iteration never exits the while loop. My clock is 16MHz -- 66 clock cycles is an eyeblink. I am not sure how a 20ms interval message rate could cause this loop to hang like this.

Can you provide any additional insight?

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Only one message can be received in one task scheduling cycle, for each new message PHY_TaskHandler() must be called again. The same principle, PHY_DataInd() is called from PHY_TaskHandler() only and PHY can only buffer one frame at a time.

What is your setting for NWK_BUFFERS_AMOUNT? It looks like there is a problem with nwkFrameNext() - if the last frame ever gets allocated, the end will never be detected. I'll need to have a closer look at this, but try to increase NWK_BUFFERS_AMOUNT.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And, if possible, don't destroy your setup for a while, I'll try to come up with a fixed version shortly and I want to see if it works for you.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I currently have NWK_BUFFERS_AMOUNT set to 3.

When I change it to 6, the function gets called 5 times before the hang. For fun, I upped it to 8 - the function gets called 7 times.

My setup isn't going anywhere! I will happily be a guinea pig.

Edit: one more factoid (in case it affects your planned alterations): This function keeps getting called even when new messages aren't coming in. I know this because the remote source is showing that it is receiving the expected [manual] ack from this local source. Once it receives the manual ack, it stops transmitting. After the remote source stops transmitting, the local source keeps churning through imaginary messages.

Science is not consensus. Science is numbers.

Last Edited: Tue. Aug 20, 2013 - 09:27 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, so the reason it hangs is allocation after the end of buffer, but the fact that frames are not freed is now a problem.

Try to find out what is the value of frame->state of all those frames.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So here is a fixed implementation:

NwkFrame_t *nwkFrameNext(NwkFrame_t *frame)
{
  if (NULL == frame)
    frame = nwkFrameFrames;
  else
    frame++;

  for (; frame < &nwkFrameFrames[NWK_BUFFERS_AMOUNT]; frame++)
  {
    if (NWK_FRAME_STATE_FREE != frame->state)
      return frame;
  }

  return NULL;
}

With this you should no longer see hang ups, but your buffers will likely to be filled and we'll have figure out why.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

hobbss wrote:
This function keeps getting called even when new messages aren't coming in.
This function just goes though all frames, so yes, it is called all the time, but it only has problems when the last buffer is allocated, which in your case happens with 3-rd, 6-th and 8-th frame, because previously allocated frames are not released.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I will try the fix you posted.

As requested, here is the output for the frame->state in the function. Note, the last "Start!" (23 lines in) is the beginning of the hang.

Attachment(s): 

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

States 48-52 (0x30-0x34) are security encryption/decryption, 32 and 33 (0x20 and 0x21) are NWK_RX_STATE_RECEIVED and NWK_RX_STATE_DECRYPT. Last states are just junk when while loop tries to read regular memory as frames memory.

So something is wrong with security. So the first frame is sitting there with NWK_SECURITY_STATE_WAIT and waits for the decryption to complete, but it never does. Make sure that you have SYS_SECURITY_MODE set to 0 or 1.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I tested the code you posted, and you are right -- it did not hang. I am not sure if the buffers are filling or not.

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm sure they have the same states 51, 49, 49 ...

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

alexru wrote:
I'm sure they have the same states 51, 49, 49 ...

I'm sorry -- I don't understand your statement above.

I have SYS_SECURITY_MODE set to 0.

Full config.h below:


#ifndef CONFIG_H
#define CONFIG_H

//Application Specific Parameters~~~~~~~~~~~~~~~~~~~~~~
//This should be unique for each device on the same network
//Make this an opSetting (set in EEPROM)
//#define MESH_ADDR				0x0001						//Node Network Address (must be unique in a network)
															//0x0000 			--> coordinator
															//0x0001 to 0x7FFF	--> Router
															//0x8000 to 0xFFFE	--> End Device

#define NWK_MAX_SECURED_PAYLOAD_SIZE  (NWK_MAX_PAYLOAD_SIZE - NWK_SECURITY_MIC_SIZE)

//These should be the same for all devices on the same network
#define MESH_CHANNEL			0x0F						//Radio Transceiver Channel, Valid range: 0x0B to 0x1A
#define MESH_PANID				0x1234						//Network Identifier
#define MESH_SENDING_INTERVAL	2000						//Coordinator: interval between sending data to the UART
															//Router: interval between reporting sensor values to the coordinator
															//End Device: Sleep interval
#define MESH_ENDPOINT				1						//Application main data communication endpoint
#define DEFAULT_MESH_SECURITY_KEY	"123456789abcdefg"		//Security encryption key

//Generic System Parameters~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#define NWK_BUFFERS_AMOUNT					6
#define NWK_MAX_ENDPOINTS_AMOUNT			3
#define NWK_DUPLICATE_REJECTION_TABLE_SIZE	10
#define NWK_DUPLICATE_REJECTION_TTL			3000			//ms
#define NWK_ROUTE_TABLE_SIZE				100
#define NWK_ROUTE_DEFAULT_SCORE				3
#define NWK_ACK_WAIT_TIME					1000			//ms
//#define NWK_ENABLE_ROUTING
#define NWK_ENABLE_SECURITY
#define SYS_SECURITY_MODE					0


#endif

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In your log there are lines:

Quote:
start!
frame->state: 51
frame->state: 49
frame->state: 49
frame->state: 49

This happens when SYS_EncryptReq() was called , but it did not call SYS_EncryptConf(). In this case (SYS_SECURITY_MODE = 0) it all boils down to PHY_EncryptReq() and PHY_EncryptConf().

Put print statements there and see why it does not want to encrypt.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

With your modified code (from above), the req/conf functions get called in pairs.

When I put the original code back in, it fails again.

void SYS_EncryptReq(uint8_t *text, uint8_t *key)
{
#if SYS_SECURITY_MODE == 0
if(debugFlag) {
	fprintf(&comExt_str, "sysEncryptReqTest1\n");	
}
  PHY_EncryptReq(text, key);
#elif SYS_SECURITY_MODE == 1
  swEncryptReq((uint32_t *)text, (uint32_t *)key);
#endif
}

#if SYS_SECURITY_MODE == 0
/*************************************************************************//**
*****************************************************************************/
void PHY_EncryptConf(void)
{
if(debugFlag) {
	fprintf(&comExt_str, "sysEncryptReqConf1\n");
}
  SYS_EncryptConf();
}
#endif

Results:
With your new code:

sysEncryptReqTest1
sysEncryptReqConf1
sysEncryptReqTest1
sysEncryptReqConf1
sysEncryptReqTest1
sysEncryptReqConf1
sysEncryptReqTest1
sysEncryptReqConf1

With the old ("released" lwMesh code):
here
sysEncryptReqTest1
here
here
here

("here" is just another message elsewhere in the code to show main state machine is functioning). Note, that the conf did not get called at all...

The weird thing about this that there are four wireless interactions between the local and remote sources (separated by long periods of time). The first two never have a problem. The third one sometimes does, the fourth one always does. This seems to imply something external to the stack, but how could that be?

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It fails on the first attempt or when it is already looped in the infinite loop? I think I've lost track of what is going on.

What does not work with the modified code?

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Good question. I will re-run and post results.

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is bizarre. I have removed all debug print statements (both from the lwMesh stack and from the main application code). I put the original code back in for NwkFrame_t *nwkFrameNext(NwkFrame_t *frame){}

I am no longer getting any errors - no hang, no reset, etc.

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am running out of hair to pull out. I am using a "vanilla" lwmesh -- i.e., fresh check out. My hardware is identical to yesterday (and earlier today) tests. My software is the same. I have been working on a local copy the last 36 hours or so while trying to sort this out, so it would be non-trivial to revert to the last copy in my repository.

I cannot get the error I was receiving yesterday (or this morning)...

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, this is OK. Just put the modified code back, modification is correct no matter what :)

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I will put the modification back. Could you clarify what you meant about buffers filling?

Thanks again for your help.

At this point, until the problem manifests itself again, I can only hope that somehow the problem was not in the stack, and I fixed the error inadvertently without realizing it.... Though I am not sure how that can happen w/ simply putting in and removing fprintf statements.

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

hobbss wrote:
I will put the modification back. Could you clarify what you meant about buffers filling?
That effect you saw previously, where frame status is not 0 for a long period time. Frames are supposed to be allocated temporarily and then released as soon as they are no longer needed. In your case frame buffer was allocated, but never released because security processing never finished.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I get it now. With the new code that you provided, is this not an issue? I misunderstood your earlier comment.

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This new code does not fix the issue you were having, because I have no idea what this issue was.

This code fixed a bug when all buffers are allocated (MCU reset). Normally this is not a problem because rarely all buffers are allocated, but if something else goes wrong and buffers are never released (like in your case), then at some point they all will be allocated => reset bug.

Now if they all are allocated, nothing will work and frames will be dropped, but there will be no reset, code will just wait until they are released.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok. I understand completely now. I am still at a loss as to why the error (which was eminently repeatable) suddenly stopped occurring. In my opinion, it strongly points to the problem being outside the stack. I must have somehow inadvertently changed something while shuffling around my debug print statements. That seems like a really lame explanation, but the best I can come up with right now.

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Could someone please tell me if this bug affect LwMesh 1.0.0? I can't find nwkFrameNext() in version 1.0.0 I'm using. And right now we haven't time to migrate to 1.2.0 yet.

Besides, is there any other major bugs that could cause hang/crash to 1.0.0 that I should fix? I've seen on the release note that only minor bugfixes were done from 1.0.0 to 1.1.0 that should not affect much on the system.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

At this point I personally don't remember, it was a long time ago. But there were problems fixed in v1.2.0 that could cause problems on highly loaded networks. They are fixable by replacing a PHY, but it is easier to replace the stack itself.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Many thanks Alex, we're going to upgrade to 1.2.0 soon.