Network reconnect when sending custom payload on WSNDemo

Go To Last Post
36 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello all,

Summary:
I am trying to make something very similar to the peer2peer demo using the WSNDemo as a starting framework.
I've make custom data structure to be passed as appCommand_t payloads, but when I send my custom packets, I get disconnected and reconnected to the coordinator.

The details:
The chips are atmel 1280s, or ATZB 24-A2
I am working with bitcloud SDK version 1.14
I have modified the router and coordinator to have the following additional payloads to AppCommands:
maximStats_t
maximCmd_t

I made this modification to the appropriate headers,
added callbacks and such, I think I have set these up properly.

Once I have data to send to the coordinator, I build a packet and add it to the outbound queue with
appCreateTxFrame()

Using the debugger, I see appMsgSender() call
APS_DataReq(&txFrame->msgParams);
four times,
and then I am reconnecting to the network.

Please let me know which files/data would be helpful in solving this problem.

I am slowly tracing the problem back through calls,
Looks like appLeaveNetwork is being called, but I'm not sure why yet.

Apparently:

txState = APP_MSG_TX_SENDING_FAILED_STATE;

After the 4th packet goes out.

Getting confInfo.Status of -86

Traced it as far back as I could:

static void msgSenderApsDataConf(APS_DataConf_t *confInfo)
gets confInfo of -86.

Converting that to binary:

86 is: 0101 0110
Flipping the sign bit:
1101 0110
In hex:
1101=1+4+8=13 0110=2+4=6
Error C6.

According to APSCommon.h that means:

00239   APS_NWK_SYNC_FAILURE_STATUS            = 0xC6

The help documentation says:

APS_NWK_SYNC_FAILURE_STATUS  An NLME-SYNC.request has failed at the MAC layer.

NLME-SYNC, is the Network Layer Management Entity

I found this on google:

http://people.ece.cornell.edu/land/courses/ece4760/FinalProjects/s2011/kjb79_ajm232/pmeter/ZigBee%20Specification.pdf

From the zigbee Spec:

3.2.2.22.2  When Generated
This primitive is generated whenever the next higher layer wishes to achieve
synchronization or check for pending data at its ZigBee coordinator or router.

So it is possible I am aiming my packet at a bad address?
or perhaps the src address is invalid?

	jCommand = (AppCommand_t *)malloc(sizeof(AppCommand_t));
	jCommand->id = APP_MAXIM_STATS_ID;
	 reply = (MaximStats_t *)malloc(sizeof(MaximStats_t));
	 strncpy(reply->payload,inputBuffer,inputBufferSize);//Set reply payload
	 reply->srcAddress = 0002;//Set reply source address, !! needs to be set from memory
	 memcpy(&((*jCommand).payload) ,reply,sizeof(MaximStats_t));
	//Copies the pCommand to a field, and updates the pointer, makes an entry in the TXqueue.
	//pointertrouble2
	if (appCreateTxFrame(&pMsgParams, &jCommand, NULL))//and then pMsgParams is defined
	{
		memset(pMsgParams, 0, sizeof(APS_DataReq_t));
				
		pMsgParams->profileId               = CCPU_TO_LE16(WSNDEMO_PROFILE_ID);
		pMsgParams->dstAddrMode             = APS_EXT_ADDRESS;
		pMsgParams->dstAddress.extAddress   = 0x000321LL;//(*jCommand).payload.identify.dstAddress;
		pMsgParams->dstEndpoint             = 1;
		pMsgParams->clusterId               = CPU_TO_LE16(1);
		pMsgParams->srcEndpoint             = WSNDEMO_ENDPOINT;
		pMsgParams->asduLength              = sizeof(MaximCommand_t) + sizeof((*jCommand).id);
		pMsgParams->txOptions.acknowledgedTransmission = 1;
		#ifdef _APS_FRAGMENTATION_
		pMsgParams->txOptions.fragmentationPermitted = 1;
		#endif
		#ifdef _LINK_SECURITY_
		pMsgParams->txOptions.securityEnabledTransmission = 1;
		#endif
		pMsgParams->radius                  = 0x0;
	}
	
Last Edited: Fri. Oct 16, 2015 - 02:18 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Your conversion is incorrect. To convert from two's complement you need to invert entire number (bit by bit) and then add one:

-86 == 0101 0110
inv == 1010 1001
+1  == 1010 1010 == 0xaa

And 0xaa == APS_NOT_SUPPORTED_STATUS

How big your asduLength is?

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks Alex,

My asduLength was 138, I've since reduced it to 18 by splitting my data into individual payloads of length 8 with a sequence field.

Now I am able to send messages without visible error,

But on the coordinator side I am not sure if I am receiving them.
The ID I set for my maximStats was "11"
but on the coordinator:
I cast the incoming packet payload to appCommand_t
and the ID inside is decimal 128, which I dont think corresponds to anything in the enum.

I also see the regular network updates (id is set to 1)

I'll have more information for you later today, and once again thank you for your help.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Maximum possible unfragmented ASDU size is 95. And according to this message fragmentation is not supported in this SDK.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello Alex,

I changed the code to break my messages into a smaller ASDU size and now the result of sending the messages is
APS_SUCCESS_STATUS

On the coordinator side I have a breakpoint on
static void appApsDataIndHandler(APS_DataInd_t *ind)

and my packets are arriving.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here are the problems I am working on now:
1.Solved.
2.Solved. Redid the code for the ASDULength and turned on fragmentation.

And here is the 3rd and lowest priority problem:

	3.Get string representations of network parameters:
		I'd like to print some data from my custom packets:
		Source address
			16 bit little endian integer
		Sequence
			8 bit integer
		Payload
			char *, length stored in the packet.

Possible new problem:
4.Coordinator sometimes cannot get messages to router.
The error I see is this:
APS_NWK_ROUTE_DISCOVERY_FAILED_STATUS = 0xD0
And after a few retries:
APS_MAC_NO_ACK_STATUS = 0xE9,
or SUCCESS, but the router in the other room does not see that (false) SUCCESS.

I thought I had solved problem 2, the router jumping off the network, as it ran fine for about 30 minutes on my desk. I'll bring it back in here and run through the debugger.

Update:
I am watching the router attempt to join the network here on my desk, with the debugger.
This is the error code from AppZDOStartNetworkConf
-22, which is:
APS_MAC_NO_BEACON_STATUS = 0xEA

The devices are less than 20cm apart, so I dont think the range is an issue.

I reset both and the router hopped on for a bit, sent a few packets across and then got a failed transmit error (-88), and left the network.
When it tries to rejoin, it repeatedly gets the error "-128" which is 80 in hex, and not in the enum.

Looking at the routers memory in the debugger,
it seems like the packets coming from the coordinator are malformed.
The have an ID of 0, which is not what I intended to send.
After this packet is sent, the router disconnects, and the Coordinator crashes.

It is 1AM here, time to take a break.
Night all, wish me luck Alex.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Set CS_UNIQ_ID to 1
There is no such parameter in BitCloud.

Quote:
Set shortaddr in config file
Set proper Little Endian destination on coordinator.
Can you be more specific here?

Can you share your application?

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am using the WSNDemo application,
The uniq ID define in the configuration.h was
#define CS_NWK_UNIQUE_ADDR 1
sorry for the confusion.

For the shortaddr destination address I had originally set it to
pMsgParams->dstAddress.shortAddress = 0x0001;
But after that forum post I changed it to:
pMsgParams->dstAddress.shortAddress = CPU_TO_LE16(0x0001);

I will PM you with my router and coordinator projects.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

CPU_TO_LE16() is only needed on AVR32.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Was the above APS_MAC_NO_BEACON_STATUS ever resolved?

Code that has run on many versions of BitCloud up to 1.12 now gives this error on 1.14. I see no obvious reason for it to occur.

Same circumstances. Code runs perfectly well as a Co-Ordinator but gives this error when configured as Router. Configuration is via 'sliders' and code is compiled with 'All' libs.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you set CS_UID for each device? Can you get a sniffer log?

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

CS_UID is set for each device.

It is read via CS_ReadParameter(CS_UID_ID, &extAddr); and then printed to debug output via usart. The serial number chip is being read correctly, ie. extAddr contains expected value.

Sniffer log will be too busy, there's a lot of traffic.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

FWIW the task 'states' go through 'initing' then 'starting' with:

APP_EVENT_NETWORK_STARTED but ZDO_StartNetworkConf_t.status == 0xea

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dalew1 wrote:
FWIW the task 'states' go through 'initing' then 'starting' with:

APP_EVENT_NETWORK_STARTED but ZDO_StartNetworkConf_t.status == 0xea

This just can't happen according to the application logic, at least in the original application.

So this happens only in heavily loaded networks?

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No this only happens when linked against BitCloud 14.

BitCloud 12 works under identical circumstances and with identical code.

I've checked and rechecked configuration.h files from both and they match. No security is being used.

Last Edited: Tue. Jan 29, 2013 - 05:55 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My guess would be that some CS_* buffer sizes changed the way they define size and configuration.h no longer defines necessary values. I'd go over csDefaults.h and check if there are some new CS_* parameters that were not in BC 1.12.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Will investigate CS parameters. I've inspected pretty much everything else.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nothing obvious with CS parameters.

Its not clear what the resolution was to the original posters problem. Is this not similar / related?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I don't think it was solved, at least no publicly.

But at the same time I don't think there is something wrong with BitCloud either. So I'd try to reproduce it on a smaller network where you can actually get a meaningful log.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The test network is not large (6 nodes) but there is a fairly large amount of data being moved around.

The facts are still that BitCloud 12 works.

Also that BitCloud 14 Co-Ord works but Router doesn't. Selecting between them is via CS_WriteParameter(CS_DEVICE_TYPE_ID, &deviceType); where deviceType is either DEVICE_TYPE_ROUTER, DEVICE_TYPE_COORDINATOR.

I'm now thinking there's something left over in EEPROM that is not compatible from 12 to 14.

I can clear EEPROM by programming with ISP but that won't help in the field where the upgrade is via a second CPU and custom boot-loader in the ZigBit.

Is there a function call which will clear old parms?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Again, can you try it in just two devices? One C and one R? There won't be any data if device can't join.

BitCloud does not store anything in the EEPROM unless you are explicitly use Commissioning and Power Failure features or user area. EEPROM format is different between BC 1.12 and BC 1.14, so even if there is something stored there, BitCloud will erase it. I don't think it is a problem in this case.

I still think it is buffer sizes problems, but the only way to find out is to debug.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'll look into buffer sizes.

My confidence in BC 14 took a knock when WSNDemo wouldn't compile under Linux via 'make'. The incorrect makefile is specified and the dependencies are poor meaning code doesn't always compile. I found these out when I used the 'sample' makefiles from WSNDemo as a starting point for my code.

In addition the define to exclude BC usart support was apparently not fully tested because it left unresolved functions on linking.

I've reported all these (with fixes) via Atmel support.

BTW the reason I wrote my own USART handling is because the code paths for USART IRQs in BC for this are ridiculously long. There are also function calls from inside the IRQ which is a very inefficient and unnecessary way to go about it. I had long discussions on this with BC developers at the time but they seemed reluctant to change things.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I don't want to argue here, but the only reason BitCloud ISR handling is so complicated is to make them short. All functions used in ISRs are inlined and mostly only set flags. This is done to not interfere with radio functionality which is time-sensitive. With custom UART code - you on your own. By design ISR should not take more than 100 us.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Excluding security, which isn't used, these are the changes in csDefaults.h

26c26
< #define CS_MAC_FRAME_RX_BUFFER_SIZE            132
---
> #define CS_MAC_FRAME_RX_BUFFER_SIZE            133
177a178,180
> #if !defined CS_NWK_PASSIVE_ACK_AMOUNT
>   #define CS_NWK_PASSIVE_ACK_AMOUNT           8
> #endif // CS_NWK_PASSIVE_ACK_AMOUNT
292a296,298
> #ifndef CS_ZCL_BUFFER_SIZE
>   #define CS_ZCL_BUFFER_SIZE (APS_MAX_ASDU_SIZE)
> #endif
298c304,310
<   #define CS_MAX_NEIGHBOR_ROUTE_COST       8U
---
>   #define CS_MAX_NEIGHBOR_ROUTE_COST       5U
> #endif
> #ifndef CS_MAX_LINK_ROUTE_COST
>   #define CS_MAX_LINK_ROUTE_COST           8U
> #endif
> #ifndef CS_NWK_LEAVE_REQ_ALLOWED
>   #define CS_NWK_LEAVE_REQ_ALLOWED         true
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm sort of surprised that there are not more changes.

I'm out of ideas without a sniffer log.

It is possible that BC 1.14 ignores Beacon frames from BC 1.12 because it thinks they are malformed, but the only way to verify it is to have a sniffer log.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I won't argue over ISR code either :-) The following is just my opinion and result of a lot of experience.

There isn't as much inlining as there should be and its possible to write the ISRs much more efficiently and to defer some work to 'task' time execution.

In fact, the radio chip ISR was also guilty of similar problems, making calls out rather than inlining. On one version of the stack I shortened it by re-organising the code.

Realistically the reason the code is structured as it is, is to facilitate a HAL and the various chipsets it supports. Its more convenient to setup #ifdef and switch statements to support different radio chips than it is to duplicate IRQs for each.

So, my conclusion was that its convenience for maintenance and performance takes a hit, however large or small.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Can you point at least one function that is called from ISR, especially radio ISR? There are even special inline versions of SPI access functions specifically for that reason.

I don't like BitCloud complexity either, but my experience shows that it pays off.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I will setup a sniffer. Drawback is it requires a (rarely used) Windows machine and all development here is under Linux.

The beacon frames may be a red-herring. I had all nodes running with BC 14: Rs and C. That combo doesn't work either.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Can't be specific on ISRs since I have source under NDA, etc. but look around for the function phyDispatcheRfInterrupt().

Am waiting for BC 14 source which may help to debug this problem as well.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've just compiled sample application under Linux. Wrong case in the main Makefile is because developers who use Linux don't actually use that main file. Not saying that Linux is not officially supported.

phyDispatcheRfInterrupt() calls HAL_SelectRfSpi()/HAL_DeselectRfSpi(), no NDA violation, it is obvious from avr-objdump output. That was unfortunate result of opening HAL in otherwise closed stack. That was not part of the original design. ISRa are still under 100 us as ensured by BEGIN_MEASURE / END_MEASURE macros.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

From sniffer output:

Co-Ord using BC14, Router BC12

Router sends Beacon Request, Co-Ord responds with Beacon.
Router stops requesting beacons.

Co-Ord using BC12, Router BC14:
Router sends Beacon Request, Co-Ord responds with Beacon.
Router repeatedly requests beacons.
Co-Ord responds each time.

In both cases the contents of Beacon Request and response are identical.

It appears that the BC14 Router stack is rejecting the Beacon Response because it keeps requesting at approx 1500ms intervals.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ugh! Think I've been chasing a non-problem.

It seems that WSNDemo has changed from using RF230 to RF230B ! I used the makefiles from there as a quick route to getting up and running.

More than likely this is causing a problem.

edit: which means that the current BC14 won't run properly on older Meshbean boards.

Last Edited: Tue. Jan 29, 2013 - 10:05 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is really frustrating. Devices now work with RF230 library, as opposed to RF230B. There are numerous errata for RF230 that have work-arounds. These are most likely causing the problem.

There is no reason that the library cannot determine this dynamically. Register 0x1D on the RF230 chip gives its revision.

The library should either fail gracefully or be combined to handle both chip revisions.

Do we assume ZIGBIT modules only ever use RF230 and not the B revision? What happens to field upgrades in this case? Its a potential minefield.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, at some point 230B was used as a default radio chip for ZigBits. I guess the assumption is that All ZigBits with 230A in them (Meshnetics ZigBits) are already on the products that do not need an upgrade since they are working and have been working for a while now.

The way to solve is to get new source code and compile it yourself, or explain situation to avr@atmel.com ask for the library with 230A support.

Do you really need some features from BC14 on older products?

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There are significant fixes in BC14 which are not radio chip specific, those are useful. New features are not required.

The RF230A libraries are available in BC14, no need to build them. Although I hear now from support that they will be discontinued.

What is most annoying is that its trivial for either the RF230 or RF230B lib code to protect itself by checking the revision-id register. IMO this is an obvious runtime, once-off check.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A lot of things are trivial, but there are much higher priority tasks. Library is built this way for historical reasons.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.