Searching for a string in a buffer

Go To Last Post
38 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi All

 

I noticed that the following statement failed a few times when I am searching for this particular string in the buffer. I know that the buffer is getting written to,

 

              if (strstr(RX_COM, "2040808") != NULL)
                {
                    IOT_STATE =IOT_SET;
                }

 

Is there any better way to perform such a search on a XMEGA?

This topic has a solution.

Thanks

Regards

DJ

Last Edited: Thu. Jan 28, 2021 - 09:16 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

I've always been a great fan of strstr() for things like this. It always works if used right. Could it be your buffer is not really a "string" (for example could there be hidden, embedded 0x00 bytes which would cause things like strstr() to fail).

 

As always you will probably want to get out your JTAG and have a look at which is actually in RX_COM

 

Actually "RX_COM" is a strange name for a char array - being in all upper case it suggests a macro - are you sure that resolves to something that is actually a character array? A very common error in programming is the classic:

char * rx_buff;
rx_string(rx_buff);

which fails for all the obvious reasons !

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
get out your JTAG and have a look at which is actually in RX_COM

+99

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The buffer is declared as 

 

char RX_COM[MAX_GSMUART_RX_BUFFER];

I will check if the buffer has a 0x00 before the bytes that need to be read.

Thanks

Regards

DJ

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The key thing with strstr() or any of the C lib "str" functions is that they need to see "C strings". So that is going to be one or more characters always ending in 0x00. If that's not the case - either no 0x00 at the end or some "hidden" characters of noise in the middle before a last 0x00 then things will behave "oddly".

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

so why the ALL-CAPS name?

 

You also need to check that the buffer does actually contain what you thought it should, and that it is properly terminated with a NUL.

 

EDIT

 

clawson beat me to it

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Wed. Jan 27, 2021 - 04:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The wonder of open-source software is that you can look at the source. Library functions don't need to be black box. You can get answers to questions like "I wonder how x works", "why doesn't x work, in this case ?", or "how can I reuse/rehash/rework x to do something slightly different". 

 

You could reimplement it in your app with some debug/instrumentation to see why it unexpectedly bails out.

 

You could compare various implementations to see if one is 'better' than an another, for certain cases (and for whatever definition of 'better').

 

The only problem with this advice is that you probably need to able to read assembler :(

 

There are also multiple google search results for strstr that link back to this site.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

obdevel wrote:
The only problem with this advice is that you probably need to able to read assembler :(
For the curious, strstr() is here:

 

http://svn.savannah.gnu.org/view...

 

(and yes, assembler).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

obdevel wrote:
The wonder of open-source software is that you can look at the source.

Absolutely.

 

You can get answers to questions like ... "why doesn't x work, in this case ?"

In this case, the answer most likely lies in what is being supplied - no need to open the black Pandora's box ...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks

 

I will wait for it to fail again to see what is actually in the buffer. 

 

Does it make a difference if CAPS are used? or is it simply a good practice not to use it?

 

 

Thanks

Regards

DJ

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you know your string is bounded by a known delimiter,  consider strtok() as a way to find your strings.

Jim

https://www.tutorialspoint.com/a...

 

(Possum Lodge oath) Quando omni flunkus, moritati.

"I thought growing old would take longer"

 

Last Edited: Wed. Jan 27, 2021 - 04:54 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, it seems that the buffer gets a 0x00,0x00 at before the chars I am trying to ready. Need to work out what is causing this?

 

 

Thanks

Regards

DJ

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So am I correct in understanding that with  strtok()  0x00 has no effect?

Thanks

Regards

DJ

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

djoshi wrote:
 am I correct in understanding that with  strtok()  0x00 has no effect?

No - you are not correct

in #5, clawson wrote:
The key thing with strstr() or any of the C lib "str" functions is that they need to see "C strings". So that is going to be one or more characters always ending in 0x00.

strtok() is one of the C lib "str" functions.

 

This is standard 'C' stuff - you can look it up in any 'C' reference; eg,

 

http://www.cplusplus.com/reference/cstring/strtok/

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

djoshi wrote:
Does it make a difference if CAPS are used? or is it simply a good practice not to use it?

It is a near-universal convention in 'C' programming that ALL CAPS is reserved for preprocessor macros.

 

In other words, don't use ALL CAPS for variables, function names, etc

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you have control over the source system, consider delimiting the messages, e.g. <xxxxxxxxx>.

 

You can then discard 'noise' until you reach the start-of-message, and then read chars until you reach the end-of-message. You can also bail out if the end-of-message delimiter isn't encountered in a reasonable period of time, or it would overrun your buffer. Clearly, you need to choose delimiter chars that wouldn't appear in the message body.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Could the device that is sending the data that is going into the RS_COM buffer be sending numeric data as binary numbers instead of ASCII numbers?  For example, could the string "2040808" be coming as : 0x02 0x00 0x04 0x00 0x08 0x00 0x08 instead of (the correct ASCII form of) 0x32 0x30 0x34 0x30 0x38 0x30 0x38?    

  If this data were binary numbers, then there would be value 0x00 in the RS_COM buffer that would act as a string terminator that would end the strstr() function prematurely.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

or perhaps the characters are being sent "backwards", or are being loaded into the buffer "backwards" - so that   "2040808" is actually stored as "8080402" ...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Simonetta wrote:

Could the device that is sending the data that is going into the RS_COM buffer be sending numeric data as binary numbers instead of ASCII numbers?  For example, could the string "2040808" be coming as : 0x02 0x00 0x04 0x00 0x08 0x00 0x08 instead of (the correct ASCII form of) 0x32 0x30 0x34 0x30 0x38 0x30 0x38?    

  If this data were binary numbers, then there would be value 0x00 in the RS_COM buffer that would act as a string terminator that would end the strstr() function prematurely.

 

No its ASCII for sure.

 

I have managed to find out that was wrong. 

 

RX_COM , is buffer from bytes I receive on the UART.

 

It seems that in a certain instance (not every time), in a process for another message prior to receiving the "2040808", i clear the buffer(making everything to 0) before entire messages has arrived. This is because I am only interested in the start part of the message. Then this means the next write location has increased e.g.5, but the start of the buffer has 0,0. Therefore fails in detecting "2040808".

 

I am now waiting for the previous message to fully arrive, before progressing and it seems to have solved the issue.

Thanks

Regards

DJ

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

djoshi wrote:
 clear the buffer(making everything to 0) before entire messages has arrived. This is because I am only interested in the start part of the message. Then this means the next write location has increased e.g.5, but the start of the buffer has 0,0. Therefore fails in detecting "2040808".
But surely when you "clear" the buffer you reset the write index to the first element?

 

It sounds an awful like what you really require is a ring buffer. Maybe see: 

 

code: https://github.com/abcminiuser/l...

docs: http://www.fourwalledcubicle.com...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
surely when you "clear" the buffer you reset the write index to the first element?

You'd have thought.

 

Sounds like there's bits of the software all over the place hacking directly with the buffer - rather than having a proper, well-defined, modular interface.

 

It sounds an awful like what you really require is a ring buffer. 

Indeed.

 

Worked on a project recently where they had a bizarre buffering arrangement that would "pull" a byte from the start of the buffer (array), and then shift the entire contents of the rest of the array down by one!

 

surprise

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:
and then shift the entire contents of the rest of the array down by one!
The thing is that when you do computer science in college you are taught about things like ring buffers so there's no real excuse for not knowing about them and using them. Perhaps the issue is that people can't see when it would be relevant to use one?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If it needs to be fast and robust, make a state counter in the rx routine that is zero when not in sync, when it receive a '2' count and look for '0' (have 2040808 in a array the counter can point at ) etc. clear when wrong char is received, or set a sync flag when the last '8' is received. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Perhaps the issue is that people can't see when it would be relevant to use [ring buffers] 

Now you mention it, they even called it "the circular buffer" - despite the fact that it clearly wasn't!

 

surprise

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I use the terms "ring buffer" and "circular buffer" interchangeably myself. A ring is a circle is a ring is a circle....

 

It clearly is a ring/circle as there is no end.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
I use the terms "ring buffer" and "circular buffer" interchangeably myself.

likewise

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You don't have to use 0x00 as the value that clears the buffer.  You could fill the buffer with 0xff to indicate that it is holding no data.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

I must be missing something. To clear a buffer all you have to do is reset the write index. Why would it matter what the unused bytes held whether it be 0x00, 0xFF or whatever? There's no point wasting space and time to write data into "unused" locations anyway.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As Cliff said.

 

My ring buffers have a read and a write pointer; if those two are the same then the buffer is empty. If data comes in faster than it can be handled, eventually the write pointer will catch up with the read pointer and a buffer full of data will be unseen (it's there, but it's invisible). This means I have to size the buffer for the relative speeds of reading and writing, as my applications don't have the possibility of flow control.

 

In practice, most of my serial data is handled on a character by character basis anyway, so there's rarely more than one or two outstanding character in the buffer. Marking buffer locations as empty by writing a special value to them feels broken, though there may I suppose be a need in some circumstances. I don't see them though; the relative values of the read and write pointers tell you all you need to know, with the possible exception of 'oops, you're overwriting data'.

 

Neil

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

barnacle wrote:
If data comes in faster than it can be handled, eventually the write pointer will catch up with the read pointer and a buffer full of data will be unseen

seems an odd approach?

 

more common:

  • When the buffer is full, stop accepting[1] new data - all the existing data remains visible & available
  • When the buffer is full, the oldest data is discarded as new data is added - so the most recent buffer-full remains visible & available

 

 

[1] either assert some sort of flow control, or just discard new data.

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's not clear to me whether the OP has a continuous stream of bytes, or whether there is some message 'structure' implied, even if it's just lines terminated with cr/lf.

 

If the later, clearly the buffer must be sized to cope with at least the largest message length. You might even have a ring buffer where the items are lines or messages.

 

I use ring buffers all the time for my CAN-based systems. The messages are placed in a buffer by the ISR so I can cope with a flurry if the code is busy elsewhere. I tend to drop excess messages rather than overwrite, but I do maintain a high-water-mark variable so I can see if the buffer has been sized adequately: (hwm = max(hwm, (head - tail))).

 

I accept that there are times where it just can't cope with 'drinking from the firehose'. There's only so much memory available for buffers and this isn't safety-critical stuff.

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

barnacle wrote:
My ring buffers have a read and a write pointer; if those two are the same then the buffer is empty. If data comes in faster than it can be handled, eventually the write pointer will catch up with the read pointer and a buffer full of data will be unseen (it's there, but it's invisible).
After reading the first sentence I was going to say that write=read can either mean "empty" or "full" but you go o to recognize this in the following. To be honest you should do what Dean does in:

As well as the read/write indices/pointers (your choice) also keep a count. That will allow you to distinguish between full/empty when read=write. Count will either be 0 (empty) or "buffer_size" (full).

 

That's what allows Dean to also offer:

		static inline bool RingBuffer_IsEmpty(RingBuffer_t* const Buffer)
		{
			return (RingBuffer_GetCount(Buffer) == 0);
		}
		static inline bool RingBuffer_IsFull(RingBuffer_t* const Buffer)
		{
			return (RingBuffer_GetCount(Buffer) == Buffer->Size);
		}

I note that in his RingbufferInsert() he's not checking count/overflow so he's presumably expecting the user to do a check (IsFull ?) prior to the Insert if concerned about possible overflow. That will always raise the question of exactly how to handle over-run. One approach is to "remove()" one (the oldest) before inserting. The other is simply to not insert() if the isFull==true. So it's the newest, not the oldest that is lost.

 

EDIT: Yup the documentation for isFull:

 

http://www.fourwalledcubicle.com...

 

does say it should be used to check before insertion.

Last Edited: Fri. Jan 29, 2021 - 10:30 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
As well as the read/write indices/pointers (your choice) also keep a count

 

Possible downside to that is, if you're writing into the buf in an ISR and reading out in non-interrupt code (or vice versa), adding a count means you then need interrupt blocking in the non-interrupt code where you didn't before.

Downside to using condition read==write to mean empty means you  have to leave one space empty ie. buf is full when you have BUFSIZE-1 entries, else you can't tell if read==write means empty or full.

(Or you can use slightly different method and let the indices wrap at uint8_t size ie. modulo 256, and only apply modulo bufszie to the indices when accessing buf, then you can have it completely full, but for this to work you really are restricted to buf sizes that are power of 2 ie. 256 will have to be an exact multiple of the bufsize).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MrKendo wrote:
Possible downside to that is, if you're writing into the buf in an ISR and reading out in non-interrupt code (or vice versa), adding a count means you then need interrupt blocking in the non-interrupt code where you didn't before.
Again I think Dean's implementation handles this well.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've used all those methods, Cliff, but for unstructured data when it doesn't matter if stuff gets lost, it's fine and less complex.

 

Most of my input is in response to my outputs and in short chunks.

 

Neil

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Simonetta wrote:

You don't have to use 0x00 as the value that clears the buffer.  You could fill the buffer with 0xff to indicate that it is holding no data.

 

I think i will change this to 0xFF, to avoid any issue as i have.

 

The reason why i cleared the buffer was to avoid any process reading a message that has already been read.

Thanks

Regards

DJ

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

djoshi wrote:
The reason why i cleared the buffer was to avoid any process reading a message that has already been read.
If you think that could happen there is something seriously FUBAR in your software design!

 

It's a bit like papering over the cracks!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
there is something seriously FUBAR in your software design!

Or a lack of any considered design in the first place?

 

See #21

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...