finding numeric position of a substring

Go To Last Post
8 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello, using GCC, the following test code seems to be working fine (reports fleas found at 14)...was wondering if this is the most efficient way of doing this...perhaps there is a direct numerical returned value available in some function (I think I looked at all of the GCC string functions).

 

 char search[]="fleas";  //looking for fleas
 char list[]="cat/mouse/dog/fleas/house/yugo/toyota";
 char *pos;
 pos=strstr(list,search);   // search list for matching string position, returns a pointer
 bin2ascii(pos-&list[0]);   //spit out position # (0....x)  to ASCII (need to add range check for mismatch)

spits out "14", as expected, a mismatch spits out a number not in the range of the list length.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Wed. Aug 5, 2015 - 08:16 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

was wondering if this is the most efficient way of doing this

Well it's exactly the same way I would have chosen to do this. BTW "list" is the same as "&list[0]" so your subtraction can be a lot simpler. In fact I'd probably just re-order things to be:

pos = strstr(list,search) - list;   // search list for matching string position
bin2ascii(pos);   //spit out position # (0....x)  to ASCII (need to add range check for mismatch)

So that "pos" really is an index (even if it is held in a char *).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Clawson: Good idea, although I got 2 warnings from it:

pos=strstr(list,search)-list;......assignment makes pointer from integer without a cast 

bin2ascii(pos)......argument 1 of bin2ascii makes integer from pointer without a cast 

 

so I simply changed the original definition of pos to:  int pos; and all is well, no warnings. 

So it seems any subtraction of 2 pointers generates an integer, rather than a pointer...makes some sense

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Now, your straightforward algorithm doesn't take into account the separators.  If you had "dogma" in the list ahead of "dog", you would get a false hit.

 

Obviously the posted example is just that.  For real work, one would need to know how many "records are in the database".  Any real dictionary would quickly overflow AVR8 SRAM, so in practice the dictionary might be in flash--and thus more-or-less fixed at run time.  If fixed, then the extent is known and it could be pre-sorted and even pre-indexed.

 

-- For a list as shown, I'd think a bullet-proof approach would go from separator to separator extracting the substrings, and then doing a comparison. (see strtok() )

 

-- For real work and a non-trivial dictionary, lexical analysis will end up to be fastest and smallest.  The FLEX variant of LEX may well be the most common tool.

 

-- For a dynamic dictionary with variable-length words, this is one time I might consider using heap in AVR8 for the words, and a pointer array.  Limits would need to be set as AVR8 SRAM would be quickly exhausted.

 

-- Other database techniques can be applied such as hash functions.

 

-- For a limited dictionary such as the keywords in an interactive app, a simple state machine could be constructed that would follow the LEX/FLEX approach:  Examine character 1.  A finite number of branches.  Each of those examines the next character.

 

http://dinosaur.compilertools.net/

https://www.gnu.org/software/fle...

http://flex.sourceforge.net/

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Wed. Aug 5, 2015 - 03:49 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Edit: Comment already answered above.

Regards,
Steve A.

The Board helps those that help themselves.

Last Edited: Wed. Aug 5, 2015 - 04:02 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch:

 

All great points...this is just putting my toes in the water, much more to come.

I like:  I'd think a bullet-proof approach would go from separator to separator extracting the substrings, and then doing a comparison 

And yes, the dictionary will be fixed & kept in flash.

 

I was thinking that doing the search in  "one swoop" would be faster, since calling the string function repeatedly requires its re-setup of memory access.  Once the strstr returns, its internal indexes are lost.  Would be nice if it could be used in some sort of progressive fashion where nothing is forgotten between calls.

I suppose that's an advantage of just creating your own search function, keeping the indexing variables you need active until the search is 100% finished.

 

I'll investigate strtok...unfortunately, the lib doc does not give a "clear-cut" explanation of what it returns..."strtok parses the string s into tokens"..if I fiddle with it, I'll certainly find out.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you are actually going to do a linear search, use patterns that begin and end with slash.

The dictionary will need to begin and end with slash.

If speed is important and the dictionary is large, sort the dictionary and use binary search.

How to implement binary search with a long string dictionary is left as an exercise for the reader.

"Demons after money.
Whatever happened to the still beating heart of a virgin?
No one has any standards anymore." -- Giles

Last Edited: Wed. Aug 5, 2015 - 05:08 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you are actually going to do a linear search, use patterns that begin and end with slash.

The dictionary will need to begin and end with slash.

Excellent!!

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!