Arduino "String" class vs c++ "std::string" class: which is faster?

Go To Last Post
37 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi, good day to experts here, I have a function to read everything that is in the Arduino serial buffer.

std::string Serialread() {
std::string retval = "";
char c;
while (Serial.available () > 0) {
c = (char)Serial.read();
retval += c;
delay(1);
}

return retval;
}

Whenever I use this code, I usually get the result string "retval" fragmented, but if I change delay(1) to delay(10), it works.

The same delay(1) do work very well if I use the Arduino's String class, but I don't know why it doesn't work reliably, when using it with std::string.

String Serialread() {
String retval = "";
char c;
while (Serial.available () > 0) {
c = (char)Serial.read();
retval += c;
delay(1);
}

return retval;
}

My thinking now is that Arduino's String class is faster than c++'s std::string, is this assumption correct????

This topic has a solution.

Small boy...

Last Edited: Sat. Oct 28, 2017 - 10:39 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Probably more important: what is your baud rate? I'd bet that, for modest baud rates, serial string handlers are far faster the transmission so that it does not make much difference which one is "faster".

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The baud rate I'm using is 9600kbits/s

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I don't understand. avr-g++ does not have STL so how is there even a std:string support at all? In fact the whole point of Arduino implementing their own String is to make up for the lack of STL isn't it?

 

Have you done something to try to add STL to the installation?

 

BTW while checking this I found:

 

https://hackingmajenkoblog.wordp...

 

Those look like wise words.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I used the StandardCplusplus library.

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You mean this?:

 

https://github.com/maniacbug/Sta...

 

So if the Arduino String version works but the std::string from that does not then does that not suggest to you that the problem lies in the latter?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I know the StandardCplusplus library is the problem source. that's why I want to know maybe its slower...

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My implication was: if you know it's some library code that is "faulty" then why don't you contact the author of that library? Or at least discuss it on a forum where other uses of that library congregate? The readme for that Arduino lib says it is a straight port of:

 

https://cxx.uclibc.org/contact.html

 

So maybe start there if you can show a test case that illustrates a fault?

 

Having said that I see the original has (c)2004 and even this Arduino lib has last activity "4 years ago". So maybe they will have lost interest at this stage?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, what are you using, really? Inside the loop, where timing is more critical, your code uses operator+= so lets look at the source code:

 

Arduino:

const String & String::operator+=( const String &other )
{
  _length += other._length;
  if ( _length > _capacity )
  {
    char *temp = (char *)realloc(_buffer, _length + 1);
    if ( temp != NULL ) {
      _buffer = temp;
      _capacity = _length;
    } else {
      _length -= other._length;
      return *this;
    }
  }
  strcat( _buffer, other._buffer );
  return *this;
}

So, this does some bounds checks then uses strcat.

 

The code from the standard library is a lot more complicated, it calls append, that uses a vector, etc, etc...

So much indirection has costs, so I would guess the Arduino strings are indeed faster.

 

Other thing, why don't you keep the value returned by Serial.available, so you know how many characters to read, then read them to a buffer with Serial.readBytes()? Maybe it's faster. (I have no idea)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:
So, this does some bounds checks then uses strcat.
More importantly it does:

(char *)realloc(_buffer, _length + 1);

which brings me back to:

 

clawson wrote:
BTW while checking this I found:

 

https://hackingmajenkoblog.wordp...

 

Those look like wise words.

malloc/realloc really have no place on a "micro" with 1..2K of RAM!

 

A far better approach is to pre-allocate (probably a fixed allocation in .bss) a char[] buffer then just keep an index/pointer to it (and wrap at the end).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, thank you very much for responding. I didn't know that the library is a port of c++ library. I thought it is the common c++ library( that everybody on any platform is used to), that's why I asked the question here.

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:

Other thing, why don't you keep the value returned by Serial.available, so you know how many characters to read, then read them to a buffer with Serial.readBytes()? Maybe it's faster. (I have no idea)

Thank you for this idea, its more efficient than the code I posted. I'll implement this Serial.readBytes () very soon.

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just be careful with the terminating \0 in the string buffer, I think Serial.readBytes() will not put one there automatically.

 

edit: But check post #10, maybe you should avoid any strings other than C strings.

Last Edited: Wed. Oct 25, 2017 - 10:35 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I didn't know that the library is a port of c++ library. I thought it is the common c++ library( that everybody on any platform is used to),

No the point is that avr-g++ (the GNU C++ compiler for AVR) does not come with any "standard" form of STL. Some folks such as Andy Brown or the authors of that ucLib++ have attempted to create minimal installations of STL for use on small micros but most of these "extendible structures" like std:string, std::vector etc (and, as it turns out even "String" in Arudino) are relying on heap allocation using malloc()/realloc()/etc. and those things are NEVER a good idea on a limited resource micro because of issues like fragmentation and just the fact that many/most micros only have 1K or 2K or possibly even less (sometimes a lot less) to work with so a few std::string or std:vector etc can soon use up all your RAM. There is quite a strong possibility that in this:

    char *temp = (char *)realloc(_buffer, _length + 1);
    if ( temp != NULL ) {
      _buffer = temp;
      _capacity = _length;
    } else {
      _length -= other._length;
      return *this;
    }

that the else{} case may be taken. So the += simply won't do anything and the string will remain unchanged! Do users of std::string (or String) in small memory environments even realise this might happen?

 

As I said above, in a limited resource environment it's far better to consider "how big could this string ever get?" and then just preallocate:

char stringBuff[N];

at compile time. Then just add characters into this with an incrementing index. Ring alarm bells if the index ever reaches N.

 

I guess this is part of the downside of Arduino - it abstracts the user so far distant from the hardware that considerations of things like "how many bytes of RAM are left?" are never really consider. Those programming AVRs in raw C/C++ are probably much more focused on every last byte they may be using.

Last Edited: Wed. Oct 25, 2017 - 10:33 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm also aware of the memory problems, & I've ran the code several time without any memory outage problem. The only problem is that of the speed... @ which the library can respond efficiently.

Small boy...

Last Edited: Wed. Oct 25, 2017 - 10:49 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are yoou really using 9.6 megabits/s or just 9600 baud? I gather the latter. At 1ms per char, i can’t see speed being a problem with string handling.

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Damilare wrote:
Whenever I use this code, I usually get the result string "retval" fragmented, but if I change delay(1) to delay(10), it works. The same delay(1) do work very well if I use the Arduino's String class, but I don't know why it doesn't work reliably, when using it with std::string.

But when you get a fragmented result, couldn't that be simply because the whole string has not been received yet?

 

Let's say you transmit the string "Hello" to the MCU and your Serialread function is called immediately after the first "H" has been transmitted. What will it return? It might be only "H", but it could also be "He" if an additional character is received while the function is executing.

 

Transmitting each character takes around (1/9600)*10 = 1.04 ms if running at 9600 baud 8N1. If you insert a 10 ms delay for each character then it is quite possible that you receive at least one additional character for each character that is appended. For the same reason, the result will depend very much on how fast the code executes, but in this case slower code would mean that the result is less fragmented i.e. "better".

 

But this method is very wrong.

 

You should rather buffer the characters until the zero-termination character is received, at which point you process the result. This could be done with both blocking and non-blocking code (and non-blocking is usually preferred).

 

How about trying this:

std::string Serialread()
{
    std::string retval = "";
    char c;

    // Loop forever; quit if zero-termination received
    while (1)
    {
        if (Serial.available () > 0)
        {
            c = (char)Serial.read();
            retval += c;

            // Check for zero-termination
            if (c == '\0')
                break;
        }
    }
    return retval;
}

 

/Jakob Selbing

Last Edited: Wed. Oct 25, 2017 - 11:54 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jaksel wrote:
You should rather buffer the characters until the zero-termination character is received
It may depend on the data source but often '\n', '\r' or maybe even just ',' is a "better" terminator as it's easy to generate from a terminal. (Not so easy to generate/send 0x00)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jaksel wrote:

Transmitting each character takes around (1/9600)*10 = 1.04 ms if running at 9600 baud 8N1. If you insert a 10 ms delay for each character then it is quite possible that you receive at least one additional character for each character that is appended. For the same reason, the result will depend very much on how fast the code executes, but in this case slower code would mean that the result is less fragmented i.e. "better".

 

I second this.

 

It's very possible that the StandardCplusplus std::string methods are faster than the Arduino String methods. The Arduino String methods reallocate the buffer for each character that is added to the string (which is a silly way to resize a buffer), while the StandardCplusplus methods may resize the buffer larger than needed so that it doesn't have to reallocate it for each character. Fewer calls to realloc() means the code can run faster.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
... most of these "extendible structures" like std:string, std::vector etc (and, as it turns out even "String" in Arudino) are relying on heap allocation using malloc()/realloc()/etc. and those things are NEVER a good idea on a limited resource micro 

Note that those things are available in mbed - for ARM Cortex-M devices.

 

But then ARM Cortex-M devices tend to have an order of magnitude more memory than AVR class devices.

 

As ever, it's horses for courses - you need to cut your coat according to your cloth ...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah, I made a mistake, it is 9600 baud

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That means I can't simply transmit 0x0 again, because it will mean end of transmission.

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That means I can't simply transmit 0x0 again, because it will mean end of transmission.

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That means I can't simply transmit 0x0 again, because it will mean end of transmission.

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That means I can't simply transmit 0x0 again, because it will mean end of transmission.

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

but whatsoever character is used will no longer be useful as it will always signal the end of transmission.

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Damilare wrote:

That means I can't simply transmit 0x0 again, because it will mean end of transmission.

I don't get your point there?

 

Damilare wrote:

but whatsoever character is used will no longer be useful as it will always signal the end of transmission.

Not necessarily.

 

eg, Cliff mentioned CR and LF - they are quite often used both to "signal the end of transmission" and for their output-formatting function...

 

And there are well-known & widely-used techniques for "escaping" control characters so that they can be used as "data" ...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes. That is the dilemma.

 

  • You need to be able to transmit all possible characters as "actual payload".
  • You need some way to signal the end of a "transmission unit".

 

The solution is to encode the transmission in some way. E.g. use an escape character like '\'. If it is encountered, then the next byte is to be interpreted as part of the message. Now you can send the value zero as part of the payload (prepend it with '\', or as a "transmission unit terminator" (just send the zero).

 

Obviously, this means coding - at both ends - for such a scheme. Not hard on the AVR side. Might be tough on the other end depending on the control you have of that software.

 

If the payload is only ever human readable text, then the problem does not occur (the zero-terminator is not a "human readable".

 

If the payload is numerical values, e.g. measurement data from sensors, then you could convert them to text before sending (using sprintf(...)) and then after receiving convert them back to numerical values (e.g. using sscanf(...)).

 


If you cant possibly implement such a scheme, then ample delays seems like the only other option. But this will lower the performance of the system.

 


 

If you tell us more about the nature of the data you're transmitting we can probably give more concrete advice.

"He used to carry his guitar in a gunny sack, or sit beneath the tree by the railroad track. Oh the engineers would see him sitting in the shade, Strumming with the rhythm that the drivers made. People passing by, they would stop and say, "Oh, my, what that little country boy could play!" [Chuck Berry]

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

JohanEkdahl wrote:
If the payload is only ever human readable text, then the problem does not occur (the zero-terminator is not a "human readable".

Which is, of course, why 'C' uses the NUL terminator for strings.

 

 

If you cant possibly implement such a scheme, then ample delays seems like the only other option. But this will lower the performance of the system.

Another option might be some sort of "out-of-band" signalling - such as a separate control line.

Like the 'Slave-Select' in SPI ...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, RS232 has plenty of control lines and most USB to UART chips support several of these. So if the OP wants to interface to a modern PC there isn't much of a problem.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Agree that the availability of hardware lines from a PC shouldn't be a problem.

 

Getting PC software to read and/or control them may be more of a challenge, though ...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What I'm actually into is that I have two Arduinos Arduino1 & Arduino2.
Arduino1 is used as a BASIC interpreter & programmer,Arduino2 is used as the LCD & keyboard controller.
So, Arduino1 will always output the processed data of the BASIC program which will be sent to the LCD by Arduino2, while Arduino2 will always output data from the keyboard. its working now working properly. Thank you jaksel for the suggestion.
I also thank you all for you responses.
This is the working function :

std::string Serialread(){
	std::string retval ="";
	char c;
	unsigned long started=millis();
	while(true){
		if(millis ()-tarted) break;
		if(Serial.available ()> 0){
		  c=(char)Serial. read();
		  if(c=='\n') break;
		  else retval +=c;
		  
		}
	
	}
return retval;
}

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Where do you plan to store the BASIC programs the user types in? 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm currently having it stored in the Arduino1's EEPROM memory. I used an Arduino pro mini with a 1kib EEPROM space, so I divided the whole memory cells by 3 ( ROM[0]= opcode; ROM[1]=operand 1; ROM[2]=operand 2;). 1024÷3=341 lines of BASIC program, & the last address (i.e. 1023) is used to save the total lines of the program whenever I'm saving it after editing....
its just to keep my hands busy, whenever I'm not with the PC (^_^)

Small boy...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But I have one last question for you guys. By manipulating strings with nothing other than the c++

 < cstring >

functions, is it possible to predict/ calculate the total memory( RAM space) my sketch will consume at most during runtime?

Small boy...

Last Edited: Sat. Oct 28, 2017 - 12:36 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Eventually, yes but to do so you'd have to run the entire thing in simulation with all the input activity simulated temporally. The problem of heap usage is fragmentation which happens when the pattern of allocation and deallocation is not linear but to understand the interactions you need to take into account all the external stimuli and when they occur. Not easy. Best approach is probably to run real code in real chip with real stimulus and then analyse the heap from time to time.
.
(or do the sensible thing and preallocate buffers)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

(or do the sensible thing and preallocate buffers)

Well, I thought c-string always work on preallocated memory buffers & that's the major reason why I want to fallback to c-string in my application, so as to optimize memory usage.

Small boy...

Last Edited: Sat. Oct 28, 2017 - 02:11 PM