Serial port data corrupted when sending a specific pattern of bytes

Go To Last Post
16 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have a python script that sends a stream of data, and then I have a Linux embedded computer receiving the data (code written in C++). Most of the times it works, however, I am noticing I get data corrupted when sending specific patterns of bytes. I have been struggling with this for a while and I don't know how to solve it.

 

Python script (sender):

 

serial = serial.Serial("COM2", 115200, timeout=5)
all_bytes = [0x63,0x20,0x72,0x69,0x67,0x68,0x74,0x73,0x20,0x61,0x6e,0x64,0x20,0x72,0x65,0x73,0x74,0x72,0x69,0x63,0x74,0x69,0x6f,0x6e,0x73,0x20,0x69,0x6e,0x0a,0x68,0x6f,0x77,0xff,0x20,0xf0,0x8b]

fmt = "B"*len(all_bytes)

byte_array = struct.pack(fmt,*all_bytes)

serial.write(byte_array)

 

C++ code (receiver)

typedef std::vector<uint8_t> ustring; // ustring = vector containing a bunch of uint8_t elements

// configure the port
int UART::configure_port()      
{
    struct termios port_settings;      // structure to store the port settings in

    cfsetispeed(&port_settings, B115200);    // set baud rates
    cfsetospeed(&port_settings, B115200);

    port_settings.c_cflag &= ~PARENB;    // set no parity, stop bits, data bits
    port_settings.c_cflag &= ~CSTOPB;
    port_settings.c_cflag &= ~CSIZE;
    port_settings.c_cflag |= CS8;
    port_settings.c_cflag |=  CREAD | CLOCAL;     // turn on READ & ignore ctrl lines

    port_settings.c_cc[VTIME]     =   10;                  // n seconds read timeout
    //port_settings.c_cc[VMIN]      =   0;     // blocking read until 1 character arrives 
    port_settings.c_iflag     &=  ~(IXON | IXOFF | IXANY); // turn off s/w flow ctrl
    port_settings.c_lflag     &=  ~(ICANON | ECHO | ECHOE | ISIG); // make raw
    port_settings.c_oflag     &=  ~OPOST;              // make raw

    tcsetattr(fd, TCSANOW, &port_settings);    // apply the settings to the port
    return(fd);
} 

int UART::uart_read(ustring *data,int buffer_size)
{
    // Buffer
    uint8_t * buf = new uint8_t[buffer_size];

    // Flush contents of the serial port
    //tcflush(fd, TCIOFLUSH);
    //usleep(1000);

    ustring data_received;
    // Read
    int n_bytes = 0;

    while (n_bytes < buffer_size)
    {
        int n = read( fd, buf , buffer_size );

        // Some bytes were read!
        if (n > 0)
        {
             n_bytes+=n;

             // Add to buffer new data!
             for( int i=0; i<n; i++ )
             {
                data_received.push_back(buf[i]);
             }
        }


    }


    // String received
    *data = data_received;
    cout << "Data received..." << endl;
    print_ustring(data_received);




    delete[] buf;

    return read_valid;

}


int main()
{ 
    UART uart_connection;

    vector<uint8_t> data;
    vector<uint8_t> *data_ptr = &data;
    int status = uart_connection.uart_read(data_ptr,36);


    return 0;
} 

This is what's happening:

If I send the following bytes (from python):

0x632072696768747320616e64207265737472696374696f6e7320696e0a686f77ff20f08b

This is what I am receiving (in C++ program):

0x632072696768747320616e64207265737472696374696f6e7320696e0a686f77ffff20f0

As you can see there are a few bytes at the end (the CRC) that are changed, the rest seems fine. But it doesn't always happen, it only happens when sending some specific pattern of bytes.

Let's say I send the following for instance (some other pattern):

0x6868686868686868686868686868686868686868686868686868686868686868b18cf5b2

I get exactly what I am sending in the above pattern!

At first I thought it was Pyserial changing my unsigned bytes to ASCII... But now I know for sure now that pyserial is not the problem. I tried replacing pyserial with nodejs serial port library. The same problem persists. I noted that the only way to "fix it" is closing the port and opening it again. Once the port is opened, the serial port data will either be (always corrupted or always right) depending if the first one was received OK or corrupted I have no clue what's going on. I have been struggling with this for days!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you set the sender to have two stop bits during transmission, there will be an additional bit time between each character, and that may  sort it all out. I think you have run into one of the hazards of async serial: it IS possible to take a bit that is NOT the start bit to be the start bit, scrambling everything that follows. Another possible strategy is to pause a whole character time every so often. This will force it to resync on a proper start bit. Making that change will not make the message any longer!

 

Frankly, I would convert those hex digits to ASCII characters at the sending end and back to numeric values at the receiving end. All of those numeric values end up as control characters and you do not have much control about how the system might interpret those on its own.

 

Jim

 

 

Until Black Lives Matter, we do not have "All Lives Matter"!

 

 

Last Edited: Thu. May 17, 2018 - 09:27 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Certainly strange, I'd suggest some small tests

* increase the Rx buffer size so you can see if you have an extra 0xff as it appears ?

* try removing leading characters and see if the trailing issue moves, or if it needs N chars first.

 

How accurate are your clocks ?

Can you try 2 stop bits in your TX ?

There are issues possible when you have poorly matched clocks, and continual bytes (no gaps) 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok so I increased the RX buffer size to +1, and there is indeed an extra 0xFF:

632072696768747320616e64207265737472696374696f6e7320696e0a686f77ffff207b8b

I also tried setting the stopbits to two in the sender, but same thing happens... Nothing works. Apparently the problem is that the receiver is getting an extra "0xFF" byte after the original 0xFF is read. So i am getting a total of 37 bytes instead of 36 (there is supposed to be 36!). Why am I getting that extra byte? I have no clue....!

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You're not getting sign extension somewhere, are you ... ?

 

EDIT

 

I missed that fact that it's Python!

 

I have just been having exactly the same problem! it's because Python/PySerial expects characters to be plain 7-bit ASCII - values >0x7F get encoded as Unicode...

 

https://stackoverflow.com/questions/14454957/pyserial-formatting-bytes-over-127-return-as-2-bytes-rather-then-one

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Thu. May 17, 2018 - 10:14 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:
sign extension

 

No, well I don't think so... I am getting an extra 0xFF at the end. But I am not doing sign extension anywhere in my code, I made sure I was careful enough to only use unsigned chars.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok so I figured out something... There is a problem with the number "0xFF"... It doesn't always happen, but if I send the following:

0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff19995859

I get double the expected data bytes on the serial port (I get a total of 68 bytes). Apparently the 0xFF are being interpreted as two bytes!!! Why??

 

This happens at least 90% of the time, I get the expected correct values maybe 10% of the times. But I am getting these conversions between 0xFF => 0xFFFF

 

So at:

 

int n = read( fd, buf , rd_bufsize );

 

n returns 68. and when reading buf it basically is a bunch of duplicated 0xff

Last Edited: Thu. May 17, 2018 - 10:30 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

See my addendum to #5 - it's bytes that don't fit 7-bit ASCII getting encoded as Unicode ..

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:

See my addendum to #5 - it's bytes that don't fit 7-bit ASCII getting encoded as Unicode ..

 

Well that didn't really solve my problem... Here is my new code:

 

ser = serial.Serial("COM2", 9600, timeout=5)
# all_bytes = [0x63,0x20,0x72,0x69,0x67,0x68,0x74,0x73,0x20,0x61,0x6e,0x64,0x20,0x72,0x65,0x73,0x74,0x72,0x69,0x63,0x74,0x69,0x6f,0x6e,0x73,0x20,0x69,0x6e,0x0a,0x68,0x6f,0x77,0xff,0x20,0xf0,0x8b]
all_bytes = [0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0x19,0x99,0x01,0x59]

 

write_string = ""
for byte in all_bytes:
    write_string += unichr(byte).encode('latin_1')

print "sending '0x" + write_string + "'..."

ser.write(write_string)

 

The problem persists. Sometimes I would get the right 36 characters, but most of the times I get double the number of bytes + CRC (32*2 + 4 =68 bytes).

 

I even tried with nodejs, same problem... The problem seems to be in the C++ code. 

Last Edited: Thu. May 17, 2018 - 10:57 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

lcruz007 wrote:
The problem persists. Sometimes I would get the right 36 characters, but most of the times I get double the number of bytes + CRC (32*2 + 4 =68 bytes).

 I even tried with nodejs, same problem... The problem seems to be in the C++ code. 

 

Well, yes, #5 above says you would get 68 bytes....

As for nodejs, perhaps it does the same 'trying to be too clever' stuff too.

 

I can sort of see what they are trying to do, but it makes it harder for embedded control....

 

You could also look thru something like this 

https://github.com/grigorig/stcgal

which uses Python on embedded controllers, so they might have hit the same issue ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you calculate your CRC on the fly?

I do not see a reference to calling the CRC routine, it might be hidden in ser.write().

Your avr might run out of processing power because it can't calculate the CRC as fast as the UART is sending.

Calculation time of the CRC may depend on the bytes send.

Doing magic with a USD 7 Logic Analyser: https://www.avrfreaks.net/comment/2421756#comment-2421756

Bunch of old projects with AVR's: http://www.hoevendesign.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

I fixed the problem... Apparently it was because I was not initializing struct termios with tcgetattr(fd, &port_settings); RIGHT AFTER initializing it and BEFORE opening the port. That fixed it. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

lcruz007 wrote:
I fixed the problem.

Excellent!

 

Now please mark the solution - see Tip #5

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:

[Now please mark the solution - see Tip #5

lcruz007 wrote:

I fixed the problem... Apparently it was because I was not initializing struct termios with tcgetattr(fd, &port_settings); RIGHT AFTER initializing it and BEFORE opening the port. That fixed it. 

?

 

It sounds like the fix was applied in the C++ code, not the Python, so it seems hard to explain ? Unless more than one thing changed here ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

lcruz007 wrote:

I fixed the problem... Apparently it was because I was not initializing struct termios with tcgetattr(fd, &port_settings); RIGHT AFTER initializing it and BEFORE opening the port. That fixed it. 

Or initialize port_settings to 0 before using bit masking to configure your particular setup.  Not sure if C++ will automatically initialize port_settings to zero, from your description it seems not to.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am not fully convinced.

A change in the CRC contents can make you think of a not so saint CRC algorithm or the hardware chip receiving the data.

I already saw some bugs in both areas and it only appears with some specific data pattern being received.

One of the most difficult bit combination for any communication system is 0xFF and 0x00.

In your case I would first isolate where the problem occurs.

You say "data sent:", and "data received:", but you are not sure if the "data received" becomes different from "data sent:" at the sender environment or at the receiver environment.

You simply don't know that information, and that is very important to know.

 

I would use a second AVR environment to receive what the first sent, and make sure about the data frame to be consistent with the "sent:" or with what the Linux told you it received.

 

Another way is to use a "protocol analyzer", that is the main reason they exist.  You install the analyzer in the physical cable, between the sender and receiver and the analyzer will tell you exactly what flows in the cable.  So you can see if the data in the cable is intact or was already changed by the sender.  That is crucial.

 

Even that a protocol analyzer can cost thousands of dollars, used on by hundreds on eBay, you can build your own - it is fun and a very useful tool.

Just a second AVR with any way to see the data if captures, with a 2x20 LCD display, a small serial thermal printer interface, or as I made, using a graphic color LCD display, $12 at eBay, where you can show almost a hundred bytes at once (small, yes, but nice, and in colors).  Obviously it requires learning how to program the color graphic display, so, for a fast result nothing better than a 2x20 LCD alphanumeric display, few buttons to scroll the captured text in the AVR SRAM up and down.

 

Once you determined with precision if the "data corruption" happened on the AVR or Linux side, you will have much better focus in the solution of the problem, and you will attack it with less doubts. 

 

As another option for "protocol analyzer", is the Saleae unit, it is a small board plugged at your PC USB, software is very good, 8 channels captures, you can intercept most of the serial communication between two machines, see the captured data on the PC display, etc.  I have one, very good.

 

 

 

 

 

Wagner Lipnharski
Orlando Florida USA