Strange (resync?) bug while listening UART output on PC

Go To Last Post
21 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello, community!

 

I try to set up communications between my AVRs and PC. I followed this awesome tutorial on UART and very basic test (sending the letter 'A') "works" with the following code:
 

#include <avr/io.h>
#include <util/delay.h>

#define USART_BAUDRATE 9600
#define BAUD_PRESCALE (((F_CPU / (USART_BAUDRATE * 16UL))) - 1)

int main(void) {
    UCSRB = (1 << RXEN) | (1 << TXEN); // enable UART
    UCSRC = (1 << UCSZ1) | (1 << UCSZ0); // 8-bit data frames
    UBRRH = (BAUD_PRESCALE >> 8); // set baud rate
    UBRRL = BAUD_PRESCALE & 0xFF;

    while (1) {
        while ((UCSRA & (1 << UDRE)) == 0) {};
        UDR = 'A';
    }
}

Serial settings: 9600 baud rate, no parity bit, 1 stop bit. I use cutecom 0.50 (running ahead: minicom, screen and other tools including my own C-programs give the same result) on Archlinux with the latest stable kernel available. But I experience the following problem: sometimes when I open serial port via my terminal I see the right letter 'A' (0x41 hex) and sometimes I see total garbage. Fast reconnects is the possible way to resync (?) and after some tries and fails I get the proper letter in terminal.

 

I tried a lot and there is some results observed:

HEX BIN
0x05 00000101
0x15 00010101
0x2a 00101010
0x41 01000001
0x50 01010000
0x54 01010100
0xa8 10101000

As soon as I send the letter 'A' via UART and use the settings listed above I expect that one frame of data should look like this: 0 01000001 1 (according to the RS-232 spec). But as you could see from the results table often some patterns of 10101 occur shifted by some zero bits and this pattern can't be the direct product of messing with the start bit of initial data frame. However, the problem could be mitigated if I add some delay (e. g. 100 ms) after the single transfer but not fixed at all. Also setting 2 stop bits for transfer protocol slightly improves results.

 

I use MAX232CPE+ as TTL to RS-232 level converter and the 10 uF electrolytic caps with it (schematics are included in attachment). Also I use 11.0592 MHz crystal oscillator for the chip so there should be no problem with UART errors. As for AVRs: the problem persists for both ATmega8 and ATtiny2313 chips.

 

I wonder why this problem arises even for so small baud rates (on 1200 it could be observed too) and can't find any successfull solution for it so any advice will be great.

 

Thank you

Attachment(s): 

Viktor Drobot
A.N. Belozersky Research Institute Of Physico-Chemical Biology MSU

Last Edited: Tue. Oct 9, 2018 - 08:55 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The schematic shows a crystal, and indeed 11.0592 is UART-friendly.  Is your processor really running off of that crystal?  Perhaps post the .LST for that sample program, and we can divine the F_CPU and UBRR values.

 

It certainly is possible, and would explain your symptoms, if the connections are noisy or ratty.  Check at the receiving end with a 'scope.  If you use a 'U' character then you get a nice square wave.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dviktor wrote:
UCSRC = (1 << UCSZ1) | (1 << UCSZ0); // 8-bit data frames

Which AVR model are you working with?

REceiver end set up to match the transmitter end?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I forget to note: I use avr-gcc, if that matters
 

@theusch

Now I experiment with ATtiny2313A (fuses are: L:0xFF, H:0x99, E:0xFF). LST file is in attachment (changed extension because of forum engine). Rx and Tx lines are crossed properly. I've made test schematics on breadboard, if that matters too. I'll test receiever with scope a bit later

Attachment(s): 

Viktor Drobot
A.N. Belozersky Research Institute Of Physico-Chemical Biology MSU

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
  15 0000 88E1      		ldi r24,lo8(24)
  16 0002 8AB9      		out 0xa,r24
  17 0004 86E0      		ldi r24,lo8(6)
  18 0006 83B9      		out 0x3,r24
  19 0008 12B8      		out 0x2,__zero_reg__
  20 000a 87E4      		ldi r24,lo8(71)
  21 000c 89B9      		out 0x9,r24
  22 000e 81E4      		ldi r24,lo8(65)
  23               	.L2:
  24 0010 5D9B      		sbis 0xb,5
  25 0012 00C0      		rjmp .L2
  26 0014 8CB9      		out 0xc,r24
  27 0016 00C0      		rjmp .L2

On Tiny2313A, UBRRL is at 0x09 and 71 sounds good for 9600 @ 11.0592.

 

Code looks OK to me.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There is no inherent syncronisation in Serial data / RS232.

There are start and stop bits, but these look just like any other bits.

So when you have a continous stream of serial data it is not always easy to re syncronize.

 

If there is a lot of variation in your data, you will get occasional framing errors and the uart hardware / software will eventually very likely lock onto the right start bits, so then there are no more framing errors, but this is no guarantee.

 

The easiest way to force a resyncrhonisation is to have an occasional pause on the RS232 wires which is longer than a whole frame.

 

Also:

When writing RS232 software for Linux I noticed that the kernel is not happy with framing errors (parity errors / etc) on a serial port.

It seems to need an exorbitant amount of time to process those errors and this might lead to buffer overflows and data getting thrown away.

I got the impression that handling of (lots of) RS232 errors was never properly debugged.

Doing magic with a USD 7 Logic Analyser: https://www.avrfreaks.net/comment/2421756#comment-2421756

Bunch of old projects with AVR's: http://www.hoevendesign.com

Last Edited: Tue. Oct 9, 2018 - 10:02 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I know that there are only start/stop/parity bits for resync process but I've tried to parse incoming frames manually to find possible right variations. However there are no meaningful results because in original correct frame 0 01000001 1 two '1' bits are coming together while in observed actual results there are bytes without two 1s coming one after another... So I think the only possible way to debug this is to analyze physical output with oscilloscope (thanks for the suggestion with letter 'U', theusch!). I'll try it as soon as possible

 

The easiest way to force a resyncrhonisation is to have an occasional pause on the RS232 wires which is longer than a whole frame.

I wonder to know how the community usually solves the problem of continuous transmission of data via UART/RS-232. Are there any pitfalls such as described intentional delay or of some sort?

 

Also I would like to clarify: what was the kernel version (or distro, may be) of your Linux? Seems like in my case I don't miss any data - just a shift of frame start...

Viktor Drobot
A.N. Belozersky Research Institute Of Physico-Chemical Biology MSU

Last Edited: Tue. Oct 9, 2018 - 11:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

start,  stop, and parity do not, alone, guarantee "sync". For example, if your PC should happen to turn on its  UART in the middle of a character, it will recognize one of the character bits as the start bit and will give you an invalid character. Ultimately, it will sync, but it may take many characters.

 

Synchronization will be faster if the sender sends with two stop bits. But, it may still be slow. Better, yet, is to occasionally insert a gap between transmitted characters that is longer, in time, than the duration of one character. You will have almost instantaneous sync if you put that large of a gap between every character. You may get an initial bad character, but following ones should all be correct.

 

This is the inherent nature of async serial. 

 

One way around this, is to always start a message with a "preamble" that is a sequence of known characters. Space each of these preamble characters more than one character apart. Have the PC respond with some unique character when it has sync'd to the preamble. Then, the micro can start with the real message, and forget about those additional between-character spaces. There are other strategies. One is to have the micro wait for a "connect request" character from the host, after which the micro responds with a message. Lots of alternatives. Its just that the one you are trying is P-A poor.

 

Jim

 

Until Black Lives Matter, we do not have "All Lives Matter"!

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Method with preamble reminds me my experiments with byte stuffing algorithm... But it suffers because of out of sync too, but to a lesser extent, doesn't it?

 

Thanks for your suggestion about little time gap! I think that including time pause of duration T = 2 * 1 / BAUDRATE * 10 first to the transmission of data chunk will be enough: 8N1 format needs 10 bits of data and missing about 2 frames should guarantee proper resync...

 

I've thought about making uC as slave which just waits for the incoming connection from PC, but this method would suffer from resync too if the device is connected in the middle of "ping" transmission from PC... Again, we should include some extra pauses between frames...

 

Anyway, thanks for the given ideas. I'll try and see which resolves issue best

 

EDIT: I've fixed the source a bit. Because in my test system I send the letter continuously in loop I've decided to make small time gap after successful transmission. Now it seems to work (almost) properly - in the worst case only one character is screwed up not only at 9600 baud/s but up to the 115.2 kbaud/s!

 

#include <avr/io.h>
#include <util/delay.h>

#define USART_BAUDRATE      9600
#define BAUD_PRESCALE       (((F_CPU / (USART_BAUDRATE * 16UL))) - 1)
#define USART_MISSFRAMES    2
#define USART_FRAMELENGTH   10 // 8N1

int main(void) {
    UCSRB = (1 << RXEN) | (1 << TXEN); // enable UART
    UCSRC = (1 << UCSZ1) | (1 << UCSZ0); // 8-bit data frames
    UBRRH = (BAUD_PRESCALE >> 8); // set baud rate
    UBRRL = BAUD_PRESCALE & 0xFF;

    while (1) {
        // send byte
        while ((UCSRA & (1 << UDRE)) == 0) {};
        UDR = 'A';

        _delay_us(USART_MISSFRAMES * USART_FRAMELENGTH * 1000000 / USART_BAUDRATE);
    }
}

Now I think that organizing data into packets by COBS with the addition of CRC at the end of the data packet will allow to solve the problem of transmission resync to the packet's beginning

Viktor Drobot
A.N. Belozersky Research Institute Of Physico-Chemical Biology MSU

Last Edited: Wed. Oct 10, 2018 - 12:29 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have never experienced an error or the like even if continuously transferring 1Mbyte of data at 230.4kbps in an appropriately designed hardware.
If the error of the baud rate is within about 2%, resynchronization is performed every time the start bit is reached.
If the problem is resynchronization due to a baud rate error, there is also a method of setting the stop bit to 2 bits. The second bit becomes an appropriate interval, and synchronization by the start bit is facilitated.

 

However, it will not solve the fundamental problem. I can not predict the cause for now.
First of all, it is necessary to accurately measure the bit width of transmission / reception with an oscilloscope.
 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wonder if 10 uF caps is good for MAX232CPE+ chip? This is the first schematic I've seen on the web and employed in hardware, but now I can't find it =\

The other schematics (including reference from datasheet) say that caps should be 1 uF

 

Does anyone have experience with caps differ from 1 uF reference design value?

Viktor Drobot
A.N. Belozersky Research Institute Of Physico-Chemical Biology MSU

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Early generation chips used 10uF later ones used 1uF. As to possible effects from using larger caps, I don’t know.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Finally I found some time to test output with the scope. In my case this is the simplest Hantek 6022BL

I've tested both the Tx pin on AVR and corresponding Tx pin on MAX232CPE+ chip. As you can see from graphs attached USART is working good so it's definitely the problem of software used because of poor start resync.

Attachment(s): 

Viktor Drobot
A.N. Belozersky Research Institute Of Physico-Chemical Biology MSU

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just one comment :

 

Use handshake signals if it has to sync correct from first char!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you mean RTS/CTS/DTR lines?

Viktor Drobot
A.N. Belozersky Research Institute Of Physico-Chemical Biology MSU

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you noted that (almost) all of your errors have the sequence 10101 in them?

This is a strong indication that you simply miss the start bit.

And after that RS232 can not sync reliably on a continuous data stream.

Never ever (in a reliable way). RS232 is now 58 years old, In that time they used the "BREAK" condition (which lasts longer than a frame) to overcome such resync problems.

Have you tried resetting the AVR when those errors occur?

This will create a pause on the serial data stream and gives the PC an opportunity to sync.

 

But for reliable communication:

Do not send continuous data streams over RS232.

I'm serious: Don't do it.

At the very least, create a pause which lasts longer than a single frame every now and then.

 

dviktor wrote:
Do you mean RTS/CTS/DTR lines?
I'm not sure what this will do if you do not stop the RS232 data stream. And if you stop the RS232 data stream that is already enough to re-synchronise.

 

About the caps for the Maxim chip.

What does the datasheet say for your particular MAX232 variant?

Also, you say you have a 6022BL You should be able to measure voltage ripple on the max232 pins with that.

 

Just curious, what software do you use with the 6022BL? OpenHANTEK, or does the hantek software work with linux / wine ?

Have you used it with Sigrok / Pulseview?

I have both a Rigol scope and some of those FX2lafw LA's, but I'm thinking of adding the 6022BL to my toolbox to measure analog voltages with my PC.

 

Why MAX232?

You probably know of chips like CH340 and PL2303

https://www.aliexpress.com/wholesale?SearchText=pl2303

These dongles work very nicely with Linux. No need to install drivers.

Small dongles inclusive USB connector cost USD0.48 inclusive shipping (Absolutely redicilous) but (still) true.

The USB cables have a built in PCB.

The connector housing is (often?) made from hard plastic, which is good enough for dayly use, but can still be easily splitted on the seam if you want to look inside or have to replace the cable if it is worn out.

Doing magic with a USD 7 Logic Analyser: https://www.avrfreaks.net/comment/2421756#comment-2421756

Bunch of old projects with AVR's: http://www.hoevendesign.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, I've noticed this pattern and also tried to find the cause of the mess but observed data does not obey simple reasoning: I send the byte which corresponds to the letter 'A'. With 1 stop bit and 8 bit data in frame we get the next pattern: 0 01000001 1. So you can see that two bits with the value '1' follow each other. But you can see from the table in my first post that there's no patterns with two '1' together. Seems like my terminal program inserts extra '0' between the last data bit and stop bit...

 

If I reset AVR then there is high probability that data stream becomes good, I tested this a lot. Also including pause of 2-frame length (T = 2 * 1 / BAUDRATE * 10) allows miss only one byte at most in case of sync failure.

 

In my device I need long data streams so I can't drop that idea. I think that making these extra pauses before long transmission (and in the middle, probably) will let me avoid this nasty resync bug.

 

For MAX232CPE+ 1uF caps are recommended. I'll try to measure ripples on caps a bit later. As for 6022BL - I have been using OpenHantek, but now it's under maintenance and development very slow. I tried official app under wine but with no luck - seems like wine cannot access USB devices directly... As soon as I know sigrok is for logic analyzer only, not for the scope mode. But I want to try it soon. So at now I have VirtualBox with Windows installed and use official app inside virtual environment. Works perfect.

 

My device shouldn't be used with PC everyday so I decided to make solid DB9F connector on my PCB and use classic serial comms over it. As for me the use of RS-to-USB dongles seems a bit awkward in such app. But thank you for pointing me out at different chips!

Viktor Drobot
A.N. Belozersky Research Institute Of Physico-Chemical Biology MSU

Last Edited: Mon. Oct 22, 2018 - 09:02 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dviktor wrote:
But you can see from the table in my first post that there's no patterns with two '1' together. Seems like my terminal program inserts extra '0' between the last data bit and stop bit...
The problem here is that you make assumptions about faulty data, and those assumptions may well also be false.

If sync is lost, a single bit may be sampled twice, or not at all. (Depends on noise, signal quality, small baud rate variations etc).

It is a trap that I also fall in far to often.

Finding faults in faulty electronic ciruits is a very difficult and annoyingly slow process for me, because I make such assumptions all the time.

 

https://sigrok.org/wiki/Hantek_6022BL

The device can either be used as oscilloscope or as logic analyzer, but not both at the same time. I.e., it is not a mixed-signal-oscilloscope (MSO).

Currently only the 8-channel logic analyzer mode is supported.

I've used genery CY7C68013A development boards with upto 16 channels (briefly) in Pulseview.

As the hardware of that chip only supports the USB streaming mode over a single port I'm almost certain that the 6022BL uses the same pins.

Which means probably only a VID/PID change for Pulseview to make all 16 channels work.

But just for the LA, the small "24Mhz 8ch" boxes from Ali / Ebay / China are more convenient to use.

And if you let the magic smoke out it's only USD 5 to replace (I have a bunch of them in a drawer somewhere).

 

I have never ever tried to use any analog stuff with Sigrok yet. No DMM, Power supply, Oscilloscope or any of the other 100's of devices, but I am kind of curious about it though.

 

Glad to hear that OpenHantek works for you.

How complete is it? Is it "almost completely working" or does it just have some basic functionality and leaves much to wish for improvements?

 

If you have a DB9 on your pc you might as well use it :)

Doing magic with a USD 7 Logic Analyser: https://www.avrfreaks.net/comment/2421756#comment-2421756

Bunch of old projects with AVR's: http://www.hoevendesign.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh, I didn't know about double-sampling and other pitfalls so I tried to analyze problem ab initio.

 

For make 6022BL work with sigrok you also need firmware for it - in that case sigrok auto-upload it to the SRAM of the LA and you can use it with sigrok out of the box (H/P button should be depressed, state "P")

 

There are a lot of improvements possible with OHantek - auto zero compensation (for example, with 1 kHz test signal I have shift about 40 mV while in official app this zero-shift is auto-compensated), different measurement modes (as far as I know you could only measure peak-to-peak and frequency right now). Also yesterday I got another bug - while sample 1 kHz calibration signal from the scope itself is good enough, measuring LED blinking with period of 40 ms on my breadboard is impossible - trigger just not working! I should do some more research... As for now, using official app is more convenient than OHantek software.

Viktor Drobot
A.N. Belozersky Research Institute Of Physico-Chemical Biology MSU

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Recently I switched from Mint to Debian Buster.

To get my LA working again I had to apt install sigrok-firmware-fx2lafw as a separate package.

The 6022BL probably needs the same.

 

The Pulseview I have now seems less buggy than the previous version.

Sigrok / Pulseview is moving. Bugs get fixed, stuff gets added on almost a monthly basis.

 

Just yesterday I was experimenting with STM32F103, with a USB device program.

Just for fun I attached the Logic analyser to the usb, and I was able to capture a lot of packages from the USB bus, but also a bunch of errors.

I expected this, because the maximum sample rate of my hardware is 24Msps, while the USB bitrate is 12Mbps.

Most of the bits got sampled twice, but when a bit gets sampled only once Sigrok gets confused about the timing.

For decent sampling with a Logic Analyser you would want to sample at least 4x the signalling rate.

 

I also like knobs on my Oscilloscope.

My Rigol is adequate, but getting a bit old.

There is a chance I'm going to do some serious development on a 3-phase motor controller.

If that goes through then I'm going to buy a Siglent SDS1104X-E.

https://www.batronix.com/shop/oscilloscopes/Siglent-SDS1104X-E.html

Not the cheapest scope, but also not the most expensive, and it delivers a lot of scope for that price.

All scopes I've seen of < USD 300 are not much more than (possibly usefull) toys.  And you get soo much extra for those extra EUS 200.

It also has Ethernet and serves as a web server, you can browse to it with a web browser to push virtual knobs and easily transfers screendumps.

This is still pretty buggy though, but still a nice feature.

 

That scope is also usable as a 4 channel LA and has protocol decoders for I2C, serial, and maybe some more.

And it does that with it's 4 analog channels, so you can zoom in on the RC times and bus conflicts of your I2C bits.

Doing magic with a USD 7 Logic Analyser: https://www.avrfreaks.net/comment/2421756#comment-2421756

Bunch of old projects with AVR's: http://www.hoevendesign.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dviktor wrote:
(...)  I expect that one frame of data should look like this: 0 01000001 1 (according to the RS-232 spec).

 

No, RS-232 has "LSB first" order. The pattern is 0 1000 0010 1, that's why you never observe two sequential ones.

Last Edited: Mon. Oct 22, 2018 - 02:05 PM