fastest ever AVR bit-bang UART: 2 cycles/bit

Go To Last Post
27 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not sure what it's good for, but I wrote a bit-bang UART that runs at half the AVR clock rate.  I tested it with an ATtiny running at 4Mhz, and had no receive errors at 2mbps.  Practically speaking, the fastest clock speed you can use would be 12Mhz, with a Pl2303 USB-TTL adapter running at 6mbps.  It's the only cheap USB-TTL adapter I know of that works at 6mbps.

 

; Ralph Doncaster 2020 open source MIT license
; 8N1 UART, 2 cycles per bit

#define __SFR_OFFSET 0
#include <avr/io.h>

#define TX_GPIO 0

; 25 instructions including ret
; r24 = char to tx, clobbers r0, r25
ttx:
    ldi r25, 1<<TX_GPIO
    mov r0, r24
    ror r0
    eor r0, r24             ; now 1=toggle, 0 = n/c
    cbi PORTB, TX_GPIO      ; disable pullup
    sbi DDRB, TX_GPIO       ; start bit
    brcc 1f                 ; no toggle
    out PINB, r25           ; bit 0
1:  sbrc r0, 0
    out PINB, r25           ; bit 1
    sbrc r0, 1
    out PINB, r25           ; bit 2
    sbrc r0, 2
    out PINB, r25           ; bit 3
    sbrc r0, 3
    out PINB, r25           ; bit 4
    sbrc r0, 4
    out PINB, r25           ; bit 5
    sbrc r0, 5
    out PINB, r25           ; bit 6
    sbrc r0, 6
    out PINB, r25           ; bit 7
    sbi PORTB, TX_GPIO      ; stop bit
    cbi DDRB, TX_GPIO       ; pullup mode
    ret

.global main
main:
    ldi r18, '0'            ; '0' = 0x30
    ldi r18, ' '            ; space = ASCII 0x20
2:  ; write 1 line of ASCII
    mov r24, r18
    rcall ttx
    inc r18
    cpi r18, 0x7f
    brne 2b
    ldi r24, '\n'
    rcall ttx
    ldi r18, 38
3:  sbiw r26, 1
    brne 3b                 ; delay 26ms
    dec r18
    brne 3b                 ; 26ms * 38 = 1s
    rjmp main               ; forever

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
...Practically speaking, the fastest clock speed you can use would be 12Mhz, with a Pl2303 USB-TTL adapter running at 6mbps.  It's the only cheap USB-TTL adapter I know of that works at 6mbps.
Thanks for pointing out this part.  Yes, I have been living in a black box.

 

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

Last Edited: Wed. Apr 8, 2020 - 08:58 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Note the Prologic USB-UART PL2303 are known to be more "problematic" than other options like the ("Rolls Royce") FTDI FT232R, CH340G, CP2102

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Blasting bits out as fast as possible is nice, I guess, but What I usually want from a software UART implementation is for it to implement “common” bit rates at a variety of different cpu clocks, maybe with some cleverness added so that the functions don’t block for an entire character time for each byte sent or (worse) waiting to be received.

 

(it’s possible that I should reconsider, given the existence of very short connections to very flexible usb/UART converters.  Hmm...)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hmm  can you make a fast,small output that woud be self-clocking in some way to the receiver,  and also fast/small code to receive it ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You may as well go all the way. There is probably only a small handful of clocks saved here even though the bit rate is doubled, but if you are after some record...

 

; UNUT license (U will Never Use This)

; UART 8N<many>, 1 cycle per bit

.set TXPORT, 0x05 //PORTB
.set TXPIN, 5

 

.macro TXbit v,reg,bp
    .if \v
    sbi \reg,\bp
    .else
    cbi \reg,\bp
    .endif
.endm

.macro TXbyte v,reg,bp
    .set b,1
    cbi \reg,\bp
    .rept 8
    TXbit \v & b,\reg,\bp
    .set b,b+b
    .endr
    sbi \reg,\bp
    ret
.endm

.macro TXtable reg,bit
    .set n,0
    .rept 256
    TXbyte n,\reg,\bit
    .set n,n+1
    .endr
.endm

 

txTable:
TXtable TXPORT,TXPIN

 

txByte:
    ldi r25,11
    ldi r30,pm_lo8(txTable)
    ldi r31,pm_hi8(txTable)
    mul r24,r25
    add r30,r0
    adc r31,r1
    ijmp

 

.global txTest
txTest:
    sbi TXPORT,TXPIN
    sbi TXPORT-1,TXPIN //DDRn
    ldi r24,' '-1
    1:
    subi r24,-1
    rcall txByte
    cpi r24,0x7f
    brne 1b
    rjmp txTest

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

Hmm  can you make a fast,small output that woud be self-clocking in some way to the receiver,  and also fast/small code to receive it ?

Manchester Code?  There's also 4-in-6 coding, like that used by the VirtualWire/RadioHead library, which has slightly higher throughput.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A "real" UART samples the incoming data multiple times during each bit and, from what I recall, uses "majority" to determine the state of that bit. In noisy environments, this can materially improve receive reliability. With only one sample per bit, as proposed here, you loose all that. It is one of those engineering trade-offs - speed vs reliability. But, then, this is transmit, only. So, its a "Soft UAT" and not a UART, after all.

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

Note the Prologic USB-UART PL2303 are known to be more "problematic" than other options like the ("Rolls Royce") FTDI FT232R, CH340G, CP2102

 

I wouldn't waste my money on a FT232R, even if it can do 6mbps.  As for the CH340G and CP2102, have you been able to get them to work at over 3mbps?

 

I've had no trouble getting my cheap PL2303 adapters (paid <$1 ea) working at 6mbps under Linux and even under Windoze.

http://nerdralph.blogspot.com/20...

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

Blasting bits out as fast as possible is nice, I guess, but What I usually want from a software UART implementation is for it to implement “common” bit rates at a variety of different cpu clocks, maybe with some cleverness added so that the functions don’t block for an entire character time for each byte sent or (worse) waiting to be received.

 

(it’s possible that I should reconsider, given the existence of very short connections to very flexible usb/UART converters.  Hmm...)

 

 

I wrote it primarily to see if I could do it, not to satisfy a particular need.  About the only possible practical application I can think of would be debugging an ISR where you want to spend as few cycles as possible transmitting the data.

 

I have done some work on a timer-based soft UART,  but it still needs some work (including receive code) before it's releases ready.

https://github.com/nerdralph/ner...

 

Despite having written multiple big-bang UARTs, my preference for getting data from tiny AVRs is to log to EEPROM.   That leaves all the IO pins free, and it's easy to read the EEPROM through ICSP or debugWire.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

Hmm  can you make a fast,small output that woud be self-clocking in some way to the receiver,  and also fast/small code to receive it ?

 

I did something like that with debugWire using a python program to detect the baud rate.

I've also had an idea to write a baud rate translator that runs on a separate AVR, however it hasn't made it to the top of the project list.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

curtvm wrote:

You may as well go all the way. There is probably only a small handful of clocks saved here even though the bit rate is doubled, but if you are after some record...

 

It seems you know just enough asm to be dangerous, but not enough to know that sbi & cbi take 2 cycles.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ka7ehk wrote:

A "real" UART samples the incoming data multiple times during each bit and, from what I recall, uses "majority" to determine the state of that bit....

 

The later UART CPU peripheral chips added triple sampling and majority voting. The first UARTs, that appeared before microprocessors became popular, condensed to an integrated circuit, Digital Equipment's discrete logic ur-UART. They only sampled the start bit twice, and all the other bits just once.

- John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ka7ehk wrote:

A "real" UART samples the incoming data multiple times during each bit and, from what I recall, uses "majority" to determine the state of that bit. In noisy environments, this can materially improve receive reliability. With only one sample per bit, as proposed here, you loose all that. It is one of those engineering trade-offs - speed vs reliability. But, then, this is transmit, only. So, its a "Soft UAT" and not a UART, after all.

 

I've thought about adding majority voting to a version of my soft UART.  It would about double the size of the code and the maximum speed would be about 3x slower.

 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What does the serial receive code look like, the R in your UART? It would have the triple-sampling/majority-voting, if you are game for that.

- John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My point was a rather picky one. No receive? Then its a UAT, not UART (Universal Asynchronous Receiver Transmitter).

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I came that close to calling his code a UAT, but I wanted to give him a chance to produce his R.

- John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
sbi & cbi take 2 cycles.
Not all AVR are created alike.  Reduced-core tinyAVR have 1-cycle cbi/sbi.

 

That's the t4/5/9/10, and the t20/40.

 

Also, every XMEGA.

 

Also, every newly-introduced so-called Xtiny (e.g. t1614 et al).

 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Thu. Apr 9, 2020 - 09:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

>It seems you know just enough asm to be dangerous, but not enough to know that sbi & cbi take 2 cycles.

 

I used to be less dangerous, but that time has passed.

 

Quite ironic - I'm looking at a 4809 datasheet instruction set, and the mcu has 4 nice usart's in it. I guess its also dangerous to think all instruction sets are created equal, which is why I will let a compiler do the dirty work, and let hardware take care of usart duty. Start getting into various arm/cortex-m instruction sets, and you will have a full time job figuring out what you can/cannot do with each variation (or just treat them all like a m0).

 

I guess I'll have to modify a little-

 

.set TXPORT, 0x05 //PORTB
.set TXPIN, TXPORT-2 //PINn
.set TXDDR, TXPORT-1 //DDRn
.set TXPINn, 5
.set TXPINREG, 16

.set pv,1
 

.macro TXbit bv
    .if \bv == pv
    nop
    .else
    out TXPIN,TXPINREG
    .set pv,pv^1
    .endif
.endm
.macro TXbyte v
    TXbit 0
    .set bv,\v
    .rept 8
    TXbit bv & 1
    .set bv,bv>>1
    .endr
    TXbit 1
    ret
.endm

 

txTable:
    .set n,0
    .rept 256
    TXbyte n
    .set n,n+1
    .endr

 

txByte:
    ldi r25,11
    ldi r30,pm_lo8(txTable)
    ldi r31,pm_hi8(txTable)
    mul r24,r25
    add r30,r0
    adc r31,r1
    ijmp

 

.global txTest
txTest:
    sbi TXPORT,TXPINn
    sbi TXDDR,TXPINn //DDRn
    ldi r24,' '-1
    ldi TXPINREG,TXPINn
    1:
    subi r24,-1
    rcall txByte
    cpi r24,0x7f
    brne 1b
    rjmp txTest

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
    sbrc r0, 2
    out PINB, r25           ; bit 3
    sbrc r0, 3
    out PINB, r25           ; bit 4
    sbrc r0, 4
    out PINB, r25           ; bit 5

Wait - doesn't the decision to toggle the output bit need to depend on the previous bit as well as the current bit?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

that is why this is done :

    mov r0, r24
    ror r0
    eor r0, r24             ; now 1=toggle, 0 = n/c

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ah.   Missed that.  Thanks.

cute trick - might it be useful elsewhere?  Were you (Ralph) already using it in your fast SPI code?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Some SW quadrature decoders use this "trick".

 

  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Most of my serial use these days is at logic levels over 30cm of cable. I doubt mutiple sampling of received levels would add anything.

I have, in the distant past, used the trick of unequal bit durations to achieve an overall standard bit rate with an unfriendly system clock frequency.

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

Ah.   Missed that.  Thanks.

cute trick - might it be useful elsewhere?  Were you (Ralph) already using it in your fast SPI code?

 

No.  It uses bst/bld and then an out instruction to set the clock low at the same time as outputting the data.  One more out to the PIN register toggles the clock high.

https://github.com/nerdralph/ner...

 

I can't think of any way of encoding the data that would get the time per bit down to 3 cycles from 4.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I can't think of any way of encoding the data that would get the time per bit down to 3 cycles from 4.

in many cases this 3 clk structure would work (where the other 7 bits on the port is known)

bld  (load T with bit to send)

bst  (store it at the bit place of port)

out PORT 

And the 100% ugly 2 clk version where output has to be low bit on the port (and the value of the rest of the port don't matter)

ror

out 

 

And a side comment that don't really matter anymore, but the first AVR's don't toggle when writhing to PIN (I think the mega8 was the first, a chip like the the mega161 don't) 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:

clawson wrote:

Note the Prologic USB-UART PL2303 are known to be more "problematic" than other options like the ("Rolls Royce") FTDI FT232R, CH340G, CP2102

 

I wouldn't waste my money on a FT232R, even if it can do 6mbps.  As for the CH340G and CP2102, have you been able to get them to work at over 3mbps?

 

I've had no trouble getting my cheap PL2303 adapters (paid <$1 ea) working at 6mbps under Linux and even under Windoze.

http://nerdralph.blogspot.com/20...

 

 

A single banner number is only partial information about any UART, and especially a USB one.

The next test/question is what sustained baud rate can they support ?  (for that, I send large files, of 0x55 and connect a frequency counter)

and what duplex rate can they support ? Usually, this is significantly lower than the one-way speed.

 

The EXAR uarts can go to 12MBd, but of course they cannot sustain that as an average over FS-USB, but then can do it over short bursts via their buffers.

If speed really matters to you, you will just buy a FT232H or FT232H  and those can sustain 12MBd in both directions.