Plenty of serials

Go To Last Post
38 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm facing a new project that requires a lot of serials: at least 12, better if 14. All of them are asynchronous, end they'll end up in RS232 or RS485. I wrote my own library based upon the official application note that uses interrupts and internal buffer to minimize the time spent in the ISR. This time I'm afraid about the quantity of serials line! Some other info:

 

  • all serials have a maximum baudrate of 115200 bps, some of them are slower (19200 bps)
  • half of them exchange data (in both direction) every 50 ms with a payload of about 200 bytes
  • the MCU(s) doesn't do much more than "redirect" data among the serials 

 

It's quite easy to define the buffer sizes to avoid overflows, but I'm not sure if the MCU @ 32 MHz can handle all those interrupts.
I'm going to use two xmega*A1 that have 8 USART each: on is used to connect both units, and I have 14 USARTs for the application. I know I might use SPI for this but it's not as easy as asynchronous serials (where packets have different lengths and they can arrive at any moment).

 

Do you see any critical issue in this design? 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

iw2nzm wrote:
... based upon the official application note that uses interrupts and internal buffer to minimize the time spent in the ISR.
AVR1307?

AVR1307 has one small section that mentions DMA and AVR1304.

 

Microchip Technology Inc

Microchip Technology

Application Notes

AN_8049 AVR1307: Using the XMEGA USART

http://www.microchip.com//wwwAppNotes/AppNotes.aspx?appnote=en592034

via https://www.microchip.com/wwwproducts/en/ATxmega128A1U

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes it was AVR1307. The DMA section refers to AVR1304 (Using the XMEGA DMA Controller).

But I now very little about DMA - I'm going to search some examples how to use it to receive/send data from USARTs.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is latency a problem?

Can you buffer whole packets before resending them, or do you want to start sending after you collected a few bytes?

I have run into trouble with this. If uart baud rate is a bit off your buffer can get empty halfway a packet if you start too soon.

But if you buffer a whole packet, send it to the other Xmega, which also buffers it, you have a delay of twice the packet length.

 

A packet of 200 bytes every 50ms is 4kBytes/s, almost half the bandwidth of a 115k2 uart.

It looks like you will defenately need the help of DMA to get your throughput without overloading the CPU itself.

How many DMA channels does the Xmega you want to use have? Is it enough?

Does a single 115k2 uart channel between your 2 CPU's have enough bandwith?

SPI can have much higher bitrates, that could be important for you.

 

A long time ago I did some brainstorming for a similar program, but I never finished it.

My idea back then was to use a bunch of AVR's, each with a single UART and connect them all together via a "high speed" SPI bus.

On regular AVR's it should be possible to put several AVR's on a single SPI bus and put them all in "Master" mode.

At the moment One of the AVR's puls the Slave Select low, all others drop into slave mode and start listening.

That project however never got beyond the initial planning stage.

Paul van der Hoeven.
Bunch of old projects with AVR's:
http://www.hoevendesign.com

Last Edited: Sun. Apr 15, 2018 - 02:44 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Paulvdh wrote:
On regular AVR's it should be possible to put several AVR's on a single SPI bus and put them all in "Master" mode.
megaAVR and XMEGA AVR with EBI can also communicate via shared memory.

Dual-port SRAM is fast though there are semaphore signals in addition to the usual SRAM signals.

An FPGA can connect three or more AVR to one shared SRAM, DRAM, MRAM, FRAM, etc.

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi, thanks for your contribution. Delay or latency is definitely not a problem. Right now I already buffer twice the packets and all works as expected. This leads to several advantages: if needed I can decode the packet before sends it ahead, or do some error checking and so on.

In which way should I use DMA? With the current library I have a circular buffer: the received chars and queued in and in the main loop I dequeue them until I detect the end of a packet. Is this possible with DMA?

 

About SPI network: it's very interesting. If I understand correctly you would use it in one-way only. I mean, the device that pulls down the line becomes a master and sends on the MOSI line the data. All the others receive it, without sending back anything. I can make some thoughts on it. Perhaps using some xmega even with only 4 serial ports might simplify a bit the stuff. 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The interrupt routines shouldn't be a problem.  There should be an separate IRQ for each UART port.  Each IRQ should be called when a single new byte arrives into the UART.   The IRQ does nothing more than read the new value from the UART's data register and puts this value into a circular ring buffer.  Then it increments the input offset for the buffer, and compares the new ptr value to the buffer's MAX value.  If less, then exit the IRQ, else assign the value zero to the input offset for the buffer.   All told, maybe ten microseconds to handle each IRQ for any newly arriving byte on the UART set. The interrupts are prioritized so that when one completes, the next lower priority IRQ begins ( after a single main instruction execution).

 

   It's the main code that examines each UART's input buffer for new data and packet completion.  Each UART will have a 200 byte input buffer and a 200 byte output buffer: about 5200 bytes buffer space.  When a complete packet arrives, the main code will copy the data to the output buffer of the correct UART and start the TX interrupt-based transmission, which runs in the background.  Copying 200 bytes takes about 10-20 microseconds.

 

  A problem may be when all the input UARTs want to send new packets to the same output UART.

 

 Direct Memory Access DMA will take each new data byte on each UART and stuff it into a memory location (the UART's buffer in this case) and automatically increment the memory location address for the next arriving data byte rather than having the byte be stored in a register.  This doesn't have any advantage when the data is arriving a relatively slow rate of 11,520 bytes per second (115.Kbaud).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Simonetta, thank you. Right now my UART handling works exactly how you described. Before I said: "it's quite easy to define the buffer sizes". This is because, I know enough about my system to guess where serials want to write. And here I described the worst case. 

Bottom line, thanks to you all I think I can rely on my current code, and try to optimize it more.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If the processor isn't doing much useful with the data, maybe you should use some mux chips to simply connect the desired signals (as directed by processor), or use an FPGA to make up a nice mux..

When in the dark remember-the future looks brighter than ever.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nope. It does little... but useful!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What is the total datarate which has to be shoved around?

115k2 is about 11kB/s. In the unlikely event of 14 simultaneous full-dubplex usarts that's around 300kB/s.

Maybe still doable, but getting near the danger zone for small uC's.

A system running on several hundred MHz seems to be a safer bet to get going.

 

I the idea Simonetta suggested of fixed buffers for all the usarts and copying data between the buffers is common, but maybe not the best way.

A "pool" of buffers seems to be smarter. Every UART (or DMA channel, whatever) gets a buffer assigned for incoming data only, and once a complete packet has been read into a buffer, that UART gets a new pointer from an empty buffer from the pool. the data in the filled buffer can then be send further on its way either by UART / SPI / DMA, maybe even a parallel 8-bit interconnect bus. Once the packet has been sent, that buffer is added to the pool of empty buffers for re-use. This may be a bit tricky to debug, beware of race conditions and such, but it avoids all the copying of data packets between buffers, which may be an important factor in a resourse constrained system.

 

Is the total cost of the system an important factor?

A small PC class machine with a bunch of UART <==> USB devices might be an option.

 

gchapman suggested an FPGA. I think a decent FPGA is capable of handling all the UART's directly with plenty of bandwidht to spare. I have never worked with FPGA's myself, but as far as I know they are excellent in serializing data and shoving data around at high speed.

Paul van der Hoeven.
Bunch of old projects with AVR's:
http://www.hoevendesign.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What an interesting project.

 

I'm guessing that none of the serial comm's channels use either hardware or software flow control?

 

If any of the channels used flow control then clearly your board doesn't control the delay at the other end, and your board could easily overflow any buffer size...

 

I'm also guessing that the system is such that you won't have two incoming packets at the same time with the same destination?  If so, then the task is much more complex, as you need to make sure you don't "mix" / interleave data from the two incoming sources into the single outgoing channel's buffer.

 

I wonder about the need for obtaining / verifying a full packet before sending it on.  If this is necessary then so be it.  If the end receiving device is going to do its own incoming data packet analysis and end of packet determination, then your board in the middle may not need to do so. 

 

The issue here is that from a data throughput perspective, if you hold a packet until it is full before initiating its transmission then you impeding your overall system's throughput for two reasons.

 

First you are spending processing time continually testing for a full packet condition.

 

Next, the USART isn't being allowed to use any of the time while the packet is being collected to send data.  i.e. the USART transmit side has a lot of dead time where it isn't doing any data transmission.  Then, when there is a full packet, it suddenly has to send 200 Bytes, (or whatever), all at once.  The system is essentially "batching" the data transmission, so the USART transmitter is either doing nothing, or it is very busy.  With sixteen USARTs running simultaneously, it might be better to have them all running continuously, instead of potentially have them all get very busy simultaneously.

 

Next question, is it just the Received data that is ISR driven, or both TxData and RxData.  I would certainly expect the RxData to be interrupt driven, but one might envision a system where the Main Loop, as it looks at the various buffer/USART channels, reads that USART's TxData busy bit and then stuffs the next data Byte to send.  That would (likely) be more efficient from an overall system perspective than using interrupts for the transmission side as well.  Also, given the number of data channels for the Main Loop to Round-Robin process, the Tx Busy bit is likely never to be busy when read.

 

Two final comments:  Is there enough processor power remaining to monitor a "watermark" for each buffer to see if they ever exceed 80% (or whatever) full?  If so, one can turn on a Red warning LED...  (A project can never have too many LED's!)  The real purpose, however, is not to have more flashing lights, but to be able to verify the buffer safety margin while the system is in real time usage, and perhaps be able to set the buffers to different sizes if it turns out that certain buffers tend to need more space than others.

 

Finally, it sounds like this might be a "mission critical" application, so evaluate this last comment carefully.  The Xmega's, if one isn't using the EEPROM or the analog modules, (ADC, DAC, AC), can likely run well while being a bit over-clocked.  Atomic Zombie says they run well at 64 MHz (IIRC).  I've run a couple at 48 MHz without any problems, (room temperature).  Getting an extra 16,000,000 clock cycles / second by changing the Xmega clock PLL value might be an option if you need more processing power. 

 

Remember that the chip's 32 MHz spec is what the manufacturer is willing to guarantee over the full production run, and temperature and voltage range.  There are many, many PC's and digital O'scopes, etc., that run just fine being over-clocked, and validated by the designer/manufacturer, albeit "out of spec".  Engineering is all about tradeoffs, and OC'ing isn't an inherently bad thing to do as long as one goes about it in a reasonable manner.   But it sounds like your system is already up and running well without this being required.

 

JC

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Paulvdh wrote:
A "pool" of buffers seems to be smarter.
 

 

Got it. I think it should be easy enough for my application and would be faster than playing with all those circular buffers.

 

Quote:
Is the total cost of the system an important factor? A small PC class machine with a bunch of UART <==> USB devices might be an option.

 

Cost is not an important factor, time is. And we need to keep all this stuff in hardware only for several reasons. I'm trying to do it with xmega because I know them quite well, and the deadline is too short to learn something new like FPGA or such.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

DocJC wrote:

I'm guessing that none of the serial comm's channels use either hardware or software flow control?

 

Good guess :)

Anyway, I know the maximum length of each packet. Most of them are shorter than the 200-byte constraint above - it's just a "worst case scenario".

 

Quote:

I'm also guessing that the system is such that you won't have two incoming packets at the same time with the same destination?  If so, then the task is much more complex, as you need to make sure you don't "mix" / interleave data from the two incoming sources into the single outgoing channel's buffer.

 

Well, actually several serials will have the same destination. To better understand the situation think of a remote device (one long RS485 line) that collects data from various sensors (other RS232 lines). All of these sensors will go into the remote device. The good news is their data is quite short.

 

Quote:

I wonder about the need for obtaining / verifying a full packet before sending it on.  If this is necessary then so be it.  If the end receiving device is going to do its own incoming data packet analysis and end of packet determination, then your board in the middle may not need to do so. 

 

I need to do so on some packets because the payload is useful for both boards: the "local" one (the one we're talking about) and the remote one. Furthermore it happens that some information should be forwarded to multiple channels. For example, I have a PC with a software and a small serial display that share some of the data. Because I have a CRC to validate the packets (very fast, just an XOR of the bytes) I have to wait the whole packet before know that it contains and then what to do with it.

 

Quote:
Next, the USART isn't being allowed to use any of the time while the packet is being collected to send data.  i.e. the USART transmit side has a lot of dead time where it isn't doing any data transmission.

 

This is not true - I guess. I have two different circular buffers (or pool of buffers) for both rx and tx. And any data received from one usart won't be retransmitted to the same port. Hence, even when receiving a packet on a port I can send other data on the same port. I already do this, though with fewer serials.

 

Quote:

Next question, is it just the Received data that is ISR driven, or both TxData and RxData.

 

I use both tx and rx interrupts (and dre as well). The main loop will (de)queue bytes from the circular buffers, all the sending/receiving is interrupt driven. If the buffer is large enough, you can easily append multiple packets to the same port. 

 

Quote:
Two final comments:  Is there enough processor power remaining to monitor a "watermark" for each buffer to see if they ever exceed 80% (or whatever) full?  If so, one can turn on a Red warning LED...  (A project can never have too many LED's!)  The real purpose, however, is not to have more flashing lights, but to be able to verify the buffer safety margin while the system is in real time usage, and perhaps be able to set the buffers to different sizes if it turns out that certain buffers tend to need more space than others.

 

This is a very good suggestion! Thanks.

 

Quote:

Finally, it sounds like this might be a "mission critical" application, so evaluate this last comment carefully.  The Xmega's, if one isn't using the EEPROM or the analog modules, (ADC, DAC, AC), can likely run well while being a bit over-clocked.  Atomic Zombie says they run well at 64 MHz (IIRC).  I've run a couple at 48 MHz without any problems, (room temperature).  Getting an extra 16,000,000 clock cycles / second by changing the Xmega clock PLL value might be an option if you need more processing power. 

 

I didn't think about it! I bet this may be evaluated after some tests with your "red light". I mean, if I notice I'm on the edge of the MCU resources I can try to speed up it a bit. In this board I won't use EEPROM nor analog modules. Of course if I will discover I need the double of resources... well it's better to move on another platform!

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

iw2nzm wrote:
In which way should I use DMA? With the current library I have a circular buffer: the received chars and queued in and in the main loop I dequeue them until I detect the end of a packet. Is this possible with DMA?
Yes though end-of-packet would be detected in a function or after end-of-buffer.

A circular buffer is possible with DMA by the DMAC resetting the destination address at the end of a block (end of block is end of buffer)

An alternative is double buffering (XMEGA AU manual, section 5.7)

iw2nzm wrote:
If I understand correctly you would use it in one-way only.
fyi, DMA exists for XMEGA slave SPI; master SPI has XMEGA USART in SPI mode (XMEGA AU manual, section 22.6)

iw2nzm wrote:
Perhaps using some xmega even with only 4 serial ports might simplify a bit the stuff.
If dual XMEGA, evaluate the throughput and latency of each serial line then place the bridge XMEGA on the slower lines.

XMEGA A1U has 8 USART so 2 XMEGA A1U get the goal of 14; 8KB of local SRAM in XMEGA128A1U might not be enough so could consider 2-port EBI LPC SRAM or 3-port EBI SRAM.

 


https://www.microchip.com/wwwproducts/en/ATxmega128A1U

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 and the deadline is too short to learn something new like FPGA or such.

I'll venture to say that is a somewhat ridiculous statement...6 months from now you'll realize that if you started using the fpga's 6 months ago (today) you'd have a working project, rather than trying to fix one more "issue" on an overloaded processor.

 

We don't have time to re-layout our PCB, since we need to be in production in 2 months.  We'll have to modify & hack the present board to get it to work & pass the QA & customer tests. Four months later, wish we would have started laying out that board!    

When in the dark remember-the future looks brighter than ever.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

iw2nzm wrote:
Delay or latency is definitely not a problem.
That's odd as there is a latency requirement else all serial streams into a terminal server for processing in a web application.

Latency can be important for processing a limit switch or when the operator's scared after the process is off the rails and presses the BFR button.

Given Paul's small PC idea, there's the SAMA5D2 SoM where more UART can be added via USB and/or SPI; advantage XMEGA as SAMA5D2 SoM consume current an order of magnitude greater.

Another order of magnitude increase in current results an embedded PC (UART via USB, LPC, and/or PCIe)

 

GitHub - chilipeppr/serial-port-json-server: Serial Port JSON Server is a websocket server for your serial devices. It compiles to a binary for Windows, Mac, Linux, Raspberry Pi, or BeagleBone Black that lets you communicate with your serial port from a web application. This enables web apps to be written that can communicate with your local serial device such as an Arduino, CNC controller, or any device that communicates over the serial port.

https://github.com/chilipeppr/serial-port-json-server

http://www.microchip.com/design-centers/32-bit-mpus/sip-and-som/system-on-module-(som)

PC Engines Home

PC Engines

http://pcengines.ch/

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not sure if the MCU @ 32 MHz can handle all those interrupts.

I used to work with network "Terminal Servers."   7 ports at 115200bps full duplex is indeed challenging.   That's 161000+ interrupts/s - less than 200 cycles per interrupt.  Faster CPUs (ARM) might not help much, because interrupt overhead tends to go up as well.  In the terminal server world, the issue was solved with smarter UARTs.  Deep FIFO, Hardware flow control done by hardware, DMA, PPP and SLIP engines IN THE UART, and stuff like that.  The XMEGA UART only has two bytes of FIFO on the receiver, so...

There are a couple of tricks you can look at:

  1. don't forget the AVR-GCC "penalty" for calling functions from an ISR.  Arrange your code so that each ISR service routine is one function.
  2. Interrupt on TX Complete instead of TX Data Register Empty.   Halves the number of TX interrupts, at a possible cost of small gaps in the TX stream.
  3. Or use DMA.  DMA for transmit is relatively easy to implement.  RX is more of a problem, because of the whole "asynchronous"  thing.  If your data doesn't have "end" conditions that the DMA controller can recognize, it may not be worthwhile.
  4. If your messages have a recognizable structure, run state machines in the ISR instead of circular buffers with message recognition in the main code.
  5. Arrange to pass messages from rx on one uart to tx on another without needing to copy all the data.
  6. I put a bipolar LED tweak in my ISR code.  Red when in the ISR, Green when you exit.  A really quick indicator of how bad things were...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
In the terminal server world, the issue was solved with smarter UARTs.  Deep FIFO, Hardware flow control done by hardware, DMA, PPP and SLIP engines IN THE UART, and stuff like that.
There's a post here stating Exar USB UART are the fastest they've evaluated; Exar USB UART have a deep FIFO, hardware flow control, and likely DMA via the driver (Linux, macOS, Windows, Android)

Exar's Windows USB UART driver is hot-plug capable (automatic attach and re-attach) for single channel USB UART.

westfw wrote:
don't forget the AVR-GCC "penalty" for calling functions from an ISR.
fyi, FSF AVR GCC 8 is reducing some of the ISR overhead.

 


Exar | Interface | USB UARTs

https://www.exar.com/products/interface/uarts/usb-uarts

Mouser Electronics

Exar Videos

https://www.mouser.com/exar/videos/

Exar XRUSB1 Software Driver

Free Software Foundation (FSF)

GNU Project

GCC 8 Release Series — Changes, New Features, and Fixes

https://gcc.gnu.org/gcc-8/changes.html

...

The compiler now generates efficient interrupt service routine (ISR) prologues and epilogues. This is achieved by using the new AVR pseudo instruction__gcc_isr which is supported and resolved by the GNU assembler.

...

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

FSF AVR GCC 8 is reducing some of the ISR overhead.

I don't think I understand how it could get rid of pushes of the Call-used registers (r18-r27, r30-r31); it's sorta required by the ABI definition.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I hope that you can recall about it.

https://gcc.gnu.org/bugzilla/show_activity.cgi?id=20296

...

 

westfw

2012-07-29

 

...

via

20296 – Speeding up small interrupts on avr

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=20296

(History)

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There are plenty Dual/Quad even Octal UARTS on the 8bit parallel bus. Xmega A1 has 8bit parallel bus so it should be easy.

https://www.nxp.com/products/ana...

 

Maxim Quad UART, SPI/I2c: MAX14830ETM+

Mouser in stock.

 

EXAR Dual Uart, SPI/I2C: XR20M1172IG28-F

Mouser in stock

 

I would recommend the SC16IS750/52/60/62 I2C/SPI uarts family from NXP. They also have 8 GPIOs and are quite cheap in fact.

https://www.nxp.com/docs/en/broc...

 

Of course you need two dual uarts for your project, but only one XMEGA. (8+2+2 = 12 uarts)

Trust me, dual cpu firmware developing is a nightmare frown

 

I have a project which needs 9 uarts. * uarts are plain RS232 full-duplex ports, baud range 9600-38400. One uart is half-suplex RS485 at 38400.

The thing is I also needed a SPI for interfacing a SPI ethernet controller.

At first I made the board with an XMEGA. The RS485 port was the software one, at least it was half-duplex. The problem with the XMEGA was that using all 8 UARTs "consumed" the SPI ports also, so I was forced to bit-banged the SPI, which took quite a good deal of the cpu processing bandwidth (this was for an spi ethernet controller doing tcp/ip !).

 

The final design is using a STM32F091 chip. 8 UARTs, RS485 still software, but hardware SPI and more bandwidth overall. I have done a "torture" test to see what bandwidth the board supports on all serials, concurently. I remember STM32 was able to deliver about 30-40% more data over the ports.

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sticker shock on some of those octal uarts!  ($30-40!)

 

 

using all 8 UARTs "consumed" the SPI ports also

definitely a thing to watch out for.

 

Don't forget that even with a deep-FIFO external UART, you still have to talk to it to extract data, mux and demux the bytes, and so on.  An XMega that has separate ISRs for each direction on each UART might simplify things a great deal.

 

 

Trust me, dual cpu firmware developing is a nightmare

 Perhaps you can make one of them into a relatively "dump" multiple-UART peripheral chip.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Octal UARTs are ancient. I wouldn't use them.

And LOL, I just forgot the unavailability of the SPI/I2C ports when using all UARTs, in the same post :)

So maybe a better system would be an XMEGA A1 with an SC16C554/654/754 quad uart over external 8bit parallel bus...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Fascinating project.

 

One really useful trick I've used often is to use a timer to detect the end of a packet. Say the end of the packet is detectable by a gap. No activity on the serial bus for some period of time. You can detect that by setting up a timer to hit a compare interrupt if not reset, and have it reset by an event triggered by the RX pin.

 

That way you can DMA the data into a buffer, and you get one interrupt at the end of the packet to indicate you need to process it. Massively reduces overhead.

 

 

I think this should be fine at 32MHz. 115200 baud is 8.6uS/character, so 8 ports gives you say 1uS/character if they are all going at the same time. 1uS at 32MHz is 32 cycles, which is more than enough for an efficient interrupt handler. It will probably be fine in C, but a little bit of assembler would be super efficient and you could use some tricks to save cycles such as having 256 byte buffers.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

32 cycles, which is more than enough for an efficient interrupt handler.

I don't think so.  My back-of envelope calcs say ~10 cycles for ISR entry and reti (not including any context save.)  Minimum 5 cycles to read/inc/write the buffer pointer, 4 cycles to read/write the uart data.  That'll use at least 3 registers and the PSR, so add 12 cycles for push/pop and 2 more for reading/writing PSR.  That's 10+5+4+12+2 = 33 cycles, without checking for buffer wraps, or UART status, or keeping a byte count, or other things that are typical for a UART ISR.

 

Fortunately, you miscalculated - 115200bps is 8.6us per BIT, but only 1/10th of that per character.  OTOH, you also forgot "full duplex."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mojo-chan wrote:

One really useful trick I've used often is to use a timer to detect the end of a packet. Say the end of the packet is detectable by a gap. No activity on the serial bus for some period of time.

 

Maybe useful in some situations, not here unfortunately. I often receive a couple of packets without interruptions. Of course most of the UARTs may work in this way, but I really don't want to handle them differently.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

32 cycles, which is more than enough for an efficient interrupt handler.

I don't think so.  My back-of envelope calcs say ~10 cycles for ISR entry and reti (not including any context save.)  Minimum 5 cycles to read/inc/write the buffer pointer, 4 cycles to read/write the uart data.  That'll use at least 3 registers and the PSR, so add 12 cycles for push/pop and 2 more for reading/writing PSR.  That's 10+5+4+12+2 = 33 cycles, without checking for buffer wraps, or UART status, or keeping a byte count, or other things that are typical for a UART ISR.

 

Fortunately, you miscalculated - 115200bps is 8.6us per BIT, but only 1/10th of that per character.  OTOH, you also forgot "full duplex."

 

Ah yes, I clearly had not drunk enough coffee before writing that this morning.

 

But 32 cycles is definitely enough.  Consider this:

 

	push	r18
	push	r30			; Z
	push	r31

	lds		r18, 0x08A0	; USARTC0_DATA
	in		r30, 0x00	; GPIO0
	in		r31, 0x01	; GPIO1
	st		Z+, r18
	out		0x00, r30	; GPIO0

	pop		r31
	pop		r30
	pop		r18
	reti

21 cycles, plus worst case 8 cycles to enter the interrupt (3 to finish executing current instruction, 2 to save PC, 3 for JMP) which gives us 29 cycles.

 

I only store the lower byte of the buffer pointer so that I have have a circular 256 byte buffer. New bytes are detected by the main loop simply caching the old pointer value and comparing. Could potentially have issues with buffer overflow.

 

But this is quite interesting, because it puts the theoretical limit at around 8 x 1Mbaud USARTs at 32MHz. Thus this application should be pretty straight forward, performance wise.

 

Edit: I wonder if you need to use r18, or if you can ld into r31...

Last Edited: Mon. Apr 16, 2018 - 10:20 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the external UART idea.

rammon wrote:
Trust me, dual cpu firmware developing is a nightmare frown
Doesn't have to be especially when there's a multi-processor bus connecting the CPUs (common in microcomputers and minicomputers)

OpenCores

Wishbone :: OpenCores

https://opencores.org/howto/wishbone

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gchapman wrote:

Thanks for the external UART idea.

rammon wrote:
Trust me, dual cpu firmware developing is a nightmare frown
Doesn't have to be especially when there's a multi-processor bus connecting the CPUs (common in microcomputers and minicomputers)

OpenCores

Wishbone :: OpenCores

https://opencores.org/howto/wishbone

 

I may said it a bit ambiguous: two distinct microcontrollers (same or not - doesn't matter) each of which having a firmware project, compilation, binary downloading (two dongles?), testing etc.

Not to say the testing when something doesn't work... Is it micro #1? Or micro #2?

The solution is of course a very clear separation of the firmware, ideally developing one of the firmware first, completely, and thorougly tested. Even then, thorougly tested, you are not 100% sure the version is "stable". There are great chances it will change in the developing stage of the firmware #2...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rammon wrote:
So maybe a better system would be an XMEGA A1 with an SC16C554/654/754 quad uart over external 8bit parallel bus...
plus some RAM on EBI as I doubt 8KB of internal RAM in XMEGA128A1U will be enough.

 

A competitor to Microchip XMEGA is Texas Instruments MSP430 some with 4 UART, 8 SPI, and 32KB RAM, (don't know the DMA channel count) 3 DMA channels.

Mention MSP430 due to FSF GCC, a 20b pointer type, and MSP Debug Stack (MSPDS)

https://gcc.gnu.org/gcc-6/changes.html

http://gnutoolchains.com/msp430/ (Windows)

https://visualgdb.com/tutorials/msp430/

 

Edits: MSP430, strikethru

 

"Dare to be naïve." - Buckminster Fuller

Last Edited: Tue. Apr 17, 2018 - 06:20 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rammon wrote:
The final design is ..
Conversely, some PIC32MZ EF have 6 UART, 6 SPI, and 15 DMA channels.

http://www.microchip.com/maps/Microcontroller.aspx

CPU = MIPS MCU
DMA = 12 ch

UART = 6 ch

 

arm Cortex-M7 SAM have 8 UART, 5 SPI, and 24 DMA channels.

 

Edit: SAM

 

"Dare to be naïve." - Buckminster Fuller

Last Edited: Tue. Apr 17, 2018 - 04:37 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gchapman wrote:

rammon wrote:
The final design is ..
Conversely, some PIC32MZ EF have 6 UART, 6 SPI, and 15 DMA channels.

http://www.microchip.com/maps/Microcontroller.aspx

CPU = MIPS MCU
DMA = 12 ch

UART = 6 ch

 

arm Cortex-M7 SAM have 8 UART, 5 SPI, and 24 DMA channels.

 

Edit: SAM

 

When I first designed the board there were no chips with 8 uarts, except the xmega, which were pretty new too.

STM32F091 appeared later. But even now, its price point is unbeatable. Also, you get those 8 UARTs in a 64pin package, not 100pin :)

STM32 family even has some variants with 10 UARTs lately, but they are big and expensive chips.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The total maximum combined baud rate is 14 x 115,200 = 1.6Mbaud, so a 2Mbaud SPI link would be able to move all the data received between any number of chips. So you could, in theory, use a total of 14 AVRs linked together in a circular SPI ring to make a virtual 14 USART chip.

#1 This forum helps those that help themselves

#2 All grounds are not created equal

#3 How have you proved that your chip is running at xxMHz?

#4 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand." - Heater's ex-boss

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A xmos-xs1 will piss in all the uarts in one chip.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'd suggest using SPI for inter-MCU communication too. It's fast, robust, easy to use with DMA... EBI will work but is more error prone, and harder to debug.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

xmos-xs1

Interesting device!

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"Read a lot.  Write a lot."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

fyi, Cypress Semiconductor is apparently leaving the dual-port SRAM competition.

Of three Cypress 3-volt dual-port SRAM, all three are NRND.

Mouser shows a lot of Cypress dual-port SRAM is EOL.

Not surprising as can implement dual-port SRAM with a cPLD and a jellybean SRAM.

Can implement a large crossbar switch for multi-processor with an FPGA.

 

http://www.cypress.com/products/mobl-dual-ports

https://www.mouser.com/Cypress-Semiconductor/Semiconductors/Memory-ICs/SRAM/_/N-4bzpt?P=1z0jmr8&Keyword=dual

Cypress Semiconductor

Cypress Semiconductor

AN70118 - Understanding Asynchronous Dual-Port RAMs

http://www.cypress.com/documentation/application-notes/an70118-understanding-asynchronous-dual-port-rams

(PDF, page 7, Semaphore)

 

"Dare to be naïve." - Buckminster Fuller