Best practice: buffer implementation

Go To Last Post
21 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

 

what kind of buffer implementation do you recommend for embedded systems, if you face user inputs or received data, of which you do not know the exact length? The class template std::vector in C++ is not available, since AVR does not support the Standard Template libraries. Declaring an oversized array isn't efficient regarding the use of memory. Or do you think, it is a bad software design in general, if data length is variable?

 

The root problem is a SPI communication, I want to design as generic as possible.

 

Kind regards

 

Jimmy

This topic has a solution.
Last Edited: Fri. Dec 18, 2020 - 11:10 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 I want to design as generic as possible.

You have entered the world of big mistakes.  Take your foot out before it is too late.  you will achieve  only chaos & inefficiency. 

 

If you mean only variable length, then that is fine.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Fri. Dec 18, 2020 - 07:06 PM
This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Google "Andy Brown STL" .

But really, don't design embedded systems with indeterminate data sizes! 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Set a requirement for the other end to transfer a maximum of N bytes in one transfer. If more comes, throw away the excess and scold the sender. 

 

You simply do not have the resources for indeterminate-size buffer. 

 

Jim

 

Until Black Lives Matter, we do not have "All Lives Matter"!

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You could possibly keep an eye on the free heap and increase buffer sizes at runtime, but you're gonna hit a limit sooner or later. Either define a fixed buffer and drop new data once you've filled it, or a ring buffer and overwrite the oldest items, whichever is less important. 

 

Or choose a chip with lots of memory, or a faster chip so that you can process the data faster than it arrives. But there is no chip with infinite memory or infinite speed.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, clear answers. I avoid variable-sized arrays, spend less time for generic applications and implement some kind of feedback, if data transfer has not yet been completed. Thank you!

Last Edited: Fri. Dec 18, 2020 - 11:11 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I looked at a few open-source ring buffer implementations for something that is fast and efficient on 8-bit systems like AVR, and everything I could find was either slow, bloated, or both.  So I wrote my own.  If there's any way of improving either code size or speed, I welcome any suggestions.

 

https://github.com/nerdralph/Rin...

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is http://www.fourwalledcubicle.com... slow or bloated? I've seen many people say it's a good implementation.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Dean's is implementation very tight, at least at the C level. I've used it and I would consider it neither slow nor bloated.

 

Jim

 

Until Black Lives Matter, we do not have "All Lives Matter"!

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

>slow or bloated? I've seen many people say it's a good implementation.

 

Nothing wrong with that- 'slow/bloated' has a different meaning to ralph. What I would change in that version (and in ralph's) is to make the add/remove functions return true or false so the user side doesn't have to create the buffer checking code for full/empty before using. That means the remove version will take in a pointer to use for the data return, and it also means you still check for add/remove success but now you just check the return for true/false, and you will not be allowed to add/remove buffer data that overflows/underflows.

 

 

In C++, you can make a generic buffer in a specific size (generic until you create one)-

https://godbolt.org/z/c8zoY4

probably not the most efficient since these are static (each buffer is its own self-contained world) and allow size_t sizes (mostly unnecessary), but its easy to create and use and you are dealing with fixed sizes.

 

I would be worried less about size, and more about whether its correct, whether it is easy to use, whether atomic issues have been all thought out, etc. Sacrificing size for any of the desired properties is a good trade and no one will notice unless you are into byte counting.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralph,

 

Since you solicited feedback, I'll nitpick.

 

First, on terminology, I think you are using head & tail backwards. I normally think of the check-out line at a store. Your usage of put in the head and get from the tail evokes scatological jokes.

 

Second, I haven't seen an implementation before that allows the indexes to range beyond the size of the buffer. I get that it allows an inclusive size of 0 to RINGBUFSIZE, but also results in a penalty of a mask operation when the buffer array is accessed. I've always been fine with my buffers being 'full' at RINGBUFSIZE-1.

 

Third, I see the appeal of (head==tail) as a test for fullness since it's quick, and RingPut is a likely candidate for an ISR. But I'd argue in favor of skipping the test. A well designed system should be able to consume data faster than it fills the buffer. I usually want the freshest data, so I allow my writes to wrap and automagically flush the buffer.

 

Finally, If you are going to discard data in RingPut when full, by all means don't make it void! Return an error code!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Jimmy11040 wrote:
I want to design as generic as possible

balisong42 wrote:
I usually want the freshest data

"What to do when the buffer is full" is a classic examples of a fundamental question which makes it hard to have a "generic" solution!

 

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

balisong42 wrote:
I think you are using head & tail backwards.

Indeed. Perhaps Ralph doesn't like waiting in line at the Bank or Post Office and is one of those characters that always tries to "push in" by joining a queue at it's head instead of at the tail.

 

 

BTW: No offence meant Ralph. I'm sure you do follow correct queuing procedure at the Post Office.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:

I looked at a few open-source ring buffer implementations for something that is fast and efficient on 8-bit systems like AVR, and everything I could find was either slow, bloated, or both.  So I wrote my own.  If there's any way of improving either code size or speed, I welcome any suggestions.

Since you are using the approach of let the indices run over a higher range that is some multiple of the BUfSIZE (2 * BUFSIZE),  and mask off when acessing the buf, it might be worth letting the indices wrap at the natural uint8_t size, which is also a multiple of the BUFSIZE assuming power of 2 BUFSIZE (ie. uint8_t is modulo 256)

Then the index increment just becomes head++ or tail++.

Of course you still need to mask when accessing the buf.

The RingCount I think then becomes just uint8_t count = head - tail;

You would have to have slightly different check for buffer full on the RingPut

eg. if (RingCount() == BUFSIZE)

Also head and tail could both start out at 0. (EDIT not could, but would)

 

Whether this would reduce overall code, I haven't tried.

Last Edited: Sat. Dec 19, 2020 - 10:56 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

christop wrote:

Is http://www.fourwalledcubicle.com... slow or bloated? I've seen many people say it's a good implementation.

 

 

It's slow when compared to mine.  It's not big enough that I'd call it bloated, though it didn't appear that a lot of effort was put into size optimization.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Next up, let's compare sizes and speeds to a native assembler ring buffer... devil  S.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

balisong42 wrote:

First, on terminology, I think you are using head & tail backwards. I normally think of the check-out line at a store. Your usage of put in the head and get from the tail evokes scatological jokes.

We think differently.  I think of animals like a snake or worm.  The first thing in the head is the first thing out the tail.

 

balisong42 wrote:

Second, I haven't seen an implementation before that allows the indexes to range beyond the size of the buffer. I get that it allows an inclusive size of 0 to RINGBUFSIZE, but also results in a penalty of a mask operation when the buffer array is accessed. I've always been fine with my buffers being 'full' at RINGBUFSIZE-1.

((head - tail) % BUFSIZE) == 1 as a full condition takes at least one more operation than the method I used, so it was a trade-off.  But combined with your third idea, it may be better.

 

balisong42 wrote:

Third, I see the appeal of (head==tail) as a test for fullness since it's quick, and RingPut is a likely candidate for an ISR. But I'd argue in favor of skipping the test. A well designed system should be able to consume data faster than it fills the buffer. I usually want the freshest data, so I allow my writes to wrap and automagically flush the buffer.

 

This idea I like.

 

One of the reasons I wrote RingBuffer was to improve the CH55x USB CDC example from WCH.  There's a bug in their ringbuffer code that causes intermittent dropped characters during bursts at 1 & 2mbps.  I'm pretty sure the same bug is in the CH340 and CH330 USB-TTL UART chips.  In addition to fixing the bug, my other goal is improving the max throughput, as it can't sustain continuous 2mbps throughput.  I've also had requests for a receive FIFO for use with my AVR bit-bang UART libraries, and want to support the highest speeds possible.

 

By dropping the check for full in RingPut (and therefore writing over old data when full), the code will be faster and therefore be able to sustain a slightly higher throughput.

 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MrKendo wrote:

ralphd wrote:

I looked at a few open-source ring buffer implementations for something that is fast and efficient on 8-bit systems like AVR, and everything I could find was either slow, bloated, or both.  So I wrote my own.  If there's any way of improving either code size or speed, I welcome any suggestions.

Since you are using the approach of let the indices run over a higher range that is some multiple of the BUfSIZE (2 * BUFSIZE),  and mask off when acessing the buf, it might be worth letting the indices wrap at the natural uint8_t size, which is also a multiple of the BUFSIZE assuming power of 2 BUFSIZE (ie. uint8_t is modulo 256)

Then the index increment just becomes head++ or tail++.

Of course you still need to mask when accessing the buf.

The RingCount I think then becomes just uint8_t count = head - tail;

You would have to have slightly different check for buffer full on the RingPut

eg. if (RingCount() == BUFSIZE)

Also head and tail could both start out at 0. (EDIT not could, but would)

 

Whether this would reduce overall code, I haven't tried.

 

That's an intriguing idea.  Thanks.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Scroungre wrote:
Next up, let's compare sizes and speeds to a native assembler ring buffer...

I always thought that would be fairly straightforward:

 

-- locate on a mod 256 boundary

-- size is a power of 2 up to 256.  [that gives a wide variety of smallish useful sizes; not a restriction IMO]

 

With the above, the "page address" in the high H index register stays the same.  Increment of the low byte needs no check, rather a mask operation.  There are the usual choices of overflow handling and content count.

 

If one is really cycle-counting for e.g. a high-speed burst, then assume the burst ends properly, wrap the buffer, and keep the needed items in dedicated registers for the duration of the burst.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In embedded systems, there are two approaches and basic problems to the buffer issue.   The first is how to arrange it in the hardware, and the second is how to code it in a high level language.

 

To do a buffer, you need memory.  If the anticipated memory is greater that what is available with the current hardware, then you need to add more hardware. 

 

There's two kinds of memory hardware: ICs that store data when the program is running, and ICs that keep data when the power is removed from the embedded system.   The program-running memory chip is often the 23K256 8-pin serial SRAM 32768 bytes with SPI interface.  It costs about a dollar and in SOIC (0.05" pins) size. 

 

What is the host CPU of your embedded system?  Is it an 30-cent ATtiny13 or ATtiny10 with 1K flash and only a handful of memory locations?   Is it a mid-range AVR like the ATmega328P?  Is it a 32-bit Cortex M3 like the STM32 Blue Pill module with 20K RAM?

 

The next issue is where is this memory going to be located in the CPU's address range?  And how do you put it there in C and C++ high level languages?   That's my weakest point, as I learned C++/C through Arduino and they simply use standardize SPI libaries to do the lowest-level interfacing.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Re #14

This is the kind of thing I had in mind, not well tested but I think this woks.

Compared to the traditional 'indices are stored modulo buffer size so have to leave one place in the buffer empty otherwise you wouldn't be able to distinguish buffer empty and buffer full', it means you can use the entire buffer, but other than that it's debatable whether it offers any benefit.

/* BUFSIZE must be power of 2 */
#define BUFSIZE 16
static uint8_t head;
static uint8_t tail;
static uint8_t buf[BUFSIZE];

static uint8_t buf_count(void)
{
    /* Assumes only put when it isn't full and only get when it
     * isn't empty, so head can only ever be at most BUFSIZE
     * 'in advance' of tail. */
    return head - tail;
}

static bool buf_put(uint8_t val)
{
    /* check if full */
    if (buf_count() == BUFSIZE)
    {
        return false;
    }
    buf[head & (BUFSIZE - 1)] = val;
    head++;
    return true;
}

static bool buf_get(uint8_t *val)
{
    /* check if empty */
    if (buf_count() == 0)
    {
        *val = 0; // optional, could ignore
        return false;
    }
    *val = buf[tail & (BUFSIZE - 1)];
    tail++;
    return true;
}

int main(void)
{
    code to test it ...