## faster & better crc/hash than crc-8?

27 posts / 0 new
Author
Message

I'm wondering if anyone has written a fast 8-bit hash function that is as good or better than CRC-8 for short (6-8byte) messages.

Here's the crc-8 code I found:

```  byte tempI;
for (tempI = 8; tempI; tempI--) {
byte sum = (hash ^ data) & 0x01;
hash >>= 1;
if (sum) {
hash ^= 0x8C;
}
data >>= 1;
}```

I wrote a small test program to create 256 different 6-byte messages with random data using rand();

eor each byte had 96 collisions (and at 1 cycle per byte on an AVR is the fastest possible hash)

The best I could come up with after trying about a dozen different shift, add, subtract, eor combinations was the following, which could be done in 3 cycles on an AVR, and had 85 collisions:

```  char msb = data & 0x80;
hash <<= 1;     /* shift left */
hash -= data;
if (msb) hash++;
```

Any suggestions?

I have no special talents.  I am only passionately curious. - Albert Einstein

If you're using the hash as a means of error detection, the main interest is the likelihood of an error generating a valid hash. There are statistical means of determining this which are beyond my scope of knowledge.

In terms of optimization of code size and execution time, looking at just the code itself is a narrow view. You still have to read the data and manage a loop. For example: if you had to bit bash the data via spi, you could incorporate the hashing function within that loop thus amortizing the load and loop overhead. Also, with optimization - don't be penny wise but pound foolish.

CRCs are designed to detect burst errors - multi bit errors where the errors are in consecutive bits. If that's not the type of error that you expect, then you can do just as well as a CRC with a simple 8-bit or 16-bit one's complement checksum.

- S

Not a hash, but an 8-bit CRC can be run-time-quick if you use a precomputed 256-byte table, iterating through the data with

`crc = crc_8541_table[ crc ^ data ];`

I know that it exist but not where:
I have seen a C-program on one of the big lib. pages, that can make hash table in any size , with any polynomial.
I have worked a place (I don't have the code), that use a 4 bit lookup, I think that is the size you want.(256 is a overkill).

For CRC I'm not sure I'd bother reinventing the wheel. If using avr-gcc I'd simply use one of these:

http://www.nongnu.org/avr-libc/u...

The manual shows the algorithms in C but the actual implementations used are in hand crafted Asm.

And for this:

Quote:
fast 8-bit hash function that is as good or better than CRC-8 for short (6-8byte) messages.

There are written hole books about that. In short it boils down to :
The probability of an error.
The probability of an error, if last bit/byte/sector had an error.(bust error)
How safe does the hole thing need to be.
Should an error just get reported (please send again), or does it need to be able to recover.

If you have an (about) error free CD, and drill a 3mm hole (without crack anything), it will make a 100% recovery.

sparrow2 wrote:
I know that it exist but not where:
I have seen a C-program on one of the big lib. pages, that can make hash table in any size , with any polynomial.
I have worked a place (I don't have the code), that use a 4 bit lookup, I think that is the size you want.(256 is a overkill).

I'm trying to keep the code size down, so if I used a lookup table, 4-bit would be the way I'd go.

I have no special talents.  I am only passionately curious. - Albert Einstein

ralphd wrote:
I'm wondering if anyone has written a fast 8-bit hash function that is as good or better than CRC-8 for short (6-8byte) messages.

What I'm looking at using it for is for a smaller and improved version of the micronucleus USB bootloader. The current version doesn't check the CRC on the data packets. In addition to re-writing the guts of v-usb to reduce the code size, I wanted to add some basic error checking without doing a full crc-16 check. i.e. the v-usb code would continue to ignore the crc, but I'd include a check byte within the 8-bytes of data in each frame.

The timing is very tight - no more than 5 us after a packet is received to send an ACK if it was OK or a NAK if it was not. That's nowhere near enough time to do a CRC-8 on 8 bytes of data, let alone CRC-16.

I have no special talents.  I am only passionately curious. - Albert Einstein

Quote:
eor each byte had 96 collisions (and at 1 cycle per byte on an AVR is the fastest possible hash)

A simple checksum, or a "one's complement" checksum should be about as fast as the xor.

I don't think most CRCs are designed for packets as short as 8 bytes. You might want to do some research on short-packet error checks (from ATM, perhaps?)

Quote:
The timing is very tight - no more than 5 us after a packet is received to send an ACK

Is there any time between bytes (or bits) during reception? Does the VUSB code have any side-effects that can be used to compute a check-byte? (presumably it's shifting things around to make bytes out of bits; that's CRC-like right there. Or: count the ones or zeros...)

westfw wrote:
Quote:
eor each byte had 96 collisions (and at 1 cycle per byte on an AVR is the fastest possible hash)

A simple checksum, or a "one's complement" checksum should be about as fast as the xor.

They are just as fast (single add or sub instruction), but are no better.
westfw wrote:

I don't think most CRCs are designed for packets as short as 8 bytes. You might want to do some research on short-packet error checks (from ATM, perhaps?)

Looked it up, and the ATM header uses CRC-8.
westfw wrote:

Quote:
The timing is very tight - no more than 5 us after a packet is received to send an ACK

Is there any time between bytes (or bits) during reception? Does the VUSB code have any side-effects that can be used to compute a check-byte? (presumably it's shifting things around to make bytes out of bits; that's CRC-like right there. Or: count the ones or zeros...)

Running on an 8Mhz AVR, you have 5 1/3 cycles/bit. You need at least 3 cycles to read and store each bit (in & 2 shifts). There's 2 more cycles to store each byte (st or push), plus 2 for the loop branch. I've used a few more instructions for timing (clock skew) adjustment. So in the current prototype code, I have no "free" instructions.

Your suggestion is a good one, because in a previous version of the code I had a couple nops which could have been used for checksum. If a later iteration of the code has an unused cycle, then I'll keep that in mind.

I have no special talents.  I am only passionately curious. - Albert Einstein

Ah. Does it have to run at 8MHz? I thought the VUSB firmware was running 12 to 20MHz (or are you trying to add the ability to use the 8MHz internal oscillator on some chips?)

Quote:
The timing is very tight - no more than 5 us after a packet is received to send an ACK if it was OK or a NAK if it was not. That's nowhere near enough time to do a CRC-8 on 8 bytes of data, let alone CRC-16.

But for this you should run CRC in the RX routine for each byte, not on the finished buffer.

And if the CRC is the last byte, then you can init the routine so the compare of last byte is a simple compare (CRC == lastbyte).

And if your RX is a SW version why not build it into the RX bit routine! (you sit in a dummy loop anyway)

sparrow2 wrote:
And if your RX is a SW version why not build it into the RX bit routine! (you sit in a dummy loop anyway)

That's what Bill was suggesting, and as I said:

Quote:

... the current prototype code, I have no "free" instructions.
[/code]

I have no special talents.  I am only passionately curious. - Albert Einstein

westfw wrote:
Ah. Does it have to run at 8MHz? I thought the VUSB firmware was running 12 to 20MHz (or are you trying to add the ability to use the 8MHz internal oscillator on some chips?)

One of the chips I'm trying to support is the ATtiny88. They don't support an external crystal, so the internal RC oscillator is how I run them. The footprint of them is compatible with the ATMega8 (with a couple extra i/o pins instead of duplicate vcc/gnd).

When I get the 8Mhz code working, I'll build USBAsp with it. With the ATtiny88-MU way cheaper than the ATmega8A-AU that is usually used on Chinese USBAsps and no need for a crystal, it could drop the cost of a USBAsp in small quantities from ~\$2.50 to close to \$1.50.

Getting it to work at 8Mhz not only helps by significantly expanding the chips it can run on, but has the side-effect of smaller code (I can only do 5 1/3 instructions/bit at 8Mhz vs 8 instructions/bit at 12).

I have no special talents.  I am only passionately curious. - Albert Einstein

How are you going to calibrate that 8MHz+/-10% ?

1.5MHz mode is +-1.5% tolerant, so not a big problem, but yes some kind of calibration is needed.

v-usb has osccal tuning based on the USB 1ms frame pulses, which will bring the device timing within +-1% of the host (or even +-0.5% for some AVRs). Tim recently re-wrote it assembly:
https://github.com/micronucleus/...

To read a full frame you need to read at least 72 bits, and more when there are stuffed bits. Timing shift of more than half a bit would lead to errors, so even +-0.5% timing on the receive side leaves only a small margin for error.

On the transmit side, timing is more relaxed, since the host has to align it's sampling intervals with the transmitted sync byte starting the USB frame.

I have no special talents.  I am only passionately curious. - Albert Einstein

Does it always work?
It's out of spec. for many of the AVR's (if not all),
The datasheet say changes need to be smaller than 2%.
And the first steps will be bigger than at least 10%.
(and the starting point is 128 not the default value for the chip).

sparrow2 wrote:
The datasheet say changes need to be smaller than 2%.
I've wondered about that. The datasheet only warns in the case of an external clock:
Quote:
When applying an external clock, it is required to avoid sudden changes in the applied clock frequency to ensure stable operation of the MCU. A variation in frequency of more than 2% from one clock cycle to the next can lead to unpredictable behavior. If changes of more than 2% is required, ensure that the MCU is kept in Reset during the changes.
There's no mention of a restriction when changing OSCCAL, or CLKPR for that matter.

Many devices have a 'range' bit in the OSCCAL register, allowing for 2 overlapping ranges of 128 values. Changing ranges runs the risk of making a more-than-2-% change in the clock, and no mathematical relationship between the ranges is published.

I expect that changes to OSCCAL and CLKPR are inherently synchronous and are managed by internal timing logic. Changes to an external clock are not, or cannot be, thus the restriction (although there are devices which allow changing between clock sources under software control).

 "Experience is what enables you to recognise a mistake the second time you make it." "Good judgement comes from experience.  Experience comes from bad judgement." "Wisdom is always wont to arrive late, and to be a little approximate on first possession." "When you hear hoofbeats, think horses, not unicorns." "Fast.  Cheap.  Good.  Pick two." "We see a lot of arses on handlebars around here." - [J Ekdahl]

note taken thank's for the info.
And about those with two ranges, you will need to decide to start at 64 or 192 anyway (and need a range check!!).

I would still use the org. programmed value as the base.(it's all ready using it :) )

sparrow2 wrote:
you will need to decide to start at 64 or 192 anyway (and need a range check!!).
Of course.

However it's likely that the overlapping ranges will interleave and that few if any values in the upper range will exactly match any values in the lower range, so a search of both ranges is best.

Attached are graphed results of a single test with a single ATtiny85, so by no means are these authoritative results. Test conditions weren't especially well controlled, nor was lab-quality measurement equipment used.

However a few things are evident.
- There is an overall linear relationship between OSCCAL and the oscillator period.
- The top of the lower range interleaves with the bottom of the upper range.
- There is a fair amount of 'jitter' across both ranges, i.e. the weight of each of the 7 bits of each range do not have a precisely binary weight. This is expected due to process variations.

Just in this one test we see that there is very nearly an inversion that takes place across the boundary defined by bit 3, i.e. its weight is less than it should be. Bit 2 has the same problem to a lesser degree. I expect some devices would actually exhibit an inversion on one or more bits. Again this is due to process variation.

Given the possibility of such an inversion, a binary search for the best OSCCAL value wouldn't be sufficient. Also, given the interleaved overlap of the two ranges, both should be tested. In effect a full test of all OSCCAL values is required to ensure optimal calibration.

If optimal calibration isn't required, a single range can be tested and/or a binary search algorithm can be used.

An intermediate approach might be to perform a partial binary search for the most significant bits in a given range (say the top 3 bits), while the sub-range defined by the remaining bits can be fully tested. While such a combined approach would be faster than a full test of all OSCCAL values, it would require more code than either approach alone, and gives no guarantee of optimal calibration in the event of an inversion in one of the most significant bits handled by the binary search.

EDIT: I took another look at the raw data, and there actually are 2 inversions:
- between OSCCAL==23 and OSCCAL==24
- between OSCCAL==151 and OSCCAL==152 (visible in the last graph)

Quote:
I would still use the org. programmed value as the base.
A straight linear search in both directions around that would likely be sufficient, but there's no way to know where you should start testing in the other range.

JJ

## Attachment(s):

 "Experience is what enables you to recognise a mistake the second time you make it." "Good judgement comes from experience.  Experience comes from bad judgement." "Wisdom is always wont to arrive late, and to be a little approximate on first possession." "When you hear hoofbeats, think horses, not unicorns." "Fast.  Cheap.  Good.  Pick two." "We see a lot of arses on handlebars around here." - [J Ekdahl]

sparrow2 wrote:
note taken thank's for the info.
And about those with two ranges, you will need to decide to start at 64 or 192 anyway (and need a range check!!).

I would still use the org. programmed value as the base.(it's all ready using it :) )

You could send your code improvements to Tim (cpldcpu at gmail).

I have no special talents.  I am only passionately curious. - Albert Einstein

Just for giggles, I ran a test whereby I saved the default value of OSCCAL and then continuously updated OSCCAL to a random value as fast as possible in a tight loop (using a 32-bit LFSR as a PRNG). Every second I restored the default value and pushed out a status line via serial debug, then resumed the random loop.

The status line reported the number of changes to OSCCAL thus far, and the number of seconds since the start of the test.

I was getting about 55,000 changes to OSCCAL per second. At that rate I cycled the 32-bit LFSR in a little under a day (4 billion + changes).

While not a definitive test (i.e. I didn't fully test all chip/peripheral functions after each change to OSCCAL) I would have expected a lockup or restart at some point if changing OSCCAL to an arbitrary value posed any risk of breaking the internal logic sequencing.

 "Experience is what enables you to recognise a mistake the second time you make it." "Good judgement comes from experience.  Experience comes from bad judgement." "Wisdom is always wont to arrive late, and to be a little approximate on first possession." "When you hear hoofbeats, think horses, not unicorns." "Fast.  Cheap.  Good.  Pick two." "We see a lot of arses on handlebars around here." - [J Ekdahl]