eXtreme compact CRC-calculator procedure

Go To Last Post
12 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

a) CRC16, used recently in USB transactions

;*********************************************************
;	eXtreme compact CRC16 procedure (c) Gleb Daniloff
;	08/01/2007 harley@hotbox.ru
;	registry usage: Z,r0,polylo,polyhi,crclo,crchi,bitcnt
;	input:	Z - pointer to string, buflen - buffer length
;	output:	crclo,crchi - CRC16
;*********************************************************

crc_go:
		ldi		polyhi,0xA0		; polynom inilialization - once at all
		ldi		polylo,1
		clr		crchi			; crc initialization - once per call
		clr		crclo			
crc16:
		lpm			
		adiw		ZL,1
		ldi		bitcnt,8
crc01:
		eor		crclo,r0
		bst		crclo,0
		eor		crclo,r0
		lsr		r0
		lsr		crchi
		ror		crclo
		brtc		crc02
		eor		crchi,polyhi
		eor		crclo,polylo
crc02:
		dec		bitcnt
		brne		crc01
		dec		buflen
		brne		crc16
		ret

b) CRC8, used in Dallas iButton products:

;*********************************************************
;	eXtreme compact CRC8 procedure (c) Gleb Daniloff
;	08/01/2007 harley@hotbox.ru
;	registry usage: Z,r0,poly8,crc8,bitcnt
;	input:	Z - pointer to string, buflen - buffer length
;	output:	crc8 - CRC8
;*********************************************************

crc_go:
		ldi		poly8,0x8C		; polynom inilialization - once at all
		clr		crc8			; crc initialization - once per call
crc8:
		lpm			
		adiw		ZL,1
		ldi		bitcnt,8
crc01:
		eor		crc8,r0
		bst		crc8,0
		eor		crc8,r0
		lsr		r0
		lsr		crc8
		brtc		crc02
		eor		crc8,poly
crc02:
		dec		bitcnt
		brne		crc01
		dec		buflen
		brne		crc8
		ret
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you looked at the inline assembler, avr-libc CRC functions in ?

The crc_ccitt_update(uint16_t crc, uint8_t data) function, for example, generates code that's almost as short but much faster.

    crc = crc_ccitt_update(crc, data_byte)
 25c:   48 27           eor     r20, r24
 25e:   04 2e           mov     r0, r20
 260:   42 95           swap    r20
 262:   40 7f           andi    r20, 0xF0       ; 240
 264:   40 25           eor     r20, r0
 266:   05 2e           mov     r0, r21
 268:   54 2f           mov     r21, r20
 26a:   42 95           swap    r20
 26c:   4f 70           andi    r20, 0x0F       ; 15
 26e:   04 26           eor     r0, r20
 270:   46 95           lsr     r20
 272:   54 27           eor     r21, r20
 274:   45 27           eor     r20, r21
 276:   44 0f           add     r20, r20
 278:   44 0f           add     r20, r20
 27a:   44 0f           add     r20, r20
 27c:   40 25           eor     r20, r0

- John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Comparission is not correct: your example contain only CRC-generation for 1 byte, initialization and definitions not included. My code for this case looks like:

      ldi      bitcnt,8
crc01:
      eor      crclo,r0
      bst      crclo,0
      eor      crclo,r0
      lsr      r0
      lsr      crchi
      ror      crclo
      brtc      crc02
      eor      crchi,polyhi
      eor      crclo,polylo
crc02:
      dec      bitcnt
      brne      crc01 

=12 words only!
But your code is faster, its right!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I count 14 words to process a single byte (you still need to load the CRC polynomial) vs. 17 words using the avr-libc library. Most days that is almost as short to me. More importantly, I didn't have to write such excellent code: I just included the library.

You realize that you can move the xor-ing of the data byte with the CRC out of the 8-bit loop? I am not sure how that affects the code size.

- John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There is another compact code:

crc_go2:
		ldi	polyhi,0xA0
		ldi	polylo,1
		clr	crchi
		clr	crclo

loc1:	lpm
		adiw	ZL,1

		ldi	bitcnt,8
		eor	crclo,r0
loc2:	lsr	crchi
		ror	crclo
		brcc	loc3
		eor	crchi,polyhi
		eor	crclo,polylo
loc3:	dec	bitcnt
		brne	loc2

		dec	buflen
		brne	loc1
		ret

CRC16 for byte array - 18 words only!
for 1 byte - 11 words (include poly loading)...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Looking good.

You can also translate this to "C" with an inline "brcc .+4" around the double "eor". If you don't like that hack and can tolerate an extra instruction, you can copy the pre-rotated, remainder low byte and later bit test it. I used both approaches before some kind soul wrote and included the inline avr-libc crc functions.

ADDENDUM -- I found some old

#define crc_update(crc, n) \
   do { \
     crc.low_byte ^= n; \
      for (n = 8; n > 0; n--) { \
       crc.word >>= 1; \
       asm("brcc .+4"); \
       crc.word ^= 0x8408; \
     } \
   } while (0)
#endif

- John

Last Edited: Thu. Jan 18, 2007 - 10:20 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What it for the compiler which allows to do inserts on the assembler? And why such polynom 0x8408 is used?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The avr-gcc compiler allows you to insert assembler statements inside C-code with the statement "asm volatile (text)". This allows you to quickly write most of your code in C and then add bits of assembler where needed to save time or space.

0x8408 is the generator polynomial for the CRC16-CCITT checksum in the section "V.41 - Code Independent Error-control System" from the Blue Book, Vol. VIII.1, ITU, Geneva, 1964. I use it to check 4kbytes of program flash.

- John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
why such polynom 0x8408 is used?

(edit: dyslexia corrected) 0x8408 is the CCITT polynomial X16 + X12 + X5 + 1 (0x11021) bit-reversed, since CCITT is a least-sig-bit-first algorithm. The X16 term, as usual, has fallen off the end and isn't included.

These routines might be compact but they're awfully s-l-o-w, over 70 cycles a byte. Let's see you optimize that compact routine down to, say, 40 cycles.

Last Edited: Mon. Jan 22, 2007 - 04:48 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

for a comparission, very large code from app.note AVR318:

unsigned int OWI_ComputeCRC16(unsigned char inData, unsigned int seed)
{
    unsigned char bitsLeft;
    unsigned char temp;

    for (bitsLeft = 8; bitsLeft > 0; bitsLeft--)
    {
        temp = ((seed ^ inData) & 0x01);
        if (temp == 0)
        {
            seed >>= 1;
        }
        else
        {
            seed ^= 0x4002;
            seed >>= 1;
            seed |= 0x8000;
        }
        inData >>= 1;
    }
    return seed;    
}
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

for a comparission, very large code from app.note AVR318

Well, OK--

For comparison, how large is "very large"?

It appears that this C code might use 3 registers plus maybe a working pair. Your "extreme compact" example uses 7/8, so they would need to be saved/restored in the general case, rendering you solution not quite as compact.

Is your intent the "fastest" routine, or the "smallest" routine? If you are looking for the smallest, does a few words one way or the other in a routine matter to the size of the whole application?

If you are looking for the fastest, then a faster example has been shown.

At the expense of a couple more working registers or register scratch pad variables, the C example given can be much more "compact". As always there is a tradeoff on resources used and size/speed.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I submit a shorter and possibly quicker C routine. It gains some speed by taking the incoming data a nibble (4 bits) at a time.

unsigned calc_crc(unsigned char *data, unsigned n, unsigned start) {
    unsigned i, k, q, c, crcval;
    crcval=start;
    for (i=0; i>4)^(q*0x1081);
        q=(crcval^(c>>4)) & 0x0F;
        crcval=(crcval>>4)^(q*0x1081);
    }
    return crcval;
}

That (q*0x1081) could be written (q<<12 + q<<7 + 1). Possibly the shifts could be faster than the multiplications, though it depends on the compiler and CPU. In any case, the compiler output should be inspected and optimized for best results.

"Where did that 0x1081 come from?" I hear you ask. Well, it's the same CCITT constant, 0x11021, bit reversed and shifted four places right.