Cheesiest manchester-decoding function ever..

Go To Last Post
14 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've been using this to decode manchester-encoded nibbles. It's not elegant. But it's simple. And it rejects any illegal bytes (errors), by not changing the value.

Of course, a table lookup from a 256-byte array would be super fast, but too large.

I'm certain that assembly w/bit opts would be much more efficient. But it works well enough currently.

Short of squeezing just a few more cycles out, anyone have any thoughts about improving this in 'C'? (I should mention that the highest legal input value is only 11, so unless there's an garbled byte, it generally doesn't do 16 iterations.)

OK, it's not really a function, just code:

// the decoding data
uint8_t manch_arr[] = {85, 86, 89, 90, 101, 102, 105, 106, 149, 150, 153, 154, 165, 166, 169, 170 };

// ...   ...

// the code
for (i=0; i<16; i++)
{
	if (inchr == manch_arr[i])
	{
		val = i;
		break;
	}
}

(Edit)
Actually, 'break' isn't going to break out of the loop, is it?
(Reedit)
Checked a reference, and yes, 'break;' should work here.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You might be able to shave a cycle here or there, but I think your code is probably simple enough for most requirements. A little explanation of the code would help though.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

You might be able to shave a cycle here or there, but I think your code is probably simple enough for most requirements.

Simple & straightforward, certainly. If no further optimization is warranted--if the app is "fast enough"--then we are done.

But more than a few cycles can be shaved, and with a very straightforward approach.

More flash space is burnt, but it depends on the app where the tradeoff is. OP seemed reluctant to burn 256 bytes to get down to ~10 cycles or so.

I unrolled the loop using a switch(). The same results would be obtained (with my compiler) with a nested loop.

#include 

#define uint8_t unsigned char

// the decoding data
uint8_t manch_arr[] = {85, 86, 89, 90, 101, 102, 105, 106, 149, 150, 153, 154, 165, 166, 169, 170 };

// ...   ...



uint8_t origfunc(uint8_t inchr)
{
// ...   ...
uint8_t i;

// the code
for (i=0; i<16; i++)
        {
        if (inchr == manch_arr[i])
          {
          return (i);
          }
        }
return inchr;
}


flash uint8_t flash_arr[] = {85, 86, 89, 90, 101, 102, 105, 106, 149, 150, 153, 154, 165, 166, 169, 170 };
uint8_t flashfunc(uint8_t inchr)
{
// ...   ...
uint8_t i;

// the code
for (i=0; i<16; i++)
        {
        if (inchr == flash_arr[i])
          {
          return (i);
          }
        }
return inchr;
}

uint8_t switchfunc(uint8_t inchr)
{
	switch (inchr)
		{
		case 85:
			return 1;
		case 86:
			return 2;
		case 89:
			return 3;
		case 90:
			return 4;
		case 101:
			return 5;
		case 102:
			return 6;
		case 105:
			return 7;
		case 106:
			return 8;
		case 149:
			return 9;
		case 150:
			return 10;
		case 153:
			return 11;
		case 154:
			return 12;
		case 165:
			return 13;
		case 166:
			return 14;
		case 169:
			return 15;
		case 170:
			return 16;
		}
	return (inchr);
}

register unsigned char junk;

void main (void)
{
junk = origfunc (106);      // middle
junk = origfunc (170);      // end
junk = origfunc (1);      // not found

junk = flashfunc (106);      // middle
junk = flashfunc (170);      // end
junk = flashfunc (1);      // not found

junk = switchfunc (106);      // middle
junk = switchfunc (170);      // end
junk = switchfunc (1);      // not found

}

More deterministic numbers with fewere tests per "hit" could be had with a binary search, and probably faster on the average, but the tree would take more tests and be larger. So how much flash can be used, and/or how fast do you want it? :)

Numbers below for the flash words, cycles for 149, cycles for 170, and cycles for 1. Cycle counts include call/return overhead. AVRStudio simulator. "Promote char to int" false.

ORIG 22 145 273 281
FLASH 22 153 289 295 (same flash size, but 16 bytes less SRAM and a hair slower)
SWITCH 76 41 65 65

So I'd call a 75% savings more than "shaving a cycle here and there". And the switch() is quite straightforward, especially compared to the binary-type nested if().

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Code is for a rf trans/receiv pair in a hand-held controller & separate receiver; specifically for the receiver. The t/r pair are the common Liapac modules (4800 bps.) Modules interfaced with AVRs w/full USARTs (ATmegas.)

theusch wrote:
So how much flash can be used, and/or how fast do you want it? :)

That's enough, thanks :). It worked OK before, but that's a significant improvement.

As a rule I've avoided switch() statements when the input values are sequencial--just always assumed that a loop with one conditional would be optimized to a state nearly as fast as consecutive instructions.

How wrong that assumption appears....Since I'm not using Windows, AVRStudio simulator isn't available. Time to try 'simulavr.'

Thanks, all.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

just always assumed that a loop with one conditional would be optimized to a state nearly as fast as consecutive instructions.

Well, it probably is--except--you have the overhead of the loop variable (at least one cycle/iteration; if an SRAM location like 5 for an 8-bit counter) increment plus loop variable comparison for the end condition. In addition, your method fetches a value from a table (which involves building the subscript) and the fetch.

My compiler doesn't build jump tables for switches, but if you are going to do that you might as well burn the 256 bytes for the lookup table and be done with it. Each comparison is 4 words, after ensuring that the switch variable is in a register:

                 ;      70 		case 153:
                 _0x18:
0000a5 39e9      	CPI  R30,LOW(0x99)
0000a6 f411      	BRNE _0x19
                 ;      71 			return 11;
0000a7 e0eb      	LDI  R30,LOW(11)
0000a8 c015      	RJMP _0x20

so there is essentially no loop overhead. Not only is the loop unrolled, but the table values are folded into the comparison so no fetch is needed.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Wouldn't a lookup table stored in flash memory be practical? I know the OP said that a 256 byte table was undesired, but I wonder if that was in RAM?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dmonn wrote:
OK, it's not really a function, just code:

// the decoding data
uint8_t manch_arr[] = {85, 86, 89, 90, 101, 102, 105, 106, 149, 150, 153, 154, 165, 166, 169, 170 };

// ...   ...

// the code
for (i=0; i<16; i++)
{
	if (inchr == manch_arr[i])
	{
		val = i;
		break;
	}
}

Don't know about C, but does this really detect a (the first few) low edge(s), and then synchronize to the baudrate, and then sample the data at 3/4th of the bittime?

RES

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks, guys. I'll test size/speed for my compiler, for both the switch() and a lookup table in PROGMEM.

RES wrote:

Don't know about C, but does this really detect a (the first few) low edge(s), and then synchronize to the baudrate, and then sample the data at 3/4th of the bittime?

I didn't post the code for syncing. When using these modules w/a USART a series of 'legal' bytes can be set initially (for instance bits 01010101, five times) to sync the receiver. Following that, data should be sent continuously or a resync is needed.

Since it's a real-time controller the stream is constant, which is wasteful but appropriate in this case. Data sent is very simple and error-tolerant, so no checksums.

If you bit-banged everything it would be more complicated, but also probably more robust.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What's the use of sending manchester encoded data via the usart? The main reason for using manchester encoded data is clock recovery - this is negated when sent via the usart!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
What's the use of sending manchester encoded data via the usart? The main reason for using manchester encoded data is clock recovery - this is negated when sent via the usart!

The receiver's output is very noisy when nothing is being transmitted, and the encoding prevents long empty bitfields (like 00000000) from becoming garbled.

I didn't invent this approach. Search for TLP434A & RLP434A modules--encoding with a USART is gonna come up, either through software or hardware encoders...

Some folks combat this with a comparator tied to the linear output of the receiver which is reportedly more reliable (than the digital output.)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why not do it the normal way ?
This code will do it
The value is in R16

1=MOV r17,r16
2=ANDI r17,170 (10101010)
3=ANDI r16,85 (01010101)
4=ADD r16,r16
5=EOR r16,r17
6=

1 copy r16
2 mask high manchester bit's
3 mask low manchester bit's
4 mul by 2 (to allign high and low bit)
5 check if the two bit set hold a 1 and a 0
6 if r6== 10101010 (170)it's legal else not

Have fun
Jens

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
Why not do it the normal way ?
This code will do it

Well, it is the "Cheesiest manchester-decoding function ever." :)

But seriously, that algorithm looks doable in 'C', too (I've never completely rejected assembly for this, just was looking to improve in 'C' on a functional but admittedly nasty approach--if the size could be kept low.)

Last Edited: Sun. Dec 31, 2006 - 07:28 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I like a challenge. sparrow2's post got me thinking: how to calculate the result, so without using a lookup table.

I came up with this this piece of code:

uint8_t decode(uint8_t b)
{
    uint8_t b2;

    b2  = b << 1;
    b2 &= 0xAA;
    b  &= 0xAA;
    b2 ^= b;
    if (b2 != 0xAA)
    {
        return 0x10;
    }
    b += b & 0x2A;
    b += b & 0x14;
    b += b & 0x08;
    return b >> 4;
}

The trick is in using the & operation and the addition to shift the individual bits within the byte.
Compiled with GCC this code results in a function of 23 words.
Execution time (including call/return overhead) is fixed: 16 cycles for invalid input values, and 27 cycles for valid input values. How's that for "shaving a cycle here and there" ?

René

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

geraetsr wrote:
I like a challenge. sparrow2's post got me thinking: how to calculate the result, so without using a lookup table.
... ...
Execution time (including call/return overhead) is fixed: 16 cycles for invalid input values, and 27 cycles for valid input values. How's that for "shaving a cycle here and there" ?

René


Impressive. Didn't even get a chance to take a crack at it myself....

avrfreaks.net -- the ultimate resource for lazy programmers :wink: