Bitbang bitshift optimization

Go To Last Post
27 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In order to read a magnetic encoder that has its own 18-bit message protocol (not SPI, I2C, etc...), I am using a bitshift as show below.

My problem is that the below code is very inefficient when compiled, even with -O3. The bit =... is very quick (two operations), but the assignment is very slow (30+ operations). Any suggestions on how to improve this? It doesn't seem to me that addressing individual bits should be that slow, but perhaps this is what happens with messages that are so large they have to be stored in long ints.

long int data=0;

for (int i=17; i>=0; i--) 
{ 
	//Read pin. Note that this gives a 1 or 0 at the pin bit, not at the first bit.
	long int bit;
	bit = PORTA.IN & (1<<PIN);

	//Here we subtract the PIN.
	data |= (bit << (i-PIN)); 
}

int reading = data >> 6;
char status = data & 0b00111111;

I've thought about bitfields, but as I understand it, there's no predicting what order the bitfields will be stored in memory, so it would be dangerous to simply assign individual bits and then cast the entire memory location into a proper variable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is not only inefficient, I don't even believe that it will work at all. If PIN is not zero then (i-PIN) becomes negative on the lower bits. I have no idea what the standard says about shifting a negative amount of bits, but I doubt that it will reverse the shift direction. So if you want to read 18 bits into a long you can do it like this:

for (uint8_t i = 0; i < 18; i++) {
    data |= PORTA.IN & (1<<PIN) ? 1 : 0;
    data <<= 1;
}

BTW: I don't see any synchronization with the sender, so how should that work?

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Any suggestions on how to improve this? It doesn't seem to me that addressing individual bits should be that slow, but perhaps this is what happens with messages that are so large they have to be stored in long ints.

As this is an encoder, I'll assume time is of the essence. Thus I'd be tempted to unroll the loop into convenient chunks, probably bytes. (Then put them together, or use a union.)

You appear to just be looking at one pin--there is no clock? Strange...

No, you don't want to do a 32-bit shift every repetition. If anything shift a mask.

There is a fairly recent thread on this sort of thing. I claimed conditional would be fastest--but as is often the case the 'Freaks came up with something better. Lessee if I can search it out...

Hmmm--it was pertinent, but more simulating a shift register on output...
https://www.avrfreaks.net/index.p...
https://www.avrfreaks.net/index.p...

From what you described so far, I'd unroll the whole thing into 100 instructions or so, loading up 3 register variables, and then put them together at the end.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Mon. Jun 14, 2010 - 08:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You may try something like this:

typedef union
{
    uint32_t dword;
    uint16_t word[2];
    uint8_t byte[4];
}
dword_t;

dword_t data;

data.byte[3] = 0;
data.byte[2] = read_bits (2);
data.byte[1] = read_bits (8);
data.byte[0] = read_bits (8);



// bit_count Must Be In The Range Of 1 To 8
uint8_t read_bits (uint8_t bit_count)
{
    uint8_t byte = 0;
    
    do
    {
        byte *= 2;
        
        if (bit_is_set (PINA, PA3))
        {
            byte |= 1;
        }
    }
    while (--bit_count > 0);
    
    return byte;
}

Regards
Sebastian

EDIT: Corrected the stupid mistake of reading PORTA instead of PINA

Last Edited: Tue. Jun 15, 2010 - 06:17 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sternst wrote:
It is not only inefficient, I don't even believe that it will work at all. If PIN is not zero then (i-PIN) becomes negative on the lower bits. I have no idea what the standard says about shifting a negative amount of bits, but I doubt that it will reverse the shift direction. So if you want to read 18 bits into a long you can do it like this:

for (uint8_t i = 0; i < 18; i++) {
    data |= PORTA.IN & (1<<PIN) ? 1 : 0;
    data <<= 1;
}

BTW: I don't see any synchronization with the sender, so how should that work?

There's no problem, it works just fine. The real code has all the necessary timing parts, etc. It's inefficient is all and I'm looking for a way to smooth things out.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kubark42 wrote:
There's no problem, it works just fine.
Most likely only because you throw away the lower bits (int reading = data >> 6).

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

how should that work?

Quote:

it works just fine.

??? I don't think so, Tim.

Quote:

//Read pin.

Quote:

PORTA.IN &

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sternst wrote:
kubark42 wrote:
There's no problem, it works just fine.
Most likely only because you throw away the lower bits (int reading = data >> 6).

You're not reading the code very carefully. I don't throw away the last six bits, I put them into status. And, yes, it works just fine. I've got the code and the actuators working that prove it. There's nothing wrong with a negative bit shift on an AVR.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kubark42 wrote:
There's nothing wrong with a negative bit shift on an AVR.
The standard (which I consulted in the meantime) has a different opinion:
Quote:
If the value of the right operand is negative or ..., the behavior is undefined.
If it really works for you it is merely luck, or a gcc extension. I would not rely on it anyway.

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

And, yes, it works just fine.

Reading PORTA?!?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:

And, yes, it works just fine.

Reading PORTA?!?

Alright, we're getting WAY too lost in the details. Technically, that's PORTF.IN, not PORTA.IN, although that makes no difference whatsoever to my problem. This is pseudo-code, not actual code. It also won't compile without a main(), header files, etc... so there's no point in harping on non-essentials.

I'd like to concentrate on the essentials, which is not whether or not WIN-AVR defines behavior for negative bitshifts, or discussing ATMEGA vs. XMEGA syntax. What I'd like to find is the most efficient way to bitshift in 18 bits. There have been some ideas here for how better to do this, but that's been hijacked by erroneous skepticism.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
long data=1L<<18;
while(!(data&1)) {
    data>>=1;
    // not really avr-gcc:
    if(PORTA.IN & (1<<PIN)) data|=(1L<<18);
}
data>>=1;

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I'd like to concentrate on the essentials, which is not whether or not WIN-AVR defines behavior for negative bitshifts,
Whether your code works by definition or merely by luck is not essential for you?

Quote:
hijacked by erroneous skepticism.
Sorry that I tried to help you writing better code (more readable, more efficient, and last but not least complying to the C standard). Bye.

Stefan Ernst

Last Edited: Mon. Jun 14, 2010 - 09:11 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kubark42 wrote:
sternst wrote:
kubark42 wrote:
There's no problem, it works just fine.
Most likely only because you throw away the lower bits (int reading = data >> 6).

You're not reading the code very carefully. I don't throw away the last six bits, I put them into status. And, yes, it works just fine. I've got the code and the actuators working that prove it. There's nothing wrong with a negative bit shift on an AVR.

There aren't any positive OR negative multibit shifts on an AVR (except maybe for the SWAP instruction); compilers synthesize them out of sequences of the underlying machine instructions. And in the case of the GCC compiler (which I assume you're using on account of this being the AVR->GCC forum), there most certainly IS 'something wrong' with negative shifts. At least there is in the almost-up-to-date version of GCC I've got installed here. Here's a snip of test code:
#include 
int main(void)
{
    return (PINA << -5);
}

, which produces this output in AStudio when it's built:

Build started 14.6.2010 at 13:48:17
avr-gcc  -mmcu=atmega128 -Wall -gdwarf-2 -Os -std=gnu99 -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -MD -MP -MT Zoink.o -MF dep/Zoink.o.d  -c  ../Zoink.c
../Zoink.c: In function 'main':
../Zoink.c:5: warning: left shift count is negative
avr-gcc -mmcu=atmega128 -Wl,-Map=Tkeys.map Zoink.o     -o Tkeys.elf
avr-objcopy -O ihex -R .eeprom -R .fuse -R .lock -R .signature  Tkeys.elf Tkeys.hex
avr-objcopy -j .eeprom --set-section-flags=.eeprom="alloc,load" --change-section-lma .eeprom=0 --no-change-warnings -O ihex Tkeys.elf Tkeys.eep || exit 0
avr-objdump -h -S Tkeys.elf > Tkeys.lss

, and generate this assembler output:

int main(void)
{
    return (PINA << -5);
  be:	29 b3       	in	r18, 0x19	; 25
  c0:	30 e0       	ldi	r19, 0x00	; 0
}
  c2:	c9 01       	movw	r24, r18
  c4:	08 95       	ret

As you can see, the bits read from the PINA register are simply returned with no shifting whatsoever. If one removes the minus sign from the shift operand and rebuilds, the warning disappears, and the generated code includes a nice little loop to left-shift the value 5 times.

If your system is working, it must be because it doesn't care what value comes back from your read_bits() function. So, good news: you can make it considerably more efficient by just having it return a constant 0x00 or 0xff or however it turns out to behave as currently written.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sternst wrote:
Quote:
I'd like to concentrate on the essentials, which is not whether or not WIN-AVR defines behavior for negative bitshifts,
Whether your code works by definition or merely by luck is not essential for you?

Quote:
hijacked by erroneous skepticism.
Sorry that I tried to help you writing better code (more readable, more efficient, and last but not least complying to the C standard). Bye.

My, oh, my, people are touchy tonight. The difference between pointing out that code isn't right and hijacking the thread is exactly the difference between one post and two. You made your point, thank you, I personally don't like coding in undefined areas-- lucky unlucky, or undefinedly lucky-- so in the future I will make sure that I try to avoid that kind of code. Which already went without saying, although I seem to have bruised your ego by not saying it.

All of which is incidental to trying to understand better how to code 18-bit bitshifts on an 8-bit AVR. Doubly so because I've followed the advice of posters in this thread and the issue has become moot as the offending bitshifts have been taken out.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

And in the case of the GCC compiler (which I assume you're using on account of this being the AVR->GCC forum), there most certainly IS 'something wrong' with negative shifts.
No, the C standard says the behavior for negative left-shift is undefined. GCC Is free to return whatever value it likes.

But all this is moot, as we have learned the OP presented pseudo-code, so a pseudo-result should do.

Stealing Proteus doesn't make you an engineer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kubark42, post your exact code, and I can have a think about ways to optimize it. While main() and the supporting code isn't important, it *is* important that we can see your exact code so that we can offer ways to optimize it. Giving pseudo code and asking for optimization will only give you optimizations for the pseudo code, which may or may not carry over to your actual implementation.

I'll be happy to help you out (I've got some experience fine-tuning AVR-GCC output via tweaking the source) once we can see what we're really working with.

- Dean :twisted:

Make Atmel Studio better with my free extensions. Open source and feedback welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

abcminiuser wrote:
kubark42, post your exact code, and I can have a think about ways to optimize it. While main() and the supporting code isn't important, it *is* important that we can see your exact code so that we can offer ways to optimize it. Giving pseudo code and asking for optimization will only give you optimizations for the pseudo code, which may or may not carry over to your actual implementation.

Actually, the pseudo-code optimizations were exactly what I was looking for, but that doesn't mean I'll turn down an offer to see if there's even more to be done!

long int data_out[3]; //Normally a function parameter, but put here for completeness.

//Set pins to low
ACT_SELECT_0_PORT.OUT   &= ~(1<<ACT_SELECT_0_PIN);  ACT_SELECT_1_PORT.OUT   &= ~(1<<ACT_SELECT_1_PIN);
ACT_SELECT_2_PORT.OUT   &= ~(1<<ACT_SELECT_2_PIN);

TCC1.CNT=0; //Timer runs at 32MHz
MAG_ENC_CLK_PORT.OUT   &= ~(1<<MAG_ENC_CLK_PIN);  //Set pin to low

//Reset data_out to 0
memset(data_out, 0, 3*sizeof(*data_out));

for (int i=17; i>=0; i--) 
{ 
	//Wait 1 microsecond
	while (TCC1.CNT < 32){
	}
	TCC1.CNT=0; //Reset timer

	MAG_ENC_CLK_PORT.OUT |= (1<<MAG_ENC_CLK_PIN);  //Set pin to high

	//Wait 1 microsecond
	while (TCC1.CNT < 32){
	}

	TCC1.CNT = 0; //Reset timer
	MAG_ENC_CLK_PORT.OUT   &= ~(1<<MAG_ENC_CLK_PIN);  //Set pin to low

	//Read pin. Note that this gives a 1 or 0 at the pin bit, not at the first bit.
	char bit[3];
	bit[0] = MAG_ENC_0_MISO_PORT.IN & (1<<MAG_ENC_0_MISO_PIN) ? 1 : 0;
	bit[1] = MAG_ENC_1_MISO_PORT.IN & (1<<MAG_ENC_1_MISO_PIN) ? 1 : 0;
	bit[2] = MAG_ENC_2_MISO_PORT.IN & (1<<MAG_ENC_2_MISO_PIN) ? 1 : 0;

	if (i >= 16){
		char *p_data[3];
		for (int j=0; j < 3; j++){
			p_data[j]=(char *)&data_out[j]+2;
			*p_data[j] |= (bit[j] << (i-16));
		}
	}
	else if (i >=8){
		char *p_data[3];
		for (int j=0; j < 3; j++){
			p_data[j]=(char *)&data_out[j]+1;
		}

	  *p_data[0] |= (bit[0] << (i-8));
	  *p_data[1] |= (bit[1] << (i-8)); 
	  *p_data[2] |= (bit[2] << (i-8));
	}
	else{
		char *p_data[3];
		for (int j=0; j < 3; j++){
			p_data[j]=(char *)&data_out[j];
		}

    	*p_data[0] |= (bit[0] << i);
    	*p_data[1] |= (bit[1] << i); 
    	*p_data[2] |= (bit[2] << i);
	}
} 

//Set pins to high
ACT_SELECT_0_PORT.OUT |= (1<<ACT_SELECT_0_PIN); ACT_SELECT_1_PORT.OUT |= (1<<ACT_SELECT_1_PIN); ACT_SELECT_2_PORT.OUT |= (1<<ACT_SELECT_2_PIN);

Note that there is some strangeness in that the most efficient code variant I found through a bit of trial-and-error was to have the first `if` use a for loop for both pointer and value assignment, and then in the others to use a loop only for pointer assignment. For some reason, if I do all three `if` with the same approach the output code balloons by 100 bytes. Maybe GCC optimization skills might come in handy here!

P.S. I can also try to improve the loop by a bit by using an interrupt to trigger driving the pins high and low, but that's for later when there's nothing else left to improve. It's also not sure if the additional instructions added for the ISR function won't cost more than they save.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I can't quite grok your logic, but it looks like you're just trying to load in three track's worth of 18 bits into three longs. In that case, a shift-and-mask approach will yield very tight code. Try something like this instead:

//Set pins to low
ACT_SELECT_0_PORT.OUT   &= ~(1<<ACT_SELECT_0_PIN);  ACT_SELECT_1_PORT.OUT   &= ~(1<<ACT_SELECT_1_PIN);
ACT_SELECT_2_PORT.OUT   &= ~(1<<ACT_SELECT_2_PIN);

TCC1.CNT=0; //Timer runs at 32MHz
MAG_ENC_CLK_PORT.OUT   &= ~(1<<MAG_ENC_CLK_PIN);  //Set pin to low

//Reset data_out to 0
memset(data_out, 0, 3*sizeof(*data_out));

unsigned long bitmask = (1 << 17);

// Repeat until all bits have been shifted in
while (bitmask)
{
   //Wait 1 microsecond
   while (TCC1.CNT < 32) {};
   TCC1.CNT = 0; //Reset timer

   MAG_ENC_CLK_PORT.OUT |= (1<<MAG_ENC_CLK_PIN);  //Set pin to high

   //Wait 1 microsecond
   while (TCC1.CNT < 32) {};
   TCC1.CNT = 0; //Reset timer

   MAG_ENC_CLK_PORT.OUT   &= ~(1<<MAG_ENC_CLK_PIN);  //Set pin to low

   // For each track, if the next bit is a 1, OR the output data with the current bitmask
   data_out[0] |= (MAG_ENC_0_MISO_PORT.IN & (1<<MAG_ENC_0_MISO_PIN)) ? bitmask : 0;
   data_out[1] |= (MAG_ENC_1_MISO_PORT.IN & (1<<MAG_ENC_1_MISO_PIN)) ? bitmask : 0;
   data_out[2] |= (MAG_ENC_2_MISO_PORT.IN & (1<<MAG_ENC_2_MISO_PIN)) ? bitmask : 0;
   
   // Shift the bitmask once for each bit that is read in, until the mask is empty (all bits read)
   bitmask >>= 1;
}

//Set pins to high
ACT_SELECT_0_PORT.OUT |= (1<<ACT_SELECT_0_PIN); ACT_SELECT_1_PORT.OUT |= (1<<ACT_SELECT_1_PIN); ACT_SELECT_2_PORT.OUT |= (1<<ACT_SELECT_2_PIN);

Which uses a single bit mask variable to hold the mask of the current bit being read for all three tracks. Each time a bit is read from the track inputs, the output values have the current bit-mask bit set if the track's input is a 1. The bit mask is then shifted right once per iteration so that eventually the mask will become zero after all 18 bits are read, and the loop terminates.

- Dean :twisted:

Make Atmel Studio better with my free extensions. Open source and feedback welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

abcminiuser wrote:
I can't quite grok your logic, but it looks like you're just trying to load in three track's worth of 18 bits into three longs.

Yup, that was exactly it. I was following the advice of an earlier poster to split the 18 bits into three distinct bytes.

Your code is definitely tighter. Just one problem(and I'm guessing this might be a gcc bug):

I had to turn

	unsigned long bitmask = (1 << 17);

into

	unsigned long int bitmask = (1 << 9); 
	bitmask <<= 8;

because otherwise GCC said that the shift was bigger than the variable type (it's obviously not) and then proceeded to optimize the whole shebang out of existence. By splitting the 17-bit shift into two parts, I apparently fooled the compiler and it works just fine. Around 200 cycles faster than the previous code for the entire loop.

Thanks a lot for the advice. I've never seen a bitmask used like this before, but I'll definitely remember it now.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kubark42 wrote:
because otherwise GCC said that the shift was bigger than the variable type (it's obviously not)
It is. The 1 is only an int.

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

because otherwise GCC said that the shift was bigger than the variable type (it's obviously not) and then proceeded to optimize the whole shebang out of existence.

Oops - who let me near a keyboard?

Constants like that one are treated as an int by the compiler unless directed otherwise, which means that it is only 16 bits on an AVR, so the compiler is correct. The way to fix that is to either add a regular cast before the shift:

unsigned long bitmask = ((unsigned long)1 << 17); 

Or, better, to use the C shorthand for an unsigned long constant:

unsigned long bitmask = (1UL << 17); 

- Dean :twisted:

Make Atmel Studio better with my free extensions. Open source and feedback welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sternst wrote:
kubark42 wrote:
because otherwise GCC said that the shift was bigger than the variable type (it's obviously not)
It is. The 1 is only an int.

Good point. I had interpreted GCC's complaint to refer to the long, and not the 1. It's been fixed.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I was following the advice of an earlier poster to split the 18 bits into three distinct bytes.
Your solution seems to be way off Lee's and mine suggestion. I had the following in mind:
typedef union
{
    uint32_t dword;
    uint16_t word[2];
    uint8_t byte[4];
}
dword_t;


dword_t *soft_spi (void)
{
    static dword_t data_in[3];
    dword_t data_bits;
    
    ACT_SELECT_0_PORT.OUT &= ~(1 << ACT_SELECT_0_PIN);  //Set pin to low
    ACT_SELECT_1_PORT.OUT &= ~(1 << ACT_SELECT_1_PIN);  //Set pin to low
    ACT_SELECT_2_PORT.OUT &= ~(1 << ACT_SELECT_2_PIN);  //Set pin to low

    MAG_ENC_CLK_PORT.OUT &= ~(1 << MAG_ENC_CLK_PIN);    //Set pin to low
    
    TCC1.CNT = 0; //Reset timer

    // The MSB Of data_in Is Always Zero (Byte Is Unused)
    data_in[0].byte[3] = 0;
    data_in[1].byte[3] = 0;
    data_in[2].byte[3] = 0;
    
    // Get Data Bits 17...16
    data_bits.dword = read_bits (2);
    data_in[0].byte[2] = data_bits.byte[0];
    data_in[1].byte[2] = data_bits.byte[1];
    data_in[2].byte[2] = data_bits.byte[2];
    
    // Get Data Bits 15...8
    data_bits.dword = read_bits (8);
    data_in[0].byte[1] = data_bits.byte[0];
    data_in[1].byte[1] = data_bits.byte[1];
    data_in[2].byte[1] = data_bits.byte[2];
    
    // Get Data Bits 7...0
    data_bits.dword = read_bits (8);
    data_in[0].byte[0] = data_bits.byte[0];
    data_in[1].byte[0] = data_bits.byte[1];
    data_in[2].byte[0] = data_bits.byte[2];

    ACT_SELECT_0_PORT.OUT |= (1<<ACT_SELECT_0_PIN);   //Set pin to high
    ACT_SELECT_1_PORT.OUT |= (1<<ACT_SELECT_1_PIN);   //Set pin to high
    ACT_SELECT_2_PORT.OUT |= (1<<ACT_SELECT_2_PIN);   //Set pin to high
    
    return data_in;
}



// Parameter:
// bit_count Must Be In The Range Of 1 To 8
//
// Return:
// data_in.byte[0] contains Data From MAG_ENC_0_MISO_PIN
// data_in.byte[1] contains Data From MAG_ENC_1_MISO_PIN
// data_in.byte[2] contains Data From MAG_ENC_2_MISO_PIN
uint32_t read_bits (uint8_t bit_count)
{
    dword_t data_in = {.dword = 0};
   
    do
    {
        //Wait 1 microsecond
        while (! (TCC1.CNTL & 0x20));
        
        MAG_ENC_CLK_PORT.OUT |= (1 << MAG_ENC_CLK_PIN);  //Set pin to high
        
        //Wait 1 microsecond
        while (! (TCC1.CNTL & 0x20));
        
        MAG_ENC_CLK_PORT.OUT &= ~(1 << MAG_ENC_CLK_PIN);  //Set pin to low
        
        // Read MAG_ENC_0_MISO_PIN
        data_in.byte[0] *= 2;
       
        if (MAG_ENC_0_MISO_PORT.IN & (1 << MAG_ENC_0_MISO_PIN))
        {
            data_in.byte[0] |= 1;
        }
        
        // Read MAG_ENC_1_MISO_PIN
        data_in.byte[1] *= 2;
       
        if (MAG_ENC_1_MISO_PORT.IN & (1 << MAG_ENC_1_MISO_PIN))
        {
            data_in.byte[1] |= 1;
        }
        
        // Read MAG_ENC_2_MISO_PIN
        data_in.byte[2] *= 2;
       
        if (MAG_ENC_2_MISO_PORT.IN & (1 << MAG_ENC_2_MISO_PIN))
        {
            data_in.byte[2] |= 1;
        }
    }
    while (--bit_count > 0);
   
    return data_in.dword;
}

Your syntax looks like you're using a xmega. Note that you can speed up the the algorithm if you make use of the Virtual Port Registers, so the algorithm could use the SBIS, SBIC, IN and OUT instructions.

Regards
Sebastian

EDIT:
Dean, I don't want to push you with putting out another soulution. I just had almost finished the code a few hours ago but then I've been distracted. I thought it was still worth to post it here.

Last Edited: Thu. Jun 17, 2010 - 03:52 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

S-Sohn wrote:
Quote:
I was following the advice of an earlier poster to split the 18 bits into three distinct bytes.
Your solution seems to be way off Lee's and mine suggestion. I had the following in mind:
...

What I took from your idea was to split up the problem into bytes instead of longs, and to use a structure. Only I decided that pointer-fu would be a bit easier to read. So I guess I can say inspired by, but not strictly followed.

Quote:

Your syntax looks like you're using a xmega. Note that you can speed up the the algorithm if you make use of the Virtual Port Registers, so the algorithm could use the SBIS, SBIC, IN and OUT instructions.

I didn't know that. In this particular case, I wonder if that would speed up the loop since the bottleneck is most of all in the bit shifting. Still, every instruction helps.

Well, actually, it doesn't so much anymore. Dean's code gave me the necessary bump to always have plenty of free time in my loop. It's for a controls system so everything is completely deterministic. Once I'm fast enough, there is no (immediate) advantage to going faster.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Dean's code gave me the necessary bump to always have plenty of free time in my loop.
Fine to read that it works for you now.

Regards
Sebastian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kubark42 wrote:
abcminiuser wrote:
I can't quite grok your logic, but it looks like you're just trying to load in three track's worth of 18 bits into three longs.

Yup, that was exactly it. I was following the advice of an earlier poster to split the 18 bits into three distinct bytes.

Your code is definitely tighter. Just one problem(and I'm guessing this might be a gcc bug):

I had to turn

	unsigned long bitmask = (1 << 17);

into

	unsigned long int bitmask = (1 << 9); 
	bitmask <<= 8;

because otherwise GCC said that the shift was bigger than the variable type (it's obviously not) and then proceeded to optimize the whole shebang out of existence. By splitting the 17-bit shift into two parts, I apparently fooled the compiler and it works just fine. Around 200 cycles faster than the previous code for the entire loop.

(1<<17) is 0 because 1 is an int and ints have only 16 bits.
(1L<<17) or (1UL<<17) would have done the trick.

Edit: attribute the redundancy to a fit of blindness.

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods