256 bit shift register required

Go To Last Post
62 posts / 0 new

Pages

Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I require a 256 bit shift register for left shifting. I could do this in asm, but if I can set up a suitable typedef, I would prefer to do this in C. Not getting anywhere though. Any ideas?

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You could do an arbitrary shift just using an array and a for loop through it's indices. Start at the low byte and take a note of bit 7 (&0x80) then shift it left. For all subsequent bytes do the same but also OR a 1 into bit 0 if the previous bit 7 was set.

The longest type supported by the compiler is uint64_t. I guess you could do what I just described in 64 bit chunks. Testing bit 63 each time and passing the "carry" onto the next shift?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks Cliff,

Quote:
The longest type supported by the compiler is uint64_t
. I thought that might have been the case.
I have the following implemented, which seems to do the trick.

  if(PORTC&02>>0)  //test data
     data=1;
  else
     data=0;
  //shift into 256 bit shift register			
  carry_out=data;		
  for(uint8_t count=0;count<32;count++) 
  {
      carry_in=carry_out;
      if(shift_register[count]>127)
        carry_out=1;
      else
      	carry_out=0;
      shift_register[count]*=2;
      shift_register[count]+=carry_in;			
   }
   data^=carry_out;   //xor with current data with carry_out
   if(data==1)
      PORTB|=0x01;
   else
      PORTB&=0xfe;

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

LDEVRIES wrote:
I require a 256 bit shift register for left shifting.
Do you really need it to be a shift register? Maybe a circular bit buffer instead?

Eugene

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Eugene,
I am trying to envisage how a circular buffer might be shorter or more efficient. However, the answer is, that it needs to be a shift register where a new bit is inserted in into b0 and we extract one from b255.
For interest sake, I am building a "huff & puff" oscillator stabilizer, using a tinyAT rather than a PIC. http://downloads.hanssummers.com/ttjun07.pdf

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

LDEVRIES wrote:

For interest sake, I am building a "huff & puff" oscillator stabilizer, using a tinyAT rather than a PIC. http://downloads.hanssummers.com/ttjun07.pdf

Note that the link you posted cannot be accessed directly, but can be accessed by going to http://www.hanssummers.com/radio... and clicking on "An Alternative PIC stabiliser".

I gather that you're trying to implement a software-based PLL without performing the division externally, as proposed by the standard design. Why not just skip the division entirely, and, in a tight loop, check if after an edge, it's the positive phase (<180deg) or the negative phase (>180deg) and output correction appropriately.

This can be done quite effectively with the interrupt on level change capabilities of the capture-compare units. Simply set it to trigger on one or both edges, and read off the timer value for the phase difference, noting that the read will always be executed on your AVR's clock edge.

I may have missed a few important details here though. :D

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, I just thought that a shift register did not scale well. Time to insert a bit into a shift register increases proportionally with the size of the register. Time to insert a value into a circular buffer does not depend on the size of the buffer. Extracting b255 will obviously always be faster from the shift register. Code-size-wise the shift registers implementation is probably smaller.

Eugene

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

To get it to work in C should not be a problem. But the best C code will depend of the compiler here is some things to look into.
The sign of a signed char is MSB
becaus some compilers use libs for 32 bit but inline for 16bit, try them both
And try 8 bit bacause bacause the AVR is a 8 bitter
perhaps not correct if there is a "carry" but make two different paths for the code.
Try a fixed mask in a reg. to avoid loads.

Have fun

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Something like this may be a little less expensive computationally.

unsigned char big_shift(unsigned char in_bit)
{
static unsigned char bit_buff[32];
static unsigned char index;
unsigned char out_bit;

// put bit in, get bit out

if (in_bit)
   {
   bit_buff[index >>3)] |= (1<<(index & 0x07)); //set the bit
   }
else
   {
   bit_buff[index >>3)] &= ~(1<<(index & 0x07)); //clear the bit
   }
index++;
out_bit = bit_buff[index >>3] & 1<<(index & 0x07);

return out_bit;
}

I did a similar 'trick' for some dsp I was doing on an ARM7 where you have to shift samples along when doing filtering. The circular method was significantly faster then shuffling words along - I can't recall how much but it was at least twice as fast.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am totally amazed by the efficiency of avr-gcc!

I simulated the code for both the ASM function and the C function for a 16MHz mega128. The actual CALL takes 17us for the ASM and 20us for the uint32_t C version.

extern uint8_t g_buf[32]; 

uint8_t shift256bit_32(uint8_t carry_in)
{
  uint32_t *shift_register = (uint8_t *)g_buf;
  uint8_t carry_out;
  //shift into 256 bit shift register         
  carry_out=carry_in;      
  for(uint8_t count=0;count< 256/32;count++)
  {
      carry_in=carry_out;
      if(shift_register[count] & (1uL<<31))
        carry_out=1;
      else
         carry_out=0;
      shift_register[count]*=2;
      shift_register[count]+=carry_in;         
   }
   return carry_out;   //
}
	.file	"shift256asm.S"

	.text
	.extern	g_buf

	.global	shift256bit_8
shift256bit_8:
	ldi	r30,lo8(g_buf)
	ldi	r31,hi8(g_buf)

	clc
	tst	r24
	breq	.LM3
	sec
.LM3:
	ldi	r25,32
.LM4:
	ld	r24,Z
	rol	r24
	st	Z+,r24
	dec	r25
	brne	.LM4

	ldi	r24,0
	brcc	.LM7
	ldi	r24,1
.LM7:
	ret

You can see that the uint32_t C code is fairly generic. Using uint8_t, uint16_t versions take a little longer. I have not timed the uint64_t version because it crashes the compiler. (20081205)

As has been suggested earlier, improving your algorithm is more effective than hand-optimising the 256 bit shift register. But are you not impressed by the efficiency of the C compiler?

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, I am also most impressed. I haven't timed the code using 32 uint8_t's, but it was just as compact as the PIC version and will no doubt run faster. Will post my results later.

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You really can't complain about the compiler here! I'm very surprised it picked up on being able to use the carry.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I had no really noticed what it was doing. I just wrote the C function (from Lee's original) and a test display function.

The Simulator does the timing. I just thought the result was impressive. Perhaps I should look at the generated code.

I timed the uint64_t version. 142us. So not wise to use this at all. I also discovered what the bug in my 20081205 compiler is. You can use a literal 0x8000000000000000LL but you cannot do a (1LL << 63) expression.

Looking at the ASM for 32bit and 64bit, I can see what it is doing. I will never be rude about avr-gcc again!

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It looks like you only need a FIFO bit queue sized 256 bits, and not a true shift register. If you can "waste" an extra byte a lot of shift operations can be avoided. A pure C solution could be something like this:

#include 

uint8_t queue_bit (uint8_t bit)
{
    static uint8_t fifoQ[33];   /* 256/8 + 1 bytes */
    static uint8_t wrap;
    uint8_t aux;

    /* push bit into last byte */

    aux = fifoQ[32];
    aux >>= 1;
    if (bit) aux |= 0x80;
    fifoQ[32] = aux;

    /* pop bit from first byte */

    bit = fifoQ[0];
    bit &= 0x01;

    /* shift by copying every 8 bits */

    ++wrap;
    wrap &= 7;
    if (wrap == 0) {
        for (aux = 0; aux < 32; ++aux) {
            fifoQ[aux] = fifoQ[aux+1];
        }
    }

    return bit;
}

Execution time should be generally faster, even when a bit more time is required every 8 bits. Internal state is also not usable as a shift register.

Regards. Carlos.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Carlos,

I cannot see where I am going wrong. The test function works fine with 'shift256bit_32()' but not with your 'queue_bit()'

char shiftchar(char c)
{
   static uint8_t i, inp, out;
   inp = c;
   out = 0;
   for (i = 0; i < 8; i++) {
	   out <<= 1;
//       out |= queue_bit((inp & 0x80) != 0);
       out |= shift256bit_32((inp & 0x80) != 0);
       inp <<= 1;
   }
   return out;
}

void main(void)
{
   uint8_t c;
   char *p;
   initstdio();
   printf("\r\nHello shift256bit\r\n");
   for (p = "We will put some dummy data in the shift register"; *p; )
       shiftchar(*p++);
   while (1) {
       c = getchar();
       putchar(shiftchar(c));
   }
}

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
unsigned char bits[32]; 
unsigned char hi_mask=0x80, hi_index=31;
// hi bit is bits[hi_index] & hi_mask
// a shift will change one bit and
// alter the labels of all of them

unsigned char shift(unsigned char lo_bit)
{
unsigned char retval= bits[hi_index] & hi_mask;
bits[hi_index]&= ~hi_mask;
if(lo_bit) bits[hi_index]|=hi_mask;
hi_mask>>=1;  // shift direction corrected
if(0==hi_mask) {
    hi_mask=0x80;
    hi_index--;
    hi_index &= 31;
}
return retval;
}

Moderation in all things. -- ancient proverb

Last Edited: Mon. May 3, 2010 - 05:48 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I took a look a the ASM code for the C version and if you use -O3 then the compiler unroll the loop so the function gets 484 bytes ! so it has a cost to optimize!

Edit it's david's code I talk about

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So an unrollede ASM version is 5*32 clk =160 clk or 10us + what you deside to do at the ends. (the clk is for a "normal" AVR)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Am I losing the plot?

The 'shift256bits()' function should take an input bit, and return the 256'th bit shifted out the other end. e.g. 1 or 0 (true or false)

I tried skeeve's function. it returns 0x80 or 0. But it does not seem to be 256 bits later.

I wrote shift256bit_8() in ASM
Lee wrote shift256bit_32() et al ... ok I adapted them to be generic
Carlos wrote queue_bit()
Skeeve wrote shift_skeeve() ... ok he called it shift()

Personally, I am more interested in the algorithm than going to extremes with unrolling. But it would be nice to have all the functions perform the same task.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

David, you are right. Last cosmetic changes left out the shift code in the 'pop bit' section. The code should say:

#include 


uint8_t queue_bit (uint8_t bit)
{
    static uint8_t fifoQ[33];   /* 256/8 + 1 bytes */
    static uint8_t wrap;
    uint8_t aux;

    /* push bit into last byte */

    aux = fifoQ[32];
    aux >>= 1;
    if (bit) aux |= 0x80;
    fifoQ[32] = aux;

    /* pop bit from first byte */

    aux = bit = fifoQ[0];
    bit &= 0x01;
    aux >>= 1;
    fifoQ[0] = aux;

    /* shift by copying every 8 bits */

    ++wrap;
    wrap &= 7;
    if (wrap == 0) {
        for (aux = 0; aux < 32; ++aux) {
            fifoQ[aux] = fifoQ[aux+1];
        }
    }

    return bit;
}

Sorry for the inconvenience. Carlos.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ok then I wasn't sure of the rules sorry.
But if thats all we need the ASM code can be made in about 25 clk (or 1.6 us on a 16 MHz)
the structure will be this:
A low REG holdes the bit we are looking at. A inc will get to the next bit.
the 3 MSB holds the bit number in the byte and get converted to a bitmask (7 clk).
The low 5 bit are a addr of the of the byte and gets added to start addr of the array (from 2 clk normally about 5 clk).
Do the load (2 clk)
AND the value with the mask (2 clk we need a copy)
the result is the return bit (0 or non zero)(2 clk)
take the bit to replace it with (from where ? using the mask)(5 clk)
store the result (2 clk)
inc bitpointer (1 clk)

done
I will like to see a C version even close ;)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
I tried skeeve's function. it returns 0x80 or 0. But it does not seem to be 256 bits later.
That is because I had the shift direction on hi_mask wrong.
In case it is not yet bug-free,
I have added comments to assist the do it yourselfers.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think I must have fixed both Carlos' and skeeve's versions by myself.

Yes. We all end up with working 256 bit shift registers. Since we have algorithms that vary with the bit #, I have timed the 'shiftchar()' function which effectively shifts eight times.

shift_skeeve()    36us    (4.5us)
queue_bit()       66us    (8.2us)
shift256bit_32   174us   (21.8us)

I think this illustrates the power of algorithm over compiler intelligence. My ASM version is blown out of the water by skeeve. And sparrow's suggestion would beat everything.

However, I am too idle to implement sparrow(). And I suspect that the skeeve() function is both in a HLL and fast enough.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I would prefer to do this in C
Does using inline assembler still count as C?
void shift_256bit_reg (void)
{
    static uint8_t shift_reg[32];
    uint8_t *shift_reg_ptr = &shift_reg[0];
    uint8_t temp;
    uint8_t index;
    
    // Clear Carry Flag
    ASM_CLC();
    
    // Skip Setting Carry Flag If PB0 Is Clear
    ASM_SBRC (PORTB, PB1);
    // Set Carry Flag
    ASM_SEC();
    
    // Shift Left Loop
    ASM_LDI (index, 32);
    ASM_LABEL (shift_reg_loop:);
    
        ASM_LD (temp, shift_reg_ptr);
        ASM_ADC (temp, temp);
        ASM_STinc (shift_reg_ptr, temp);
    
    ASM_DEC (index);
    ASM_BRNE (shift_reg_loop);
    
    // Branch if Carry Cleared (Goto Label clear_PB0)
    ASM_BRCC (clear_PB0);
    PORTB |= (1 << PB0);
    ASM_RJMP (skip_clear_PB0);
    
    ASM_LABEL (clear_PB0:);
    PORTB &= ~(1 << PB0);
    
    ASM_LABEL (skip_clear_PB0:);
}

Yes, I know. It looks hair-raising to most of you. BUT gcc compiles this code to this:

  3c:	e0 e6       	ldi	r30, 0x60	; 96
  3e:	f0 e0       	ldi	r31, 0x00	; 0
    uint8_t *shift_reg_ptr = &shift_reg[0];
    uint8_t temp;
    uint8_t index;
    
    // Clear Carry Flag
    ASM_CLC();
  40:	88 94       	clc
    
    // Skip Setting Carry Flag If PB0 Is Clear
    ASM_SBRC (PORTB, PB1);
  42:	88 b3       	in	r24, 0x18	; 24
  44:	81 fd       	sbrc	r24, 1
    // Set Carry Flag
    ASM_SEC();
  46:	08 94       	sec
    
    // Shift Left Loop
    ASM_LDI (index, 32);
  48:	90 e2       	ldi	r25, 0x20	; 32

0000004a :
    ASM_LABEL (shift_reg_loop:);
    
        ASM_LD (temp, shift_reg_ptr);
  4a:	80 81       	ld	r24, Z
        ASM_ADC (temp, temp);
  4c:	88 1f       	adc	r24, r24
        ASM_STinc (shift_reg_ptr, temp);
  4e:	81 93       	st	Z+, r24
    
    ASM_DEC (index);
  50:	9a 95       	dec	r25
    ASM_BRNE (shift_reg_loop);
  52:	d9 f7       	brne	.-10     	; 0x4a 
    
    // Branch if Carry Cleared (Goto Label clear_PB0)
    ASM_BRCC (clear_PB0);
  54:	10 f4       	brcc	.+4      	; 0x5a 
    PORTB |= (1 << PB0);
  56:	c0 9a       	sbi	0x18, 0	; 24
    ASM_RJMP (skip_clear_PB0);
  58:	01 c0       	rjmp	.+2      	; 0x5c 

0000005a :
    
    ASM_LABEL (clear_PB0:);
    PORTB &= ~(1 << PB0);
  5a:	c0 98       	cbi	0x18, 0	; 24

0000005c :

Well, I think this is some kind of "job-saving" code. It'll make your boss :shock: :evil:

Regards
Sebastian

EDIT: Somehow I can't post the assembler macros here. So I've uploaded them as an attachment.

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But in C my ASM way should be fast aswell, but the pointer byte should be with the mask bit as the low 3 bits and use a lookup table for the mask. So
mask[bitpointer&0x07] will get the mask and
byte_ar[bitpointer>>3] will get the byte.

But if we only need this as a shift register it can be done even faster if we keep the mask in one char and addr in an other!
The description I gave for ASM was just a copy from a bitlib I made a while back because I needed some 8051 code (the 8051 can use the carry as a bit ACC) to be converted.(but I can't find it).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Does using inline assembler still count as C?

When I read this, my eyes glazed over. I am pathologically terrified of GCC gobbledygook. But later, I thought: "This almost looks readable"

I still prefer proper ASM syntax and separate .S file.

Meanwhile, Carlos, skeeve and sparrow have offered algorithms that are infinitely superior to any ASM that has a loop!

Using a static byte array as a circular bit buffer is O(1) efficiency. You can also access any of the bits at any time. Unfortunately the AVR does not have a barrel shifter, so a random access involves calculating the bit mask. Looping a ROL can be a little slower than looking up the bit mask from a table.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
barrel shifter

All maga AVR's has a 8 bit barrel shifter, use the high byte after a mul ;)

Edit or low byte depending of what you need

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have never really used ASM on an AVR.

A C compiler seems to do multiple shifts with a LSL or ASR in a loop.

I suppose that LSL by 3 is the same as a MUL by 8.
How would you do a ASR by 3 ?

I think I answered the calculation of a mask. For a start you are only shifting a 1 by a variable # of bits. You get an O(1) result by indexing into an 8 byte table.

An ARM or a 68K can do all the shifts in one cycle. I do not see how you can convert a register that contains 3 into a MUL by 8.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
How would you do a ASR by 3 ?


Make a signed mul by 32 and use the high byte.

And yes it's not a real barrel shifter it works only for fixed numbers.

And just to make it clear a 68K (lower than 68020) do not have a barrel shifter (but only the instruction and then it make an intern loop)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes. While I was walking the dog, I remembered that a multiple shift took extra cycles. Mind you, it was faster than in a programmed loop.

So there is no clever sequence to evaluate (1 << var).

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Even simpler, pick LSB, then shift and insert into the MSB:

#include 

uint8_t queue_bit_2 (uint8_t bit)
{
    static uint8_t fifoQ[32];
    static uint8_t index;
    uint8_t aux1, aux2;

    aux1 = aux2 = fifoQ
; aux1 &= 0x01; aux2 >>= 1; if (bit) aux2 |= 0x80; fifoQ[index++] = aux2; index &= 0x1f; return aux1; }

Regards. Carlos.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@Carlos: I'm pretty sure yours is faster than mine.
I had to think a bit to realize it worked.

@David: From 3-bits in one register to a
mask in another can be done in 9 cycles
without involving other registers,
6 cycles if one is willing to use an index
register and keep the lookup table in SRAM.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
So there is no clever sequence to evaluate (1 << var)

ldi r17,1
sbrc r3,1
ldi r17,4
sbrc r3,0
add r17,r17
sbrc r3,2
swap r17

That is the fastest I know of

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
Quote:
So there is no clever sequence to evaluate (1 << var)

ldi r17,1
sbrc r3,1
ldi r17,4
sbrc r3,0
add r17,r17
sbrc r3,2
swap r17

That is the fastest I know of

hmmm, that code looks strangely familiar ;)

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
strangely familiar

Just to make it clear I have used that code for more than 10 years , and before it was showed in this forum

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi. This is my fifo bitshift buffer, written in asm.
It runs on ATMega128, size in an example below is 8192 bits (1024 bytes used), but scalable (at compile time) from 2^3=8 to 2^18=262144. Optimized for speed, execution time is independent of buffer size and always takes 812ns (13clks@16MHz) (+ call and ret if not inlined).

.def	bitfield			=r0
.def	bitfield_cpy		=r1
.def	exp2_mask		=r16 ;static
.def	byte_pointer_L	=r26 ;static
.def	byte_pointer_H	=r27 ;static

.equ	BITSHIFT_BUFFER_SIZE=EXP2(13)	;1024*8
.equ	BITSHIFT_BUFFER_SIZE_B=BITSHIFT_BUFFER_SIZE/8
.equ	BITSHIFT_BUFFER_START	=BITSHIFT_BUFFER_SIZE_B<<1 ;start somewhere
.equ	BITSHIFT_BUFFER_END		=BITSHIFT_BUFFER_START+BITSHIFT_BUFFER_SIZE_B-1
.equ	BITSHIFT_MODULO_MASK	=0b11111011

;code requires .byte(BITSHIFT_BUFFER_SIZE_B) SRAM allocation, pointer and exp2_mask initialization. 

;Argument to the function can be passed in various (compile time defined) ways to conditionally skip (****) line:
; - SREG_X {T,I} ->brbx SREG_X,...
; - IO_bit {0x20:0x3F}_{0:7} ->sbix, (used sbis PINA,6 in here)
; - reg_bit{0x00:0x1F}_{0:7} ->sbrx
; - comparing any two registers (cpse x,x)
;just skip (****) line with appropriate command
;return value: SREG_Z is set and  bitfield_cpy is zero, when pushed out bit was clear. Opposite, when it was set.

;------------------------------------------------------------------------------
Bitshift_f:
	cpi	exp2_mask,2 ;sets SREG_C only when exp2_mask==1
	sbrc	byte_pointer_H,2	;check if bit was set/cleared
		ror	exp2_mask			;modify if necesseary		

	andi		byte_pointer_H,BITSHIFT_MODULO_MASK	;perform modulo addressing

	ld	bitfield,X
		mov	bitfield_cpy,bitfield
		or		bitfield	,exp2_mask	;default - bit is set
		sbis	PINA,6					;check argument
			eor	bitfield,exp2_mask		;(****)
	st	X+,bitfield
	and		bitfield_cpy,exp2_mask
ret
;------------------------------------------------------------------------------

Could someone please rewrite it in C to compare GCC efficiency in terms of speed(13clks) and size(13w)?

No RSTDISBL, no fun!

Last Edited: Sun. May 30, 2010 - 06:11 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And here is a modified version, which is even faster (750ns, 12 clks), but requires different initialization and returns pushed out fifo bit in a peculiar way - two static variables are equal if pushed out bit is set, nequal when is cleared. So typically caller can conditionally:

    -skip some code (jmp or other) using "cpse bitfield,bitfield_cpy" or -perform test using "cp bitfield,bitfield_cpy"

;------------------------------------------------------------------------------
Bitshift_f:
		sbic	PINA,6					;check argument
			eor	bitfield,exp2_mask		;when needed - toggle to 0
	st	X+,bitfield

		cpi		exp2_mask,2		;sets SREG_C only when exp2_mask =1
		sbrc	byte_pointer_H,2	;check if bit was set/cleared
			ror	exp2_mask			;modify if necesseary		

		andi		byte_pointer_H,BITSHIFT_MODULO_MASK	;modulo addressing

	ld	bitfield,X
		mov	bitfield_cpy	,bitfield
		or	bitfield		,exp2_mask	;default - bit is set
ret
;------------------------------------------------------------------------------

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
andi byte_pointer_H,BITSHIFT_MODULO_MASK ;perform modulo addressing


I took a quick look at the code and don't get this.
For me it only seems to work if your data starts in addr 0 !!! and not normal RAM.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My data starts at BITSHIFT_BUFFER_START and ends at BITSHIFT_BUFFER_END.

Try this loop and you will find out what modulo means:

clr temp
back:
 inc temp
 andi temp,0b11110111
 rjmp back

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That is a bad way to show how modulo work for you!
Because the value go will wrap to 0.
What I mean is if when you have 1024 byte they have to be placed either 0-1023 or 2048-3071 or !!! not many places to put them.
like your sample code with different init values will do 0-7 16-23 32-39 etc. the number you would like 8-15 can't be done that way.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The short example above does not loop through [8:F] but through [0:7] and that was what I intended to do to explain modulo addressing, but now I see you are a more advanced freak. If you want the code to loop through different range, simply initialize temp with different value:

ldi temp,0x80
...

This one loops through [0x80:0x87]. What is wrong with it?

Quote:
the number you would like 8-15 can't be done that way.

With bitshift it completely does not matter if it loops through [0x8:0xF] or [0xF:0x8]:

ldi temp,0x0F
Back:
  dec temp
  ori temp,0b00001000
  rjmp Back

Quote:
What I mean is if when you have 1024 byte they have to be placed either 0-1023 or 2048-3071 or !!! not many places to put them.

I always assumed it is up to me where to put my data :). When I want to compare two solutions (GCC with asm) in terms of code efficiency, I do not force the compiler to put data at nasty spaces, but I leave it up to the compiler where to put it. When I write in asm, I do not pick a random place, but the most favorable one. Is that wrong?

With mentioned ranges, I am afraid you have made a mistake. You cannot place the data at [0x0000:0x03FF] range on AVRs because the address overlaps IO space. But you can do this at any other range of modulo aligned memory, [0x0400<->0x7FF] which you skipped, and [0x0800<->0x0BFF] or [0x0C00<->0xFFF].

Not many places to put them? ATMega128 has only 4kB of RAM onboard, so what is the reason of putting them in many places? With external SRAM there are up to 63 places to put 1kB modulo alligned buffer. Is that many?

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This one is faster on average run(500ns,8clks), still inlineable. The called version could be slightly faster with uncommented code at the end, but since the code is so short I would rather inline it in most cases.

Bitshift_f:
	sbic	PINA,6					;check argument
		eor	bitfield,exp2_mask		;(****)

	cpi		exp2_mask,2				;rotate mask
	ror		exp2_mask

	brbs	SREG_C,reload_data		;7/8
continue_with_new_byte:
		mov	bitfield_cpy,bitfield
		or	bitfield	,exp2_mask
ret
reload_data:						;1/8
		st	X+,bitfield
		andi	byte_pointer_H,BITSHIFT_MODULO_MASK
		ld	bitfield,X
		rjmp	continue_with_new_byte
;Faster, but uninlineable
;		mov	bitfield	,bitfield_cpy	
;		or	bitfield	,exp2_mask
;ret
;(7*7+1*(6+9))/8=(49+15)/8=64/8=8clks

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
With mentioned ranges, I am afraid you have made a mistake. You cannot place the data at [0x0000:0x03FF] range on AVRs because the address overlaps IO space.

NO that was my point in the first plase you can't .

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
ldi temp,0x0F 
Back: 
  dec temp 
  ori temp,0b00001000 
  rjmp Back 

What do you try to show with this?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

With the example above I wanted to show looping through [0xF:0x8] of course.

Quote:
like your sample code with different init values will do 0-7 16-23 32-39 etc. the number you would like 8-15 can't be done that way.

It CAN be done that way, and that is how:

ldi temp,0x0F
Back:
  dec temp
  ori temp,0b00001000
  rjmp Back

And the other post:

Quote:
For me it only seems to work if your data starts in addr 0 !!! and not normal RAM.

Quote:

What I mean is if when you have 1024 byte they have to be placed either 0-1023 or (...)

My answer was:

Quote:
I am afraid you have made a mistake. You cannot place the data at [0x0000:0x03FF] range on AVRs

You can't and you MUST NOT place this 1kB data at 0-1023 range! That is why

Quote:
My data starts at BITSHIFT_BUFFER_START and ends at BITSHIFT_BUFFER_END.

BITSHIFT_BUFFER_START=BITSHIFT_BUFFER_SIZE_B<<1=[0x0400]

And in your opinion, what range does this code loop through?

	ldi byte_pointer_L,LOW(0x0400)
	ldi byte_pointer_H,HIGH(0x0400)
Back:
	adiw	byte_pointer_L,1
	andi	byte_pointer_H,0b11111101
	rjmp	Back

No RSTDISBL, no fun!

Last Edited: Tue. Jun 1, 2010 - 08:15 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm corrected the or code work sorry.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And the code is not a 8192 bit buffer, but a 4096 bit one - mask is from my previous project. That is my mistake this time.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Anyway, this is the fourth version. It runs 445ns (7.125 clks) on average, uses only two low registers (except of pointer of course ) and can pass the argument in any of SREG flags additionally, which is very useful in many cases.

If called, it can pass the argument implicitly, calling Bitshift_clear_f to clear current bit, Bitshift_set_f to set current bit, or as usual, calling Bitshift_f to pass arg explicitly.

Bitshift_f:
	sbic	PINA,6					;check argument
Bitshift_clear_f:
		eor	bitfield,exp2_mask		;(****)
Bitshift_set_f:
	lsr		exp2_mask
	brbs	SREG_C,reload_data		;7/8
continue_with_new_byte:
		mov	bitfield_cpy,bitfield
		or	bitfield	,exp2_mask
ret
reload_data:						;1/8
		ror	exp2_mask
		st	X+,bitfield
		andi	byte_pointer_H,BITSHIFT_MODULO_MASK
		ld	bitfield,X
		rjmp	continue_with_new_byte
;Faster, but uninlineable
;		mov	bitfield	,bitfield_cpy	
;		or	bitfield	,exp2_mask
;ret
;(7*6 +1*(5+10))/8=(42+15)/8=57/8=7.125 clks

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

if it's speed you need just do it the normal (old) way

ADD   bitbyte,bitbyte ;shift
SBIC  PINA,6
INC   bitbyte         ;set LSB
BRCS  do_store        ;carry set if byte done
RET
do_store:
...
do store and leave bitbyte=1
...
RET
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Where is the returned value in your code? The shift buffer should return bit state saved exp2(n) calls ago. It should have the same effect as ror or rol, shifting in argument (SREG_C in ror) and shifting out returned value (also SREG_C in ror).

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ok added

ADD   bitbyte,bitbyte ;shift 
SBIC  PINA,6 
INC   bitbyte         ;set LSB 
BRCS  do_store        ;carry set if byte done 
ADD bitbyte_old,bitbyte_old
RET                   ;carry holds the return value
do_store: 
... 
do store 
do load
and update pointers
leave bitbyte=1 
... 
RET 

Pages