Help with assembly

Go To Last Post
13 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm working with some inline assembly taken from an AVR library created by cpldcpu to control WS2912 LEDs. It's small and works great. However I'm trying to use a new type of LED that requires the same bit timing, but uses 12-bit packets instead of 8-bit. 

The uint8_t *data parameter will need to be uint16_t but the code should only send out the least significant 12 bits of the 16 bit value.

I've never used Assembly before. I'm hoping to find someone here both good at asm and charitable.

Thanks in advance.

#define ws2812_port B     // Data port 
#define ws2812_pin  1     // Data out pin

#define CONCAT(a, b)            a ## b
#define CONCAT_EXP(a, b)   CONCAT(a, b)

#define ws2812_PORTREG  CONCAT_EXP(PORT,ws2812_port)
#define ws2812_DDRREG   CONCAT_EXP(DDR,ws2812_port)

void inline ws2812_sendarray_mask(uint8_t *data,uint16_t datlen,uint8_t maskhi)
{
  uint8_t curbyte,ctr,masklo;
  uint8_t sreg_prev;
  
  masklo	=~maskhi&ws2812_PORTREG;
  maskhi |=        ws2812_PORTREG;
  sreg_prev=SREG;
  cli();  

  while (datlen--) {
    curbyte=*data++;
    
    asm volatile(
    "       ldi   %0,8  \n\t"
    "loop%=:            \n\t"
    "       out   %2,%3 \n\t"    //  '1' [01] '0' [01] - re
    
    "nop      \n\t" // This will be a number of nops to set timing and is built by compile time variables
    
    "       sbrs  %1,7  \n\t"    //  '1' [03] '0' [02]
    "       out   %2,%4 \n\t"    //  '1' [--] '0' [03] - fe-low
    "       lsl   %1    \n\t"    //  '1' [04] '0' [04]
    
    "nop      \n\t" // This will be a number of nops to set timing and is built by compile time variables
    
    "       out   %2,%4 \n\t"    //  '1' [+1] '0' [+1] - fe-high
    
    "nop      \n\t" // This will be a number of nops to set timing and is built by compile time variables
    
    "       dec   %0    \n\t"    //  '1' [+2] '0' [+2]
    "       brne  loop%=\n\t"    //  '1' [+3] '0' [+4]
    :	"=&d" (ctr)
    :	"r" (curbyte), "I" (_SFR_IO_ADDR(ws2812_PORTREG)), "r" (maskhi), "r" (masklo)
    );
  }
  
  SREG=sreg_prev;
}

 

Jim M., Rank amateur AVR guy.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Probably easiest to just do two of those. At present it gets curbyte as *data++ and then the

    "       ldi   %0,8  \n\t"
    "loop%=:            \n\t"

loops over all 8 bits of that byte.

 

I'd just add an almost identical second set of all this but fir that one just "ldi %0,4" to do the additional 4 bits.

 

If you try and hack this existing loop about to do all 12 you are always going to hit the boundary where you have done 8 from one and then need to do 4 from the next

 

Also as it stands it looks like the sbrs %1,7 and then the lsl means that it is testing the top bit of the input so it's moving from left to right across the bits. So if you have another 4 bits in a second loop I guess you want them in the top 4 of it. So you effecively have:

 

01234567 89ABxxxx

 

as the output order. I suppose it might be more natural to use:

 

xxxx0123 456789AB

 

in which case you would want to start by picking up the first byte, do a <<4 on it to move 0123 into the top 4 bits then just do the 4 count on that first byte and then the 8 count on the second byte.

 

(hope all that makes sense!!)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

shift

rot shift with carry

branch

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods

Last Edited: Fri. Apr 13, 2018 - 09:23 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'll have to try it out and do some study of your reply before it makes sense, syntax-wise. But I get what you're saying. And it makes perfect sense. The extra 4 bits, since it's a drive current setting, can be sent in as a separate parameter and have it handled by the 4bit loop.

The 12-bit packet is 8bits grayscale + 4bits Constant current drive setting

 

Pseudo code

void inline ws2812_sendarray_mask(uint8_t *data,uint16_t datlen,uint8_t maskhi,uint8_t ccdrive)
{ 
  Use existing code to loop through 8 bits of curbyte=*data
  
  New code to loop through 4 bits of ccdrive
  
  }

Am I making sense?

Jim M., Rank amateur AVR guy.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson,

I've tried to cobble this together. Not sure if I have to declare the asm volatile twice (1 for the 4 bit and 1 for the 8 bit)

Is this even remotely correct?

 

void inline sj1221_sendarray_mask(uint8_t *data,uint16_t datlen,uint8_t maskhi, uint8_t ccval)
{
  uint8_t curbyte,ctr,masklo;
  uint8_t sreg_prev;
  
  
  sj1221_DDRREG |= maskhi; // Enable output
  
  masklo	=~maskhi&sj1221_PORTREG;
  maskhi |=        sj1221_PORTREG;
  
  sreg_prev=SREG;
  cli();  
  
 
  
  while (datlen--) {
    curbyte=*data++;
    /*
	
	*/
	
	asm volatile(//new 4-bit begin
	
    "       ldi   %0,4  \n\t" // changed 8 to 4
    "loop%=:            \n\t"
    "       out   %2,%3 \n\t"    //  '1' [01] '0' [01] - re

	"nop      \n\t" // This will be a number of nops to set timing and is built by compile time variables
	
    "       sbrs  %1,7  \n\t"    //  '1' [03] '0' [02]
    "       out   %2,%4 \n\t"    //  '1' [--] '0' [03] - fe-low
    "       lsl   %1    \n\t"    //  '1' [04] '0' [04]

	"nop      \n\t" // This will be a number of nops to set timing and is built by compile time variables
	
    "       out   %2,%4 \n\t"    //  '1' [+1] '0' [+1] - fe-high

	"nop      \n\t" // This will be a number of nops to set timing and is built by compile time variables
	
    "       dec   %0    \n\t"    //  '1' [+2] '0' [+2]
    "       brne  loop%=\n\t"    //  '1' [+3] '0' [+4]
    :	"=&d" (ctr)
    :	"r" (ccval), "I" (_SFR_IO_ADDR(sj1221_PORTREG)), "r" (maskhi), "r" (masklo) // ccval substituted for curbyte
    );//new 4-bit end
	
    asm volatile(//existing 8-bit begin
	
    "       ldi   %0,8  \n\t"
    "loop%=:            \n\t"
    "       out   %2,%3 \n\t"    //  '1' [01] '0' [01] - re

	"nop      \n\t" // This will be a number of nops to set timing and is built by compile time variables
	
    "       sbrs  %1,7  \n\t"    //  '1' [03] '0' [02]
    "       out   %2,%4 \n\t"    //  '1' [--] '0' [03] - fe-low
    "       lsl   %1    \n\t"    //  '1' [04] '0' [04]

	"nop      \n\t" // This will be a number of nops to set timing and is built by compile time variables
	
    "       out   %2,%4 \n\t"    //  '1' [+1] '0' [+1] - fe-high

	"nop      \n\t" // This will be a number of nops to set timing and is built by compile time variables
	
    "       dec   %0    \n\t"    //  '1' [+2] '0' [+2]
    "       brne  loop%=\n\t"    //  '1' [+3] '0' [+4]
    :	"=&d" (ctr)
    :	"r" (curbyte), "I" (_SFR_IO_ADDR(sj1221_PORTREG)), "r" (maskhi), "r" (masklo)
    );//existing 8-bit end
	

  }
  
  SREG=sreg_prev;
}

 

Jim M., Rank amateur AVR guy.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You have to come out of the asm() after the first collection of bits to do another curbyte=*data++ anyway (which is currently missing by the way) because you can only do up to 8 bits in each loop.

 

Like I say I guess you could put it all in one ask but as there's going to be some kind of 4/8 split at some point you'd end up passing in "data" then dereferencing (LD X+,etc or similar)anyway. So making a neat division between the 4 and the 8 just seems "cleaner" to me anyway even if it does require a handful more opcodes (effectively "unrolling")

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is a 2-byte shift and a branch too slow?

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve, I'm familiar with shifting bits and bytes, but branching (the term) is unfamiliar to me. I'd need to research that. Total asm boob.

 

clawson, I'm going to do some more work with the code I sent based on your feedback before I come back with more questions. I have to understand this better.

 

Jim M., Rank amateur AVR guy.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What Michael was suggesting is that instead of doing the 4 and 8 separately you DO hold a 16 bit variable (but just 12 bits used) and then instead of a single shift such as:

 "       lsl   %1    \n\t"    //  '1' [04] '0' [04]

you actually make this a double operation where you use one shift on the lower byte that moves its top bit into the carry then another shift on the other half that brings that carry bit into bit 0.

 

LSL already moves the top bit into the carry but the second shift can't be just another LSL as that pulls 0 in to the lowest bit. So the second opcode would have to be ROL which pulls the C bit into bit 0.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks. That explanation makes sense. I've still got actual code testing to do and will comeback after I've had a chance to test. I don't want to keep making blind stabs at what I think might work without actual testing.

For economy of data, I think passing the extra 4 bits as a parameter instead of doubling the size of my *data array by going from 8 to 16 bits would be more "efficient". That extra 4 bits isn't going to change on a 12-bit packet basis. Maybe in the future, There might be a desire for per LED current drive adjustment. But I'm just trying to get the darn things working.

 

I really appreciate your help. I'm hesitant to waste your time.

Jim M., Rank amateur AVR guy.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

JimmyM wrote:
I'm hesitant to waste your time.
Don't be - wasting time is what Freaks is all about! cheeky

 

(don't know about anyone else but I tend to read/reply when I'm waiting for compilations to finish - as I work on large (not AVR!) projects it's not unusual for an initial compilation of a code tree to take 30 minutes or more).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

One of the advantages of the 2-byte shift is that there is less to debug.

Only one loop.

No inter-loop timing issues.

 

Code for correctness and maintenance.

When on a bug-hunt, small simple code takes less time

to examine and is more likely to be right in the first place.

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In these situations I try to avoid the counter variable.

So if the data is xxxx dddd:dddd dddd (x = don't care, d = valid data) I would do:

 

1) 16 bit shift 4 and set a flag marking the end of data: dddd dddd:dddd 1000

2) 16 bit shift and send data until low bit is zero dddd ddd1:0000 0000

3) 8 bit shift until high bit is zero, terminate before sending the flag

 

 

edit: Another option, if you are not critical on flash space, is to completely unroll the loop (actually, better do a macro to make it easier) and use the T flag:

 

1) put data bit in T flag with bst instruction

2) put T flag in output register with bld instruction

3) output data

... repeat instruction sequence 12 times for each data bit ...

Last Edited: Tue. Apr 17, 2018 - 11:50 PM