Most efficient way to set a particular bit?

Go To Last Post
22 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Say var is either 1 or 0 and I want to copy its value to PB3 in DDRB. Also, I don't know or want to change any of the other bits in DDRB.

Whats the best way of doing this?

Some of the ways I've come up with are:

1)

if(!(DDRB & (var << PB3)))
    DDRB |= (1 << PB3);

2)

// Clear then set to var
DDRB &= ~(1 << PB3);
DDRB |= (var << PB3);

3)

DDRB = (DDRB&~(1 << PB3))|(var << PB3);

Method 1) uses a conditional and requires reading the current value of DDRB.
Method 2) doesn't need to read DDRB, but it does a redundant masking.
Method 3) reads DDRB and is a bit tricky to understand.

Any ideas?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Assuming that DDRB is in the first 0x1F register locations, your compiler can efficiently clear and set a bit in the register without reading the entire byte. I wouldn't use var in the bit shift operation, because the compiler can optimize the bit operation if it knows the value at compile time. I'd use something like:

if (var)
  DDRB |= (1<<PB3);
else
  DDRB &= ~(1<<PB3);

Edit: You may find it instructive to look at the assembly language output of all of these possibilities to give you a better understanding of how your compiler handles this possibilities.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As an example of the assembly language output of the code I supplied, IAR generates the following code (with comments added by me):

TST     R16  ;; tests the value of var against 0
BREQ    ??L1
SBI     0x17, 0x03 ;; sets bit 3
RJMP    ??L2
??L1:
CBI     0x17, 0x03  ;; clears bit 3
??L2:
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

TO examine what is most effective, just write them all down, one by one, and look at the dissembler. less commands == most effective

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kmr wrote:
As an example of the assembly language output of the code I supplied, IAR generates the following code (with comments added by me):
TST     R16  ;; tests the value of var against 0
BREQ    ??L1
SBI     0x17, 0x03 ;; sets bit 3
RJMP    ??L2
??L1:
CBI     0x17, 0x03  ;; clears bit 3
??L2:

5 instructions seems a bit excessive for a rather simple operation. I'll have to check the assembly code for the other examples, but I thought there'd be an accepted optimal way for doing this.

(I'm using WinAVR BTW)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Remember,
What is written in 'c' eventually is compiled into assembler. KMR posted exactly what I would have done. BTW is 5 instructions that big a deal?

Jim

If you want a career with a known path - become an undertaker. Dead people don't sue! - Kartman

Why is there a "Highway to Hell" and only a "Stairway to Heaven"? A prediction of the expected traffic load?  - Lee "theusch"

Please Read: Code-of-Conduct

Atmel Studio6.2/AS7, DipTrace, Quartus, MPLAB user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It isn't a big deal for me right now, but I can imagine cases where it might be.

What if you had data in var(8 bits say) that you needed to send serially over PB3 as fast as possible?

If you didn't wanted to preserve the other port values then you could probably do it a lot faster than if you did.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theduder wrote:
5 instructions seems a bit excessive for a rather simple operation. I'll have to check the assembly code for the other examples
Rather than checking the assembly output of other C constructs, if you are trying to micro-optimize the results, you should first code what would be optimal assembly, then develop the C code to emit that. As for the 5 assembly instructions, if you check the assembly output of your 3 proposed methods, I believe you'll find that all of those methods will generate more assembly code that my initial suggestion.

But, for true micro-optimization the issues are: 1) are you trying to optimize for minimal cpu cycles or minimal size; 2) if you are trying for minimal cpu cycles, do you know if val will more likely be 0 or 1? If you know what, you can micro-optimize the code to have the fewest cpu cycles for the most common value of val.

As a quick improvement over my initial code:

CBI DDRB,3 ;; clear the bit
TST R16    ;; test val that is stored in R16
BREQ ??L1  ;; jmp to L1 if val is 0
SBI DDRB,3 ;; set bit 3 for non-zero val
??L1:

This is one instruction shorter. In general, if can be faster to initialize a simple variable and then change it if it is the wrong value, than to have to have two branches to set the value to different values. However, programmers often choose the latter to make it more clear how the variable (or in this case a bit in a IO register) will be set.

You should find it simple to convert the above assembly language into C. If you have trouble, though, we can help.

Edit: Also, keep in mind the more optimized code in the this example will flip the DDRB to clear then set if val is not zero. This may cause trouble for your application. Especially if the IO register wasn't a direction register but an output port register. You'd be changing a bit twice which may cause trouble to whatever that output line is connected.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
5 instructions seems a bit excessive for a rather simple operation.

How could you do it in much less? Even in assembly I can't think of a way to do it in less than 4 intructions:

CBI DDRB, 3
TST R16
BREQ L1
SBI DDRB, 3
L1:

or

IN R17, DDRB
BST R16, 0
BLD R17, 3
OUT DDRB, R17

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

What if you had data in var(8 bits say) that you needed to send serially over PB3 as fast as possible?

You will spend more bytes trying to encode one bit rather than just sending the whole 8.

Jim

If you want a career with a known path - become an undertaker. Dead people don't sue! - Kartman

Why is there a "Highway to Hell" and only a "Stairway to Heaven"? A prediction of the expected traffic load?  - Lee "theusch"

Please Read: Code-of-Conduct

Atmel Studio6.2/AS7, DipTrace, Quartus, MPLAB user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I did a couple tests to see which code produces the smallest program size (with -Os):

if(!(PORTB & (x << 3)))
    PORTB |= (1 << 3); 

=> 170 bytes

BF(PORTB).b3 = x; // using a bitfield macro

=> 160 bytes

PORTB &= ~(1 << 3);
PORTB |= (x << 3);

=> 158 bytes

PORTB = (PORTB & ~(1 << 3)) | (x << 3);

=> 158 bytes

if(x)
    PORTB |= (1 << 3);
else
    PORTB &= ~(1 << 3);	

=>154 bytes (6 instructions)

Looks like the LSL instruction spoils the fun since it can only shift one position at a time (and you need 3 for some methods).

Last Edited: Fri. Dec 28, 2007 - 09:56 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jgmdesign wrote:
Quote:

What if you had data in var(8 bits say) that you needed to send serially over PB3 as fast as possible?
You will spend more bytes trying to encode one bit rather than just sending the whole 8.
I wasn't sure about the point of this question, either, Jim. Was he asking for an efficient serial algorithm, would he be happy with your parallel approach, or was it in reference to understanding how to best solve his original question.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kmr wrote:
jgmdesign wrote:
Quote:

What if you had data in var(8 bits say) that you needed to send serially over PB3 as fast as possible?
You will spend more bytes trying to encode one bit rather than just sending the whole 8.
I wasn't sure about the point of this question, either, Jim. Was he asking for an efficient serial algorithm, would he be happy with your parallel approach, or was it in reference to understanding how to best solve his original question.

It was just an example, so yes, it excludes sending 8 bits in parallel.

Incidently, if I was asking for an efficient serial algorithm, could one send 8 bits in less than ~8*4 cycles?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Assuming you don't mind perhappings flipping the pin excessively, you can try to C code of the further optimization I have you in assembly.

  PORTB |= (1<<PB3);
  if (x) PORTB &~ ~(1<<PB3);

Steve's second code example was smart (using BST and BLD), but I doubt the C compiler would be smart enough to generate that optimized code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kmr wrote:
Assuming you don't mind perhappings flipping the pin excessively, you can try to C code of the further optimization I have you in assembly.
  PORTB |= (1<<PB3);
  if (x) PORTB &~ ~(1<<PB3);

Steve's second code example was smart (using BST and BLD), but I doubt the C compiler would be smart enough to generate that optimized code.

Nice job, that gives me a code size of
=> 150 bytes (4 instructions)

Yeah, Steve's code seems like the optimal and its pretty straight forward. Is there anything that can be done to make GCC recognize that optimization (perhaps a patch or something)?

EDIT:
I think you meant

PORTB &= ~(1<<PB3)
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theduder wrote:
Yeah, Steve's code seems like the optimal and its pretty straight forward. Is there anything that can be done to make GCC recognize that optimization (perhaps a patch or something)?
Maybe, but nothing that I'm currently aware of. Some of the AVR-GCC hackers maybe able to answer that question.
Quote:

I think you meant
PORTB &= ~(1<<PB3)

. The code had an additional error. What I meant was
  PORTB &= ~(1<<PB3);
  if (x) PORTB |= (1<<PB3);
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Assuming that "var" changes between 1 and 0 or if you use another bit in "var" to detect how to deal with DDRB you can also do like this

sbrs   R16,0   ;skip next if bit 0 is set
cbi   DDRB,3   ;else clear bit 3
sbrc   R16,0   ;skip next if bit 0 is clear
sbi   DDRB,3   ;else set bit 3

If you can accept the earlier suggested method to "preset" DDRB you can do it in 3 instructions

cbi   DDRB,3   ;clear bit 3
sbrc   R16,0   ;skip next if bit 0 is clear
sbi   DDRB,3   ;else set bit 3
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nice assembly work, Lennart! It shows my limitation of learning AVR assembly from reading compiler output. Compilers don't often choose such optimized output. Mostly I see sbrc/sbrs as being followed by an rjmp.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Mostly I see sbrc/sbrs as being followed by an rjmp.

Yes, this can provide a way to reach a destination that is to distant for BREQ and similar instructions.
I like SBRS and SBRC because they provide a good flow to the code, you don't need to clutter it with a lot of labels like "L1" and "L2".
The fact that any instruction can be skipped make interesting scenarios possible. Also, in the example I gave number of cycles will be equal in both cases as opposed to conditional branching (as in the code generated by your compiler). Sometimes this is very important if you have some sort of time-critical code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Good point about the time constancy.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

What if you had data in var(8 bits say) that you needed to send serially over PB3 as fast as possible?

Interesting discussion, all-in-all. The AVR instruction set just doesn't lend itself nicely to certain things. If you think this is ugly then try toggling a bit sometime. It is so ugly that Atmel gave us a new construct in new AVRs.

As with the toggle, I believe that you will find the if-else to end up the best. A non-branch solution (using T maybe?) may end up to be longer 'cause of the I/O register destination.

Judging from the question being posed using DDRB, is this an externally-pulled-up link like I2C? If so, then 1) Use I2C lol; 2) Look at bit-banged I2C implementations. usually these are low-speed anyway, so a us for a bit ain't a big deal.

Now, let's say you REALLY want to do this in anger. Your restriction on the rest of DDRB is too tight--if it is your app, of COURSE you will know what is going on there, and most apps never change DDRx after initial setup. So you want fast? YOU CAN'T HANDLE FAST! Oops, sorry, wrong movie. Keep DDRB shadow in a register. Pick your bit from the source register with BST. Place into the shadow with BLD. OUT the register.

Now, this is still 3 cycles. And you still have to shift the source register for the next, although you could unroll the loop for a byte (gasp! 24 or a few more instuctions! The horror!).

Anyway, dude, if you can come up with a faster machine code sequence we'd like to hear about it, and see if we can get our compilers to generate it.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
...Anyway, dude...
Good points, Lee, especially the shadowing and loop unrolling. I was guessing the OP's name to be "Mr. Duder". But, perhaps he goes by "dude" informally.