In a constant effort to speed up and save memory, I confronted a simple but uncomfortable situation.
Needing to find the ABS value between two 16 bits, already set in four low AVR Registers, the easier and automatically way was like this:
R6:R5 = first 16 bits number
R8:R7 = second 16 bits number
R0:R1 = resulting ABSolute Value
Just a reminder for those a little bit lost about ABS value, it means the "positive result" between the two numbers. For example, the ABS between 8 and 5 is 3, but also is 3 the ABS between 5 and 8. So, subtracting 8-5 result in 3, but 5-8 results in -3, removing the sign, it ends in pure 3. The problem is when the AVR subtracts (8 bits operation) 5-8 it results in 0xFD (-3) with cy bit on, and ABS needs to be 0x03, so a simple NEG operation does the trick. But for 16 bits it is a little bit tricky.
Okay, at first I wrote:
Mov R0,R6 Mov R1,R7 Sub R0,R8 Sbc R1,R9 Brcc PC+5 Mov R0,R8 Mov R1,R9 Sub R0,R6 Sbc R1,R7
and at this point I'd have the ABS value in R0:R1.
Oh, yeah! you never saw more than one assembly instruction in the same line? hmmm. Let me tell you, it works, and in most cases it is easier to understand the flow.
See, the first line (Mov R0,R6 Mov R1,R7) makes two instructions that represent moving 16 bits, they work together, they are a pack, so why not group them together in the same line?
The second line, makes the subtraction, and the Brcc at the end decides if we already have the ABS value or not, based on the carry bit. So, if it will jump the next 4 instructions or not, it depends on the registers on the second line. It is another pack tied together. So, why not make it another group of operation and decision, just in one line? I don't know how you see it, but it is easier for me.
Okay, the 4 lines of code above just tried to subtract two registers from other two, if it results with cy bit on, means overflow, negative number, so not ABS value yet. Then the subtraction is done with the value reversed, and now ABSolute value as result.
Well, that pestered me for days, and finally I was able to dedicate few minutes trying to make that decision smarter.
Okay, if the result is negative (cy=1), it would be just a matter of subtract that negative number from zero, right? Yes. Right. Riiiiight. It would end up in a worse and ugly code.
Mov R0,R6 Mov R1,R7 Sub R0,R8 Sbc R1,R9 Brcc PC+11 Push R8 Push R9 Clr R8 Clr R9 Sub R9,R1 Sbc R8,R0 Mov R1,R9 Mov R0,R8 Pop R9 Pop R8
But wait, there are the NEG and COM instructions, right? Riiiiight, again!
I made few attempts and to make sure, I run some simulations with R5,R6,R7,R8 increasing from 0x00000000 to 0xFFFFFFFF.
The AVR Studio is a very fat old lady desperately in need to cut donuts and carbohydrates, when dealing with simulation speed. It would take few days to complete the almost 4.3 billion loops in the test.
Solution? Dump it into an AVR chip and test, the AtMega128 running 16MHz takes only 3 hours!!!!
And No!, the devil lives in the little and inocent details, mostly around 0x01 and 0xFF, so I need to turn every single rock in the way.
For my desperation, I couldn't find an easier solution, but I found the name of what I was looking for: 16 bits Negate. Easier. Googhelp!!
Found Atmel suggestion for "16 bits Negate" at application Doc0937 http://www.atmel.com/Images/doc0937.pdf as
Com xl Com xh Subi xl,$ff sbci xh,$ff
So, my little piece of code will end up:
Mov R0,R6 Mov R1,R7 Sub R0,R8 Sbc R1,R9 Brcc PC+5 Com R0 Com R1 Subi R1,$FF Sbci R0,$FF
Whaaaat? It seems life is miserable. The problem is that Subi and Sbci doesn't work with lower registers.
Then, after few more minutes (I wish) of thinking I came up with:
Neg R0 Neg R1 Brcc PC+2 Dec R0
So, my final code would be:
Mov R0,R6 Mov R1,R7 Sub R0,R8 Sbc R1,R9 Brcc PC+5 Neg R1 Neg R0 Brcc PC+2 Dec R0
Wow. Same instruction count as before, same cpu cycles, no gain at all. Well, but it's different :)
The advantage is that it gains from Atmel suggestion, since it runs on the all 32 registers.
But no, it is not the same cpu cycles, since the Brcc PC+2 may take or give one extra cycle. Humpf!
See, the subtract instruction in AVR for lower registers are only the SUB and SBC, but it needs an extra register. The "Brcc PC+ 2 Dec R0", could be done in a single instruction (that doesn't exist) SBCI R0,0x00.
The solution comes from a trick. Subtract any register other than itself, and add it back again, the result is zero, right? RIGHT! Subtract with Carry and you are a winner.
Sbc R20,R31 Add R20,R31 ; is the same as Sbci R31,0x00 ; since you subtracted and added R31
This takes always the same CPU cycles, and will work for lower and higher registers.
So, my code ended up as:
Mov R0,R6 Mov R1,R7 Sub R0,R8 Sbc R1,R9 Brcc PC+5 Neg R1 Neg R0 Sbc R0,R31 Add R0,R31
But wait, still four instructions after the Brcc PC+5, I started like that. But I found the tricky thing.
SBC Ra, Rb
Add Ra, Rb
while Rb can be any register, except Ra, and praying it will not be modified by an interrupt between the two instructions, is the same as
for all 32 registers.
We don't always gain, but we learn.