Summer burning here in Florida. Lots of rain - and electrical storm.
Some application requires to rotate a block of data bits (mostly a 8 bytes, 64 bits), in 90 degrees.
One of them is, for example, using the Max7219 LED driver chip with 8 x 7segments LED display, with Common ANODE, instead of Common Catode, as it was designed for.
With common Catode, the Max7218 scans catode by catode (to zeroV), changing the row pins level, that drive the segments, easy.
With common Anode, the story changes completely, as the Max7219 always will scan the "Catode" pins to Ground, then you need to connect the Common Anode displays Segments to the Max7219 Catode pins.
This way, the Max will scan segments instead of catodes. All the same segments of all displays will be selected, and then the Max need to send its "segment" rows that will be connected to the Displays Anode pins.
It works upside down. The problem is, the binary combination to lit segments is completely transversed.
Max doesn't know that, but it thinks it selected display #1 common catode, in real it selected segment "A" for all displays. Now it is your turn to select which displays shall lit that segment, and it goes to the Max as its "segments", in real it is driving Anodes of the Displays. Instead of scanning display by display, and each display will lit exactly its ON segments, with this transverse all displays that need to lit "A" segment, will lit at the same time.
In the past I did it, at extra cost of clock cycles, the AVR becomes almost hot red... ;)
Here is the trick:
Suppose you have this sequence of bytes: 0xF1, 0x39, 0x27, 0x0E, 0x7B, 0x16, 0x4B, 0x0F.
Let me print this side by side vertically, MSB on top, the first one at the left, column, is 0xF1 vertical.
1 0 0 0 0 0 0 0
1 0 0 0 1 0 1 0
1 1 1 0 1 0 0 0
1 1 0 0 1 1 0 0
0 1 0 1 1 0 1 1
0 0 1 1 0 1 0 1
0 0 1 1 1 1 1 1
1 1 1 0 1 0 1 1
How nice would be a uC with a special register that you pump 8 bytes, then read 8 bytes rotated in 90 degrees...
The trick here is to read the 8 bytes at 90 degrees, horizontally, resulting in:
The fastest way to do that (according to what I did), is:
Suppose the input 8 bytes, 0xF1, 0x39, 0x27, 0x0E, 0x7B, 0x16, 0x4B, 0x0F, are in registers R1-R8, and the resulting transversed bytes will be at R11-R18.
LSL R1 ROL R11 LSL R1 ROL R12 LSL R1 ROL R13 LSL R1 ROL R14 LSL R1 ROL R15 LSL R1 ROL R16 LSL R1 ROL R17 LSL R1 ROL R18
The above sequence works on R1, it must be repeated more 7 times, for R2-R8.
It will be a total of 128 lines of code (256 bytes of Flash), 128 clock cycles, an AVR running at 8MHz, will take 16 microseconds to do it.
In some cases, it is to much.
The indirect addressing using X, Y and Z registers can access regular R1-R31 registers, but it takes more cycles, and unfortunately there is no ASL or ROL indirect addressing.
So, to make things shorter, one could do things like this:
CLR ZH LDI ZL,1 LDI R18,1 LP1: LD R0,Z INC ZL LSL R0 ROL R11 LSL R0 ROL R12 LSL R0 ROL R13 LSL R0 ROL R14 LSL R0 ROL R15 LSL R0 ROL R16 LSL R0 ROL R17 LSL R0 ROL R18 BRCC LP1 RET
As trick above, R18 starts with 0x01 and after 8 loops of "ROL" and at the bottom of LP1, this "on" bit will fall into "Carry" bit, exiting the procedure by the BRCC before RET.
The above uses less flash bytes, at cost of extra clock cycles for the Rcall and Ret, as well the LD R0,Z that consumes 2 cycles, as well the control loop instructions.
Below, a very shorter (flash bytes) version would use two indirect addressing, XH:XL and ZH:ZL to address the registers, will be small, but the 64 loops on LP2 are clock cycles hungry.
LP0: CLR ZH LDI ZL,1 CLR XH LDI XL,11 LP1: LD R0,Z LP2: LD R10,X LSL R0 ROL R10 ST X,R10 INC XL CPI XL,19 BRCS LP2 INC ZL CPI ZL,9 BRCS LP1 RET
Somebody has some other ideas? other than using external parallel/serial logic registers... :)
Another nasty procedure is to transpose bits in a byte... ABCD EFGH to become HGFE DCBA... uhggg!! I hate bitbanging...
Orlando Florida USA