Now - before anyone claims this is a set up - you can look on the uzebox forums. This code was posted on the 27th Jan which was 2 days before I first spoke on the lazarus troll thread.
Also I HAVE asked here several times before for help in writing better C so the compiler knows how to optimize what I am doing. So this is in form with what I have asked in the past.
The troll thread DID prompt me to ask this question though. I am curious if GCC can get close and if the way I am writing the argument is sub optimal.
Again - I do write C code. I like to write C code. I was formally taught to write C code in a university environment. All that said I know I am a pretty average C coder.
void SetRamTile(uint8_t x, uint8_t y, uint8_t tile) { x = x & 0x1f; y = y & 0x1f; vram[((y>>3)*256)+x<<3+(y&7)] = tile; }
becomes
226: { +000060BB: 716F ANDI R22,0x1F Logical AND with immediate 229: vram[((y>>3)*256)+x<<3+(y&7)] = tile; +000060BC: 2FF6 MOV R31,R22 Copy register +000060BD: 95F6 LSR R31 Logical shift right +000060BE: 95F6 LSR R31 Logical shift right +000060BF: 95F6 LSR R31 Logical shift right +000060C0: E0E0 LDI R30,0x00 Load immediate +000060C1: 718F ANDI R24,0x1F Logical AND with immediate +000060C2: 0FE8 ADD R30,R24 Add without carry +000060C3: 1DF1 ADC R31,R1 Add with carry +000060C4: E070 LDI R23,0x00 Load immediate +000060C5: 7067 ANDI R22,0x07 Logical AND with immediate +000060C6: 7070 ANDI R23,0x00 Logical AND with immediate +000060C7: 5F6D SUBI R22,0xFD Subtract immediate +000060C8: 4F7F SBCI R23,0xFF Subtract immediate with carry +000060C9: C002 RJMP PC+0x0003 Relative jump +000060CA: 0FEE LSL R30 Logical Shift Left +000060CB: 1FFF ROL R31 Rotate Left Through Carry +000060CC: 956A DEC R22 Decrement +000060CD: F7E2 BRPL PC-0x03 Branch if plus +000060CE: 5EE0 SUBI R30,0xE0 Subtract immediate +000060CF: 4FFE SBCI R31,0xFE Subtract immediate with carry +000060D0: 8340 STD Z+0,R20 Store indirect with displacement 230: } +000060D1: 9508 RET Subroutine return
Now I have tried replacing the
(y>>3)*256
part with several other variations including a struct/union that has 2x uint8 sharing with a uint16. None of them ended up better, only different.
In fact the original version was
vram[((y>>3)*256)+8*x+(y&7)]
which I changed the (8*x) to (X<<3) and that did yield an improvement.
Anyways, the ASM code that gets the same result is
;*********************************** ; SET TILE 8bit mode ; C-callable ; r24=X pos (8 bit) ; r22=Y pos (8 bit) ; r20=Tile No (8 bit) ;************************************ .section .text.SetTile SetTile: #if SCROLLING == 1 ;index formula is vram[((y>>3)*256)+8x+(y&7)] ; r22 r24 XH XL T ; y7y6y5y4y3y2y1y0 x7x6x5x4x3x2x1x0 - - - - - - - - - - - - - - - - lsl r24 ; y7y6y5y4y3y2y1y0 x6x5x4x3x2x1x0 0 lsl r24 ; y7y6y5y4y3y2y1y0 x5x4x3x2x1x0 0 0 lsl r24 ; y7y6y5y4y3y2y1y0 x4x3x2x1x0 0 0 0 mov XH,r22 ; y7y6y5y4y3y2y1y0 lsl XH ; y6y5y4y3y2y1y0 0 swap XH ; y2y1y0 0y6y5y4y3 andi XH,3 ; 0 0 0 0 0 0 y4y3 andi r22, 7 ; 0 0 0 0 0 y2y1y0 or r24, r22 ; x4x3x2x1x0y2y1y0 mov XL, r24 ; 0 0 0 0 0 0 y4y3 x4x3x2x1x0y2y1y0 subi XL,lo8(-(vram)) sbci XH,hi8(-(vram)) st X,r20 ret #else
Now I would be surprised if any compiler did the LSL/SWAP bit as that normally would not save you any time if not for the fact that bits5..7 are don't cares.
But is there a way I can hint the compiler to get it better than it is?