I am going to try my luck at coding an 8080 emulator in assembly. I have one in C and we have tried dedicating some registers to improve performance, but this is the type of thing that I think C isn't as well suited for as assembly. One can take advantage of copying processor flags in assembly easily for example. My C code was running about 700 kHz emulation at 18.432 MHz clock speed, so about 26 times slower.
I've read AT1886 mixing ASM with C with AVRGCC and it talks about which registers are call-used vs which are call-saved and this makes me wonder what a good strategy is for register usage. My goal is to just code the part that is fast in assembly only, which is the bulk of it. For example, before calling the ASM function to execute, the C function will check to see if any breakpoints are enabled and if so, it will send a parameter to the assembly function that tells it to check the breakpoints or not. That is a single test then. I was hoping to make the ASM function not have to call back to C, but it will have to call in() and out() functions, so it will have to.
Given that it is both called from C and has to call a couple of C functions, what is the best strategy for which registers to use. I know that some of them might be best used because they are part of the X, Y, or Z., but if you have a choice of whether to use a call-used or call-saved register, do you start with one bank and then move to the other? Getting in and out of the ASM function isn't a huge deal because it is only the running speed I want to optimize, so a bunch of push's/pop's surrounding it don't affect performance because if it is entering/exiting, it is going back to the non running state. It is the inner loop where it is processing instructions that needs to be made the fastest it can be.
If I use call saved, then I need to push/pop them in the ASM function, but no worries when calling in/out.
If I use call used, then it is the inverse.