Compiler optimization between different CPU widths...

Go To Last Post
18 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am working on making my first emulator of something, the Intel 8080.  I am just in the beginning stages and am doing this just for entertainment and educational use right now.  I know there is a lot of source code that does this already, but would rather learn and work through my own to see what I can make it do.

 

Here is a question I've run into, and it is more of a C type question.  There are different ways I can write this, and the below is potentially two ways it could be done.  The source processor is an 8 bit one, but I could potentially compile this code for an AVR, for DOS 16-bit, or for WIN32 32-bit.  What I don't want to do is code it so that it is dealing with 8 bit values at a time (unless there is no choice) when if it is larger (16bit, 32bit) it could just deal with all 16 bits at a time.  I know if I code it to 16 bits at a time and it needs to break it down for the AVR in 8bits, it will do that, but it won't "combine it up" on the other side if it is coded as dealing with 8 bits at a time (or at least I wouldn't think it would).

 

#define PUSH_PC_ON_STACK()     { memory[--cpu.pairs.x.sp]=cpu.pairs.h.pch; memory[--cpu.pairs.x.sp]=cpu.pairs.h.pcl; }
#define JUMP(AAddr)            { cpu.pairs.x.pc=AAddr; }
#define RST(AVector)           { PUSH_PC_ON_STACK(); JUMP(AVector*8); }

#define CALL(AAddr)            { PUSH_PC_ON_STACK(); JUMP(AAddr); }

#define FETCH()                memory[cpu.pairs.x.pc++]
#define FETCH2()               (uint16_t)(memory[cpu.pairs.x.pc++] | (memory[cpu.pairs.x.pc++]<<8))


        case OPCODE_CALL_A16:   /*0xCD*/  l=FETCH(); h=FETCH(); CALL((h<<8) | l);
                                          //CALL(FETCH2());

 

The first call fetches each byte from the 2nd/3rd and then combines them for the call.  I am not thinking this is so ideal.  The CALL(FETCH2()) isn't much better the way it is now, but if I could fetch the two directly into a uint16_t by casting it as a uint16_t, but perhaps that brings endianness issues (not that they will affect me much)

 

I guess since there are some 16 bit operations in the 8080 that I would like to take advance of a 16 bit CPU that can handle them together.

Last Edited: Sun. Apr 26, 2020 - 05:42 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Don't know how you posted it but something has gone quite wrong in that code section with HTML tags. Did you use [ code ] ? That's no longer supported in the latest forum software.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Let me try to edit it and use the icon.

Last Edited: Sun. Apr 26, 2020 - 05:42 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are you saying you want to simulate an 8080 procesor, your 8/6 bit target, and run it on 8, 16, 32, or 64 bit host?

I usually use unions to deal with 16 bit broken down into hi and lo 8-bit quantities.   Here is a snippet from my AVR simulator:

typedef union {
  uint16_t w;                           /* wide (16 bit) value, unsigned */
  int16_t ws;                           /* wide (16 bit) value, signed */
  struct {
#if __BYTE_ORDER == __LITTLE_ENDIAN
    uint8_t lo;
    uint8_t hi;
#else
    uint8_t hi;
    uint8_t lo;
#endif
  };
} wreg_t;

Note that to make it host portable, you need to think about endianness.

I advise to not use struct bit fields.  You can get into trouble with.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes - what I'm looking for is to write code that will lend itself to good compiling whether it is targeted on an 8 bit, 16 bit, or 32 bit device.  I don't want to code it for an 8 bit processor in such a way that the 16 bit deals with two 16 bit variables one holding the high and one holding the low if that makes sense.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The 16 and 32 bit processors above 8bit usually have so much "grunt" I wouldn't have thought you'd need have too many concerns about optimization efficiency. I'd target your 8bit build(s).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You make a good point clawson, it isn't like I need to set any performance records, but I'd like to do it in a way that is optimal.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Marat Fayzullin in his Z80 emulator does this:

/** Structured Datatypes *************************************/
/** NOTICE: #define LSB_FIRST for machines where least      **/
/**         signifcant byte goes first.                     **/
/*************************************************************/
typedef union
{
#ifdef LSB_FIRST
  struct { byte l,h; } B;
#else
  struct { byte h,l; } B;
#endif
  uint16_t W;
} pair;

typedef struct
{
  pair AF,BC,DE,HL,IX,IY,PC,SP;       /* Main registers      */
  pair AF1,BC1,DE1,HL1;               /* Shadow registers    */
  byte IFF,I;                         /* Interrupt registers */
  byte R;                             /* Refresh register    */

  int IPeriod,ICount; /* Set IPeriod to number of CPU cycles */
                      /* between calls to LoopZ80()          */
  int IBackup;        /* Private, don't touch                */
  uint16_t IRequest;      /* Set to address of pending IRQ       */
  byte IAutoReset;    /* Set to 1 to autom. reset IRequest   */
  byte TrapBadOps;    /* Set to 1 to warn of illegal opcodes */
  uint16_t Trap;          /* Set Trap to address to trace from   */
  byte Trace;         /* Set Trace=1 to start tracing        */
  void *User;         /* Arbitrary user data (ID,RAM*,etc.)  */
} Z80;

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am using a union so I can reference it as a byte or word.  I guess maybe I just need to think about handling it as a word every chance I can.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
#define FETCH()                memory[CPU_PC++]
#define FETCH2()               *(uint16_t*)&memory[CPU_PC+=(uint16_t)2];

Here is a common issue I've run into and maybe I'm just not doing this right, but FETCH works as expected, it returns memory[CPU_PC] and then increments CPU_PC.

 

FETCH2 is an attempt at doing this operation as a uint16_t.  CPU_PC is incremented by two, but what is returned is not CPU_PC before the change.

 

I wondered if I could do something with a comma like first do this, then do that, but I had odd results.  i1=1,3 sets i1=1, but i1=(1,3) sets i1=3 for some reason.

 

More trying, this does work, but I don't love that I have to change CPU_PC first and then reference it later by having to subtract 2 from it:

#define FETCH2()               ( CPU_PC+=(uint16_t)2, *(uint16_t*)&memory[CPU_PC-2] )

 

Last Edited: Mon. Apr 27, 2020 - 02:16 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

FETCH2 is endian specific -the result will differ if you run on a PC vs an AVR

 

Don't fall into the trap of thinking less C code = less generated code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman - aren't the AVR and PC both little little endian?

 

More testing - DOS 16 bit compiler:

 

A byte access approach

#define FETCH2()               ( (uint16_t)(memory[CPU_PC++] | (memory[CPU_PC++]<<8)) )

Result:

	sub	word ptr DGROUP:_cpu+8,2
	mov	bx,word ptr DGROUP:_cpu+8
	xor	cx,cx
	mov	dx,seg _memory
	mov	ax,offset _memory
	call	near ptr N_PADD@
	mov	dx,word ptr DGROUP:_cpu+10
	mov	bx,ax
	mov	word ptr [bx],dx
	mov	bx,word ptr DGROUP:_cpu+10
	xor	cx,cx
	mov	dx,seg _memory
	mov	ax,offset _memory
	call	near ptr N_PADD@
	mov	bx,ax
	mov	es,dx
	mov	al,byte ptr es:[bx]
	mov	ah,0
	mov	bx,word ptr DGROUP:_cpu+10
	xor	cx,cx
	mov	dx,seg _memory
	push	ax
	mov	ax,offset _memory
	call	near ptr N_PADD@
	mov	bx,ax
	mov	es,dx
	mov	al,byte ptr es:[bx]
	mov	ah,0
	mov	cl,8
	shl	ax,cl
	pop	dx
	or	dx,ax
	mov	word ptr DGROUP:_cpu+10,dx
	inc	word ptr DGROUP:_cpu+10
	inc	word ptr DGROUP:_cpu+10
	jmp	@3@7450

A word access approach - it uses a temporary variable which I don't love:

uint16_t zzz;
#define FETCH2()               ( zzz=CPU_PC, CPU_PC+=2, *(uint16_t*)&memory[zzz] )

Shorter result:

	sub	word ptr DGROUP:_cpu+8,2
	mov	bx,word ptr DGROUP:_cpu+8
	xor	cx,cx
	mov	dx,seg _memory
	mov	ax,offset _memory
	call	near ptr N_PADD@
	mov	dx,word ptr DGROUP:_cpu+10
	mov	bx,ax
	mov	word ptr [bx],dx
	mov	ax,word ptr DGROUP:_cpu+10
	mov	word ptr DGROUP:_zzz,ax
	add	word ptr DGROUP:_cpu+10,2
	mov	bx,word ptr DGROUP:_zzz
	xor	cx,cx
	mov	dx,seg _memory
	mov	ax,offset _memory
	call	near ptr N_PADD@
	mov	bx,ax
	mov	ax,word ptr [bx]
	mov	word ptr DGROUP:_cpu+10,ax
	jmp	@3@7450

This sort of shows what I was thinking, if I can deal with something in terms of a word, it will be better handled by a compiler that has a width of word size or larger.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think the correct way to do this in C would be with union like in #4

 

If you have something like

uint16_t x = *(uint16_t *)&mem[n];

where mem is array of uint8_t

uint8_t mem[]

 

then, at least for C99 onwards, by my reading of the C standard this is undefined behaviour.

If compiled with gcc with optimisation, you get a warning like

main.c:368: warning: dereferencing type-punned pointer will break strict-aliasing rules

 

That doesn't mean it won't work, it might stilll work if

1) the address is correctly aligned for a 16 bit read

2) you might, for gcc as example, need compiler option -fno-strict-aliasing  (EDIted to try and spell it right)

 

but the union method is guaranteed to be OK.

If the endianes is correct for the cases you care about, maybe you don't need to worry about other endianess, just add a run time check at start of program to test the endianess and bail out if it is wrong.

 

Or maybe this is all premature optimisation, just handle it all as 8 bits, combining 2 8 bits together where needed (so endianess now doesn't matter), and see how it runs.

 

Last Edited: Mon. Apr 27, 2020 - 11:30 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Mrkendo - I've seen that warning before about the type-punned pointer.  I did a quick test in AS7 using this, but didn't get the warning.

 

#include <avr/io.h>

uint8_t memory[24576];

int main(void)
{
	volatile uint16_t ui1;

	ui1=*(uint16_t*)&memory[5];
    /* Replace with your application code */
    while (1)
    {

    }
}

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


alank2 wrote:
but didn't get the warning.
Really? Is your AS7 out of date or something...

 

 

Note however I also have:

 

 

If you don't have those ticked I would suggest now is probably the time to do so !

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Re #14.

Interestingly, with an older gccc (4.4.7) I only get the warning for &memory[0], &memory[5] or seemingly any other index doesn't give warning.

With a newer version (5.4.0) I get it for any index.

You also need to have optimisation -O2 or -Os or -O3

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


It has been awhile since I updated...

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I had it in debug, changing to release does show the:

 

Severity    Code    Description    Project    File    Line
Warning        dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]    GccApplication1    c:\1\GccApplication1\main.c    17