Am I doing something wrong to not let the compiler work this

Go To Last Post
16 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

(ignore the PORTC bit - it is just to give it something to do)

#include 
#include 

typedef const struct {
	uint16_t X1;
	uint16_t Y1;
	uint16_t X2;
	uint16_t Y2;
} PROGMEM ObjectDataStruct;

ObjectDataStruct ObjectDataArray[] =
{

// Giant Cross
{0,   0,  0,319,239},
{0,   0,239,319,  0}

};

void bresh_line(uint16_t x1,uint16_t y1,uint16_t x2,uint16_t y2)
{
	PORTC = x1;
	PORTC = y1;
	PORTC = x2;
	PORTC = y2;
}

int main(void)
{
		uint8_t  i;
		uint16_t g_x1;
		uint16_t g_y1;
		uint16_t g_x2;
		uint16_t g_y2;

		for (i = 0; i < (sizeof(ObjectDataArray)/sizeof(ObjectDataStruct)); i++)
		{
			g_x1 = pgm_read_word(&ObjectDataArray[i].X1);
			g_y1 = pgm_read_word(&ObjectDataArray[i].Y1);
			g_x2 = pgm_read_word(&ObjectDataArray[i].X2);
			g_y2 = pgm_read_word(&ObjectDataArray[i].Y2);
			bresh_line(g_x1, g_y1, g_x2, g_y2);
		}

}

Results in

g_x1 = pgm_read_word(&ObjectDataArray[i].X1);

LDI R30,0x54		Load immediate 
LDI R31,0x00		Load immediate 
LPM R22,Z+		Load program memory and postincrement 
LPM R23,Z		Load program memory 

g_y1 = pgm_read_word(&ObjectDataArray[i].Y1);

LDI R30,0x56		Load immediate 
LDI R31,0x00		Load immediate 
LPM R20,Z+		Load program memory and postincrement 
LPM R21,Z		Load program memory 

g_x2 = pgm_read_word(&ObjectDataArray[i].X2);

LDI R30,0x58		Load immediate 
LDI R31,0x00		Load immediate 
LPM R18,Z+		Load program memory and postincrement 
LPM R19,Z		Load program memory 

g_y2 = pgm_read_word(&ObjectDataArray[i].Y2);

LDI R30,0x5A		Load immediate 
LDI R31,0x00		Load immediate 
LPM R24,Z+		Load program memory and postincrement 
LPM R25,Z		Load program memory 

Now if you make the table longer than n lines it does not unroll the loop, but it still does not do the obv. thing of just z+-ing all the time.

Now I have tried to make the for loop

for (i = 0; i < sizeof(ObjectDataArray); i++)

And then having a case to make four states inside the loop. This gets the compiler to make slightly quicker ASM code when it optimises away the state-machine. But it makes the C confusing.

Is there something I am not doing correct in the first case to make the compiler know that

array[i+1].x1

Is straight after

array[i].y2

In memory.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pgm_read_word is basically just a set of macros that invoke inline assembly. There have been improvements in the latest avr-gcc (that I believe comes with AVR Studio 6.1 beta) that might make the situation better. It certainly makes the use of data in flash much easier.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks,

Already in AS6.1 so I guess next level up from "line draw" is going to ASM as well.

Nothing up from here needs to be optimised so I should be able to leave everything higher level than this in C. If it's only getting called once per video frame I don't care how many cycles it wastes :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The new implementation that I am referring to uses _flash rather than PROGMEM. You'll have to look around in the docs or here on the forums for the details. I haven't had a chance to try it out yet. (Or maybe I'm mistaken and it isn't being distributed with AS6 yet.)

Edit: Here is the relevant page in the docs:
http://gcc.gnu.org/onlinedocs/gc...

And I did see Cliff mentioning that it is in AS6.1beta.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK - I will give that a try.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

And I did see Cliff mentioning that it is in AS6.1beta.

Indeed it is. SprinterSB added __flash to avr-gcc at 4.7 and AS6.1Beta contains 4.7.2

By the way the use of PROGMEM on struct typedef is just something that accidentally works so its use is deprecated in 4.7.x onwards (you will see some of the typedef's previously in pgmspace.h have gone). So I just tried your code with __flash and for:

#include 
#include 

typedef const struct {
	uint16_t X1;
	uint16_t Y1;
	uint16_t X2;
	uint16_t Y2;
} ObjectDataStruct;

__flash ObjectDataStruct ObjectDataArray[] =
{

	// Giant Cross
	{ 0,  0,  319, 239},
	{ 0, 239, 319,  0}

};

void bresh_line(uint16_t x1,uint16_t y1,uint16_t x2,uint16_t y2)
{
	PORTC = x1;
	PORTC = y1;
	PORTC = x2;
	PORTC = y2;
}

int main(void)
{
	uint8_t  i;
	uint16_t g_x1;
	uint16_t g_y1;
	uint16_t g_x2;
	uint16_t g_y2;

	for (i = 0; i < (sizeof(ObjectDataArray)/sizeof(ObjectDataStruct)); i++)
	{
		g_x1 = ObjectDataArray[i].X1;
		g_y1 = ObjectDataArray[i].Y1;
		g_x2 = ObjectDataArray[i].X2;
		g_y2 = ObjectDataArray[i].Y2;
		bresh_line(g_x1, g_y1, g_x2, g_y2);
	}
}

(I changed the initialisers to have four not five entries each). The code produced was:

.global	main
	.type	main, @function
main:
//==> {
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
//==> 	PORTC = x1;
	out 0x15,__zero_reg__	 ;  MEM[(volatile uint8_t *)53B],
//==> 	PORTC = y1;
	out 0x15,__zero_reg__	 ;  MEM[(volatile uint8_t *)53B],
//==> 	PORTC = x2;
	ldi r24,lo8(63)	 ;  tmp46,
	out 0x15,r24	 ;  MEM[(volatile uint8_t *)53B], tmp46
//==> 	PORTC = y2;
	ldi r25,lo8(-17)	 ;  tmp48,
	out 0x15,r25	 ;  MEM[(volatile uint8_t *)53B], tmp48
//==> 	PORTC = x1;
	out 0x15,__zero_reg__	 ;  MEM[(volatile uint8_t *)53B],
//==> 	PORTC = y1;
	out 0x15,r25	 ;  MEM[(volatile uint8_t *)53B], tmp48
//==> 	PORTC = x2;
	out 0x15,r24	 ;  MEM[(volatile uint8_t *)53B], tmp46
//==> 	PORTC = y2;
	out 0x15,__zero_reg__	 ;  MEM[(volatile uint8_t *)53B],
//==> }
	ldi r24,0	 ; 
	ldi r25,0	 ; 
	ret
	.size	main, .-main

So, yes the use of Z and LPM has gone but I'm afraid the optimisation was too good and it's just hard coded the values to be output. I guess if I make the array index from a random volatile source (PINB?) then I may be able to force it to generate some indexing code:

	i = PINB;
	g_x1 = ObjectDataArray[i].X1;
	g_y1 = ObjectDataArray[i].Y1;
	g_x2 = ObjectDataArray[i].X2;
	g_y2 = ObjectDataArray[i].Y2;
	bresh_line(g_x1, g_y1, g_x2, g_y2);

that generates:

//==> 	i = PINB;
	in r24,0x16	 ;  i, MEM[(volatile uint8_t *)54B]
//==> 	g_y1 = ObjectDataArray[i].Y1;
	ldi r18,lo8(8)	 ; ,
	mul r24,r18	 ;  i,
	movw r24,r0	 ;  tmp56
	clr __zero_reg__
	subi r24,lo8(-(ObjectDataArray))	 ;  tmp56,
	sbci r25,hi8(-(ObjectDataArray))	 ;  tmp56,
	movw r30,r24	 ;  tmp57, tmp56
	adiw r30,2	 ;  tmp57,
	lpm r20,Z+	 ; ,,
	lpm r21,Z	 ; ,,
//==> 	g_x2 = ObjectDataArray[i].X2;
	movw r30,r24	 ;  tmp61, tmp56
	adiw r30,4	 ;  tmp61,
	lpm r18,Z+	 ; ,,
	lpm r19,Z	 ; ,,
//==> 	g_y2 = ObjectDataArray[i].Y2;
	movw r30,r24	 ;  tmp65, tmp56
	adiw r30,6	 ;  tmp65,
	lpm r22,Z+	 ; ,,
	lpm r23,Z	 ; ,,
//==> 	PORTC = x1;
	movw r30,r24	 ; , tmp56
	lpm r24,Z	 ;  D.1552,
	out 0x15,r24	 ;  MEM[(volatile uint8_t *)53B], D.1552
//==> 	PORTC = y1;
	out 0x15,r20	 ;  MEM[(volatile uint8_t *)53B], g_y1
//==> 	PORTC = x2;
	out 0x15,r18	 ;  MEM[(volatile uint8_t *)53B], g_x2
//==> 	PORTC = y2;
	out 0x15,r22	 ;  MEM[(volatile uint8_t *)53B], g_y2

I'm afraid that does not achieve your goal of Z+'ing all the time either but the code generator (and I would be astonished if this were true of any code generator) does not go back to analyze that X1,Y1.X2,Y2 are consecutive but just treats each access as a separate entity. You might want to download the Kickstart edition of IAR and just try the similar code there. If any compiler was going to do that kind of deep analysis during code generation it would be IAR!

BTW for comparison I changed your original code to do the i = PINB thing and compiled with 4.3.3 in WinAVR and for comparison got:

.LM8:
	in r24,54-32	 ;  i,,
.LVL1:
.LBB8:
.LM9:
	ldi r25,lo8(0)	 ;  D.1364,
	movw r22,r24	 ;  tmp57, D.1364
.LVL2:
	ldi r21,3	 ; ,
1:	lsl r22	 ;  tmp57
	rol r23	 ;  tmp57
	dec r21	 ; 
	brne 1b
	ldi r18,lo8(ObjectDataArray)	 ;  tmp58,
	ldi r19,hi8(ObjectDataArray)	 ;  tmp58,
	movw r30,r18	 ;  __addr16, tmp58
.LVL3:
	add r30,r22	 ;  __addr16, tmp57
	adc r31,r23	 ;  __addr16, tmp57
/* #APP */
 ;  36 "test.c" 1
	lpm r26, Z+	 ;  __result
	lpm r27, Z	 ;  __result
	
 ;  0 "" 2
.LVL4:
/* #NOAPP */
.LBE8:
.LBB9:
.LM10:
	movw r24,r22	 ;  D.1364, tmp57
	adiw r24,2	 ;  D.1364,
	movw r20,r18	 ;  __addr16, tmp58
.LVL5:
	add r20,r24	 ;  __addr16, D.1364
	adc r21,r25	 ;  __addr16, D.1364
	sbiw r24,2	 ;  __addr16,
.LVL6:
	movw r30,r20	 ; , __addr16
.LVL7:
/* #APP */
 ;  37 "test.c" 1
	lpm r28, Z+	 ;  __result
	lpm r29, Z	 ;  __result
	
 ;  0 "" 2
.LVL8:
/* #NOAPP */
.LBE9:
.LBB10:
.LM11:
.LVL9:
	adiw r24,4	 ;  __addr16,
	add r24,r18	 ;  __addr16, tmp58
	adc r25,r19	 ;  __addr16, tmp58
	movw r30,r24	 ; , __addr16
/* #APP */
 ;  38 "test.c" 1
	lpm r20, Z+	 ;  __result
	lpm r21, Z	 ;  __result
	
 ;  0 "" 2
.LVL10:
/* #NOAPP */
.LBE10:
.LBB11:
.LM12:
	subi r22,lo8(-(6))	 ;  __addr16,
	sbci r23,hi8(-(6))	 ;  __addr16,
.LVL11:
	add r22,r18	 ;  __addr16, tmp58
	adc r23,r19	 ;  __addr16, tmp58
	movw r30,r22	 ; , __addr16
/* #APP */
 ;  39 "test.c" 1
	lpm r24, Z+	 ;  __result
	lpm r25, Z	 ;  __result
	
 ;  0 "" 2
.LVL12:
/* #NOAPP */
.LBE11:
.LBB12:
.LBB13:
.LM13:
	out 53-32,r26	 ; ,, __result
.LM14:
	out 53-32,r28	 ; ,, __result
.LM15:
	out 53-32,r20	 ; ,, __result
.LM16:
	out 53-32,r24	 ; ,, __result
.LBE13:
.LBE12:
.LM17:

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cliff,

Thanks for investigating the new __flash thing there.

This is my naive ASM version.

	push	r17			; I will trash these so need to save them
	push	r28
	push	r29

	ldi	r17, 110		; there are 110 lines in the test scene


	ldi	r30, lo8(ObjectArray)	; Get the address of array in Z
	ldi	r31, hi8(ObjectArray)

DrawLineLoop:				; for(r17=110; r17 != 0; r17--) 
	lpm	r24, Z+
	lpm	r25, Z+
	lpm	r22, Z+
	lpm	r23, Z+
	lpm	r20, Z+
	lpm	r21, Z+
	lpm	r19, Z+
	lpm	r19, Z+
		
	movw	R28, R30		; calling C Routine may trash Z so needs to be saved
	rcall	bresh_line
	movw	R30, R28		; restore Z

	dec	r17			
	brne	DrawLineLoop

	pop	r29			; Restore the 3 regs I trashed
	pop	r28
	pop	r17
	ret

One thing I don't like is I am not smart enough to work out how to get the size of the array in ASM.

I just had to hard code "110" on the fourth line of code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just put a dummy label at the end of ObjArray:

ObjArray: .db foo, bar, ...
ObjArray_end:

The ASM can simply use (ObjArray_end - ObjArray) to evaluate the size.

I am guessing that you are doing moving pictures on a TFT display.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
This is my naive ASM version.

Dare one ask why you don't simply use that? You can easily mix .S and .c files with GCC. As long as you stick to the ABI you can freely call in either direction, share global data and so on.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

David, the "struct" and the data are all defined in a C file not an .S (ASM) file.

The LCD display is all being draw as lines. No bitmaps. Static parts of scenes will be stored in flash. The interactive parts stored in structs that reside in RAM.

The RAM parts are contained in a few cross pollinated linked lists. I need to sort things in X to redraw ahead of the "beam". It would be lovely if the IL9238 could do an 8 bit more and double buffer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:
This is my naive ASM version.

Dare one ask why you don't simply use that?

Cliff,

I called it naive as it was just what I typed on the fly to match what I have learned about mixing .c and .S

I just have not had time to sit down and think about it carefully.

Looking at it a bit closer the only thing I can think of changing is loosing the save/restore on Z. I won't need to do that _if_ I am only going to call my own ASM bresh_line routine that does not trash Z. That will save 10 ticks (as I won't have to push/pop Y either)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can inspect your .s files to see where ObjectArray starts and ends. Either you add a global label at the end or you spot how GCC marks the end.

Personally, I would simply write the optimised function in a .S file. Then call it as a C function with the address and size of your struct array.

GCC is really quite pleasant for adding .S files to. The overhead of calling as a C function is trivial. Inside your .S function, you can be as efficient as you like.

In fact, I would write the C function. Retrieve the generated ASM and hand optimise it in a .S file.

You have two equivalent working functions. You can test one against the other. Both for speed and compliance to spec.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Personally, I would simply write the optimised function in a .S file. Then call it as a C function with the address and size of your struct array.

Ah - I didn't think of that. Let C work it out with a

(sizeof(ObjectDataArray)/sizeof(ObjectDataStruct))

And pass that to the ASM in r24.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not only that, but C will use all the correct ASM directives to ensure your .S function links properly.

Note that you can strip out a whole load of garbage too. But don't do that until you are comfortable with gcc-as.

The same applies to most other toolchains. However GCC is probably the easiest to 'optimise'.

Personally, I have no intention of learning the complete ARM / Renesas / AVR instruction set. However a human can spot the obvious, and just look up the docs on Z, Z+, LPM, ...
Make the changes. Then link with both the C and ASM version alternately to test compliance.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is probably too late and not sufficiently on-topic, but it might serve the OP better to make more use of the data structures he set up. Instead of explicitly unpackaging all the elements of an ObjectDataStruct for passing into the bresh_line() function, for example, pass the structure itself. Or, better still, pass a reference to the ObjectDataStruct. Here's a paraphrase of the OP's code:

#include  
#include  

// this should be a part of stdint.h....
//
#define NELEMENTS(array) (sizeof(array)/sizeof(array[0]))


struct ObjectDataStruct{ 
   uint16_t X1; 
   uint16_t Y1; 
   uint16_t X2; 
   uint16_t Y2; 
}; 

const ObjectDataStruct ObjectDataArray[] PROGMEM  = 
{ 
// Giant Cross 
    {0,  0,319,239}, 
    {0,239,319,  0} 
}; 

void bresh_line(ObjectDataStruct& point)
{ 
   PORTC = point.X1; 
   PORTC = point.Y1; 
   PORTC = point.X2; 
   PORTC = point.Y2; 
} 



int main(void) 
{ 
      ObjectDataStruct sramObject;
      for (uint8_t index = 0; index < (NELEMENTS(ObjectDataArray)); index++) 
      { 
          memcpy_P(&sramObject, ObjectDataArray + index, sizeof(ObjectDataStruct));
          bresh_line(sramObject); 
      } 

} 


And here's some of the interesting pieces from the disassembly (as produced by WinAVR2010/AStudio 4):

void bresh_line(ObjectDataStruct& point)
  a4:	fc 01       	movw	r30, r24
{ 
   PORTC = point.X1; 
  a6:	80 81       	ld	r24, Z
  a8:	88 b9       	out	0x08, r24	; 8
   PORTC = point.Y1; 
  aa:	82 81       	ldd	r24, Z+2	; 0x02
  ac:	88 b9       	out	0x08, r24	; 8
   PORTC = point.X2; 
  ae:	84 81       	ldd	r24, Z+4	; 0x04
  b0:	88 b9       	out	0x08, r24	; 8
   PORTC = point.Y2; 
  b2:	86 81       	ldd	r24, Z+6	; 0x06
  b4:	88 b9       	out	0x08, r24	; 8
} 
  b6:	08 95       	ret

I compiled it as a C++ program (I hate having to use "typedef")

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Levenkay,

Thanks for the input. I can see that passing the address pointer has advantages.

The bresh_line routine does not always get its data from the same caller though. Some of the other callers don't use the struct so that I guess would require packing it a struct prior to call.

Excellent idea for this simple case shown.

It's interesting none of the other compiled ASM version use LPM Z

. It was designed just for that job. LPM Z
is def. what I would use if sequential structs where not being accessed.