320x240 TFT and Uno performance?

Go To Last Post
9 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm getting about 58K pix per sec drawing lines and rectangles to this display. 1.3 sec screen fill time. Anyone else have a TFT display shield running on an Uno in c (not c++) and can post any draw speeds/times? I'd like to speed it up a little more if possible. Thanks.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I'd like to speed it up a little more if possible.

Have you looked at the generated Asm at the heart of the pixel writing? Is it optimal? I assume your compiler supports in line Asm if not?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Bob,

Explain your TFT i/f connections. And which shield/display you are using.

I have a ILI9320 shield (8-bit) that runs at 3.3V (no good for UNO) and uses PB0..PB1 and PD2..PD7
And a SSD1289 shield (8-bit) with level-shifters that uses PD0..PD7

The SSD1289 certainly goes fast. The ILI9320 requires multiple IN, AND, OUT instructions. So is about third the SSD1289 speed.

If you are using a UNO, you need level shifters and a lot of masks for a 16-bit i/f.
Of course a MEGA has got entire PORTs available. So you don't need to move bits around.

I can dig out those shields and post you some times.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Its the seeedstudio 2.8" TFT v1.0 using an ST7781 ctlr. it uses cs,rd,wr and 8 data lines, but 6 bits are in portd and two in portb, so they split every byte write into two port writes. For example, the putdata fn calls a fn called allpinslo that does PORTD &= ~0xfc; and PORTB &= ~0x03; then it writes the byte with PORTD |= (data<<2); PORTB |= (data>>6);

Here is their version

//---------------------------------------
void sendCommand(unsigned int index){
//send cmd byte

  CS_LOW;
  RS_LOW;  //cmd
  RD_HIGH; //wr
//  WR_HIGH;

  WR_LOW;
  putData(0); //index hi
  WR_HIGH;
	
  WR_LOW;
  putData(index & 0xff); //index lo
  WR_HIGH;

  CS_HIGH;
}

//-------------------------------------
void sendData(unsigned int data){
//send data bytes    16 bit rgb 565

  CS_LOW;
  RS_HIGH; //data
  RD_HIGH; //wr

  WR_LOW;
  putData(data >> 8); //hi byte
  WR_HIGH;

  WR_LOW;
  putData(data & 0xff); //lo byte
  WR_HIGH;

  CS_HIGH;
}

//and here is my inlined version
//---------------------------------------
void sendCommand(unsigned int index){
//called subs inlined
unsigned char dh,dl;

  dh=(index & 0xff00) >> 8;
  dl=(index & 0xff);
  CS_LOW;
  RS_LOW; //command
  RD_HIGH;
  WR_HIGH;

  WR_LOW;
  PORTD &= ~0xfc;     //allpinlow inlined
  PORTB &= ~0x03;
  PORTD |= (dh << 2); //putdata hi
  PORTB |= (dh >> 6);
  WR_HIGH;

  WR_LOW;
  PORTD &= ~0xfc;     //allpinlow inlined
  PORTB &= ~0x03;
  PORTD |= (dl << 2); //putdata lo
  PORTB |= (dl >> 6);
  WR_HIGH;

  CS_HIGH;
}

//-------------------------------------
void sendData(unsigned int data){
//with called subs inlined
unsigned char dh,dl;

  dh=(data & 0xff00) >> 8; //hi byte
	dl=(data & 0xff);        //lo byte
  CS_LOW;
  RS_HIGH;  //data
  RD_HIGH;

  WR_LOW;
  PORTD &= ~0xfc;     //allpinlow inlined
  PORTB &= ~0x03;
  PORTD |= (dh << 2); //
  PORTB |= (dh >> 6);
  WR_HIGH;

  WR_LOW;
  PORTD &= ~0xfc;     //allpinlow inlined
  PORTB &= ~0x03;
  PORTD |= (dl << 2); //
  PORTB |= (dl >> 6);
  WR_HIGH;

  CS_HIGH;
}

I bet I could leave cs lo the whole time for example?

(0182) //#if 0  //inlined version
(0183) //---------------------------------------
(0184) void sendCommand(unsigned int index){
(0185) //called subs inlined
(0186) unsigned char dh,dl;
(0187) 
(0188)   dh=(index & 0xff00) >> 8;
    00EE1 01CA      MOVW	R24,R20
    00EE2 7080      ANDI	R24,0
    00EE3 016C      MOVW	R12,R24
    00EE4 2CCD      MOV	R12,R13
    00EE5 24DD      CLR	R13
(0189)   dl=(index & 0xff);
    00EE6 01CA      MOVW	R24,R20
    00EE7 7090      ANDI	R25,0
    00EE8 2EA8      MOV	R10,R24
(0190)   CS_LOW;
    00EE9 982A      CBI	0x05,2
(0191)   RS_LOW; //command
    00EEA 982B      CBI	0x05,3
(0192)   RD_HIGH;
    00EEB 9A2D      SBI	0x05,5
(0193)   WR_HIGH;
    00EEC 9A2C      SBI	0x05,4
(0194) 
(0195)   WR_LOW;
    00EED 982C      CBI	0x05,4
(0196)   PORTD &= ~0xfc;     //allpinlow inlined
    00EEE B18B      IN	R24,0x0B
    00EEF 7083      ANDI	R24,3
    00EF0 B98B      OUT	0x0B,R24
(0197)   PORTB &= ~0x03;
    00EF1 B185      IN	R24,0x05
    00EF2 7F8C      ANDI	R24,0xFC
    00EF3 B985      OUT	0x05,R24
(0198)   PORTD |= (dh << 2); //putdata hi
    00EF4 2C2C      MOV	R2,R12
    00EF5 0C22      LSL	R2
    00EF6 0C22      LSL	R2
    00EF7 B03B      IN	R3,0x0B
    00EF8 2832      OR	R3,R2
    00EF9 B83B      OUT	0x0B,R3
(0199)   PORTB |= (dh >> 6);
    00EFA E026      LDI	R18,6
    00EFB E030      LDI	R19,0
    00EFC 2D0C      MOV	R16,R12
    00EFD 2711      CLR	R17
    00EFE 940E 25FF CALL	asr16
    00F00 B025      IN	R2,0x05
    00F01 2433      CLR	R3
    00F02 2A20      OR	R2,R16
    00F03 2A31      OR	R3,R17
    00F04 B825      OUT	0x05,R2
(0200)   WR_HIGH;
    00F05 9A2C      SBI	0x05,4
(0201) 
(0202)   WR_LOW;
    00F06 982C      CBI	0x05,4
(0203)   PORTD &= ~0xfc;     //allpinlow inlined
    00F07 B18B      IN	R24,0x0B
    00F08 7083      ANDI	R24,3
    00F09 B98B      OUT	0x0B,R24
(0204)   PORTB &= ~0x03;
    00F0A B185      IN	R24,0x05
    00F0B 7F8C      ANDI	R24,0xFC
    00F0C B985      OUT	0x05,R24
(0205)   PORTD |= (dl << 2); //putdata lo
    00F0D 2C2A      MOV	R2,R10
    00F0E 0C22      LSL	R2
    00F0F 0C22      LSL	R2
    00F10 B03B      IN	R3,0x0B
    00F11 2832      OR	R3,R2
    00F12 B83B      OUT	0x0B,R3
(0206)   PORTB |= (dl >> 6);
    00F13 E026      LDI	R18,6
    00F14 E030      LDI	R19,0
    00F15 2D0A      MOV	R16,R10
    00F16 2711      CLR	R17
    00F17 940E 25FF CALL	asr16
    00F19 B025      IN	R2,0x05
    00F1A 2433      CLR	R3
    00F1B 2A20      OR	R2,R16
    00F1C 2A31      OR	R3,R17
    00F1D B825      OUT	0x05,R2
(0207)   WR_HIGH;
    00F1E 9A2C      SBI	0x05,4
(0208) 
(0209)   CS_HIGH;
    00F1F 9A2A      SBI	0x05,2
    00F20 940C 270E JMP	pop_xgset303C
_sendData:
  dl                   --> R10
  dh                   --> R12
  data                 --> R20
    00F22 940E 2707 CALL	push_xgset303C
    00F24 01A8      MOVW	R20,R16
(0210) }
(0211) 
(0212) //-------------------------------------
(0213) void sendData(unsigned int data){
(0214) //with called subs inlined
(0215) unsigned char dh,dl;
(0216) 
(0217)   dh=(data & 0xff00) >> 8; //hi byte
    00F25 01CA      MOVW	R24,R20
    00F26 7080      ANDI	R24,0
    00F27 016C      MOVW	R12,R24
    00F28 2CCD      MOV	R12,R13
    00F29 24DD      CLR	R13
(0218) 	dl=(data & 0xff);        //lo byte
    00F2A 01CA      MOVW	R24,R20
    00F2B 7090      ANDI	R25,0
    00F2C 2EA8      MOV	R10,R24
(0219)   CS_LOW;
    00F2D 982A      CBI	0x05,2
(0220)   RS_HIGH;  //data
    00F2E 9A2B      SBI	0x05,3
(0221)   RD_HIGH;
    00F2F 9A2D      SBI	0x05,5
(0222) 
(0223)   WR_LOW;
    00F30 982C      CBI	0x05,4
(0224)   PORTD &= ~0xfc;     //allpinlow inlined
    00F31 B18B      IN	R24,0x0B
    00F32 7083      ANDI	R24,3
    00F33 B98B      OUT	0x0B,R24
(0225)   PORTB &= ~0x03;
    00F34 B185      IN	R24,0x05
    00F35 7F8C      ANDI	R24,0xFC
    00F36 B985      OUT	0x05,R24
(0226)   PORTD |= (dh << 2); //
    00F37 2C2C      MOV	R2,R12
    00F38 0C22      LSL	R2
    00F39 0C22      LSL	R2
    00F3A B03B      IN	R3,0x0B
    00F3B 2832      OR	R3,R2
    00F3C B83B      OUT	0x0B,R3
(0227)   PORTB |= (dh >> 6);
    00F3D E026      LDI	R18,6
    00F3E E030      LDI	R19,0
    00F3F 2D0C      MOV	R16,R12
    00F40 2711      CLR	R17
    00F41 940E 25FF CALL	asr16
    00F43 B025      IN	R2,0x05
    00F44 2433      CLR	R3
    00F45 2A20      OR	R2,R16
    00F46 2A31      OR	R3,R17
    00F47 B825      OUT	0x05,R2
(0228)   WR_HIGH;
    00F48 9A2C      SBI	0x05,4
(0229) 
(0230)   WR_LOW;
    00F49 982C      CBI	0x05,4
(0231)   PORTD &= ~0xfc;     //allpinlow inlined
    00F4A B18B      IN	R24,0x0B
    00F4B 7083      ANDI	R24,3
    00F4C B98B      OUT	0x0B,R24
(0232)   PORTB &= ~0x03;
    00F4D B185      IN	R24,0x05
    00F4E 7F8C      ANDI	R24,0xFC
    00F4F B985      OUT	0x05,R24
(0233)   PORTD |= (dl << 2); //
    00F50 2C2A      MOV	R2,R10
    00F51 0C22      LSL	R2
    00F52 0C22      LSL	R2
    00F53 B03B      IN	R3,0x0B
    00F54 2832      OR	R3,R2
    00F55 B83B      OUT	0x0B,R3
(0234)   PORTB |= (dl >> 6);
    00F56 E026      LDI	R18,6
    00F57 E030      LDI	R19,0
    00F58 2D0A      MOV	R16,R10
    00F59 2711      CLR	R17
    00F5A 940E 25FF CALL	asr16
    00F5C B025      IN	R2,0x05
    00F5D 2433      CLR	R3
    00F5E 2A20      OR	R2,R16
    00F5F 2A31      OR	R3,R17
    00F60 B825      OUT	0x05,R2
(0235)   WR_HIGH;
    00F61 9A2C      SBI	0x05,4
(0236) 
(0237)   CS_HIGH;
    00F62 9A2A      SBI	0x05,2
    00F63 940C 270E JMP	pop_xgset303C
_readRegister:
  data                 --> R10
  index                --> R12
    00F65 940E 26E0 CALL	push_xgset003C
    00F67 0168      MOVW	R12,R16
(0238) }
(0239) //#endif

I have to admit the code generation stinks. See where it calls asr16 6 times in a loop instead of 6 lsr shifts? That really stinks Richard!

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That seems crazy. Not only do you have to do masks, you have multiple shifts too.

Look to see what your Compiler generates in ASM.

The Adafruit Shield and my Chinese 3.3V Shield use PD2..PD7 for D2..D7 and PB0..PB1 for D0..D1

So it is simply AND, IN, AND, OR, OUT for each PORT. No, I don't think that I ever looked at the ASM.

I am a great believer in avoiding the soldering iron. Someone at SeeedStudio must have got hold of your cigarettes!

I will have a look at the SeeedStudio site.

No, you don't need to shake CS for every op.
It looks as if your macros will be pretty efficient.
It is unfortunate that you have got to shift everything. The ARM has a barrel shifter so it does not matter how far you do the shifting. The AVR needs one shift at a time.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Now we are getting somewhere. I give you a byte and tell you to put 2 bits here and 6 bits there. Whats the best way to do that? This was the seeedstudio stuff, and it had a bunch of ifdefs for uno or mega that I left out. Give me the asm instructions to do that and I'll make a macro out of it?

Imagecraft compiler user

Last Edited: Tue. Dec 31, 2013 - 09:04 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
(0226)   PORTD |= (dh << 2); //
    00F37 2C2C      MOV   R2,R12
    00F38 0C22      LSL   R2
    00F39 0C22      LSL   R2
    00F3A B03B      IN   R3,0x0B
    00F3B 2832      OR   R3,R2
    00F3C B83B      OUT   0x0B,R3
(0227)   PORTB |= (dh >> 6);
    00F3D E026      LDI   R18,6
    00F3E E030      LDI   R19,0
    00F3F 2D0C      MOV   R16,R12
    00F40 2711      CLR   R17
    00F41 940E 25FF CALL   asr16
    00F43 B025      IN   R2,0x05
    00F44 2433      CLR   R3
    00F45 2A20      OR   R2,R16
    00F46 2A31      OR   R3,R17
    00F47 B825      OUT   0x05,R2

The <<2 is no problem.
To do >>6 you could do SWAP and >>2

And you could write the code better in the first place. e.g.

PORTD = (PORTD & 0x03) | (dh << 2);
PORTB = (PORTB & 0xFC) | (dh >> 6);

I have not looked, but I would not be surprised to see a Compiler do a AND, SWAP, LSR, LSR
or even an AND, ROL, ROL
or even a BLD, BST, BLD, BST

Untested. Anything is better than calling a subroutine to shift one bit right!

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK cool. I'll try those tricks! Thanks

I replaced that crazy asr16 call with 6 lsr asm instructions and it doubled the speed from 60K to 120K pix per sec.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
#include "portab_mcu.h"
#include 
extern void initstdio(void);

#define COUNTCYCLES(name, sequence) {TCNT2=0;TCCR2B=1;{sequence};TCCR2B=0;\
    printf("\"%s\" took %u cycles\r\n", name, TCNT2);}

void main(void)
{
    volatile uint8_t dh = OSCCAL;
    initstdio();
    printf("Hello time Bob\r\n");
    COUNTCYCLES("orig", {
                PORTD &= ~0xfc; //allpinlow inlined
                PORTB &= ~0x03; 
                PORTD |= (dh << 2);     //putdata hi
                PORTB |= (dh >> 6);
                });
    COUNTCYCLES("shift", {
                PORTD = (PORTD & 0x03)|(dh << 2); 
                PORTB = (PORTB & 0xFC)|(dh >> 6);}
                );
    COUNTCYCLES("mask", {
                PORTD = (PORTD & 0x03)|(dh & 0xFC); 
                PORTB = (PORTB & 0xFC)|(dh & 0x03);
                });
    COUNTCYCLES("whole", {
                PORTD = dh; 
                });
    while (1);
}

I compiled with CV, ImageCraft, AS4.
CV took 27, 23, 19, 6 cycles for orig, shift, mask, whole
ICC took 27, 23, 16, 6 cycles
GCC took 24, 20, 16, 5 cycles

So actually, there was little difference between compilers or sequences.
In fact ICC did not call an external function to do the 6 right shifts. It just does 6 LSRs.
Both CV and GCC used AND, SWAP, LSR, LSR.

The important point over Arduino wiring is that spreading bits over PORTs is expensive.

David.