Forum Menu




 


Log in Problems?
New User? Sign Up!
AVR Freaks Forum Index

Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Author Message
bearing
PostPosted: Feb 04, 2012 - 01:27 AM
Newbie


Joined: Mar 14, 2005
Posts: 6


I'm writing some code that needs to be executed fast. Code is very inspired by AVR447, but I'm writing it for avr-gcc, and mostly from the ground.

Code:
  const uint8_t sineTable192x3[192 * 3] PROGMEM = {...};
...
  uint8_t phaseA, phaseB, phaseC;
  uint8_t angle;
  uint16_t temp;
...
  temp = (uint16_t)angle*3;
  phaseA = pgm_read_byte(&sineTable192x3[temp++]);
  phaseB = pgm_read_byte(&sineTable192x3[temp++]);
  phaseC = pgm_read_byte(&sineTable192x3[temp]);

My thought was that the last lines would generate two LPM Rxx, Z+ and one LPM Rxx, Z. But it instead produces this:


Code:
243:        temp = (uint16_t)angle*3;
+000001BA:   E093        LDI       R25,0x03       Load immediate
+000001BB:   9F89        MUL       R24,R25        Multiply unsigned
+000001BC:   01F0        MOVW      R30,R0         Copy register pair
+000001BD:   2411        CLR       R1             Clear Register
244:        phaseA = pgm_read_byte(&sineTable192x3[temp++]);
+000001BE:   01CF        MOVW      R24,R30        Copy register pair
+000001BF:   9601        ADIW      R24,0x01       Add immediate to word
+000001C0:   59E8        SUBI      R30,0x98       Subtract immediate
+000001C1:   4FFF        SBCI      R31,0xFF       Subtract immediate with carry
+000001C2:   9124        LPM       R18,Z          Load program memory
245:        phaseB = pgm_read_byte(&sineTable192x3[temp++]);
+000001C3:   01FC        MOVW      R30,R24        Copy register pair
+000001C4:   59E8        SUBI      R30,0x98       Subtract immediate
+000001C5:   4FFF        SBCI      R31,0xFF       Subtract immediate with carry
+000001C6:   9134        LPM       R19,Z          Load program memory
246:        phaseC = pgm_read_byte(&sineTable192x3[temp]);
+000001C7:   5987        SUBI      R24,0x97       Subtract immediate
+000001C8:   4F9F        SBCI      R25,0xFF       Subtract immediate with carry
+000001C9:   01FC        MOVW      R30,R24        Copy register pair
+000001CA:   9184        LPM       R24,Z          Load program memory


Which seems like a lot of unnecessary stuff.

I also tried with a pointer, which was a little bit better, but still no Z+.

Code:
  sineTable = sineTable192x3;;
  sineTable += (uint16_t)angle*3;


  phaseA = pgm_read_byte(sineTable++);
  phaseB = pgm_read_byte(sineTable++);
  phaseC = pgm_read_byte(sineTable);
Code:
235:        sineTable += (uint16_t)angle*3;
+000001BA:   E093        LDI       R25,0x03       Load immediate
+000001BB:   9F89        MUL       R24,R25        Multiply unsigned
+000001BC:   01F0        MOVW      R30,R0         Copy register pair
+000001BD:   2411        CLR       R1             Clear Register
+000001BE:   59E8        SUBI      R30,0x98       Subtract immediate
+000001BF:   4FFF        SBCI      R31,0xFF       Subtract immediate with carry
238:        phaseA = pgm_read_byte(sineTable++);
+000001C0:   01CF        MOVW      R24,R30        Copy register pair
+000001C1:   9601        ADIW      R24,0x01       Add immediate to word
+000001C2:   9124        LPM       R18,Z          Load program memory
239:        phaseB = pgm_read_byte(sineTable++);
+000001C3:   01FC        MOVW      R30,R24        Copy register pair
+000001C4:   9134        LPM       R19,Z          Load program memory
240:        phaseC = pgm_read_byte(sineTable);
+000001C5:   9601        ADIW      R24,0x01       Add immediate to word
+000001C6:   01FC        MOVW      R30,R24        Copy register pair
+000001C7:   9184        LPM       R24,Z          Load program memory


Is there a way to make the compiler generate better assembly?
 
 View user's profile Send private message  
Reply with quote Back to top
bearing
PostPosted: Feb 04, 2012 - 01:41 AM
Newbie


Joined: Mar 14, 2005
Posts: 6


Now it hits me that I should check what pgm_read_byte() actually does. It is defined to use this macro:

Code:


#define __LPM_enhanced__(addr)  \
(__extension__({                \
    uint16_t __addr16 = (uint16_t)(addr); \
    uint8_t __result;           \
    __asm__                     \
    (                           \
        "lpm % 0, Z" "\n\t"      \
´        : "=r" (__result)       \
        : "z" (__addr16)        \
    );                          \
    __result;                   \
}))



There is no Z+ in this macro, only a Z. Is it possible for the compilers optimizer to generate an LPM Rxx, Z+ from this? or is my only option to write my own pgm_read_byte() macro?
Judging from AVR447, the IAR compiler can generate the LPM Z+ when reading a lookup table located in FLASH. It seems you actually get some really useful features by paying for the compiler.

(By the way, it seems there is a bug in the posting system of this forum. You can't make a post that contains the character % followed by the character 0. "Bad Reqest Your browser sent a request that this server could not understand.". It took me a while to find that problem I can say... I added a space between those characters in the code above to solve the problem.)
 
 View user's profile Send private message  
Reply with quote Back to top
SprinterSB
PostPosted: Feb 04, 2012 - 01:50 AM
Posting Freak


Joined: Dec 21, 2006
Posts: 1545
Location: Saar-Lor-Lux

Look at pgmspace.h. It all maps to inline assembly, so you will never get Z+ with pgm_read_byte from post-incrementint the address in C.

One solution is writing code completely in assembler (fastest code, I use that for sin/cos lookup) or to use pgm_read_dword (read one byte too much and you have to extract the bytes) or write own pgm_read_byte_inc inline assembler macro (I use that, too).
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
bearing
PostPosted: Feb 04, 2012 - 02:09 AM
Newbie


Joined: Mar 14, 2005
Posts: 6


Thank you!

It's a shame the compiler doesn't have built in support for reading tables in FLASH. It seems the inline assembly really screws with the optimization algorithms in the compiler. I usually don't code for AVR, but I've written some code lately, and been studying how the compiler optimizes the code, and I'm impressed.

Since I'm not familiar with the whole instruction set of this CPU, I'm hesitant to start making inline assembly, but I think I'll make a try now.
 
 View user's profile Send private message  
Reply with quote Back to top
wek
PostPosted: Feb 04, 2012 - 07:20 AM
Raving lunatic


Joined: Dec 16, 2005
Posts: 3094
Location: Bratislava, Slovakia

Actually, it's SprinterSB who works on more native flash-located variables handling; but it's IMHO still couple of months to get into wider use.

You might get a peek preview through the binary package he posted in this thread http://www.avrfreaks.net/index.php?name ... ourceforge (mind the title Wink ).

All this said, I don't understand your hesitation to use assembler, where you desire more tight control over the resulting code (FLASH or RAM size, speed, whatever). If you are afraid of side effects, post it for some discussion here - you hardly can get more competent consultant in this filed than SprinterSB is. And if you think it decreases readibility, just encapsulate and comment extensively.

JW
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
SprinterSB
PostPosted: Feb 04, 2012 - 04:45 PM
Posting Freak


Joined: Dec 21, 2006
Posts: 1545
Location: Saar-Lor-Lux

bearing wrote:
It's a shame the compiler doesn't have built in support for reading tables in FLASH.
avr-gcc offers you a benefit/cost ratio of oo.

Thus, native flash support won't add to that infinite benefit/cost ratio.

And it's a shame that so few people contribute to a great project like GCC.

avr-gcc supports native flash support since 4.7. Tentative release time is around April, but you are invited to test the tools before the 4.7.0 release and report bugs so that they can be fixed and won't slip into official releases.

However, even the native flash support won't come up with assembler-like code and the feature aims as cleaner C source and not at quenching out the last bit of performance.

GCC is *really* bad with pre- and post-modify optimizations. The reason is that great deal of optimizations take place on SSA and by its very nature, SSA cannot represent pre-/post-modify.

SSA representation in GCC ends around pass 150/230 where the machine specific phase is entered.

There is an auto-inc-dec pass in pass 180/230, but the pass works bad, it's known it has to be rewritten to yield better performance and IMO such a pass should be located after SSA but prior to SSA → machine lowering so that machine part has the opportunity to emit pre-/post-modify in the first place.

For other machines or address spaces good side-effect addressing is not as vital for optimal code as for AVR's flash where no other addressing modes are available.

Compiler and human brains work considerably different; where the code-crunching compiler is based on SSA and works on the level of electrons, neutrons and protons, the human brain works on the level of molecules and crystals or even more complex patterns.

I don't know if there are compilers that try to follow the brain approach, or if there is even reserch on trying to find new paradigms for compilers beyond the code-crunch approach.
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
Display posts from previous:     
Jump to:  
All times are GMT + 1 Hour
Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Powered by PNphpBB2 © 2003-2006 The PNphpBB Group
Credits