What would be new, reasonable features for avr-gcc¹?
Features that
- Are useful and will actually be used
- Don't conflict with the standard
What would be new, reasonable features for avr-gcc¹?
Features that
ahhh. not c++. I was going to raise the old point about vtables in flash....
I am sure that there are some improvements but I think it is extraordinarily good already.
I don't thinks he asks for specific recommendations, if I understand correctly the question is basically if it is OK to implement useful new features that may conflict with the standard.
Alex
EDIT:
I have misinterpreted the question, he is asking for new features ideas that don't conflict with the standard and are useful.
It's for recommendations and "what's missing".
More of Embedded C.
IIRC, avr-gcc 4.8 will (has) restore(d) fixed point; is that correct?
Atomic operations.
Perhaps the ability to squish up the vector table a little - an option to discard any sections of the vector table that aren't used so that normal program data can occupy them, which would reduce XMEGA application footprint. Or perhaps an option to use all unused entries in the vector table for normal data, so that you still save space even if you hook the first and last vector entries.
- Dean :twisted:
I would like to see a lot more warnings. We get away with way too much.
The simplest example I can think of is
int i; char c = i;
64bit double perhaps?
I don't say this because it's a feature I'm desperate to use on an 8bit micro but simply because it's a tick in the box on the specification list and has previously been cited as a reason why avr-gcc isn't as good as some competing compilers.
The implementation wouldn't have to be that efficient (just accurate) because let's face it if you use float / double you probably aren't that fussed about efficiency. ;-)
Perhaps a mechanism for marking in-line assembly so that it will not be reordered with respect to volatile accesses.
Perhaps by putting volatile in the clobber list.
memory clobber is a bit of a bludgeon.
For that matter, it might be good to be able to "pass" a pointer into in-line assembly and tell the compiler that only its target might be clobbered.
It would be wonderful if values of variables could be inspected during a debugging session. In my experience, this works less than half the time. I don't know whether AStudio's debugger or gcc's debug output is to blame, but from the fact that this situation didn't seem to improve going from AStudio4 to AStudio6, I tend to suspect that the compiler output files don't contain enough supplimental debugging information to let a debugger work.
And, yes, I'm talking about debugging optimized code (you know; the kind you MUST build if you want _delayXs() and perhaps other features to work). I know this is pretty arrogant of me, especially as I haven't the time to construct my own tools, but it seems to me that if *I* can figure out what's in the 16-bit variable gleemFleeble by "simply" opening a Disassembler window and reverse-engineering the code to deduce which register pair got assigned, then maybe the compiler that decided which registers to use should have mentioned that to the debugger.
I can imagine that better symbolic debug hints are probably hugely difficult to generate, but I think they'd be hugely appreciated, too.
64bit double perhaps?
If it hasn't been fixed already, I would like the ternary operator to generate the same code as if/else where applicable.
We had a long discussion about that a couple of months ago - one method turned out better than the other in all the cases mentioned in that discussion, but I have seen the opposite as well. So there is room for improvement of both.
I would also like to see improved nested inlining. I find myself duplicating code because nested inlining generates bigger code than doing the same thing in two inlined functions.
I am willing to spend time on coming up with cases that demonstrate the problems I describe in this thread - but I would like to know that they are likely to be addressed if I do. (That is, I don't want to spend time on it if nobody is going to do anything about it anyway.)
I need better optimisation with LTO in next situation:
typedef struct { struct { volatile uint8_t* const pUDR; volatile uint8_t* const pUBRRL; volatile uint8_t* const pUBRRH; volatile uint8_t* const pUCSRA; volatile uint8_t* const pUCSRB; volatile uint8_t* const pUCSRC; } io; struct { uint8_t cbr; uint8_t byte_len; uint8_t parity; uint8_t stop_bits; } config; struct { fifo_t tx; fifo_t rx; } fifo; struct { on_txc_callback_t on_txc; on_rxc_callback_t on_rxc; on_udre_callback_t on_udre; } callback; owner_t owner; ... } rs485_handle_t; //===================================================== static void rs485_txc_isr(rs485_handle_t* const rs) { rs485_txen_off(rs); *rs->io.pUCSRB &= (uint8_t)~( (1U<<TXEN) |(1U<<TXCIE) |(1U<<UDRIE)); if (rs->callback.on_txc) rs->callback.on_txc(rs->owner); }
It would be nice to have full unroll struct_field reference. Often I have only one used USART in MCU... And I steel (avr-gcc-4.7.2 + all new optimizetion options) see ugly Z->SFR sentence after compilation...
It would be nice to have ENDIAN attribute:
uint32_t BIG_ENDIAN x; uint32_t y = x; // swap bytes on the fly x++; // perform right math if (x>y) {...}
It would be nice to enable effect of pack and allign pragmas and attributes.
Solve situation with
const __flash char* const __flash names[] = {"AAA","BBB","CCC"};
and put all together in flash (like IAR do).
Perhaps the ability to squish up the vector table a little
- Dean :twisted:
I had a hack for the linker that does something like that - time to dust it up and send it out to the mailing list, I guess. See https://www.avrfreaks.net/index.p...
Not sure what the standards say about this but here's an area that could be improved.
a.cpp: int a_value; b.cpp: extern double a_value; void atest() { printf("percent-f\n",a_value); }
This compiles without error despite the variable a_value having been defined with two different types. I presume then GCC C++ doesn't put type information in the object files and therefore the linker has no way of detecting the issue. I tested this on another popular (non-AVR) compiler and it refused to link.
This is a job for lint (or similar static source analysis). You can also try -fno-common or LTO (-flto). With LTO, you'll have global type information.
This is a job for lint
cc1plus.exe: error: unrecognized command line option "-flto"
I have GNU C++ (WinAVR 20100110) version 4.3.3 (avr)
Try a current version of the compiler, for example 4.7.2. You can read the GCC release notes to see when what feature went in.
Oooh - I've always, always wanted a pragma/attribute to unroll a loop that has a constant number of iterations. See the code here:
https://github.com/abcminiuser/l...
I have to duplicate the code, so that I don't incur the loop overhead. It would be nice to write this as:
for (uint8_t i = 0; i < 16; i++) __attribute__((unroll(1))) Dataflash_SendByte(Endpoint_Read_8());
And have it unroll into a smaller number of effectively copy-pasted version of the loop body, where the inner loop could be specified (for example "unroll these 64 loops into 32 loops of the body repeated twice").
This way the same code would also work on other compilers as a normal loop with reduced efficiency.
- Dean :twisted:
You have a test case?
A #pragma or something to automatically unroll loops would be nice. E.g.
void foo(char i); void bar() { #pragma unroll for(char i = 0; i < 8; i++) { foo(i); } }
would generate code similar to
void foo(char i); void bar() { foo(0); foo(1); foo(2); foo(3); foo(4); foo(5); foo(6); foo(7); }
If the unrolling used compile-time constants it would be even better - that would allow iteration over different template classes in C++.
Test case for what? Basically, I would want this:
void BodyFunction(void) { volatile uint8_t* a = (volatile uint8_t*)0x1234; *a++; } void MyFunction { // Unroll 64 loops into groups of 2 for (uint8_t i = 0; i < 64; i++) __attribute__((unrolled(2))) { BodyFunction(); } // Unroll 64 loops into groups of 4 for (uint8_t j = 0; j < 64; j++) __attribute__((unrolled(4))) { BodyFunction(); } // Unroll 64 loops into groups of 8 for (uint8_t k = 0; k < 64; k++) __attribute__((unrolled(8))) { BodyFunction(); } // Unroll 64 loops into groups of 10 for (uint8_t l = 0; l < 64; l++) __attribute__((unrolled(10))) { BodyFunction(); } }
To transform into the equivalent of this:
void BodyFunction(void) { volatile uint8_t* a = (volatile uint8_t*)0x1234; *a++; } void MyFunction { // Unroll 64 loops into groups of 2 for (uint8_t i = 0; i < 32; i++) { BodyFunction(); BodyFunction(); } // Unroll 64 loops into groups of 4 for (uint8_t j = 0; j < 16; j++) { BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); } // Unroll 64 loops into groups of 8 for (uint8_t k = 0; k < 8; k++) { BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); } // Unroll 64 loops into groups of 10 for (uint8_t l = 0; l < 6; l++) { BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); BodyFunction(); } for (uint8_t m = 0; m < 4; m++) { BodyFunction(); } }
Note that the last test case can be tricky - but requiring that the unroll amount is an integer division of the loop count would be reasonable.
- Dean :twisted:
Dean,
I'll be damned. You posted that while I was working out my post - which includes a test case.
Snap! I got in before you, last post of the previous page :). Seems I'm not crazy, and this isn't such a bad idea after all.
Actually, I'm fairly certain other architectures would love to have this -- especially for architecture dependent speedups since you could adjust the unroll parameter based on the pipeline and branch cost of a particular target.
- Dean :twisted:
Sure you read about all the optimization options and --param GCC loop optimizers provide? You might get close to the code quality to want to achieve; without the need for language extensions or explicitly cluttering the source with directives. You may want to try the current development version (future 4.8). New optimizations won't go into stable bugfix-only releases like 4.7.
> A #pragma or something to automatically unroll loops would be nice. E.g.
void foo(char i); #pragma GCC push_options #pragma GCC optimize "-O3" void bar(void) { for(char i = 0; i < 8; i++) { foo(i); } } #pragma GCC pop_options
You might get close to the code quality to want to achieve; without the need for language extensions or explicitly cluttering the source with directives. You may want to try the current development version (future 4.8). New optimizations won't go into stable bugfix-only releases like 4.7.
I want to use -Os for the whole project, but unroll some loops for speed. (And, in the case of C++ templates, this could make a source code loop possible.)
> A #pragma or something to automatically unroll loops would be nice. E.g.void foo(char i); #pragma GCC push_options #pragma GCC optimize "-O3" void bar(void) { for(char i = 0; i < 8; i++) { foo(i); } } #pragma GCC pop_options
For the case at hand it seems a bit heavy, though. Having the option to only unroll the loop would be better. Changing the optimisation level may have side effects.
You can use #pragma GCC reset_options to switch back. It's not actually like push / pop, but should be enough to imlement what you want.
I mean avr-gcc from gcc.gnu.org
Shoot; I'm so far behind I don't even know what's IN the current version.
Is there any way to access the carry bit value? That's one of the things that I see relatively frequently in my asm vs C contemplations; asm can pretty much do 9-bit math, while C can only do 8bit. This is especially annoying when you try to do something that involves rotate (CRC-like calcs, for instance.)
@Jörg: ever tried the optimize function attribute?
> @Jörg: ever tried the optimize function attribute?
No, but I didn't have much need for optimization switching inside
a compilation unit so far. What I did use though in the past was
to disable just a single warning for the next function (like,
"unused arguments") through similar pragmas.
The entire point of my posting above was to demonstrate that the
functionality that was requested is already available right now.
FYI, can accomplish that also without attributes:
void f (char c) { (void) c; }
#define unroll1(count, x) if((count) & 1) { x } #define unroll3(count, x) unroll1((count) & 1, x) unroll1((count)/2, x x) #define unroll7(count, x) unroll1((count) & 1, x) unroll3((count)/2, x x) #define unrollF(count, x) unroll1((count) & 1, x) unroll7((count)/2, x x) #define unroll1F(count, x) unroll1((count) & 1, x) unrollF((count)/2, x x) #define unroll3F(count, x) unroll1((count) & 1, x) unroll1F((count)/2, x x) #define unroll7F(count, x) unroll1((count) & 1, x) unroll3F((count)/2, x x) #define unrollFF(count, x) unroll1((count) & 1, x) unroll7F((count)/2, x x) #define unroll2(count, x) unroll1((count)/2, x x) unroll1((count & 1, x) #define unroll5(count, x) unroll2((count)/2, x x) unroll1((count) & 1, x) #define unroll10(count, x) unroll5((count)/2, x x) unroll1((count) & 1, x) #define unroll20(count, x) unroll10((count)/2, x x) unroll1((count) & 1, x) #define unroll50(count, x) unroll10((count)/5, x x x x x) \ unroll2((count) mod 5 / 2, x x) \ unroll1((count) mod 5 mod 2, x) #define unroll100(count, x) unroll50((count)/2, x x) unroll1((count) & 1, x) #define unroll200(count, x) unroll100((count)/2, x x) unroll1((count) & 1, x) #define unroll255(count, x) unrollFF(count, x)
The first parameter is the number of times the second will be executed.
The macros are named after the maximum allowed value of the first parameter.
One doesn't want too many extraneous copies of the second parameter.
Optimization should take care of the ifs.
Of course, optimization, especially -Os might turn them back into loops.
Hmm. How about 64-bit floating point? Can I claim that that's gcc rather than avr-libc if I'm willing to settle for the standard gcc soft floating point code? (And just how bad IS that, on AVRs?) The tiny avr libm code is swell, but as AVR flash space gets bigger (including XMega), being limited to 32bit precision is getting a bit embarrassing.
If a 64bit FP library remains elusive, how about putting in the infrastructure and defining the ABIs at the avr-gcc level, so that it would be easier to experiment with partial libraries?
Cliff suggested 64-bit double further up in this thread. I think it is a good idea if (and only if) it can be an option. I assume that would require a lot of work.
One problem with just implementing double as a 64-bit type is promotion rules and library interfaces. Existing code will incur data size and performance penalties. Sometimes, you probably want that because of the improved accuracy, but other times probably not - if the code already works, why "improve" it ?
One problem with just implementing double as a 64-bit type is promotion rules and library interfaces. Existing code will incur data size and performance penalties. Sometimes, you probably want that because of the improved accuracy, but other times probably not - if the code already works, why "improve" it ?
Note that the last test case can be tricky - but requiring that the unroll amount is an integer division of the loop count would be reasonable.
#define justonce ... for(unsigned char j=64/10; j; --j) { unroll10(10, justonce); } unroll5(64 mod 10, justonce) #define chunker(msize, mrem, count, size, x) \ for(unsigned char j=(count)/(size); j; --j) { msize(size, x); } \ mrem((count) mod (size), x) chunker(unroll10, unroll5, 64, 10, justonce)
The standard math functions have float as well as double versions.
Existing code should have used them.
Functions that use variable arguments lists will have to deal with the promotion to double.
Wouldn't' this loop-unrolling stuff be fairly easy to add as an m4 macro, by bringing back the ability to specify m4 pre-preprocessing (embarrassment in advance if the gcc script already takes an obscure flag for doing that)? All you need is a more powerful preprocessor that can do conditionals & loops...
The standard math functions have float as well as double versions.
Existing code should have used them.
Nope. Existing code *does* use "double", but they are limited to 32
bits. Still, it's completely OK for the code to use the default
(double) functions rather than their "f" counterparts.
If double is optionally implemented as 64 bits (something I'd also
like to see), it requires two sets of respective libraries which
have to be chosen by the compiler at link-time.
skeeve wrote:The standard math functions have float as well as double versions.
Existing code should have used them.
Functions that use variable arguments lists will have to deal with the promotion to double.
But the promotion rules apply to expressions as well ?
ChaunceyGardiner wrote:In the OT days, but not now.skeeve wrote:The standard math functions have float as well as double versions.
Existing code should have used them.
Functions that use variable arguments lists will have to deal with the promotion to double.
But the promotion rules apply to expressions as well ?
int i; int j = i / 1.234f;
I'm not sure if this really belongs to this thread ... I tried IAR today for the first time. I tried an existing project, for which I normally use WinAVR. I ended up with a little bit smaller code. I looked at the code and noticed that in many cases several identical sequences of code get replaced with a subroutine and an rcall. In the past I used to use CodeVision, and I know that it does that too (well, at least v1 did). In the IAR case, for example, if there are several assignments to the same variable, which normally would get translated to a 4 byte STS, IAR would instead generate a subroutine with an STS, and several 2 byte RCALLs.
Does GCC do anything like? I see that there is a -ftree-tail-merge, but I guess that is only for "tails". Is there something for identical sequences in the "middle"? If not, can something like that be added?
Regarding new features in general - I'm really quite happy with avr-gcc the way it is. Better optimization for size is the only thing that I would want (so that we beat those other guys).
There is cfo-branch (code factoring optimization), abandoned several years ago and removed from mainline because it turned out not to work well. The command line option was -frtl-abstract-sequences
I think you are aware that this is work of several man-years (not counting the study of the theoretical background, testing, etc...)
Compiler-generated functions seems like a very bad idea to me. At the very least, it would have to be off by default even when -Os is used.
Breaking out functions can reduce the code size a lot. But I don't think that well written programs will gain much code density.
This optimization can help with spaghetti code from auto-generators that are agnostic to the mess they produce. I saw parts of auto-generated C-code from a customer with functions with 96 (yes, close to 100) arguments!
I am not sure if it would work well as mini-optimization with load / stort: You'll have to fix the involved GPRs which may lead to additional MOVes to adjust the registers. It would work well if there is no sophisticated register allocation like in BASCOM, but with GCC you must be very very careful or the effect is completely detrimental.
Besides that, the complexity of such a work is far beyond anything I can do in my spare time.
I'm really quite happy with avr-gcc the way it is. Better optimization for size is the only thing that I would want
skeeve wrote:ChaunceyGardiner wrote:In the OT days, but not now.skeeve wrote:The standard math functions have float as well as double versions.
Existing code should have used them.
Functions that use variable arguments lists will have to deal with the promotion to double.
But the promotion rules apply to expressions as well ?
Are you saying that the following code will not use double ?
int i; int j = i / 1.234f;
ChaunceyGardiner wrote:Yes.skeeve wrote:ChaunceyGardiner wrote:In the OT days, but not now.skeeve wrote:The standard math functions have float as well as double versions.
Existing code should have used them.
Functions that use variable arguments lists will have to deal with the promotion to double.
But the promotion rules apply to expressions as well ?
Are you saying that the following code will not use double ?
int i; int j = i / 1.234f;
There is something to this, though - making double a 64-bit data type will most likely carry a performance penalty for some expressions, such as:
int i; int j = i / 1.234;
There is something to this, though - making double a 64-bit data type will most likely carry a performance penalty for some expressions, such as:
int i; int j = i / 1.234;
ChaunceyGardiner wrote:No, it won't if it is properly implemented :-)There is something to this, though - making double a 64-bit data type will most likely carry a performance penalty for some expressions, such as:
int i; int j = i / 1.234;
Are you saying that a constant in the form N.N is not a double, or that the compiler will treat that expression as a float division even if the constant is a double?
How about this:
float f; f = f / 1.234;
There must be a penalty somewhere in this neighborhood. Maybe it has to involve a constant that can't be properly dealt with as a float ?
The 32-bit doubles are hard-coded in avr.h and don't care for -fno-short-double. Therefore, double support would just
#define DOUBLE_TYPE_SIZE (flag_short_double ? 32 : 64) #define LONG_DOUBLE_TYPE_SIZE (flag_short_double ? 32 : 64)
together with setting -fshort-double as the default.
64-bit doubles would be an ABI change and therefore the default for sizeof (double) should still be 4.
Users who need 64-bit would use -fno-short-double.
SprinterSB,
You just confirmed that I am right about the performance penalty associated with 64-bit double, including expressions like one or two of those I posted above.
I am glad to see that 64-bit double will be optional. That was my main concern:
Cliff suggested 64-bit double further up in this thread. I think it is a good idea if (and only if) it can be an option.One problem with just implementing double as a 64-bit type is promotion rules and library interfaces.
One related question remains, and that is how the standard libraries will deal with this option.
-fno-short-double must be promoted to a multilib option (just like -mmcu= or -msp8).
Headers might need adjustment per __SIZEOF_DOUBLE__ or builtin_types_compatible, and of course the standard libraries must adopt libgcc's multilib layout.
avr-libc does not automatically adopt multilib layout from gcc, and even if it will not help because the underlying routines (printf, sin, atanh, etc.) are missing.
Conclusion is that you use newlib as the LibC of your choice. newlib supports -fshort-double as multilib option and the headers contain proper prototypes (avr-libc contains stuff like #define sinf sin).
The primary focus is not performance, it's ABI compliance and stability.
The performance, however, will be so bad that nobody will use it.
If people really need higher precision a wayout would have been long long accum (1 sign, 16 integral bis and 47 fractional bits). But as I just learned, nobody will use fixed-point either (the long long accum is a GCC extension and not even mentioned in the non-standard TR18037).
Next conclusion is to switch to a reasonable platform for your project: ARM, PowerPC VLE, Renesas V850, Infineon XC2000, TI MSP430, Atmel AVR32, Infineon TriCore, Microchip PIC32, embedded MIPS, National CompactRISC, Renesas H8, ...
There are so much choices that are so much more reasonable if double is inevitable in your design...
Back to AVR. Even if only some lines have to be changes in avr-gcc this won't go into 4.8. It's too intrusive for the current stage.
Skimming the proposals, I get the following summary:
Issues that are too com(P)lex for me, (N)ot AVR-specific, realm of Lib(C) / (C)rt0 / (B)inutils, or are already e(X)isting, or (Q)estionable, or not (F)easible, or conflict with the (S)tandard and need GCC-extension that is non-compliant
Some remarks on the above list:
From time to time I am contributing to the AVR backend of GCC in my spare time. The AVR back is a vanishing small part of the GCC code base (currently 32000 LOC in 37 files) which is far less than 1% of GCC's code base. Almost any part of GCC is complex, highly non-trivial, mined with technical debts and legacy structures, etc.
Therefore, I focus on the AVR part which has no impact or side effects on other architectures, the front ends the optimization passes, general structure of GCC, etc. And I am focusing on avr-gcc (not Binutils, not AVR-Libc, not avr-g++).
This limits the number and complexity of features I am considering as appropriate to be handled in my spare time. And what to add to the compiler is also a matter of preference and personal interest. (Anyone who is unhappy with these preferences can contribute to the project and put more emphasis on his / her personal preferences.)
The eXisting marker above means that the feature, or very similar feature, is already present in GCC.
For example, adding a #pragma to force / inhibit a particular transformation, has almost zero chance to go into GCC: It would result in hundreds of #pragma for every imaginable optimization / transformation.
GCC's approach is to keep the code clean and provide command line switches and paramaters to fine-tune the optimizers. Yor really read and tried all the command options and params GCC provides for the unroll tuning you are trying to achieve? You won't find your exact requirement, but very likely you find one that's very close.
For features like warnings, where you think they can be improved, I'd propose that you file a problem report in GCC's bugzilla or ask in the gcc-help@ mailing lists. If your are really interested in the feature, there is no way around getting in touch with the GCC developers. Maybe you found a bug or shortcoming, maybe you misunderstood something, maybe it is for historical reasons, maybe it is too complex or nobody has his / her focus on the matter.
In any case, make a proper problem report: Say what compiler version you are using, what command switches, what source code, and what diagnostics it produces or is intended (not) to produce. Stuff like "AVR Studio" means nothing to the general GCC developer, so does #include
Make sure you are using a supported version of GCC (at least 4.6, in 1 or 2 months it will be at least 4.7). In the case it is not AVR-specific, you can try to reproduce it on a primary (very important) platform like x86, PowerPC or ARM. Problems with ternary (what the fuck is...?) platforms like AVR are automatically low priority and get less attention.
Attitudes like
Coming back to the feature list, the following points are remaining:
Fixed-point Support
I see the point that it is non-standard. Most applications that need fixed-points have their own, home-brew, more flexible, standard-compliant fixed-points implementatios.
At least a single user would find it helpful, but nobody uses fixed points which are available in wide-spread avr-gcc 4.6 from Atmel Studio.
So either remove the scrap and clean up the compiler, or bring it to a reasonable implementation. You made me really unsure...
The open coded C arithmetic in libgcc will have a bloat factor of (estimated) 3 to 10, compared to an assembler implementation maybe even higher.
Will AVR-LibC implement double? Or will you use newlib?
Try a current version of the compiler, maybe it's already fixed. Some known code size problems are hard to fix. If the problem is not already reported, file a report (see above) so that the issue is known to the developers, won't be forgotten and there is a test case that can produce the artifact. Set "target" to "avr".
For some base arithmetic like shifts I thought about a fster implementation even at -Os, but that might code some more instructions per shift. Shift loops are very slow, but shifts are very basic in programming and not uncommon.
I don't know if it is wide to be more catholic than the pope with code size vs. speed at -Os. If one or two instruction more cost you several hundreds of cycles, what should be done? Be overly strict or use a reasonable implementation at the expense of 1 or 2 instructions more?
This is also hard because there are no benchmarks — neither for code size nor for speed — for avr-gcc. Official banchmarks like Autobench, Coremark, Spec2000 etc. oviously make absolutely no sense. Same for the programs in the GCC test suite.
Finally, I am ashonished about some missing points like "more device support". So it turned out that the current compiler covers a reasonable subset (~180 or so) of the hundreds of AVR derivates.
The AVR back is a vanishing small part of the GCC code base (currently 32000 LOC in 37 files)
"more device support"
64-Bit doubleSome ansers and thoughts are above. I see the "Standard C" point and the precision argument. But I cannot imagine anybody will really use this. Will this really be preferred over a pixed-point implementation? Who will provide a reasonable implementation of the base arithmetic?
The open coded C arithmetic in libgcc will have a bloat factor of (estimated) 3 to 10, compared to an assembler implementation maybe even higher.
Will AVR-LibC implement double? Or will you use newlib?
On processors without hardware multiply, one might as well use 4 and 8 bytes.
Using base 0x100 won't offer much, if any, speed up.
Would that distinction cause a problem?
Ah. Would you be willing to explain, in a way that might be understandable to someone who flunked their compiler class 30+ years ago, what kind of functionality is located in the "back end" and how it is interfaced to?
I think the back end refers to the last two.
Others can be more specific about the "one or more"s.
Quote:Ah. Would you be willing to explain, in a way that might be understandable to someone who flunked their compiler class 30+ years ago, what kind of functionality is located in the "back end" and how it is interfaced to?The AVR back is a vanishing small part of the GCC code base (currently 32000 LOC in 37 files)
• gcc/config/avr/
• libgcc/config/avr/
There are few more parts (not mentioned above) that deal with making the avr BE known to configure, e.g. add configure --target=avr support, documentation, AVR-specific tests.
Compiling a program like
The first, important intermediate representation (IR) is tree-SSA. It's target independent but the optimizers may take into account --param parameters and -f or -O command options to scale the passes that deal with tree-SSA.
The first ~150 of ~230 passes deal with this IR. After that, the code is lowered to RTL (register transfer language) which is basically an algebraic, lisp-like representation of target instructions.
The remaining ~80 passes deal with RTL. When the IR has to be lowered to RTL, the program has to be represented by the given building blocks: The insns written in RTL. This step is the analogon to an assembler programmer who has to construct his algorithm out of a given set of building blocks (the assembler instructions).
The smaller the pass number is, the more will the code look like the original C code (i-file). And the higher the pass number is, the more will it look like the produced assembler code (s-file).
GCC is a retargetable compiler and you can write a new backend to support your favorite architecture. You can support your own FPGA instruction set design if you like and have all of GCC's option passing and optimization and library framework available.
The target specific bits are described in the BE:
An example for a simple instruction in the avr.md machine description is 16-bit subtraction. The RTL optimizers just use the meta informations and RTL representation. The instruction output for 16-bit subtract is irrelevant and effectively an agnostic printf to the s-file. It might take advantage of mini-optimizations, of course.
This structure makes it easy to add a new target architecture to GCC: For a first, sketchy implementation for an architecture an experienced GCC guy would need around 2-3 month. The "easy" is compared to writing a complete compiler and from scratch.
However, this structure also limits what a backend can achive. Leaving the backend's sandbox, changes are no more target specific and will affect all other backends, too.
Wow.
However, this structure also limits what a backend can achive. Leaving the backend's sandbox, changes are no more target specific and will affect all other backends, too.
With LLVM it's the same: If you change the inline machinery (not only costs or heuristics), any target is affected. Similar for the C front etc. YOu are limited to hooks or methods you can override. Correct me if I'am wrong.
It's also possible in GCC to write complete, target-specific passes. But it is constly w.r.t to maintenance and development. Nobody has done this so far.
Yet another way to hook in is plugins like DragonEgg. You don't even need to change the compiler sources for that.
Quote:I wouldn't mind seeing a more "modular" method of specifying a target; sort of like the 68000 series where you dig out appropriate values for -mno-bit-fields and similar instead of having a separate name for every possible chip. It's a bit silly for avr-gcc to have separate switches for m168, m168p, m328, m328p, AT THE COMPILER LEVEL when they're pretty much the same except for memory size, isn't it?"more device support"
But I was told that the "68000 approch" is too complicated for our users, that they won't understand it, won't use it, and only 1-click solutions will be used.
Nonetheless I wrote some lines in the Wiki and filed 2 extension requests.
The "modular" approach means that Binutils have to learn the instruction set: PR15043
The avr tools currently support around 200 devices and there are more, unsupported devices.It makes hardly any sense to add a -mmcu=device switch for each and every target: For binutils, it's enough to know the instruction set architecture (ISA) to assemble for.
Thus, command line switches like -mdes and -matomic could greatly reduce the time until new devices are supported, because the compiler need not to wait until respective support in binutils is available:
The compiler could just call avr-as with -mmcu=core -mdes if it knows that the device supports the DES instruction and with -mmcu=core -mno-des, otherwise.
The options shall work as follows:
With -mdes, the assembler will accept and assemble the DES instruction. With -mno-des, the assembler will issue an "unknown instruction" error.
If a device is specified with -mmcu= then the assembler knows the right setting for -m[no-]des and uses it, provided it is not explicitly overridden by -m[no-]des. Alternatively, -m[no-]des could be ignored in that case.
If a core like avr4 definitely does not support DES instruction, the option is ignored and the behavior the same as it would be with -mno-des.
Similar for -m[no-]atomics and the instructions XCH, LAC, LAS, LAT.
There was the proposal to supply an external file that GCC would read and evaluate. This is a bad idea, IMO, because all what the compiler driver (avr-gcc) would to is to call the tools with specific options: compiler proper, assembler, linker.
Conclusion is to call the compiler with these options, e.g. provode a Makefile snip and is $(MY_MMCU) instead of -mmcu=my_mmcu.
The assembler source code of the startup code ./crt1/gcrt1.S could be added to the installation.This makes it easier for the user to adjust the startup code and / or compiler it for devices that are not yet supported by avr-gcc as lined out in avr-gcc Wiki: Supporting "unsupported" Devices.
The onyl dependency should be #include
. Other dependencies like macros.inc and sectionname.h can easily be avoided altogether, which makes using gcrt1.S more straigh forward.
avr-gcc and avr-as support the -mmcu=device command line option to generate code for a specific device. Currently (2012), there are more than 200 known AVR devices and the hardware vendor keeps releasing new devices. If you need support for such a device and don't want to rebuild the tools, you canApproach 1 is comfortable but slow. Lazy developers that don't care for time-to-market will use it.
- Sit and wait until support for your -mmcu=device is added to the tools.
- Use appropriate command line options to compile for your favorite device.
Ideas:
Add __eeprom keyword similar to __flash, or ideally release ability to add custom sections with user-defined functions read and write. For example __eeprom_i2c or __flash_at45 sections.
Add __eeprom keyword similar to __flash, or ideally release ability to add custom sections with user-defined functions read and write. For example __eeprom_i2c or __flash_at45 sections.
Embedded C knows two different kinds of address space qualifiers:
The way how the eeprom must be read / written is too complicated to be appropriate for an intrinsic space. There is too much hardware and SFR stuff involved that I would not integrate into the compiler.
I already decided that I'll give the fixed-point support a roundup. It makes no sense to remove it because it might not be used frequently. It makes the backend more complicated, but if it would destabilize the backend in an unappropriate way, I would not have added / ported it in the first place.
Almost all of the features mentioned in this thread are far beyond anything I will consider. Rounding up fixed-point is one reasonable feature (for its complexity and my time frame, AVR-specific, not already implemented or inappropriate, ...)
This forum was a really bad place to ask; next time I will use the mailing lists.
The way how the eeprom must be read / written is too complicated to be appropriate for an intrinsic space. There is too much hardware and SFR stuff involved that I would not integrate into the compiler.
eeprom int foo = 37; int bar; bar = foo;
and it does indeed generate EEAR/EEDR code to read foo.
Anybody who feels inclined to add this to GCC is invited to propose a working patch. I won't, it's insane to hack more than 500 SFR definitions into the compiler.
Anybody who wants to implement application-defined address spaces into the compiler can do if he likes. I won't; I beyond my time frame.
Besides that, such a feature is diametral to getting the tools more independent and provide easier device support as outlined by Bill in https://www.avrfreaks.net/index.p...
Anybody who feels inclined to add this to GCC is invited to propose a working patch. I won't, it's insane to hack more than 500 SFR definitions into the compiler.
No doubt I'm oversimplifying things, but couldn't this just map the reads and writes to a known function name that avr-libc could then map to? The EEPROM code is already in avr-libc, it just needs to be automatically wired up.
- Dean :twisted:
Read eeprom.h to see why this cannot work.
How is it with Joergs patches now ?
Are they needed now ?
Could they be built in so no patching ?
What did they do anyway ?
John
What are "Jörg's patches"?
What are "Jörg's patches"?
They hang out around here.
http://www.freebsd.org/cgi/cvsweb.cgi/~checkout~/ports/devel/avr-gcc/files/
Bingos scripts, the sticky at the top of this section, explain how to use them.
John
These patches are for 4.5.x and completely outdated.
If you need them for 4.8, you'll have to throw them away and reimplement them.
Some stuff like PR46779 and PR18145 are my work and already integrated, for example Jörg backported PR46779 to 4.5 before it was upstrem to 4.5.
Fixed-point support is based on Sean's work but quite different now w.r.t its details.
Nobody ever proposed Tiny10 support, and IMO makes no sense for 4.8 which will be released before very long (~ 1-2 months or so). Typical approach is that work is dublicated, i.e. added to Atmel port and then reimplmented again for official support.
The device support is a mess. I lost track over the > 200 devices, I gave up on that.
OS_main and OS_task are upsteam since 4.7, after I found a documentation.
NVM attribute is not supported because there is no documentation. Adding undocumented stuff is pointless.
Another future request:
Add warning or error in case of:
extern const char* p; extern const char __flash* pf; extern const char x; extern const char __flash xf; p = pf; // error pf = p; // error pf = &x; // error p = &xf; // error
like IAR does.
I think, this is very useful...
extern const char* p; extern const char __flash* pf; extern const char x; extern const char __flash xf; p = pf; // error pf = p; // error pf = &x; // error p = &xf; // error
p = pf; // error ^:6:1: warning: type defaults to 'int' in declaration of 'p' [enabled by default] :6:1: error: conflicting types for 'p' :1:20: note: previous declaration of 'p' was here extern const char* p; ^ :6:5: warning: initialization makes integer from pointer without a cast [enabled by default] p = pf; // error ^ :6:1: error: initializer element is not constant p = pf; // error ^ :7:1: warning: data definition has no type or storage class [enabled by default] pf = p; // error ^ :7:1: warning: type defaults to 'int' in declaration of 'pf' [enabled by default] :7:1: error: conflicting types for 'pf' :2:28: note: previous declaration of 'pf' was here extern const char __flash* pf; ^ :7:1: error: initializer element is not constant pf = p; // error ^ :8:1: warning: data definition has no type or storage class [enabled by default] pf = &x; // error ^ :8:1: warning: type defaults to 'int' in declaration of 'pf' [enabled by default] :8:1: error: conflicting types for 'pf' :2:28: note: previous declaration of 'pf' was here extern const char __flash* pf; ^ :8:6: warning: initialization makes integer from pointer without a cast [enabled by default] pf = &x; // error ^ :9:1: warning: data definition has no type or storage class [enabled by default] p = &xf; // error ^ :9:1: warning: type defaults to 'int' in declaration of 'p' [enabled by default] :9:1: error: conflicting types for 'p' :1:20: note: previous declaration of 'p' was here extern const char* p; ^ :9:5: warning: initialization makes integer from pointer without a cast [enabled by default] p = &xf; // error
That throws plenty or errors and warnings...
int main(void) { static const char __flash strf[] = "const flash str"; static const char str[] = "const ram str"; const char* p = strf; // error const char __flash* pf = str; // error p = pf; // error pf = p; // error return 0; }
avr-gcc-4.7.2 -S -mmcu=atmega8 -Os -pedantic -Wall -Wextra -std=gnu99 main.c
And we have no warnings and no errors...
http://gcc.gnu.org/bugzilla/show...
Thx to Georg-Johann Lay!
Another future request based on http://gcc.gnu.org/ml/gcc-help/2007-07/msg00342.html:
Add
I don't see need for a header like
__FOO, however, starts with __ and is thus matter of the implementation, not of the application. This means avr-gcc may implemenet __BIG_ENDIAN, __PDP_ENDIAN, __IEEE_BIG_ENDIAN ans similar to whatever is likes.
Thank You for your answer!
I know that
And I see all endian-related gcc symbols...
echo "" | avr-gcc -E -dM -x c - | sort #define __ORDER_LITTLE_ENDIAN__ 1234 #define __ORDER_BIG_ENDIAN__ 4321 #define __ORDER_PDP_ENDIAN__ 3412 #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
But it is annoying to handle all this stuff for big number of compilers. (gcc armcc iar keil intel ...).
IMHO It's time to standardize
Well, then just propose patches to all these great compilers. Posting here is pointless as you know...
In my project which uses 160kB flash, >128kB of code and >30kB program data I suffer from two things which should belong to gcc I think.
1. possibility to get return address or even better collect call trace
- this is important for tracing rare errors, e.g. at runtime I can realize in function that some sporadic internal error happend but I am imposible to check call chain, only can check debug logs but offten there is no information need for specific scenario or are not enough verbose at this momemnt..
2. posibility to somehow initialize far pointer at load time, e.g.:
const char pstr_can[] __attribute__((__section__(".progmem_far"))) = "Txt"; uint_farptr_t name = (uint_farptr_t)pstr_can;
throw error: error: initializer element is not constant
regards
You can use avr-gcc 4.7, binutils 2.23 and the 24-bit pointers from the __memx address space.
You can get the return address from __builtin_return_address(0). 0 is the obly supported level. Higher levels would require ABI change and a new frame layout.
About __memx, I updated to binutils 2.23 (gcc was 4.7.2) and initialization seems to works now but not sure if this is usable for me because by default __memx variables are placed into .progmem.data which is located at beginning of flash.
Moving .progmem.data to end of flash is not a case bcs will move others variables.
Code below is not working correctly I think:
const __memx char pstr_tst[] __attribute__((__section__("._progmem_far"))) = "Tst";
attribute progmem_far is ignored and variable is placed at .progmem.data, is this ok or bug ?
Is it possible somehow to change __memx object placement from progmem.data to something different ?
or distinguish __memx objects at linker script ?
About 64-bit floating point.
What about to make 32-bit float, 32-bit double and 64-bit long double?
This will not change an existing code, as of promotion rule for functions with variable number of arguments is from float to double.
But it will make possible to use 64-bit floating point when it is really necessary.
In internet there are some 64-bit floating point libraries, so as people wrote them they was necessary.
Maybe I am writing in the wrong topic, but I hope that Joerg read this message.
With the advent of 24-bit integer data-types, it would be logical implemented in the library stdlib:
type div24_t
and functions div24 (similar div and ldiv).
Thx!