In the hope this will help at least a bit.
As for resorting to inline assembler, I agree that would let me express the operation succinctly. The mechanisms that gcc's inline assembler requires you to use in order to reference the C variables, though, are more complex and (to me) obscure than the most bizzare C++ feature ever thought of being on its worst day. In the gcc docs, the "asm constraints" are explained by statements like, "When making an adjunctive covariant reference to an interstitially reflexive variable, prefix a treble-clef character to the intermediate term's constraint". I do use inline asm for cases where access to the carry or T flag(s) is key. But not being able to understand the documentation (which I suspect requires more knowledge about the compiler internals than I'm ever likely to have) makes each use fraught with worry. I try to be sparing. I have to remain in awe of folk like you who at least *seem* comfortable with it.
A few conclusions:
- in my opinion, in cases where tighter control over mcu resources (mainly execution time, but also memory) is required, assembler is to be deployed.
There are two forms of it: inline or not.
- complete functions in assembler are preferred because of the issues described in following. These can be conveniently written in a "normal" syntax to a separate ".S" file and subsequently linked to the rest.
The avr-gcc ABI is to be observed for the "rules" such as parameter passing, register usage etc.
The function can be written first in C, then compiled only (using e.g. the -S command-line switch to avr-gcc), and the resulting assembler file can then be used as a template or starting point for modifications.
- Inline assembler is preferred for short snippets to be inlined to C functions, replacing possibly a poorly compiling portion, where the overhead of function call is untolerable, and/or an intensive usage of variables already present in registers is a bonus. In rare cases, inline assembler can be used to implement features unimplemented by the C compiler, e.g. Carlos Lamas' GET_FAR_ADDRESS() macro.
- Generally, there is not enough quality documentation on the inline assembler's details. Basic reading is the "cookbook", but it lacks a couple of details needed for the insight. The following text attempts to explain these bits and should be read in conjunction with the "cookbook" (ideally, parts of it be merged into it in the future).
The reference source is supposed to be the relevant portions of gcc manual (or, better than the online version, the offline version installed alongside with the compiler at [WinAVR]/doc/gcc/HTML/gcc-[version]/gcc/Extended-Asm.htm and [same path]/Constraints.htm); however, that is notoriously poorly organized and sparse.
- The exact effect of "volatile" keyword in conjunction with "__asm__" keyword may be different from some expectations (namely it does not provide "code barrier", preventing reordering of code). It merely tells the compiler (optimizer) not to remove the code, even if it appears to do "nothing" (e.g. if it has no input/output operands (in which case the compiler sets the __asm__() statement implicitly volatile)) as it may have side effects (similarly to variables tagged volatile and (seemingly redundant) accesses to them). The compiler still can remove such code if it proves that it is never reachable.
Similarly the effect of clobbers (including "memory") on code reordering. While these ensure correct ordering of variables accesses (and thus correct execution of the abstract machine as per the C language specification), it won't prevent reordering of apparently unrelated statements, which may have impact on timing (which is irrelevant for the language, but utmost important for many microcontroller applications). The canonical example is reordering of lengthy operations across __asm__("cli")/__asm__("sei"), resulting in interrupts being disabled significantly longer than intended (with detrimental effect on interrupts' latency).
Although dramatic code reordering across __asm__() statements, or even reordering of __asm__() themselves, is rarely observed in the wild, the result of compilation shall be checked for correctness (the brave may also pester the high esteemed gcc developers for remedy).
- The inline assembler syntax is ugly, too verbose (i.e. a lot of "extra" characters (doublequotes, backslash-escaped sequences) to be typed in every line, in contrast to "normal" assembler) and hard to edit.
For some recommendations to alleviate this problem (e.g. using $ as a line separator) see discussion below.
For better understanding of the inline assembler, the developer must understand all the steps of a C program translation. (It is informative to have a look at the results of all steps of translation of an inline-assembler piece, produced e.g. using the --save-temps command-line switch to avr-gcc).
- Preprocessing.
Generally, the assembler snippet itself, i.e. the first "parameter" of the __asm__() statement, is a single string, with all the consequences of this fact:
- The usual syntax for asm is to write each statement into a separate line. Strings cannot exceed a source line, so the following is incorrect:
__asm__( " ldi r16, 3 ldi r17, 5 " );
Fortunately, the compiler in the first steps of compilation concatenates strings with only whitespace between them (which includes end-of-line characters), so the following is correct (from the point of view of preprocessor and compiler, not assembler, see below):
__asm__( " ldi r16, 3 " " ldi r17, 5 " );
We now fast forward to the point where the compiler takes the string, strips the quote characters, concatenates it, replaces the operands (see below) and passes the processed code to assembler. This string above would result in the following output in .s file:
ldi r16, 3 ldi r17, 5
But the avr-as assembler requires a single instruction on a line, so whatever goes after the "ldi r16, 3" sees as "garbage at end of line" (this is the expression used in the actual error message). As the compiler just before concatenation of strings also converts the escaped characters, we shall then insert escaped end-of-line characters to the strings, which results in the familiar ugly syntax (\n would be sufficient; \t is added for tidy look in the output listfile):
__asm__( " ldi r16, 3 \n\t" " ldi r17, 5 \n\t" );
- the preprocessor does not perform macro expansion within strings. This is why you can't write e.g.
__asm__("in r16, PINB");
Constants defined through macros must therefore be passed through operands/constraints (see below).
Sometimes, if the number of asm parameters is exhausted, numeric constants can be stringified and then concatenated with the asm statements' string, e.g.
#define STRINGIFY_(a) #a #define STRINGIFY(a) STRINGIFY_(a) #define THREE 3 __asm__( "ldi r20, " STRINGIFY(THREE) " \n\t" "ldi r21, 5 \n\t" );
- The usual syntax for asm is to write each statement into a separate line. Strings cannot exceed a source line, so the following is incorrect:
- Compiler (after string de-escaping and concatenation) parses the string for % characters (and subsequent "tokens") and attempts to replace them for the respective operands; either through order or through name. This is described sufficiently in the "cookbook" together with their syntax.
What should be discussed more is the meaning of "constraints" associated with the operands. Let's start with that the word "constraints" is misleading: while the compiler indeed performs certain "sanity checks" on the operands, they mainly prescribe how the compiler shall treat the operand before substituting it for the %-token. This will be dealt with in more detail below.
Also, there is a table in the "cookbook" which seems to prescribe, which constrain is to be used with which instruction; this is completely misleading, as there are almost always multiple constraints leading to correct parameters for an instruction.
The ugliness of the syntax of constraints can be partially alleviated by grouping them to a "vertical structure", e.g.
__asm__( "[... some asm statements ...]" : [p] "=&e" (__tmpptr) : [d] "r" ((uint8_t)(__data)) , [offsetTx] "M" (offsetof(TRs485Buffers, uartTx)) , [mask] "M" (RS485_TX_BUFFER_SIZE - 1) );
but of course that is upon one's individual preference.
There is also a limit (16 or 32, I don't remember the exact number) on the number of operands in a single __asm__() statement, so in more complex snippets you can run to troubles; the first to give up in this situation are the constant input constraints (see in the following).
- Let's start with "input" operands, as they are simpler. There are basically two types of constraints: "constants" and "registers".
The first type results simply in a constant known at compile (or link) time, e.g. value of an expression, offset of a struct member and similar. Typical "constant" constraints are "M" and "I". "X" is also an interesting and useful constraint, meaning "any operand whatsoever ;-)".
The second type of constraints instructs the compiler to supply a register (or a set of consecutive registers, see below) containing the *current value* of variable or variable expression. So, the compiler sets aside the required number of registers for the use in __asm__() statement, fills them with the value of expression, and replaces the respective "%-tokens" in the assembler string by the appropriate register names.
The number of registers set aside is given by the effective type of the "assigned" variable or expression. The "higher" registers of the range are then accessed using the "%A", "%B" etc. mechanism as described in the cookbook.
Typical "register" constraints are "r", "d", "w", "x" to "z".
- output operands - these are to say to the compiler, "supply a register name (or register range) for the given variable, the __asm__() will fill those registers, and then store them to their respective variables". Thus, only "register" type of constraints are meaningful together with direct variable names (no expressions). There must also be an extra character just before the constraint letter: "+" if the same variable is to be read before and written after __asm__() (i.e. if its previous value is essential to the asm() construct), "&" if an unique register shall be used (the compiler has a nasty habit of assigning the same register for unrelated input and output operands unless explicitly told by this character not to do so), "=" otherwise.
- the clobbers are described in the "cookbook" in sufficient detail
Sometimes the algorithm in the inline asm snippet needs an extra register or two above those allocated by the "register" type of operands. While these can be explicitly named as "r16" etc. and then listed in the clobbers, this is a "bad thing" to do, as it prevents the compiler to chose an optimal register (e.g. unused at the moment by the surrounding code). The way how to do this "properly" is to define a local (best to do it as local as possible, i.e. create a block just around the __asm__() statement and define it there) C variable (never to be used in C itself) and the assign it through an (output) operand, e.g.
unsigned char reg1; __asm__( "lds %[r1], 55 \n\t" : [r1] "=r" (reg1) );
Don't worry about memory usage: the optimizer finds out that such variable is not used and will not allocate any memory for it.
One of the often needed operation is to access real C variables. The variable has to have statically allocated memory (has to be global or local static). (There may be a way how to determine the offset of a local variable in the stack frame through an operand - I have not tried this so far). There are several ways how to achieve that:
- use the variable name directly in asm to load/store its value
unsigned char blah; __asm( "lds r16, blah \n\t" : : : "r16" );
The name will be resolved by the linker, so the variable does not need to be local to the file where the __asm__() snippet is. Expressions involving constants offsets (which are resolved by the linker, too) are also allowed. Note that clobbering directly a register is a "bad thing", as explained by the "cookbook" and above; apart of that, this method "spares" down an operand, if this is desired
- use the direct address to load a pointer (which is subsequently used in the assembler) (in case of "function pointers", the pm() and gs() assembler operators should be used appropriately).
unsigned char array[4]; __asm__( "ldi r30, lo8(array) \n\t" "ldi r31, hi8(array) \n\t" "ld r16, z" : : : "r16", "r30", "r31" );
This is very similar to the previous, and useful to process arrays or structs in the __asm__().
- perform any of the two methods above, except instead of direct entering of variable name supply it through a "constant" constraint operand, e.g.
unsigned char array[4]; __asm__( "ldi r30, lo8(%[array_]) \n\t" "ldi r31, hi8(%[array_]) \n\t" "ld r16, z \n\t" : : [array_] "X" (&array) : "r16", "r30", "r31" );
(there is little benefit in this method, unless the address is computed compile-time, e.g. member of a struct with an offset)
- instruct the compiler to set aside a register pointer pair and fill it by the value of pointer to the variable:
unsigned char array[4]; __asm__( "ld r16, %a[arr] \n\t" : : [arr] "e" (array) );
This is useful in case the __asm__() snippet is going to perform some operation over an array or a struct; this is also useful for local variables (the compiler knows how to calculate the address of variable on the stack frame into a register pair)
- instruct the compiler to set aside a register (registers) and load the variable into it -- this is the purpose of the "register" type of constraints (input or output or both, depending how the __asm__() is going to use the variable), and this is also the preferred method allowing to the compiler to perform optimizations at its will
unsigned char array[4]; __asm__( " ; variable from array[2] already loaded into %[r1] \n\t" : : [r1] "r" (array[2]) );
- Let's start with "input" operands, as they are simpler. There are basically two types of constraints: "constants" and "registers".
- assembler
Except for the one-instruction-per-line problem and replacement of "%-tokens", both described in previous text; the assembler syntax and semantics follows the same rules as avr-as - naturally, as the outcome IS assembled by avr-as. Unlike the standalone ".S" file solution, there is no preprocessing of the ".s" coming out of compilation, so any macro one desires to use has to be treated using the features mentioned above. Also the usual gotcha applies when attempting to use labels to be fixed by linker (e.g. names of variables or functions) in expressions - only constant additions are allowed, plus the hi8/lo8-pm-gs kind of operators.
It is worth noting that all avr-as directives can be used within inline assembler, of course with proper caution, not to "spoil" anything assumed by the compiler (i.e. current section is .text, addresses aligned to even, etc.). Although the inline assembler can be used only within C functions, i.e. they are placed into ".text" section, this does not exclude changing sections - provided .text and .align is restored by the end of the __asm__().
Locality of labels of all kinds should be also observed and globality can be enforced. Jumps between inline asm snippets is not excluded, but sounds like shooting onself straight into leg.
There is also a gotcha in the form of IO register names (as explained above, these have to be used through the operands). These are defined with their "memory" address in mind, so if you intend to use them as IO (as parameter of instructions in, out, sbi/cbi, sbis/sbic), 0x20 has to be deduced, best through using a macro already defined for this purpose:
__asm__( "sbic %[_PIND], 6 \n\t" : : [_PIND] "I" (_SFR_IO_ADDR(PIND)) );
Inline assembler snippets are often defined as macros and intended for repeated use; this is sufficiently covered in the "cookbook". Maybe one gotcha: as macros have to be single-line and we are used to write multi-line asm, the backslash line-merging mechanism has to be employed in those macros, which in turn gets into way of comments.
Note, that the compiler does not "see" accesses to a variable in asm, especially if they are performed through a pointer. (This is quite similar as when the compiler does not "see" accesses to a variable in an interrupt service routine). So the optimizer sometimes removes completely other accesses to this variable if it finds them unnecessary (e.g. write accesses in C to variables which read only in asm), and in case of local variables it can completely remove the variable from memory (and keep it only in registers, if it sees this sufficient). This is a good thing in the case of the temporary variables created only to allocate registers (as mentioned above), but a bad thing in case of "real" variables. The "volatile" keyword helps in these cases to "materialize" the variables as needed.
The cookbook also mentions a special "%-token", namely "%=" (the "%" character got lost here and there in the cookbook unfortunately). This is replaced by the compiler by a number which is unique for every __asm__() statement. Typical use is for labels local to the __asm__() statement even if the it is defined as a macro and repeatedly used in the same file. Beware, as the replacement is a number, if in its usage is combined with more digits, repeated labels can occur inadvertently (i.e. surround the "%=" by non-digit characters). Beware, the compiler is quirky and won't perform this replace until at least the first colon hinting that there are some operands is present (no actual operands are needed). Of course, for jumps, the *nix-style "forward/back" local labels can be used, too.
Comments, please.
Jan Waclawek
[edited twice, "sources" of previous versions and current are in the attached zipfile]