Code size from 8K to 16K parts.

Go To Last Post
25 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is for the C people.
How good does the c compileres handle the change from using rjmp and rcall to jmp and call when the part goes from a 8k AVR to a 16k AVR?

Jens

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

no problem at all. The compiler inserts the appropriate instruction where necessary. (it may still use rjmps on the bigger part, if the code is still reachable, for speed)

this is completely transparent, and not something you have to worry about.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's not my problem but there was just one in an other link that had a full mage8. So I was just thinking if you have a 8000 byte code on a mega8 how big will it be on a mega16 ? (now when 50% of the rjump and rcall)has to be changed.
Jens

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

glitch wrote:
no problem at all. The compiler inserts the appropriate instruction where necessary. (it may still use rjmps on the bigger part, if the code is still reachable, for speed)

this is completely transparent, and not something you have to worry about.

Yes, the compiler handles this for you automatically.

Some C compilers allow additional flexibilty to place data and functions in different address ranges to take advantage of smaller pointer sizes or smaller pointer arithmetic. IAR for example allows one to specify:

Attribute   Pointer size Memory space
__tiny      1 byte       0-0xFF
__near      2 bytes      0-0xFFFF
__far       3 bytes      0-0xFFFFFF (Max object 32k)
__huge      3 bytes      0-0xFFFFFF (Max object 8M)
__tinyflash 1 byte Code  0-0xFF
__flash     2 bytes Code 0-0xFFFF
__farflash  3 bytes Code 0-0xFFFFFF (Max code 32K)
__hugeflash 3 bytes Data 0-0xFFFFFF (Max object 8M)
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry if I'm not clear. (I do know that the compiler will handle the code).
But will a "normal" 8K code on a mega 8 be 8010 byte on a mega16 or will it be 9000 byte ?
Jens

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
It's not my problem but there was just one in an other link that had a full mage8. So I was just thinking if you have a 8000 byte code on a mega8 how big will it be on a mega16 ? (now when 50% of the rjump and rcall)has to be changed.
It depends upon how many long jumps you have and how many efficient the compiler is at optimizing. You can test this quite quickly with your compiler. Just perform compiles targetting the 8k device and the 16k device and compare the code size.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Problem
I don't have a code that make 8k on a mega8.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

All the code I have gererate less than 4K and then the code will be the same.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
All the code I have gererate less than 4K and then the code will be the same.
Write more code! ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Now we are where I started

Quote:
This is for the C people.

I'm a ASM person !!! at least on the AVR

Jens

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But, you have C code generating 4k now, right? It shouldn't be hard to expand it. The code doesn't have to be AVR specific. You may put in some random C functions, like from a string library (strlen, strcpy, etc). Or, link in a few open-source AVR libraries modules, like from the Procyon avrlib library ( http://hubbard.engr.scu.edu/embe... ). You should be able to expand from 4k to 8k of code rather quickly. Heck, you can even put in a floating-point operation to quickly link in the floating point library, but I doubt the FP library would resemble your C code so that measuring change with the FP library wouldn't help answer the question). As mentioned, the answer to your question is code, compiler, and compiler optimization level specific.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You don't have to expand it do you? If it's just a test to find the ratio increase in size then take the 4K of mega8 code and build it for the mega16 and see if it becomes 4.1K or something.

However with programs of this size the most significant size change is probably not going to be the RCALL/CALLs embedded in your code but the increase in size of the interrupt vector table where the entries move from 2 to 4 bytes each (and larger devices tend to have more interrupt sources too)

My guess is that a good optimiser should ensure that the same code doesn't actually grow at all (apart from the vector table thing which is a necessity) and if you use GCC you might, particularly, want to search here for the word "relax" which can have a very calming effect on the linker.

Cliff

EDIT: the "relax" thread I was thinking of:

https://www.avrfreaks.net/index.p...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

If it's just a test to find the ratio increase in size then take the 4K of mega8 code and build it for the mega16 and see if it becomes 4.1K or something.

lol That's what I was just going to do with an app.

You can't really directly compare the Mega8 ==> Mega16 migration, as there are two chances that the app will port unmodified: slim and none. If nothing else the vector tables will be different sizes.

I had the situation a few years back, with a '8535 app going to a Mega16. Vector tables will be different as well, but close enough: the app should probably build with just a re-target and perhaps a few register name changes. IIRC I "lost" several hundred words to CALL/JUMP. If it comes up again it will be interesting to see the results, since my compiler now has "smart" jump/call handling depending on the destination distance.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

When you move from that 8k part to the 16k part because it will no longer fit in 8k, you'll have an extra 8k right away. Its not all going to be used up by the switch of some rjmps/rcalls to jmps/calls. So the point when its time to worry , is when you get toward the end of the 16k part (or the end of larger parts). Then its probably time to see how smart the compiler is, or time to make the compiler smarter with some command line switches.

You can't do anything different for the 8k part, since its already using only rjmp/rcall, so the switch from an 8k to 16k part will be because its actually needed (probably). Where the jump from 16k to 32k 'may' be prevented by smart use of jmp/rjmp/call/rcall.

I just tested a small app- it compiles to 1216 bytes for a mega88, and 1310 for a mega168. I'm not using the lastest version of gcc, so that could be improved a little with the option mention by Cliff. The local jumps are rjmps, the calls to functions change from rcall to calls, and I suspect you won't see many jmps no matter what (at least in C, if you are 'going' to code that far away, a call will more than likely get you there, I think).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

curtvm wrote:
(at least in C, if you are 'going' to code that far away, a call will more than likely get you there, I think).

Until you make a huge 'goto' to an error handler across a huge switch{} statement?

You might also get long jumps if you embed 'return's fairly early in a very long function that need to jump to the function epilogue handler.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Another related point of interest is that in some cases (for example when you're moving from an ATmega8 to an ATmega168), some of the I/O registers you frequently manipulate may move out of the low (bit-addressable) I/O space and into the standard (only byte-addressable) space. Or they may even leave the I/O space entirely and enter the SRAM space. That may force the compiler, for example, to replace some SBI instructions with IN/OR/OUT combinations, or to replace IN/OUT with LDS/STS.

Quote:
at least in C, if you are 'going' to code that far away, a call will more than likely get you there, I think

The CALL instruction has the exact same "reach" as the JMP instruction: theoretically, it could reach any point within an 8 MB address space.

Of course, as Cliff says, there are other reasons why JMPs might be required.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But the big question is what space optimization strategies are available to c programmers. I'd say making sure data that doesnt change like strings is in flash only... normally, strings are 'initialized data' and get copied to ram. Imagecraft has an option to do this in the ide. It also has an optimizer that makes little subroutines out of 1 and 2 lines duplicated here and there. I always look at the assembler... if a variable is never out of the 8 bit range, dont put it in an integer. Others will add more tips....

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bobgardner wrote:
Others will add more tips....
As I mentioned earlier in the thread, IAR allowing multiple memory models in an application increases flexibility for the programmer to optimize output.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
(at least in C, if you are 'going' to code that far away, a call will more than likely get you there, I think).
what I was trying to say (that 'going' didn't help), was you will probably see more rcalls replaced by calls as opposed to rjmps being replaced by jmps. And it seems in the gcc version I'm using (without --relax), all rcalls are changed to calls no matter what (at least without -mshort-calls). Whereas the rjmps will stay unchanged. Mostly. I think. But could be wrong.

I'm still thinking about the goto's and the large functions Cliff mentioned. Since goto's in gcc are limited to the function they are in, it seems to boil down to having a function larger than 4kb? I will have to ponder that some more.

The other thing about the jmp's, is in C, it seems you have to 'jump' through a lot of hoops to even produce one (since every other person wants a jmp to the bootloader, it seems the best that anybody can come up with in C is the function pointer thing, that ends up as a call, and the goto *0 ends up as an ijmp, so everyone ends up without a jmp, unless they use asm).

I sure don't have the big apps to find out, but it would be curious how many jmps end up in a large app (excluding the vector table).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I sure don't have the big apps to find out, but it would be curious how many jmps end up in a large app (excluding the vector table).

ATmega169 instruction use summary:
...
ijmp  :   0 in    :  18 inc   :  29 jmp   :  76 ld    : 119 ldd   : 181 
ldi   :1007 lds   : 424 lpm   :  17 lsl   : 124 lsr   :   4 mov   : 274 
movw  : 120 mul   :  17 muls  :   0 mulsu :   0 neg   :   2 nop   :   1 
or    :  20 ori   :  43 out   :  37 pop   :  44 push  :  44 rcall :  62 
ret   :  66 reti  :   5 rjmp  : 263 rol   : 129 ror   :   4 sbc   :  18 
...
Instructions used: 73 out of 111 (65.8%)

ATmega169 memory use summary [bytes]:
Segment   Begin    End      Code   Data   Used    Size   Use%
---------------------------------------------------------------
[.cseg] 0x000000 0x003c2a  15172    230  15402   16384  94.0%

Now, my compiler may not be as good as yours on folding-down to RJMPs. I have a huge main() and the JMPs are there between sections (excluding the vector table).

ATmega64 instruction use summary: ...
ijmp  :   0 in    :  48 inc   :  11 jmp   : 178 ld    : 140 ldd   : 257 
ldi   :3000 lds   :3234 lpm   :  17 lsl   :  65 lsr   :   5 mov   : 455 
movw  : 139 mul   :  53 muls  :   0 mulsu :   0 neg   :   0 nop   :   0 
or    :  50 ori   :  61 out   :  81 pop   :  77 push  :  77 rcall : 138 
ret   : 116 reti  :   6 rjmp  : 805 rol   :  72 ror   :   4 sbc   :  34 

ATmega64 memory use summary [bytes]:
Segment   Begin    End      Code   Data   Used    Size   Use%
---------------------------------------------------------------
[.cseg] 0x000000 0x00d09e  46910   6496  53406   65536  81.5%

[oops. CALL: 427 and 1314. It will be interesting to see the results when I port these projects to CV's next generation]

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So it would seem that 'call' is going to be the major cause of the code size increase when switching from 8k to 16k (not many rjmps in those app summaries- your compiler must be smart enough to make brxx work when possible, or something).

The optimization from call to rcall when possible seems to be where the biggest gains could be made.

But how is the compiler going to know where the linker is going to put things? So I suppose if a compiler needs to produce code for a function call, it will have to use a call, as it will have no way of knowing where the linker will end up placing the called function.

I think I will quit thinking about it, since I'm not even close to 16k yet.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

curtvm wrote:
But how is the compiler going to know where the linker is going to put things? So I suppose if a compiler needs to produce code for a function call, it will have to use a call, as it will have no way of knowing where the linker will end up placing the called function.
That's right, the compiler has to decide whether to use a wide or narrow address. Some people place all their functions in a single source file so that the compiler will know how long each function is and can then use the narrowest address for the jmps and calls. The source code can also help the compiler place code and data objects close together for narrower references. For example, IAR allows a programmer to create named segments and then declare certain data and code objects to be placed in certain segments. The linker is in charge of ensuring that difference in addresses don't overflow the width of the reference.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

With current releases of the avr-gcc toolchain, the compiler is capable of determining in its own whether to use RCALL/RJMPs or CALL/JMPs for any references which are internal to any single compilation unit (object file). For any jumps or subroutine calls that need to cross from one compilation unit into an external reference, I think it assumes a long CALL/JMP will be necessary and inserts a placeholder sufficient to hold the larger instructions.

Then, at the linking stage, an optional mechanism called relaxation can be used to identify situations in the preliminary linked binary where those externally referenced CALL/JMPs could be safely replaced with RCALL/RJMPs.

I not sure about 2 things:
1) I don't know whether or not the linker goes out of its way to try to lay out the object files in such a way that the most highly interdependent modules are located close together to maximize the probability of gaining real benefits from the relaxation.
2) I don't know if relaxation is done in a single pass, or if it is iterative, repeating itself every time there is a successful relaxation to see if any further candidates for relaxation might have been created as a result.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for all the inputs.

All I can add is. With just a small increase of code size because of jmp and call will
add code when brances can't reach.

I found a program (at avr freaks :recording barometer).
The code can only be compiled with –Os for maga8:
The code for a maga8 is 6846
For a mega 16 the same code make 7164
That is a add of 4.6%

Jens

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, you can forget worrying about the code size increase, cause I found you some savings- (assuming I found the same barometer program)

don't know if it works, but simply adding the libm.a library will help a lot (Merry Christmas)

Using WinAVR20060421

---------------------------------------
mega8 -Os
---------------------------------------
Program:    6624 bytes (80.9% Full)
129 rcall
155 rjmp

---------------------------------------
mega8 -Os -lm  (libm.a library)
---------------------------------------
Program:    4444 bytes (54.2% Full)
124 rcall
69 rjmp

---------------------------------------
mega16 -Os
---------------------------------------
Program:    6956 bytes (42.5% Full)
4 rcall
118 rjmp
125 call
39 jmp-
    21 in vector table (asm produced)
    2 in c startup code (asm produced)
    16 from libgcc (asm produced)
1 ijmp (asm produced)

---------------------------------------
mega16 -Os  -lm (libm.a library)
---------------------------------------
Program:    4712 bytes (28.8% Full)
15 rcall
48 rjmp
109 call
23 jmp-
    21 in vector table (asm produced)
    2 in c startup code (asm produced)

and I still can't find any C produced jmp's. The jmp's in the above code are all produced by asm (libgcc, gcrt1, etc). I'm starting to wonder if C (not asm) even produces a jmp (except for functions > 4kb). The big savings to seek out when needed is the possible 'call'->'rcall' (after all the other details are taken care of, like using libm.a).