How do YOU cut down on program size?

Go To Last Post
50 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For the first time in a while I've got a project that calls for an inexpensive and uncomplicated micro, controlling 5 small motors. Long story short I used an ATTiny13V and things were going great until all the sudden the thing wouldn't move after write. After poking around, I found I had added just a few too many functions and I was going over the 1K on the Tiny13; trimming out the added functions and reducing the size to under 1K makes things work... but I want all my functionality! I've cleaned out and simplified my code as best I can, but with full functionality I'm still running at about 1.6K. I'm ready to sit down and put everything into ASM, but I started to wonder if there were any quick tricks to reduce program size people here employ. So, any tricks/hacks/kludges?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

from head.... the tiny13 has a cousin the tiny25 it has 2K.
you need to helf your code.... that is quite alot.
are you using C or ASM ?

try to reduce the number of functions this should save some bytes.

regards

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Start with the .lss and see which functions are generating any "bloat" and whether they can be optimized to reduce size.

Also use "avr-nm --size-sort project.elf" which lists the functions ("T") in increasing size order. You are more likely to be able to find some "slack" in the big ones at the end.

Make sure you are using all the space savers like --relax to the linker.

But I agree with Meslomp - pick a 2K micro. In 1K you might be able to squeeze in a 1.1K program but if it reached 1.6K then I doubt you'll ever recover 600 bytes!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kagetsuki wrote:
I'm ready to sit down and put everything into ASM, but I started to wonder if there were any quick tricks to reduce program size people here employ. So, any tricks/hacks/kludges?

Sure. Sit down and put everything into ASM.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'd upgrade to a t25, which also a provides an upgrade path to 4K and 8K. At Farnell the t25 is just as expensive as the t13A and cheaper in higher quantities...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here going for a tiny25 or even tiny45 is the easy way out.

The other point is to avoid a few thigs that produce long code:
floats
mixing singed/unsigned numbers
calculations with volatile variables

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jayjay1974 wrote:
I'd upgrade to a t25, which also a provides an upgrade path to 4K and 8K. At Farnell the t25 is just as expensive as the t13A and cheaper in higher quantities...
Having crunched a fair amount of ATtiny C code, that would be my thought as well unless the 'tiny13V has some cost advantage that becomes decisive because of really large quantities.

- John

Last Edited: Tue. Feb 9, 2010 - 11:11 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, an easy way out that comes at no price. In an commercial setting I guess the big boss would prefer the bigger uC solution instead of paying an engineer another week of pay (or longer) to rewrite software in assembly with all it's related issues and no room for future bugfixes and upgrades.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
unless the 'tiny13V has some cost advantage that becomes decisive because of really large quantities.

At Farnell at least, it does not. The t25 is 9 eurocents cheaper at +100 quantities and can run at 20MHz too instead of 10.

I don't know the pricing when buying directly from Atmel in 1 million quantities :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kagetsuki wrote:
I've cleaned out and simplified my code as best I can, but with full functionality I'm still running at about 1.6K.

Why not post the complete code?

Then it would esay to see, if it can be squeezed enough.
Maybe, some compiler switches are sufficient already.

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jayjay1974 wrote:
Quote:
unless the 'tiny13V has some cost advantage that becomes decisive because of really large quantities.

At Farnell at least, it does not. The t25 is 9 eurocents cheaper at +100 quantities


Farnell is not exactly a price reference. I have different prices at sk.farnell.com than you do at nl.farnell.com, for example. And you can buy an ATTiny25-20 in SOIC8 14 cents cheaper than ATTiny25-20 in SOIC8 ;-)

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just to clear up some things:

*I'm currently using C, gcc from the package in Ubuntu 9.10 - version 4.3.3 (yes I know gcc4 produces bulkier binaries).

*I had the 13V's lying around, which is why I used it. I have no qualm with upgrading, and my current price target per board would allow that just fine, so I'll consider picking up some Tiny25's or something a bit beefier if I can find them here.

*I can't show you the code now because of licensing issues, but it is both trivial and contains no real industry secrets. Also, I'm quite used to doing compact code and finding ways to slim code down and have already done so as far as I can (I shaved over 600 bytes off the .hex). At this point I'm still considering getting GCC to spit out the ASM for me and then shaving that down with hacks and kludges. To me that kind of thing is like playing Shogi with code. I should note though as much as I enjoy it I rarely win at Shogi.

*clawson is like some sort of AVR god. Those are some extremely good tips and already I found a few points I think I can shave off. Thank you!

*No floats, everything is unsigned 8 bit or 16 bit. More than anything it is code flow and logic that is taking up space. A large switch statement seems to be taking up a massive chunk of the binary footprint and I'm trying to contemplate how to alter or get rid of it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I found that "inline" of small functions sometimes
helps.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ossi wrote:
I found that "inline" of small functions sometimes
helps.

I found, the opposite helps better:

-fno-inline-small-functions
-Wl,--relax
--combine
-fwhole-program
-ffunction-sections
-Wl,--gc-sections

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I suspect I would have to see at least an avr-objdump -h -S ... listing to make any useful suggestions. With sufficient crunching you should be able to get the memory overhead of your writing in C vs. assembler below 1%, assuming you can't resort to more exotic and intensive measures in assembler, such as threaded code, coroutines and functions with multiple-entry points. Hence, my belonging to the use-a-bigger-part crowd.

- John

Last Edited: Tue. Feb 9, 2010 - 02:39 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I often use the same ones as Peter (danni). Plus -fshort-enums. Plus --param inline-call-cost=n, where n is some number. What number? Well, I just try different ones: 0, 1, 2, 3, 5, 10, 100, 200, etc.
Also, my main is usually:

__attribute__((OS_main)) int main(void)
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Also, my main is usually:

Are you doing that to prevent the:

LDI R24, 0
LDI R25, 0
RET

generated at the end of main() in order to return an int and the preserving PUSH/POP's? If you look at code output from recent issues of the compiler that have an infinite loop in main() you'll see they are "tight" anyway:

#include 

int main(void) {
	while(1) {
	}
}

generates:

0000006e 
: #include int main(void) { 6e: ff cf rjmp .-2 ; 0x6e

Adding the attribute makes no difference.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you looked at the size of any library code you are using? You can write your own interrupt table / start up code section in 22 words, for example, if you can require that all your statics start at zero.

- John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, my main definition was to avoid PUSHes and POPs. I am using rather old release - WinAVR-20080610. Just tried WinAVR-20100110. I have an infinite loop in main, but it still generates the PUSHes.

Eugene

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I'm ready to sit down and put everything into ASM,

If you can shave from 1.6 to 1.0 by going to ASM, then you are a machine-language guru or you didn't write your C very well. "Any program can be made one storage cell smaller", true enough. But 60%?

How on earth did your ISP system let you put 160% of code into the chip, and not tell you about it?

Quote:

I had the 13V's lying around, which is why I used it.

I might have some small screwdrivers on hand. That doesn't mean they will give satisfactory performance when doing farm machinery repair.

I guess I'm spoiled in the US, with distributors carrying mainstream AVR models and normally in-stock. I'd find it hard to believe that there is no source for a Tiny25-family chip as they have been in full production for some years.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

How on earth did your ISP system let you put 160% of code into the chip, and not tell you about it?

How on earth did the compiler even compile it?

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

How on earth did the compiler even compile it?

Well if something was going to complain I guess it'd be the linker not the compiler but...

#include 
#include 

uint32_t big_array[5000] PROGMEM = { 1,2,3};

int main(void) {
	while(1) {
	}
}

leads to:

AVR Memory Usage
----------------
Device: atmega16

Program:   20116 bytes (122.8% Full)
(.text + .data + .bootloader)

Data:          0 bytes (0.0% Full)
(.data + .bss + .noinit)

Build succeeded with 0 Warnings...

Nothing worried about the 122.8% - "0 Warnings"

Actually I keep meaning to suggest something (though this relates to SRAM not flash) and that is that avr-size should be patched so that of the "Data:" figure output is ever more than 100% it should output something like "SRAM overflow, consider using PROGMEM for constant data/strings" which would pre-empt many, many threads here! (apart from those people who then don't understand that text). I guess avr-size could also complain when Code: exceeds 100% too.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you don't need high speed, and are sure that the program will run with the amount of RAM in a tiny13, I don't see why it should be a big problem to rewrite it to ASM and fit it into 1K. It's just 500 lines of code so it should not be to big a problem.
From the code you have (the 160% in size) how big is the startup code?
How many LDS and STS are there in the program?, if you can lock up Y or Z with the highbyte as zero you can save a word for each instruction (use LDD and STD).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

how big is the startup code?

Easy to check. The vector table is:

00000000 <__vectors>:
   0:	09 c0       	rjmp	.+18     	; 0x14 <__ctors_end>
   2:	0e c0       	rjmp	.+28     	; 0x20 <__bad_interrupt>
   4:	0d c0       	rjmp	.+26     	; 0x20 <__bad_interrupt>
   6:	0c c0       	rjmp	.+24     	; 0x20 <__bad_interrupt>
   8:	0b c0       	rjmp	.+22     	; 0x20 <__bad_interrupt>
   a:	0a c0       	rjmp	.+20     	; 0x20 <__bad_interrupt>
   c:	09 c0       	rjmp	.+18     	; 0x20 <__bad_interrupt>
   e:	08 c0       	rjmp	.+16     	; 0x20 <__bad_interrupt>
  10:	07 c0       	rjmp	.+14     	; 0x20 <__bad_interrupt>
  12:	06 c0       	rjmp	.+12     	; 0x20 <__bad_interrupt>

The stack/R1 stuff is:

00000014 <__ctors_end>:
  14:	11 24       	eor	r1, r1
  16:	1f be       	out	0x3f, r1	; 63
  18:	cf e9       	ldi	r28, 0x9F	; 159
  1a:	cd bf       	out	0x3d, r28	; 61

The .data loop (but only if any .data) is:

0000001c <__do_copy_data>:
  1c:	10 e0       	ldi	r17, 0x00	; 0
  1e:	a0 e6       	ldi	r26, 0x60	; 96
  20:	b0 e0       	ldi	r27, 0x00	; 0
  22:	ee e4       	ldi	r30, 0x4E	; 78
  24:	f0 e0       	ldi	r31, 0x00	; 0
  26:	02 c0       	rjmp	.+4      	; 0x2c <.do_copy_data_start>

00000028 <.do_copy_data_loop>:
  28:	05 90       	lpm	r0, Z+
  2a:	0d 92       	st	X+, r0

0000002c <.do_copy_data_start>:
  2c:	a6 36       	cpi	r26, 0x66	; 102
  2e:	b1 07       	cpc	r27, r17
  30:	d9 f7       	brne	.-10     	; 0x28 <.do_copy_data_loop>

The .bss loop (but only if any .bss) is:

00000032 <__do_clear_bss>:
  32:	10 e0       	ldi	r17, 0x00	; 0
  34:	a6 e6       	ldi	r26, 0x66	; 102
  36:	b0 e0       	ldi	r27, 0x00	; 0
  38:	01 c0       	rjmp	.+2      	; 0x3c <.do_clear_bss_start>

0000003a <.do_clear_bss_loop>:
  3a:	1d 92       	st	X+, r1

0000003c <.do_clear_bss_start>:
  3c:	a4 37       	cpi	r26, 0x74	; 116
  3e:	b1 07       	cpc	r27, r17
  40:	e1 f7       	brne	.-8      	; 0x3a <.do_clear_bss_loop>

The call to and return from main() handler is:

  42:	02 d0       	rcall	.+4      	; 0x48 
44: 02 c0 rjmp .+4 ; 0x4a <_exit> ... 0000004a <_exit>: 4a: f8 94 cli 0000004c <__stop_program>: 4c: ff cf rjmp .-2 ; 0x4c <__stop_program>

And the bad interrupt capture all is:

00000046 <__bad_interrupt>:
  46:	dc cf       	rjmp	.-72     	; 0x0 <__vectors>

By avoiding some/all globals it'd be possible to ditch the .data and/or .bss loops but this would need to be traded against the stack frame creation code.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
glitch wrote:
theusch wrote:

How on earth did your ISP system let you put 160% of code into the chip, and not tell you about it?

How on earth did the compiler even compile it?

Well if something was going to complain I guess it'd be the linker not the compiler but...[...]
Nothing worried about the 122.8% - "0 Warnings"

Actually I keep meaning to suggest something (though this relates to SRAM not flash) and that is that avr-size should be patched so that of the "Data:" figure output is ever more than 100% it should output something like "SRAM overflow, consider using PROGMEM for constant data/strings" which would pre-empt many, many threads here! (apart from those people who then don't understand that text). I guess avr-size could also complain when Code: exceeds 100% too.

The reasons for the linker to be so "lax" in checking code memory usage is, that the linker scripts are fewer than chips and are written for groups of chips (avr1 through avr6, althought now there are various "and-a-half" ones, too). Therefore the FLASH limit is set to the biggest of them (try it - look into the appropriate linker script and make the array big enough to go over the value there).

The RAM is a similar story, plus some of the Megas do have external data memory bus so the linker cannot restrict that.

I have two homebrew solutions:

  1. a mapfile analyzer tool, with multiple purpose, discussed for example here and here
  2. a modified analyzer for the "standard" make's output, see here (the current version is attached to the last post I've just made)

Enjoy ;-)

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:

I'm ready to sit down and put everything into ASM,

If you can shave from 1.6 to 1.0 by going to ASM, then you are a machine-language guru or you didn't write your C very well. "Any program can be made one storage cell smaller", true enough. But 60%?

You know you are right, I have no idea what idea I had but putting the code into ASM and trying to shave it down is obviously not reasonable. For some reason I had the impression (probably from working on ARM targets with an ancient gcc some 7 or 8 years ago) that gcc likes to put out big chunky non optimzed code. This however is very much no longer the case, and taking one look at the ASM this morning made me realize I how foolish my assumption was.
Quote:

How on earth did your ISP system let you put 160% of code into the chip, and not tell you about it?

You know I still wonder about that, especially since I used to at least get warnings I think. I'm currently using an AVRISP-MKII with avrdude in Ubuntu Linux 9.10. Perhaps a bug? Once I go over the size limit the read back does fail though.
Quote:

Quote:

I had the 13V's lying around, which is why I used it.

I might have some small screwdrivers on hand. That doesn't mean they will give satisfactory performance when doing farm machinery repair.

I guess I'm spoiled in the US, with distributors carrying mainstream AVR models and normally in-stock. I'd find it hard to believe that there is no source for a Tiny25-family chip as they have been in full production for some years.


Oh no there are sources, but perhaps AVRs don't move so much here becuse when the Yen is high the AVR prices never seem to change. Currently I can get an H8/30xy series for a comparable price. At one distributor for example the ATTiny25 is something like 280Y, a lower end H8 is just a touch over 300Y. That means I can get something like 16k of flash, 50+ pins, and a uC that can run TRON with threads and a serial terminal for just a bit more (all things I don't need for this project mind you). The same supplier must have moved their ATTiny2313s and obtained more since the Yen went up because those are 90Y.

Just a note; we have Digikey here, but they send half their stuff from the US so shipping is outrageous - they charge way more for a lot of parts - and the last time I used them they did the customs paperwork wrong so and refused to send in the proper paperwork to get the parts out of holding.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm bookmarking this thread...

There's a few tricks here I could use myself, especially the extra linker options.

-- Damien

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Other thoughts:

Avoid switch(){} constructs generally

Aggressively hold the size of variables to a minimum (and keep bludgeoning the compiler with casts as necessary to make it understand your wishes. For example, don't do this, even though "int" is easy to type:

for (int index = 10;  0 <= --index; )
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Levenkay wrote:
Other thoughts:

Avoid switch(){} constructs generally

Aggressively hold the size of variables to a minimum (and keep bludgeoning the compiler with casts as necessary to make it understand your wishes. For example, don't do this, even though "int" is easy to type:

for (int index = 10;  0 <= --index; )

It's precisely the switches which are clogging up my code the most. They are much larger than I thought they would be. As for keeping things small I have that covered. In this particular program there are actually only a few variables, most routines are pre-defined but they are complex (essentially timing sequences). As for generally saving space, I'm quite used to heavy bit packing from ARM.

Your comment on int being short brings up a question. I have a standard type definition file I have adapted to AVR, which seems to be used very often in a variety of industries here. It defines say an unsigned char as u8, and a signed 32bit int as s32 etc. I'm not quite sure where this naming scheme originates, but I see it often here in Japan and use it myself in almost any C or C++ code. Does anyone else go by this naming system? I know gcc/*nix usually has things like uint_16t and that's nice but u16 would seem just as reasonable if everyone who cared to know knew what it meant.

EDIT: Oh and by the way I got rid of a few functions and combined a few and am now under 1K. Functionality is reduced for now but as soon as I get a beefier uC I'll re-add it. For now I can proceed with other aspects of the product development which I'm quite happy about.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kagetsuki wrote:

Your comment on int being short brings up a question. I have a standard type definition file I have adapted to AVR, which seems to be used very often in a variety of industries here. It defines say an unsigned char as u8, and a signed 32bit int as s32 etc. I'm not quite sure where this naming scheme originates, but I see it often here in Japan and use it myself in almost any C or C++ code. Does anyone else go by this naming system? I know gcc/*nix usually has things like uint_16t and that's nice but u16 would seem just as reasonable if everyone who cared to know knew what it meant.


// Defined in C99
#include  

// Which defines:
// uint8_t 
// uint16_t
// uint32_t
// uint64_t
// int16_t
// ... etc

I use them exclusively to the standard C types (e.g. int) on deeply embedded systems unless there is a good reason not to do so.

-- Damien

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yet another way to cut down the main():

int main(void) __attribute__((naked));

But be careful. This isn't a common solution.
........................................
Further compiler options are suitable:
-mint8 (don't forget about old libc compatability problems with this option),
-mtiny-stack
.........................................
Avoid static and volatile variables where is possible.
.........................................
But... there is no right way to reduce a number of LDS/STS instructions. :(

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kagetsuki wrote:
I know gcc/*nix usually has things like uint_16t and that's nice but u16 would seem just as reasonable if everyone who cared to know knew what it meant.
That's not gcc/Unix specific, that's from the C99 standard, supposed to be in stdint.h

Stealing Proteus doesn't make you an engineer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

damien_d wrote:
I'm bookmarking this thread...

There's a few tricks here I could use myself, especially the extra linker options.

-- Damien

For compiler and linker options I have these 2 threads bookmarked:
https://www.avrfreaks.net/index.p...
https://www.avrfreaks.net/index.p...

I've also bookmarked these URLs:
http://www.mail-archive.com/avr-...
http://www.mail-archive.com/avr-...
http://www.tty1.net/blog/2008-04...

Worth reading.

Felipe Maimon

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
But... there is no right way to reduce a number of LDS/STS instructions.
Nonsens
A tiny 13 has 64 byts of RAM, LDD and STD has a range of 64 so if you lock ZL or YL to RAM start (and zero ZH or YH) for the hole program, you can reach all RAM and save a word over LDS and STS.
I don't think that any of the C compilers can do it, but that's not a problem in ASM ;)

Jens

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can generally get avr-gcc to use LDD/STD, but sometimes you have to trick it:

   // BUG: "Retreat" to the saved state to
   // encourage avr-gcc to load it using the
   // pointer. (Avr-gcc 3.4.6 still won't.)
  svp = &saved_state + 1;
  svp--;                 
  clock          = svp->clock;
  flashes_left   = svp->flashes_left;

(Here, I am pretty sure I could have convinced avr-gcc 3.4.6 with a bit more work.)

When I get a chance, I should build a current version of avr-gcc and report some code regressions.

- John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

One thing is to get the compiler to generate LDD and STD but how well does it keep the pointer, nothing is gained if Z (or Y) is loaded/changed all the time.

I will ask the OP do you need any real speed, if not I will still say that if C can do it in 1600 byte, you can make a ASM program that can do it in 1K.
And if you need speed it's normally a small part of the program that need to be fast.
On a small AVR (where a pointer only need to be 8bit wide) the code can be compressed a lot, compared to the model the C compiler use.

Jens

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Imagecraft does use the Y register to access local variables in a stack frame. If you know your compiler you can massage your code to generate the least amount of instructions. But it will be compiler specific.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

if C can do it in 1600 byte, you can make a ASM program that can do it in 1K.

Now that's fighting talk! If only the OP was at liberty to show the code it'd be an interesting contest for the Asm fans to back up that kind of claim. I'll give you 200 bytes maybe, perhaps 300 but do you really think you can take 600 out of 1,600 output from a C compiler? I'm tempted to make an arbitrary collection of C routines totalling a 1,600 byte program - but it's not really a valid test - it wouldn't be a real world application and I could just write reams and reams of "PORTA |= (1<<3)" type stuff knowing that you couldn't code SBI's any better than the C compiler can (or at least cherry pick routines I know would be "close to the hardware")

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
One thing is to get the compiler to generate LDD and STD but how well does it keep the pointer, nothing is gained if Z (or Y) is loaded/changed all the time.
In practice, somewhat less may be gained but still enough to be worth the effort.

But nothing stops you from reducing the register load overhead before you send the compiler output to the assembler. I'm just looking at an ATMega48 image where I do that. It uses 38 bytes out of 3060 to load Y+Z, adding 1.3% to the code size. (15 functions use Y or Z to access static variables and two modify Z because of LPMs.) With smarter pre-assembler-processing, I am pretty sure I could reduce that to 16 bytes and add only 0.5% to the code size (16-12 = 4 bytes and 0.1% more than using assembler).

Addendum: With no post-processing, the compiler would use 66 bytes out of 3100 to load Y+Z, an extra 1.8% compared to assembler. (At one point, with a smaller part, I didn't have that 1.8%.)

- John

Last Edited: Wed. Feb 10, 2010 - 01:25 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Now that's fighting talk!
Yes and I take it :)
But that's why I ask OP for the need of speed.
If there is no real need the ASM program can be very small.
If you give me a lot of SBI's and that kind of stuf I would make a 8 bit model for that.
But yes you could make a program like a array of 1200 const random bytes ;).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
One thing is to get the compiler to generate LDD and STD but how well does it keep the pointer, nothing is gained if Z (or Y) is loaded/changed all the time.

I will ask the OP do you need any real speed, if not I will still say that if C can do it in 1600 byte, you can make a ASM program that can do it in 1K.
And if you need speed it's normally a small part of the program that need to be fast.
On a small AVR (where a pointer only need to be 8bit wide) the code can be compressed a lot, compared to the model the C compiler use.

Jens

For this application no real speed is needed at all, in general state transitions occur about 6 times a second max and the rest is just setting registers and then delay loops. There are 4 motors that need to be turned on or off or pulsed for speed control at certain points. If you count reset I have 2 open pins, but I'm using reset to change modes (reading from EEPROM then incrementing the mode variable and writing that to EEPROM on start up) and it's nice to have a button instead of flicking a switch - so one pin is still free. As for memory I never have more than 72bytes allocated from my program at any point in time. The whole bulk of the thing is the "scripted" movement. And that said I couldn't find anything immediately in ASM that would help me shave off some space. I was actually kind of hoping there would be some silly thing gcc was throwing in there that I could get rid of, like some sort of register read and clear after a jump or something (aah CASL II). Unfortunately, whoever did the AVR target of GCC is no fool and did it pretty clean for what I can see.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Especially for such a small programm there are good chances to get down unter the 1000 Bytes limt. It depends on the programm of cause.

One big thing could be keeping more variables in registers, instead of RAM this can save a lot of LDS, STS.

The use of Y or Z pointer can help, if its combined with autoincrement or similar.

One thing GCC does not solve good is use of mixed variable sizes, e.g. comparing or adding a 16 Bit and a 8 Bit number. Things get even worse with multiplications.

Edit:
another easy save is leaving out all the unused interrupt vectors at the end, and the last one. Some ISRs even fit inside the vector space.

Last Edited: Wed. Feb 10, 2010 - 02:02 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:

if C can do it in 1600 byte, you can make a ASM program that can do it in 1K.

Now that's fighting talk! If only the OP was at liberty to show the code it'd be an interesting contest for the Asm fans to back up that kind of claim. I'll give you 200 bytes maybe, perhaps 300 but do you really think you can take 600 out of 1,600 output from a C compiler? I'm tempted to make an arbitrary collection of C routines totalling a 1,600 byte program - but it's not really a valid test - it wouldn't be a real world application and I could just write reams and reams of "PORTA |= (1<<3)" type stuff knowing that you couldn't code SBI's any better than the C compiler can (or at least cherry pick routines I know would be "close to the hardware")

I could probably rig up a similar program for you, it's preposterously simple though. There are 4 outputs, and a mode. Each mode has a detailed pulse pattern for the outputs, and each of those patterns is held in a function. To decide what function is executed there is a switch statement that takes in the mode as an argument. Also, each mode has 4 speeds, defined by 4 #defines and within the switch statement it looks like:
....
case 2:
ModeA(SPEED_MID);
break;
case 3:
ModeA(SPEED_FAST);
break;
...

And that is basically the whole thing. You could make up a similar program extremely easy. There is literally no real register setup other than setting a full port for output, and I'm using _delay_loop2 (yes, time accuracy is no issue only pulse times between the motors).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Maybe you could implement some kind of interpreter that executes simple instructions that are, for example, only 4 bits wide. In one AVR instruction you could have 4 motor instructions. Say the interpreter takes 512 bytes, you're left with 512 bytes which are 2048 motor instructions.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As cases is supposed to use a lot of space is may be a good idea to use a constant array, even if its in RAM.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

use a constant array, even if its in RAM.

But a const array in RAM has to start life in flash anyway so why would you have it in both? Or are you saying that LD/LDS access code is "tighter" than LPM? Otherwise why not just PROGMEM it?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Maybe you could implement some kind of interpreter that executes simple instructions that are, for example, only 4 bits wide. In one AVR instruction you could have 4 motor instructions. Say the interpreter takes 512 bytes, you're left with 512 bytes which are 2048 motor instructions.

When the problem is that simple I think you are right.
An other (faster) way would be to use 8 bit tokens, but only use 4-5 bit, and use the rest for some of the logic. That will make the interpreter code smaller and the code run faster. But the token code would be a tad bigger.

Edit because your 4 bit version read two instructions for each LPM it could be faster than a 8 bit.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You could reserve one token as escape token to gain another 16 possible instructions. Or use two escape tokens for a total of 14+16+16=46 instructions.

But without knowing the exact application and requirements, it difficult to judge if it's a feasible solution :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Having mentioned threaded code (the other, older type from interpretation and compilation), I wonder if constructing a compact interpreter for the AVR might be a bit of a chore. My first inclination would be to write some table driven code, and create, in the spirit of another discussion, a sort of random statement based processor rather than a microcoded one.

- John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Some time ago I was playing with a AVR emulator running on a AVR. That make i possible to run AVR code from a EEPROM (or RAM), and that proved to me how easy it is to make a very small interpreter on a AVR.

That been sayed the structure of the AVR without a accmulator is bad for a interpreter, (at least for speed), we just have to call our R25:R24 for a ACC or something like that.