Can you spare a few bytes, sir?

Go To Last Post
35 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm trying to recompile an old C program which I orignally made for a 4K PIC processor for a Tiny2313 with 2K memory. Problem is, with all the optimizations turned on and all packing options turned on it's still 2162 bytes.

Anyone have any ideas to make the code smaller?

Attachment(s): 

Building my dreams!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Might help if you identified your C compiler. But from the looks of the code it's probably GCC so I'll move this to that forum.

I assume you are doing he obvious -Os and if any math is used that you link with libm.a. Also, I guess your search already found the suggestions about whole program optimization and things like the -relax option given to the linker? (I can only guess as you haven't shown the all important Makefile)

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The compiler is AVR-GCC as you already guessed.

I haven't heard of the '-relax' option. Could you elaborate?

I'm unable to attach the makefile for some reason.

I'm thinking of removing the leapyear support to get it to fit into 2048 bytes.

Building my dreams!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I removed leapyear support and it's still 2078 bytes :(

Building my dreams!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The delay function in minilcd.c is unpredictable and will most probably be optimized away. Instead, read about util/delay.h in avr-libc documentation or put

asm volatile("nop");

where necessary.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A search for "relax" in "GCC forum" currently hits 30 threads besides this one - wonder if any of them might tell you a bit more about it ;)

While there try a search for 'gc-sections' and notice how both words appear in some common threads (14 of them) - they probably have a fairly full description of your options.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Tuomas wrote:
The delay function in minilcd.c is unpredictable and will most probably be optimized away. Instead, read about util/delay.h in avr-libc documentation or put
asm volatile("nop");

where necessary.

Your comment did give me the idea to replace the delay() function in minilcd.c with the one from utils/delay.h but that resulted in a program that took up more than 2500 bytes for some reason that I don't understand. I would have guessed that the only the call would result in extra overhead whilst the code is already linked since it's also used in testoptimizer.c

Building my dreams!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
A search for "relax" in "GCC forum" currently hits 30 threads besides this one - wonder if any of them might tell you a bit more about it ;)

While there try a search for 'gc-sections' and notice how both words appear in some common threads (14 of them) - they probably have a fairly full description of your options.

I tried adding relax and gc-sections:

avr-gcc.exe -mmcu=attiny2313 --relax  --gc-sections  TestOptimizer.o minilcd.o     -o TestOptimizer.elf

but it didn't make one byte of difference.

Building my dreams!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You should try to use functions for things, which are done often.

E.g. you put out a 2 digit number many times and blanks many times
On using separate functions I get 144 bytes less:

void putblanks( unsigned char i )
{
  do{
    putchar( ' ' );
  }while( -- i );
}


void put2dig( unsigned char val )
{
    putchar( '0'+ val / 10 );
    putchar( '0'+ val % 10 );
}

void DisplayTime()
{
  gotoxy( 0, 0 );

  if ( mode == DISPLAY_SET_TIME_HOUR  && blink )
  {
    put2dig( timedate.hour );
  }
  else
  {
    putblanks( 2 );
  }
...

You must also use the compiler switch:

-fno-inline-small-functions

to avoid expanding the code size by inlining.

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

A search for "relax" in "GCC forum" currently hits 30 threads besides this one - wonder if any of them might tell you a bit more about it

That is true, Cliff. Still... I have not digged so deeply in the neww options for the avr-gcc/AVRLibc toolchain, because I haven't needed them. This thread made me curious of what the documentation hada to say about it. So I sifted through the GCC documentation ( http://gcc.gnu.org/onlinedocs/gc... ). Nothing there. Hmmm.. Over tot the binutils, ld documentation ( http://sourceware.org/binutils/d... ). This is what is said about --relax :
Quote:
An option with machine dependent effects. This option is only supported on a few targets. See ld and the H8/300. See ld and the Intel 960 family. See ld and Xtensa Processors. See ld and the 68HC11 and 68HC12. See ld and PowerPC 32-bit ELF Support.
On some platforms, the `--relax' option performs global optimizations that become possible when the linker resolves addressing in the program, such as relaxing address modes and synthesizing new instructions in the output object file.

On some platforms these link time global optimizations may make symbolic debugging of the resulting executable impossible. This is known to be the case for the Matsushita MN10200 and MN10300 family of processors.

On platforms where this is not supported, `--relax' is accepted, but ignored.

So my question becomes: Where is the definitive or "canonical" documentation of the --relax option for the AVR target?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

JohanEkdahl wrote:

So my question becomes: Where is the definitive or "canonical" documentation of the --relax option for the AVR target?

On AVR it try to replace jmp/call by rjmp/rcall.
So it has only effect on AVRs with >=16kB.

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

*sigh*

Use -mrelax on the compile command line. It sends the correct flag to the linker.

To remove unused functions, add this to every compile command line:
-ffunction-sections
And add this to your linker command line:
-Wl,--gc-sections

I'm fairly sure that linker relaxation, as it is used on the AVR, is probably not documented. Patches welcome.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

danni wrote:

On AVR it try to replace jmp/call by rjmp/rcall.
So it has only effect on AVRs with >=16kB.

That is not the only thing that linker relaxation does.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just to be clear. The suggestion to search for "relax" or "gc-sections" that I made wasn't saying that those were the only options or even that they'd be useful here - simply that if one follows a search for those things you'll hit the threads where this has been discussed many times before (14 apparently) and in THOSE threads you might learn of some of the various techniques that have been proposed previously for size optimisation.

By the way, if you used util/delay.h and the size grew to 2500 bytes that suggests either:

1) you aren't building -Os

2) you are passing variable amounts to the delay routines. Instead of:

N=PINA;
_delay_ms(N);

use

N=PINA;
for (int i=0;i

If you start passing non-constant values to either _delay_us() or _delay_ms() then the function loops cannot be optimised at compile time and instead a large part of the f.p. support lib gets dragged into the link.

While on the subject of optimisation another useful thing to do is:

avr-nm -size-sort final.elf

The "T" entries are the global functions in the code and the increasing numbers are their size in bytes. Start at the big one (as it may have most potential to shave something off through hand optimisation) and see if you can spot any way to get the job done in a more efficient way.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cliff, I compiled his code and, sure enough, he's at about 2100 bytes. There is nothing obviously fubar in the code, although there are lots of calls to

lcd_putc(' ');

Those calls could probably be improved, but the added space of new routines may not (does not - tried it) offset the space used in the multiple calls.

The comment I would make is why are you centered on the tiny2313? There are tiny44, tiny45, and tiny461 available and they all have 4K of flash. There's even the tiny44A which is one of the new PicoPower processors.

You're trying to move code that fit in a 4K PIC into a 2K AVR. Why not try to move that code into a 4K AVR?

One final note: Even if you got the code to fit, the code won't work as written. Access to the PSTR constants must use a pgm_read_* or one of the *_P routines defined in pgmspace.h. As written, the strings you've defined won't be read.

Try reading the following for more help:

[TUT] [C] GCC and the PROGMEM Attribute

One other possibility: rewrite using assembly. I do not often recommend this, but in this case it may help.

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Stu,

HOW did you compile his code? I would have done but without a Makefile to know what optimisation are already being tried it's difficult to get a stake in the ground as a starting point.

I guess you just used Mfile then added his source files to that?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I'm fairly sure that linker relaxation, as it is used on the AVR, is probably not documented. Patches welcome.

I get the hint, Eric. (I will answer your request for an email from me, just need to contemplate how to position myself and my time first).

Problem is this: now someone needs to dig into the details of what was done with --relax for the AVR target. It seems to me that this would include interviewing the people that coded it in the first place. Or are there bug-reports / tickets for this work? Would they be in the gcc bug/ticket system or in the avr-libc buglist? (A search in in both for "relax" gave me nil, but I'm sure I did something wrong.)

Could the people that actually fixed --relax for the AVR sketch some documentation and then hand it over to some foot soldier (me, maybe?) that takes care of the tidying up and formatting?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Unfortunately, there's another saying that applies in this situation:
"Use the source, Luke!" ;)

The best thing to do is to start looking at the comments in the source code. This will be in GNU Binutils.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can force pointer use whenever you are using lots of struct members-

#define FIX_POINTER(_ptr) __asm__ __volatile__("" : "=b" (_ptr) : "0" (_ptr)) //carlo lamas idea

Then in the local function where needed-

//inside function
volatile struct _timedate* td_ptr=&timedate;
FIX_POINTER(td_ptr);

then use 'td_ptr->' instead of 'timedate.' to access the members of the struct (main, timer isr, DisplayTime seem to be best candidates).

It may buy you some space when accessing members of a struct many times (st/ld 1 word, sts/lds 2 words). You can setup the code to always use pointers, then add or comment out the line where the pointer is 'fixed'. With no pointer 'fixed', it will mostly revert back to sts/lds. You can the see where it is a code size advantage or not.

Last Edited: Wed. Dec 17, 2008 - 05:31 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

stu_san wrote:
Cliff, I compiled his code and, sure enough, he's at about 2100 bytes. There is nothing obviously fubar in the code, although there are lots of calls to
lcd_putc(' ');

Thats exact the same as I stated above.
Together with my other suggestion the code shrinks below 2048 byte, so I stopped further improvement.

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

By the way, this program can't have previously worked can it? It declares months[] and daysofweek[] as PSTR() but then does nothing about using pgm_read_byte() to access the data. Also there's a copy/paste error in that it continues to attempt to use daysofweek[] when it means months[]. I was looking at that stuff specifically because all those *3's in there generated heaps of code. A better technique is something like:

  {
    char * ptr = &daysinweek[ ( dayofweek * 3 )];
    putchar( pgm_read_byte(ptr++) );
    putchar( pgm_read_byte(ptr++) );
    putchar( pgm_read_byte(ptr) );
  }

Also there's a problem with the initialisers for:

struct _timedate
{
  unsigned char hour;
  unsigned char minute;
  unsigned char second;
  unsigned char date;
  unsigned char dayofweek;
  unsigned char month;
  unsigned int year;
} volatile timedate = {12,0,0,1,1,2000};

there's an element missing (I bet you didn't intend to assign 2000 to .month!)

So did this code ever work anyway? I started to look at what I could do to reduce the size but once I added the necessary pgm_read_byte() stuff it actually started getting bigger, rather than smaller.

One of the key things (as you'll see in the .lss and .map) is the fact that some library code for multiplies and divides has been dragged in - if one could optimise those (shifts and adds perhaps?) it might be possible to reduce things a bit but I guess that on a CPU without MUL this is probably unavoidable

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Most of the functions in minilcd.c could be static inline.
1 % X == 1 for integral X>=2.
Even if the +'s were done before the %'s,
the X's are mostly wrong.
I'd recommend using :?, as was done for -.
It might be more compact to manipulate
the decimal digits directly.
If the dj's are ascii digits:
d3 d2 d1 d0 is divisible by 4 iff !(((d1<<1)+d0)&3)
d3 d2 d1 d0 is divisible by 400 iff
d0=='0' && d1=='0' && !(((d3<<1)+d2)&3).

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

The best thing to do is to start looking at the comments in the source code.

:shock:
Checked in without a ticket number of any sortr?
No revision number known?
Change not traceable in CVS at all?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

JohanEkdahl wrote:
Quote:

The best thing to do is to start looking at the comments in the source code.

:shock:
Checked in without a ticket number of any sortr?
No revision number known?
Change not traceable in CVS at all?

GNU Binutils here:
http://www.gnu.org/software/binu...

CVSWeb interface here:
http://sourceware.org/cgi-bin/cv...

Linker relaxation is done in BFD, in the elf32-avr.c file here:
http://sourceware.org/cgi-bin/cv...

Look at comment before function:
elf32_avr_relax_section

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks everyone. I've got it down to 1960 bytes now.

@Stu: I'll read up on the program pointer stuff and try to rewrite it.

@Clawson: I found those bugs myself and fixed them.

Attachment(s): 

Building my dreams!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Tuomas wrote:
The delay function in minilcd.c is unpredictable and will most probably be optimized away. Instead, read about util/delay.h in avr-libc documentation or put
asm volatile("nop");

where necessary.

There's no need for a NOP there.

asm volatile("");

will work just as well.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

asm volatile("");

will work just as well.


Actually, what I meant was to forget about the original tick delay function entirely and to replace it with something completely different. I was not suggesting any kind of fix to that function.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bigpilot wrote:
I'm trying to recompile an old C program which I orignally made for a 4K PIC processor for a Tiny2313 with 2K memory. Problem is, with all the optimizations turned on and all packing options turned on it's still 2162 bytes.

Anyone have any ideas to make the code smaller?

I used CodeVision AVR in the Tiny Memory model mode which uses 8 bit pointers (since RAM in the '2313 is/was just 256 bytes). Versus GCC, the code was a great deal smaller, like 30%.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I used the latest WinAVR build with this makefile and i got 1874 bytes, although there were several warnings.

    Attachment(s): 

    • 1
    • 2
    • 3
    • 4
    • 5
    Total votes: 0

    Halakatevakis wrote:
    I used the latest WinAVR build with this makefile and i got 1874 bytes, although there were several warnings.

      That's strange. I can't reproduce your results. I get 1952 bytes with the latest WinAVR (20081205). Are you using the Release Candidate?

      Building my dreams!

      • 1
      • 2
      • 3
      • 4
      • 5
      Total votes: 0

      Quote:
      That's strange. I can't reproduce your results. I get 1952 bytes with the latest WinAVR (20081205).

      Are you sure that you are using the makefile that i have attach in the previous post?
      I get 1952 bytes only with your makefile.

      • 1
      • 2
      • 3
      • 4
      • 5
      Total votes: 0

      Must have done something wrong. I tried it with your Makefile which I put in the 'default' directory and then got the error:

      Quote:
      Build started 18.12.2008 at 11:31:49

      -------- begin --------
      avr-gcc (WinAVR 20081205) 4.3.2
      Copyright (C) 2008 Free Software Foundation, Inc.
      This is free software; see the source for copying conditions. There is NO
      warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

      make: *** No rule to make target `TestOptimizer.elf', needed by `elf'. Stop.
      Build failed with 1 errors and 0 warnings...

      Do I need to add something to the PATH?

      Building my dreams!

      • 1
      • 2
      • 3
      • 4
      • 5
      Total votes: 0

        Just put the TestOptimizer.c, minilcd.c and minilcd.h in the same directory with makefile.

          • 1
          • 2
          • 3
          • 4
          • 5
          Total votes: 0

          Ah yes, I get 1930 bytes with your Makefile. That is quite an improvement. Can it be duplicated through the normal AVR Studio settings?

          Building my dreams!

          • 1
          • 2
          • 3
          • 4
          • 5
          Total votes: 0

          Quote:

          Can it be duplicated through the normal AVR Studio settings?

          Yes, just copy all the -options from the CFLAGS and LDFLAGS to the appropriate place in the Studio project configuration.