Compression algorithm

Go To Last Post
25 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello friends,

Any idea for data compression algorithm on ATMega? For a re-programming application, which actually transmits modified Intel HEX records (adr & data in hex format). On 2561 takes a lot to transmit all 256K, especially on low serial speeds (imposed by customers). So we thinking on some data compression.

Algorithm can be any complexity at compacting since at PC side we have a lot of power. On ATMega side decompression must be reasonable fast.

Appreciate any thoughts,

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why do you have to be stuck with low seral speeds?

You can normally read the existing connection speed, increase the baudrate, and return to the original speed.

What really matters is the reliability and data integrity of the serial connection. Hard-wired PC to AVR is 100% good. A telephone or radio link needs an error correcting protocol.

You can use all the regular compression algorithms on your HEX file. Or if you are lazy, just use ZIP and see what zip has chosen for you.

A 2561 has just about enough SRAM to decompress most formats. It certainly has enough FLASH for the program code.

Sending a ZIP file through a noisy telephone is fatal.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
For a re-programming application,

If you used a code optimized for size, do not expect to find any reasonable compression algorithm as it is already compressed. If there was some pattern within code so that it could be repeated, IMHO compiler would already use it as a called function.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brutte wrote:
Quote:
For a re-programming application,

If you used a code optimized for size, do not expect to find any reasonable compression algorithm as it is already compressed. If there was some pattern within code so that it could be repeated, IMHO compiler would already use it as a called function.

Hmm... quite right. And now, code is indeed -Os.

I don't think *any* compression can be implemented, at least in practice. Our program it's far over internal SRAM, actually we use external 512K module... and program has over 10K lines of code.

About speed... it is NOT our choice like I said, our customer use only 4800 :) and very often through (old) telephone lines :wink:

We thought to compress since very trivial tests with winrar on originally Intel HEX shown 1/4 reduction which means 200% improvement. We actually transmit only relevant Intel HEX payload and non ASCII fashion, so even 50% improvement would be great.

I saw a lot of theory about LZW, LZ77 but I wonder if anyone actually tested on MCU.

Appreciate your advices,

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I do not know if you are old enouh to remember 300 baud modems or even 1200/75 baud modems. There were frequent duff bits and simple protocols like XMODEM were used to re-transmit packets.

Error-correcting modems appeared in the 1980s. There were still errors on the telephone lines, but the modems handled the errors transparently.

The user just saw perfect results, although sometimes the data rate slowed down.

In much the same way, downloading files on the internet appears to be error-free. It is not. The low-level software looks after the error correction.

I suspect that your punters are using post-1990 modems. So you can safely transfer files with the modems correcting errors. Choose LZW or LZ77 or whatever takes your fancy. Most of these algorithms will give 25% compression of HEX files. So you will get 4 x speed improvement.

The decompression will take milliseconds. It will however add some complication to your application.

Doing some sums. A 256k binary will be 640k of Intel HEX file. This will take 22 minutes at 4800 baud with no compression.

Note that your regular error-correcting modem will probably also do on-the-fly compression. So you might just as well feed it Intel HEX. The modems will compress and decompress by themselves. Hence you will get HEX out of the other end !!

As you have probably discovered for yourself, zipping a ZIP file does not achieve any extra compression.

Regarding Butte's comment. First off, you need to compile size-efficient code ( avr-gcc -Os ).
A smaller binary produces a smaller HEX.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brutte wrote:
Quote:
For a re-programming application,

If you used a code optimized for size, do not expect to find any reasonable compression algorithm as it is already compressed. If there was some pattern within code so that it could be repeated, IMHO compiler would already use it as a called function.
Does a typical compiler really look for binary patterns to optimise out (like LZW etc), or does it look for common code to optimise out?

I just winzip'ped a random .hex file and got 60% compression :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Compiler will only "compress" similar code eg. making one (asm) routine for similar block instructions, regarding actual code does. It's one of techniques... But will never strip-out a lookup const rom initialized table. And originally Intel HEX is ASCII encoded + address of each record, so actually code data is less than 1/2 Intel HEX size (about 30-40%).

Like I said, we already transmit pure machine code for re-flash, so only lossless compressing may help.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AFIK, avr-gcc just does regular optimisations. e.g. moving code out of loops, removing unused code etc.

CodeVision and ImageCraft do binary pattern compression to reduce size. They are not so clever on code analysis type of optimisations.

Quite honestly, I would treat an application as an application. Use hardware (modems) to handle the transfer.

Regarding compression. Look at Linux distributions. They get compressed before saving as an ISO image on a CDROM or DVD. Originally, you used to fit them on a 1.44MB boot floppy. They uncompress themselves as part of the boot process.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you have INTEL-HEX, it consists of rather few different characters, therefor by simply encoding
these as 5 bits instead of 8, you get some reduction.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I assume that the data is AVR code. Since it's 16 bit no of the normal compressons will be optimal.
Try to make table of the most 32 or 64 or??? instructions.(use huffmann to see).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
If you used a code optimized for size, do not expect to find any reasonable compression algorithm as it is already compressed.
Optimization has absolutely nothing to do with compression.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Some time ago I had to improve firmware download time, and cold truth was that transmitting full 512 kilobytes of code memory in binary with Xmodem is still faster than transmitting ~200 kilobytes of code in Intel Hex format. So only used part of the memory is transmitted, full 512 kbytes would take 1.4 megabytes to transmit in Hex.

Many other factors affect as well, how much is the protocol overhead (how big blocks of data you transmit), how many bytes you program at a time to flash in single function call, baud rate of course..

Edit: good point about what data HEX file contains - 16 different digits (0..9,A..F), semi-colon (:), LF/CR pair (13,10). 19 different characters?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Of course depending on your application you could also consider using two, or more, channels at the same time.

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

aPack is very old but was effective and decompression code was small.
http://www.ibsensoftware.com/dow...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Optimization has absolutely nothing to do with compression

Of course it has. Especlially if it is compiled with -s. Compiler looks for the similar patterns in code and replaces it with loops or calls. Or even eliminates irrelevant data. Any time source < output, it is a compression.
I agree most compilers do not search for binary patterns in binary files, but not because they are not alowed to, but rather because they are not clever enough. The only requirement of a compiler is to meet standards. If -s code works and is small, then it is better than when it works and is big.

So if I create a table of

PROGMEM[]  {5,5,5,5,5};

I understand GCC compiler does not optimize it with -S. But I really doubt ANSI C defines that this table must be completely placed in memory, without any optimizations applied.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Probably 50%, more or less, can be saved by shifting from an intel hex file to a binary image.

By the way, optimization is not really compression. It can effect the total code size, but that will be the size of the actual executing code.

Jim

 

Until Black Lives Matter, we do not have "All Lives Matter"!

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How often you need to do this will play a role - if it's once a day it's worth some effort; if it's once a year you can afford to have it take a little longer.

One draconian solution, if you had to do this on a very regular basis, would be to restructure the program into a run time library executing P-code, essentially an interpreter. The feasibility of this approach would depend on the application, but if it were adaptable to this method you would only have to reload the (relatively small) P-code, not the supporting functions themselves. You could have a mechanism which allowed updating all or part of the run time code. Worst case would be a complete reload (what you're stuck with now), while best case might be a couple of kilobytes, again depending on the nature of the application.

Anyway, there are lots of different ways to think about it, many obviously not worth the time it would take to implement.

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is one of the few references where I have
seen "code-compression".

http://spectrum.ieee.org/semicon...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Compiler looks for the similar patterns in code and replaces it with loops or calls.
Optimization != compression. Just because they may share common aspects does not make them equal.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Concur.

For one thing, compression is definitively and always about making things smaller. Optimization might be about making things smaller, but not necessarily. Some optimizations might actually make the code larger.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

This is one of the few references where I have
seen "code-compression".


Not am ImageCraft fan, then, eh?
http://www.imagecraft.com/devtoo...
Quote:
ADVANCED: STD + Code Compressorâ„¢. Decrease program size from 5-15%. Support for 128K bytes and 256K bytes MegaAVRs.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

For one thing, compression is definitively and always about making things smaller.

Definitely and always?

What if you want to make something warmer? (First Law of Thermodynamics, IIRC)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Definitely and always?

What if you want to make something warmer?


:D
Then it is still about making the thing smaller, though this is not the primary objective of the exercise.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You get a 2:1 speedup by sending binary instead of the ascii representaion of hex as two ascii characters. I assume this is a 'field upgrade' of the program? You have a bootloader that accepts the hex file now. I assume you will replace that with a bootloader that will accept a 'packet' with a flag, address, bytecount, 16 or so data bytes, and a checksum. This new fast bootloader might as well attempt to autobaud at a nice fast baud rate like 115200 or 230 Kbps. This will have no effect on the baud rate of the operational program. Win win. Download is faster, baud rate is still Real Slow during normal operation.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

probably help to someone later
http://excamera.com/sphinx/artic...