how to compress the double float string

Go To Last Post
91 posts / 0 new

Pages

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What methods are there?

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

You've already mentioned, RLE, Huffman. There is also arithmetic coding and deflate (heavily uses Huffman compression) among most often used algorithms.

 

But you are really asking for much. The only way to compress the data is to take advantage of the data itself. For example, if your numbers are expected to be close to each other, then you can just send the difference and other stuff like that.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

Last Edited: Thu. Aug 6, 2015 - 05:00 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Also, you can pre-compute Huffman tables and have them stored on both sides. But for floating point data, the tree will be balanced and you won't save much.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

Any compression takes advantage of the redundancy of the data. If you are saying that your data is not redundant in any way, then it is uncompressable.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Like Alex says I'd do a Huffman static analysis on a large sample of potential data and then hard code the encoding tree in the device. However I would be astonished if this really helped very much.

 

What is your bandwidth limiter here anyway? Is it the bit rate on the radio channel or something else? Is there nothing that can be done to increase it?

 

Trying to compress data that isn't very compressible looks like a bit of a "clutching at straws" solution to me!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I try again, why do you need FP?

If data come from an ADC either sum or sqr sum or ? the nature are integer, and you lose precision by converting to FP. (at least 32 bit FP)

 

Can you tell what there are done with the samples before you want to send?
 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@alexru i undertand your point but what compressions i can do on binary data?

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

You can do any compression you like. But if this is purely random data, then it won't compress no matter how hard you try. That's the definition of randomness - there is no structure in the data that compression algorithm can take advantage of.

 

So you are clearly not getting my point.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

why are you so focused on losslessly when you use FP that loss data more than an integer. (remember that you only have 24 bit "number" on a normal float).

 

what are your max 999999 or 99999 ? (you use both).

 

if we say 999999

so if max 3 digits after decimal point

999999,999     999999999 is max, that is 30 bits, with -+ that is 31, so all values can fit 100% into a 32 bit integer so 4 bytes, (if you use FP you won't be able to get that precision)

 

or am I missing something?

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@sparrow2 float number is the result of mathematical calculations which gives me desired output so to save some data, it is being converted into integer and now as i can send binary data so i want to compress binary data.

thanks 

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That's why i am asking you sir, is there any losslessly compression which will help me to compress the binary data which is of numeric type and as i said present data may or may not be totally different from the previous value and future value.

thanks 

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

What answer do you want to hear? You need to clearly define limitations on your numbers.

 

For example, numbers in the range -99 999...+99 999 can be represented with 18 bits. Here, I just saved you 14 bits per value.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But i am already converted that float into integer so will that work on integers ?

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

my range is -999999 to +999999 and how did you save 14 bits per value?

Thanks

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sorry, yes you saved 18 bits by sending data in binary form

Thanks

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

trulysahil wrote:
sorry, yes you saved 18 bits by sending data in binary form
No, I've saved them by not having to send bits that I know will be 0. That only works for integers.

 

If you are using 32-bit integers, then each of them takes 4 bytes (32 bits, naturally). 32 bits can represent numbers in the range -2 147 483 648 ... 2 147 483 647. This is way more than you need. Numbers in the range -999 999 ... 999 999 can be represented with just 21 bits. In fact 21-bit value can represent any numbers in the range -2^20 ... 2^20-1 or  -1048576 ... 1048575.

 

So you need to chop away 10 most significant bits on transmission and add them back on the receiving side. Of course you can only send multiples of 8-bits, so you can't efficiently send a single value, but you can combine the entire array into one big stream of bits and split this stream into bytes.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

Last Edited: Tue. Aug 11, 2015 - 12:09 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

Another thing you can do is just send lower 3 bytes. That will be 24 bits, so you will be loosing 3 bits of compression on each sent value, but implementation is much easier and you still get 25% compression ratio.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

Last Edited: Tue. Aug 11, 2015 - 12:10 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What are you 'converting' into float?  What is the >>source<< of those numbers?  How are they converted?

 

In a previous post you said:

Quote:
but i can't send adc values as the value which i am talking about is accumulated values for certain time and sending after certain period of time so it has to be converted into their corresponding format
Describe >>precisely<< what you are doing to arrive at these floats.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But if you do those <mathematical calculations> with only 24 bit precision (a normal float), you will send some bit where the rounding errors are bigger that the value

they represent, and that is not efficient.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

alexru wrote:
So you need to chop away 10 most significant bits on transmission and add them back on the receiving side. Of course you can only send multiples of 8-bits, so you can't efficiently send a single value, but you can combine the entire array into one big stream of bits and split this stream into bytes.

This is effectively the same as how SMS manages to pack 160 7-bit characters into a 140-byte payload ...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

but didn't you say that you need a decimal part as well or are 999999, your FP of 9999,99 ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

trulysahil

 

You seemed to be deliberately evading questions here. Several times you have been asked about the precision that is actually going into the numbers to be conveyed in the first place and what the source of he numbers is and how you are calculating them but every time you ignore the question. There is purpose in these questions. For one thing if it's an AVR making ADC readings using it's own ADC then you cannot have more than 10 bits to convey in the first place - sure you may be doing rolling averages or whatever but at the end of the day uyou can't make a silk purse from a sows ear and the ADC is just 10 bits of info.

 

Then there's the actual purpose of this "compression" anyway. The usual reason to employ compression is because storage or bandwidth is limited. as this is about a communications channel then the limit here is presumably bandwidth so I asked previously what that bandwidth limit was. Again this was to try and understand how much bit reduction you actually need or whether there might be an alternative mechanism to overcome such bandwidth limitation.

 

So any chance of answering some of the questions already posed to you in this thread?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@clawson

actually i am working on the group assignment and what i know is that they will send some data 

which could be anything i means values float, long, short etc and i need to compress it.

 

I have never read anything related to compression before and have gone through some algorithm like RLE, Huffman and some other combinations but now what i have understood so far is i can't do compression without knowing the nature of data and only possible way is binary compression because in that i dont need to know the structure of the data.

 

Through all the answers on this thread, i got to know some other things which i didn't know before.

 

Now if i am making sense to you, would you please suggest something on it?

My aim is not save memory for storage, my aim is to pack the whole data into best smallest form so that when i send it takes less bytes.

Thanks   
 

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

trulysahil wrote:
i can't do compression without knowing the nature of data and only possible way is binary compression because in that i dont need to know the structure of the data.
You are incorrect here. All compression algorithms work with alphabets of characters, which can vary depending on type of data you are compressing, but the general idea is the same. Take Huffman and do it with binary data, there is noting wrong with it. I think you are reading simplified examples and assuming that what is shown in the example is the limit of what an algorithm can do.

 

trulysahil wrote:
My aim is not save memory for storage, my aim is to pack the whole data into best smallest form so that when i send it takes less bytes
The best compression method depends on your data. As I've said multiple times already, absolutely arbitrary and random data is not compressible.

 

Take a few of the algorithms you already know, implement them and compare how they work. And if you think that some of them are not applicable to binary data, then this is something you should work on first. 

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Would you please explain me what does "depends on your data" means ?

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, you are right, by readings simplified examples and assuming that what is shown in the example is the limit of what an algorithm can do.

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Quote:
Now if i am making sense to you, would you please suggest something on it?
No you are not making sense to us, because we have made many suggestions and asked you many questions, to which you have given no answers.  Start with those and get back to us.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 3

trulysahil wrote:
Would you please explain me what does "depends on your data" means ?
There is no universal algorithm that just works for everything.

 

For example, RLE will work better if you expect to see long streaks of 1's and 0's in your data (assuming that you are working with a binary alphabet). So if you transmit some bitmask value, which is 0 most of the time, and occasionally there is a bit set to 1, then RLE will work great.

 

Other algorithms have their strengths. If your data represents video, then doing RLE would be stupid, you want an algorithm that can take advantage of the repetitive nature of the data.

 

To come up with the best compression method you need to do some statistical analysis on a typical set of data and see if you can find some patterns. If you do that and it will turn out that your data is pure random noise, then no compression will help.

 

There is a very simple illustration - compress some data using ZIP and try to compress it again using the same ZIP archiver.  You won't get good results on the second run. Why? Because the first run already removed as many regularities as it could find, producing uniformly random data. It is incompressible any further.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

Also note that most of the actual compression methods are hybrids of pure academic compression algorithms. They try to compress different chunks of data using different algorithms and pick the best one for that specific chunk (see DEFLATE as an example).

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I will add:

And how much it can compress depends greatly of how big a data chunk you try to compress, the bigger the better. (on PC's etc, it's normal to have buffers of more than 1G byte! )

 

I you should invent an "optimal" solution for your kind of job, you should have a lot of actual data in a big "file" in both ends and use that as a (or more) code book(s).  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As you said that the huffman algorithm cane applied on binary

How can we do that ?

I didn't find anything relative to it.

So could you please help me with that?

Thanks

 

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

trulysahil wrote:
As you said that the huffman algorithm cane applied on binary. How can we do that ?

The same as you would use it on ASCII. Your alphabet would be all possible values from 0 to 255.

 

Huffman is essentially counting how many times each character from the alphabet has occurred and assigning the shortest encoding to the most common characters.

And 'alphabet' here can be anything - just ASCII characters, full 8-bit bytes, or even entire words. It does not matter, the principle is the same - define a limited set of distinct entities in your data (call this set 'an alphabet') and find how often each entity appears in the input text.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Also, have you tried wikipedia? It has a very detailed article on Huffman coding.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I didn't find anything relative to it.

How hard did you try searching? I just typed "c code to build static huffman" into Google and it said:

 

http://www.programminglogic.com/...

 

Now that example is talking about a "dictionary" of "symbols" that are just 'a'..'z' so it is a little simplified. Your "dictionary" is going to be all the symbols from 0x00 to 0xFF. However it illustrates the key technique here. You need to do a frequency analysis on a large quantity of sample data and build a probability mapping for it. If you generate 1,000,000 of these floating point samples say. Write them all to a file and then write a short program to open and read every byte in the file you should be able to come up with figures like 0x00 appears 4.27% of the time, 0x01 appears 2.03% of the time, 0x02 appears 0.04% of the time and so on. Or perhaps more simply, probability:

 

0x00 - 0.0427

0x01 - 0.0203

0x02 - 0.0004

etc.

 

When you have the probabilities you can then build a code tree (the 0/1 branches shown on that web page). At the end of the day all the "leaves" will be the values 0x00 to 0xFF and the path from the "root" to each will be a sequence of 0's and 1's. Because 0x00 is a much higher probability (0.0427) than 0x02 (0.0004) it will have a much shorter sequence of 0/1's in the encoded symbol. This is the whole point of Huffman. Based on a sample of data you encode the symbols you know will occur more often with less 0/1's than the rare ones. While in the original alphabet all symbols from 0x00 to 0xFF had 8 bits you will find in Huffman that the very common symbols (0x00, 0xFF etc) might be only 3 or 4 bits long while some value that hardly ever occurs (0xD7 say) might now have a bit pattern that is 13 or 14 bits long.

 

As long as you sampled a lot of data (and the samples were truly representative of all future data) then you know that 0x00 will always compress to just 4 bits (say) so every time it occurs in the data you now save 4 bits. If you have a run of bad luck and manage to generate a number of samples that happened to include 0xD7 (say) then you will have a number of 13/14 bit sequences which would actually bloat the data. But because the common symbols occur more often than the rare ones the hope is that the "average" bit length is below 8 so you save almost all the time.

 

One thing you should be aware of, of course, is that your "symbols" no longer neatly fit into 8 bit bytes for transmission (assuming your transmission/storage system is 8 bits - most are). So say the data was 0x4F, 0x00, 0x2B, 0x55 and these each have the symbol sequence (in bits) 1101001, 0101, 10110, 1001001101 then what you pass/store needs to be fitted to 8 bit boundaries.:

 

1101001, 0101, 10110, 1001001101

# group in 8's

1101001,0 101,10110, 10010011 01

 

So the storage/transmission is 0xD2, 0xB6, 0x93 and 01 at the top of the next byte stored/sent. This has reduced 4 bytes to just 3 bytes and 2 bits so it has saved 6 bits but the serialisation/deserialisation is something you do need to think about.

 

To encode/decode both sender and receiver need a copy of that tree with the 0/1 branches.

 

As Alex says there's tons about this all over the internet. It should be easy to find both code to do the probability analysis and tree generation and then code to do the actual symbol encoding/decoding easily.

 

(your hope - for best compression rates - is that your probabilities are hugely biased - just a few symbols that occur very often (very short bit patterns output) and a load that are very rare (long bit patterns that you don't care about as they are so infrequent).

 

Oh and needless to say the likes of Knuth, Dykstra, Turing, (Huffman of course!) and all the other computer "Gods" from the 1930's, 40's, 50's will have done all the analysis of this stuff many decades ago and there are copious quantities of Computer Science text books that will cover this and all other forms of symbol encode/decode that will be worth searching out. I wish I could remember the name but when I was at university we had a great reference that covered all the well known computer science tasks - encryption, compression, sorting, etc with just the right amount of detail on each. It seems a shame that the entire education of the modern engineer appears now to solely fall to single pages at Wikpedia!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

alexru wrote:

Let's say you have an array of floating point values "float arr[25];". You can interpret this array as an array of bytes: "unsigned char *bytes = (unsigned char *)arr;".

 

 

Could you please explain this how does this work?

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

The memory in a microcontroller is organized as a block of bytes, so any other types must be represented as a combination of some number of bytes.

 

To store a float value it takes 4 bytes (8 for double).

 

You can cast any type to any other type in C (with some limitations that are not important here), so in this case an array of floating point numbers is cast to an array of unsigned char values.

 

So if your array of floats was stored in memory like this: "F0 F1 F2 F3" (4*4 = 16 bytes), then its byte representation would be "B0.0 B0.1 B0.2 B0.3 B1.0 B1.1 B1.2 B1.3 B2.0 B2.1 B2.2 B2.3 B3.0 B3.1 B3.2 B3.3" (still 16 bytes). What part of the floating point number goes into what Bx.y depends on the endiannes of the MCU, so you need to think about that as well if you want your code to be portable to other platforms.

 

 

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:

A float is stored internally (in AVR GCC) as 4 bytes.  That's all you need to send.  Period.  If you are unable to figure out how to send binary (which should be >>very<< easy), the worst-case scenario would take 7 Base36-encoded characters to encode one float, 13 characters to encode two floats, 19 characters to encode three floats, etc...  For Base64, 6 characters will encode one float, 11 characters will encode two floats, 16 characters will encode three floats, etc...

How the 4 byte float will take 7 base36 -encoded characters?

I am assuming you are taking the maximum value of 4 byte float and encoding with36 base here.

It's a matter of start when you do, you find self confidence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

4 bytes is 32 bits. Base 64 encodes 6 bits in one ASCII character, so you need 32/6 = 5.3 characters. You obviously need to round it up to 6.

 

Base36 is tricky to work with, and I don't know why you would want to work with that. But Base32 is nice, and you still can encode any float value into 32/5 = 6.4 Base32 characters. So even with Base32 you can send any float number in 7 characters.

 

Conversion code would look like:

Quote:

char map[] = "0123456789abcdefghijklmnopqrstuvxyz"; // This should be at least 32 characters.

float f = 12345.6789f;

uint32_t i = *(uint32_t *)f;

char c[7];

 

c[0] = map[(i >> 0) & 0x1f];

c[1] = map[(i >> 5) & 0x1f];

c[2] = map[(i >> 10) & 0x1f];

c[3] = map[(i >> 15) & 0x1f];

c[4] = map[(i >> 20) & 0x1f];

c[5] = map[(i >> 25) & 0x1f];

c[6] = map[(i >> 30) & 0x1f];

 

C is the result encoded in Base32.

 

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

Last Edited: Mon. Oct 5, 2015 - 02:23 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

trulysahil wrote:
How the 4 byte float will take 7 base36 -encoded characters?
4 bytes is 32 bits, is 4,294,967,296 different values.  36^6 is 2,176,782,336, and 36^7 is 78,364,164,096, so it will take 7 base36 digits to represent all of the values possible in 32 bits i.e. in a float.

 

As I said earlier:

joeymorin wrote:
In general you can compute the minimum number of characters to encode a byte of data into a given base B with log 256 / log B (and rounding up).  For Base 36, that's a little more than 1.54 characters per byte.    For base 64 it's 1.3333 characters per byte.
Drawing from that general case:

log 4,294,967,296 / log 36 = 6.189644916

Rounding up, that's 7 digits.

 

alexru wrote:
Base36 is tricky to work with, and I don't know why you would want to work with that. But Base32 is nice, and you still can encode any float value into 32/5 = 6.4 Base32 characters. So even with Base32 you can send any float number in 7 characters.
Agreed, but the OP specifically referenced base36.

 

Although trickier to work with, base 36 can be a bit more efficient when sending larger groups all at once.  Sending 4 floats at once i.e. 128 bits will take 26 base32 digits, while in base36:

log (2^128) / log 36 = 25 digits.

 

The gains are hardly worth it IMO.  However this thread has been concerned with compression.  The OP has yet to answer a number of questions w.r.t. the nature of the data, and the reasoning behind the desire to compress it.

 

Furthermore:

trulysahil wrote:
@Clawson, I have figured out how to send in binary format
... so I don't understand why the OP is returning to the question of baseN encoding.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Mon. Oct 5, 2015 - 03:20 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If this is the format OP want, then there should be some bit's (2 I think) in exp, that never change. (+- max value can be inside 2**32).

That give log(2**30)/log(36)= 5.8 so it can be inside 6 digits.

 

Even 31 bit just stay under 6 :)

Pages