An oddity of the random() function and other sources of randomness

Go To Last Post
19 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've been comparing various different methods for generating random numbers using the ATMega328 and running the results through a couple of random number test suites to check their quality and a very odd pattern has emerged that baffles me. One test suite is called ent, and among other things, it reads a file of random numbers byte-by-byte and lists the number and percentage of each byte value out of the whole. If you have a dramatic lack of randomness, this points it out pretty easily.

 

Here's the weird part. random() using any seed shows a distict lack of values 15 and 22. In a 20 meg file, there are between 80,000 and 82,000 of every single possible byte value (pretty good distribution) except for 15 and 22 in which there are only 331 and 296 instances respectively. That's absurdly low. At first I chalked this up to a weakness of the PRNG, but ran into it again in other places.

 

For instance: analogRead() on an unconnected pin gave really bad distribution but the byte value 15 only occurred 17 times in a 20 meg file. That's bonkers. Here's another example: Take the two lowest significant bits of analogRead() on an unconnected pin and XOR them to get a single bit and then bitshift those into a long integer. A 20 meg file built up using that method resulted in byte value 15 occuring only 90 times and byte value 22 never occuring once.

 

At that point I figured the mehod I was using to pull the serial data off the USB was faulty (screen -L) but when I put /dev/random through it there is no such similar oddity.

 

My next step is to try programming a 328 with single instructions such as "unsigned long testNumber = random();" and then disassemble the resulting hex file to see what exactly it's doing, but then I figured I'd just ask and save myself the trouble.

 

Anyone have any ideas why this oddity occures?

This topic has a solution.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rohare wrote:

At that point I figured the mehod I was using to pull the serial data off the USB was faulty (screen -L) but when I put /dev/random through it there is no such similar oddity.

What are you talking about? What is screen -L? Elaborate on this section, please.

 

Quote:

My next step is to try programming a 328 with single instructions such as "unsigned long testNumber = random();" and then disassemble the resulting hex file to see what exactly it's doing, but then I figured I'd just ask and save myself the trouble.

Where is random() coming from? avr-libc? That's open source.

 

http://svn.savannah.nongnu.org/v...

 

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For instance: analogRead() on an unconnected pin gave really bad distribution

 

Why would that surprise you? As far as I know, random number generation from analog inputs is usually confined to using the LS bit, often with the input connected to a noise source. Just using an open circuit ADC input I would expect to be picking up 50Hz(or 60Hz, depending on where you live).

 

On the other hand, Halloween is coming up....surprise

Quebracho seems to be the hardest wood.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you have a dramatic lack of randomness, this points it out pretty easily.

The mathematical author Martin Gardner has written some great books. One of them includes a proof that there's no such thing as randomness which roughly follows the followimg argument that if you took a random generator and sampled its output and found that all numbers appeared with equal regularity then there'd be a predictable pattern so it isn't random!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And, coincidentally, it's the 100th anniversary of Martin Gardner's birth today.

http://www.theguardian.com/science/alexs-adventures-in-numberland/2014/oct/21/martin-gardner-mathematical-puzzles-birthday

 

Quebracho seems to be the hardest wood.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But doesn't Gardner's argument just point out the issue of the two definitions of randomness?

 

- that it is impossible to predict the next value in a random sequence

- that a random sequence will contain all values in equal proportion

 

He (I'm working from memory here) states that the requirements of the second enable the first to fail: if you have a 'random' sequence of ten values, by the ninth you know what the next digit must be to maintain the proportion. But he (carefully) ignores that the fact that his second definition ignores the implicit 'of infinite length'...

 

An interesting trivia - while tossing a thousand coins should give a 50:50 heads:tails split, in practice it doesn't: there are far more close points (49:51, 48:52) which, while they have individually lower probability, have a cumulative probability rather greater than the 50:50 split.

 

Neil

 

p.s. Happy Birthday to the late Mr Gardner.

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek3 wrote:
What are you talking about? What is screen -L? Elaborate on this section, please.
GNU Screen is a text-based window manager.

 

The problem isn't with analogRead(), nor with random().  ASCII 15 is CTRL-O and ASCII 22 is CTRL-V.  Either screen or your tty is filtering those out.  I ran into a similar problem with XON/XOFF flow control (which uses CTRL-S and CTRL-Q) which is enabled by default.  Running:

stty -F /dev/ttyACM0 9600 -ixon

... before a capture disables XON/XOFF flow control.  I'm not sure what could be filtering CTRL-O and CTRL-V from your capture, but I'd stop looking at AVR-GCC, AVR Libc, or Arduino as the source of the problem.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"Read a lot.  Write a lot."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Tue. Oct 21, 2014 - 11:44 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Good catch(probably), Joey.

 

However, I would still hesitate to suggest a floating ADC input as a reliable source of random numbers.

 

 

Quebracho seems to be the hardest wood.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I had a requirement for a pseudo random number generator which was implemented as a linear feedback shift register with an outptu register 16 bits long.

 

A polynomial was selected from a set of known polynomials whihc produce a full length epoch ( allpossible 16 bit values are present in a pseudo random output with equal frequency).

 

While not trully random, the epoch is long enough to mask lack of randomness. Spreadsheet simulation suggesteda fairly uniform ( random?) distribution of outcomes.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Where is random() coming from? avr-libc? That's open source.

An interesting discussion.  Some random (pun intended) thoughts in no particular order...

 

1)  From the evidence, I'd also guess that the missing values were somehow "filtered".

2)  The only functionality from C libraries I've ever used is rand()/srand().  Now, these are int/uint.  The random()/srandom() are GCC extensions, with lint/luint to get more range?  Perhaps OP should try the standard function(s) as well?

3)  I'd be interested to hear about the AVR8 application needs for a pure/exacting/conforming random number generator.  In decades of C work, at least half on microprocessors/microcontrollers, I've rarely generated/used random numbers and as long as the sequences were more-or-less random it was good enough.

3a)  After all, e.g. Ethernet "random" collision backoff times work just fine if one picks an arbitrary value from a small table of "random" values.

3b)  Surely, cryptography and "real" secure comms need decent random numbers, if not perfect.  (Is "perfect random number" an oxymoron?) Same with some statistics work.  But (IME/IMO) those aren't really pertinent to AVR8.

4)  When I >>did< do random number work, I found it useful and comforting to be able to use srand() during test runs to generate the same sequence over and over for debugging and such.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just a thought but if you want an embedded app with true random number generator have you considered Cortex? There are ARM parts from both ST and Atmel that have TRNG in silicon. (I think SAM3A and SAM3X have TRNG - maybe others?).

 

(actually I think TRNG may go hand in hand with Ethernet controller for the same reason that Lee gave).

Last Edited: Tue. Oct 21, 2014 - 04:00 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There are ARM parts from both ST and Atmel that have TRNG in silicon.

http://www.kerrywong.com/2014/10...

... with pretty pictures.  Y'all tell me what it means.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ahhh, thanks Joey. It's probably the TTY! I verified that screen -L wasn't the source of the problem, but I didn't think to check the TTY. Dang. That means all my captures have been wrong. I'll double check, but it seems like you're right.

 

As for everyone's concerns, I don't need a TRNG, or cryptographically secure anything. I'm just taking the techniques that are easily available on the platform and checking the quality of entropy with standard test suites.

 

Thanks for all the help!

Last Edited: Tue. Oct 21, 2014 - 06:27 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm just taking the techniques that are easily available on the platform and checking the quality of entropy with standard test suites.

What about my comments/questions about standard rand()  vs. some half-baked random() developed and tested by who-knows-who?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

What about my comments/questions about standard rand()  vs. some half-baked random() developed and tested by who-knows-who?

 

It's on the list! I'm going to be testing more than a dozen, but I've got to get my toolchain and testing methodology straight first.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

While not trully random, the epoch [of a LFSR PRNG] is long enough to mask lack of randomness.

 Be sure you know what you're looking for.  a LFSR will give you nice statistical properties for gaming, but is entirely predictable and completely unsuitable for (say) cryptography.  Also, the math used to give you "small" random numbers from a LFSR may end up removing some of the statistical randomness as well.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

some half-baked random() developed and tested by who-knows-who

Do you mean the function in AVR-LibC?

 

http://www.nongnu.org/avr-libc/u...

 

As Wek says the implementation is open to study here:

 

http://svn.savannah.nongnu.org/v...

 

The check-in's (stable for 5 years) suggest the "who knows who" in this case is Joerg Wunsch - I'm pretty sure that's someone you can generally trust ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

some half-baked random() developed and tested by who-knows-who

Do you mean the function in AVR-LibC?

Indeed I did.  I wasn't getting any response, so I threw out the flame bait.  I ran the mutton chop up the flagpole to see who would bite.

 

I'm still curious why the standard rand() wasn't sufficient, and why the development of random() was justified?  Perhaps rand() was a generic GCC implementation, and random() is "better" on an AVR?  Or the wider return value? 

 

Using your link, and then a Revision History click, I answered my own questions:

Revision 141 - (view) (download) (as text) (annotate) - [select for diffs]
Added Wed Oct 16 15:52:25 2002 UTC (12 years ago) by joerg_wunsch
File length: 2491 byte(s)

Fix and document rand() & Co.

The C standard says that rand() must return `int', but the algorithm
used had once been written with 32-bit ints in mind.  So fix all that
was broken, notably reduce RAND_MAX to 0x7FFF since this is all we can
do with 16-bit ints.  For the not-so-standard rand_r(), still a
pointer to an unsigned long (as opposed to an unsigned int) is required
to store the current context, since anything else would make no sense
given the 32-bit algorithm used.

Remove the traces of a rand_r() test program that was of no use to the
AVR platform anyway.

Add (#ifdef'ed out) the older poorer de-facto PNRG `standard', just
for reference.

Add a second file random.c, implementing a complementary set of
functions with `rand' replaced by `random', operating on 32-bit
integers (long int).  The algorithm used is identical, only the
result type is different.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

random() is not an AVR Libc phenomenon:

http://linux.die.net/man/3/random

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"Read a lot.  Write a lot."

"We see a lot of arses on handlebars around here." - [J Ekdahl]