I suspect this is a question for ADC experts -- those who know a lot about the inner workings and foibles of ADCs.
The background: I've created a noise source using an avalanche diode with the following characteristics: Bandwidth (roughly) 500kHz, Amplitude 370mV RMS, about 2V pk-pk. The signal has been analyzed using a Tek scope with a record of 12.5M samples and shows a very flat power spectrum (estimated using the Welch method). This signal is connected to one of the ADC input channels on an ATmega328.
The ADC is running with 125kHz clock (9.6kS/s) -- it is intentionally undersampling -- aliasing is desirable for my purposes (true random number generation). The ADC has 3.3V Vref and the ATmega is running at 5V with 16MHz system clock. The RMS value of the ADC data is about 330mV which is a good indication that the analog input bandwidth is close to the signal bandwidth and that I'm getting a lot of aliasing (since the input signal is 370mV RMS). Because the noise doesn't span the entire ADC input range, only the 8LSBs are being used (a total range of 825mV).
So here's the issue. When this data is analyzed some odd behavior is found. Taking a large sample of data (e.g. 10 million values), the number of occurances of each bit value (0..255) is counted and graphed (attached image). Some values are quite a bit less likely to occur than others -- and this behavior has an exact periodicity of 16 ADC counts. For example, values with the four LSBs 0011 are much more likely than those with LSBs 1011.
This really smells like an issue with the ADC, and if so it may well have something to do with the high frequency content of the input signal. Any thoughts or suggestions for figuring this out would be appreciated.