Converting a float into an int with greater than 11 bit exponent?

Go To Last Post
10 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi All,

This has probably been asked for in some shape or form. It's not a problem I've come across as I rarely do float maths, but the PID library I'm using is doing just that. 

 

The PID library (arm_pid_f32) returns a float32_t of value between -1.0 and 1.0. I limit the lower to 0.0.

I need to map this onto a 12-bit uint16_t, so 1.0 = 4095

Casting the float into a double offers a 11 bit exponent which isn't enough to do a *4095 operation. 

float32_t value = arm_pid_f32(&pid, motion.target - motion.kph);
  if (value < 0.0) {
    value = 0.0;
  }
  printf("target %f \n", value);

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Go on.   The mantissa holds the numeric value i.e. bits of resolution.  The exponent is the multiplier.

 

 

 

You can simply say uint16_t val = arm_pid_f32 * 4095.0;

Equally well, uint16_t val = arm_pid_f32 * 4095;

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I thought the exponent was fixed, so the max value of that part being 255. Now I now it's a multiplier - thanks!.
 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

People go to great lengths to avoid floating point.

 

This is crazy.   Most projects have enough Flash memory to accommodate the f-p code.   The execution time is pretty quick.

And once you have one f-p call,  the f-p code is linked anyway.

 

So if your PID library functions use f-p,   the f-p code is already added by the linker.

It is straightforward to scale values perfectly by simply multiplying an expression with a f-p value.

You don't need to worry about intermediate expression overflow, underflow,  integer size, ...

The calculation is performed correctly with ~ 24-bits of precision.   And the result is cast to whatever type you are assigning to.

 

No,  you don't want to handle financial transactions with f-p.

But if you intend to scale an expression to fit 0-4095 the result only has 12 significant bits.

Even scaling to an uint16_t there are only 16 significant bits.

Single precision f-p with 24-bits of precision is fine for this kind of scaling.

 

Note that a single statement like uint16_t result = value * 1.234567 is unambiguous.

It will be 100% correct for any input value between 0 and 53082

 

Yes,  you can use integer maths but it looks fairly hairy and is prone to overflows / underflows for a wide range of input value

e.g uint16_t result = ((uint32_t)value * 1234567uL) / 1000000uL is only valid for 0 to 3478

 

You would need to cast to uint64_t if you want an acceptable range of input values.   I bet that the 64-bit math is slower and fatter than the f-p version.

 

Of course,  embedded apps often know the value range.    If it is always within 0-3478 the integer statement is 100% correct.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
Most projects have enough Flash memory to accommodate the f-p code

and, as we're in the ARM section, many ARMs have hardware floating point anyhow.

 

But you do need to beware of float-vs-double: some FPUs only do single precision - so double still calls-in software FP.

 

david.prentice wrote:
once you have one f-p call,  the f-p code is linked anyway

Indeed. Surprising how few seem to appreciate this.

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes I have the FPU (float only, M4). 

One question, is it generally faster to stick with floats and the FPU or stick to int maths all the way through? 

At the moment I'm crunching a lot of numbers in a buffer, it takes the M4 about 0.25s to complete.

I wonder how fast the FPU is, I guess it doesn't compute something relatively complex within one CPU clock so blocks the CPU while it computes.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Go on.   Nothing can take 0.25 seconds.

 

Even if it does take 0.25 seconds,  it is not going to make much difference for a one-off calculation.

 

If you are controlling a Nuclear Bomb or Guided Missile you probably want better performance.

 

Why would you want to be crunching a lot of numbers in a buffer ?

Give some example.

 

Yes,  hardware FPU is going to be faster than software.   Integers will be faster than f-p.   There is no point in using the wrong sort of variable.

Execution time generally depends on using the appropriate algorithm.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm doing power analysis, the ADC is buffering 17,500 samples in 0.35s by DMA and then they get crunched. For the crunching, I'd best post some code, so that's below. You can see that I'm using int maths for most of it including sqrt, only P_RATIO is a float and it doesn't need to be. I'm unsure if it's worth investigating the benefit of FPU float's especially as the INT's are large and may well fall into double category which will certainly be slower without the FPU.
 

#define V_RATIO ((float32_t)ADC_AC_CURRENT_SENSITIVITY/ADC_CVT_DIV)
#define I_RATIO ((float32_t) ADC_AC_VOLTAGE_RESISTOR_A/ADC_CVT_DIV)
#define P_RATIO (V_RATIO * I_RATIO)

static uint16_t sqrt32 (uint32_t n) {

  uint32_t c = 0x8000;
  uint32_t g = 0x8000;
  while (1) {
    if (g * g > n) {
      g ^= c;
    }
    c >>= 1;
    if (c == 0) {
      return g;
    }
    g |= c;
  }
}

void adcWork (void) {

  // clear
  memset((uint8_t*) &adc.ac, 0, sizeof(adc.ac));
  memset((uint8_t*) &adc.dc, 0, sizeof(adc.dc));

  struct {
    int64_t acVoltage;
    int64_t acCurrent;
    int64_t acPower;
    uint64_t dcVoltage;
    uint64_t dcCurrent;
  } accumulators = { 0, 0, 0, 0 };
  uint32_t value;

  // remove offsets and convert
  struct powerReadProto *p_buffer = &adc.buffer[0];

  while (p_buffer < &adc.buffer[ADC_POWERREAD_SAMPLES]) {
    p_buffer->acVoltage -= adc.calibration.ac.voltage;
    p_buffer->acCurrent -= adc.calibration.ac.current;
    if (p_buffer->dcVoltage > 0) {
      accumulators.dcVoltage += p_buffer->dcVoltage;
    }
    if (p_buffer->dcCurrent > 0) {
      accumulators.dcCurrent += p_buffer->dcCurrent;
    }

    accumulators.acPower += (int32_t) p_buffer->acVoltage * p_buffer->acCurrent;
    accumulators.acVoltage += (int32_t) p_buffer->acVoltage * p_buffer->acVoltage;
    accumulators.acCurrent += (int32_t) p_buffer->acCurrent * p_buffer->acCurrent;
    p_buffer++;
  }

  // dc voltage and current
  if (accumulators.dcVoltage) {
    value = accumulators.dcVoltage / ADC_POWERREAD_SAMPLES;
    value *= ADC_CVT_MULT*ADC_DC_VOLTAGE_RESISTOR_A;
    adc.dc.voltage = value / ADC_CVT_DIV;
  }
  if (accumulators.dcCurrent) {
    value = accumulators.dcCurrent / ADC_POWERREAD_SAMPLES;
    value *= (ADC_CVT_MULT * ADC_DC_CURRENT_MULTI);
    value /= (ADC_CVT_DIV * ADC_DC_CURRENT_RESISTOR * ADC_DC_CURRENT_SENSITIVITY);
    adc.dc.current = value;
    // wattage
    adc.dc.wattage = (value * adc.dc.voltage) / 1000U;
  }

  // real power
  adc.ac.power.real.wattage = accumulators.acPower > 0? ((P_RATIO * accumulators.acPower) / ADC_POWERREAD_SAMPLES) : 0;

  // apparent voltage
  value = accumulators.acVoltage / ADC_POWERREAD_SAMPLES;
  value = sqrt32(value);
  value *= (ADC_CVT_MULT*10U*ADC_AC_VOLTAGE_RESISTOR_A);
  value /= ADC_CVT_DIV;
  adc.ac.power.apparent.voltage = value;

  // apparent current
  value = accumulators.acCurrent / ADC_POWERREAD_SAMPLES;
  value = sqrt32(value);
  value *= ADC_CVT_MULT*ADC_AC_CURRENT_MULTI;
  value /= (ADC_CVT_DIV*ADC_AC_CURRENT_SENSITIVITY);
  value /= ADC_CVT_DIV;
  adc.ac.power.apparent.current = value;

  // apparent power
  adc.ac.power.apparent.wattage = (value * adc.ac.power.apparent.voltage) / 1000U;

  // power factor
  value = adc.ac.power.apparent.wattage;
  value *= 100U;
  adc.ac.power.factor = !adc.ac.power.real.wattage? 0 : value / adc.ac.power.real.wattage;

  // find zero cross points
  struct powerReadProto* zerocross[3];

  // voltage zero cross
  p_buffer = &adc.buffer[0];

  bool isHigh;
  for (uint8_t i=0; i<3; i++) {

    // advance until outside of zerocross area
    while (1) {
      if (p_buffer == &adc.buffer[ADC_POWERREAD_SAMPLES]) {
        adcSample();
        return;
      }
      if (p_buffer->acVoltage > ADC_ZEROCROSS_OFFSET) {
        isHigh = true;
        break;
      } else if (p_buffer->acVoltage < (0 - ADC_ZEROCROSS_OFFSET)) {
        isHigh = false;
        break;
      }
      p_buffer++;
    }

    // find zerocross
    while (1) {
      p_buffer++;
      if (p_buffer == &adc.buffer[ADC_POWERREAD_SAMPLES]) {
        //printf("hz failed \n");
        adcSample();
        return;
      }
      if ((!isHigh && p_buffer->acVoltage >= 0) || (isHigh && p_buffer->acVoltage <= 0)) {
        zerocross[i] = p_buffer;
        p_buffer++;
        break;
      }
    }
  }

  // voltage samples
  uint32_t samples = (zerocross[2] - zerocross[0]) * ADC_SAMPLE_PASS; // us
 
  // frequency
  adc.ac.frequency.value = 10000000U / samples; // us to hz

  adcSample();
}

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

snoopy33 wrote:

  for (uint8_t i=0; i<3; i++) {

Remember that ARM is 32-bit - so using a uint8_t might not be optimal ...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

To be honest,  my eyes glazed over when I saw the reams of code.

 

I can understand two independent ADC channels that read voltage and current via DMA.   i.e. costs virtually nothing in CPU time.

I doubt whether anything requires 17500 sets of samples.    Hey-ho,   modern ARMs have a lot of processing power.

 

I did not really see what you were calculating.   Surely instantaneous power is V x A.   You can determine zero crossing, peak power, ... or perform all sorts of signal transformations.

 

An ARM's natural int is 32-bit.   I suspect your ADC is no more than 12-bit or 14-bit.   Which means that you can do most things in regular 32-bit integer maths.

 

So it comes down to choosing a sensible number of samples.   And a sensible post-processing algorithm for the results.   e.g. think carefully about what needs 32-bit maths and if you ever need 64-bit integers or doubles.

 

My earlier comments were about the typical AVR project.    You normally choose a convenient set of units for your calculations e.g. milliVolts.   Perform some trivial maths and then convert to a human readable 12.345V decimal number.    It is far more important to get the correct human-readable display than worry about execution time.    For example you might do all your internal calculations in some arbitrary units but just multiply with a f-p scaling value to create the human-readable output.   Or you might use f-p maths throughout.    The time does not matter when the calculations are infrequent (like updating a display for humans).

 

David.