Enabling FPU on SAMV71Q21

Go To Last Post
5 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello, I've been trying to enable the FPU on a SAMV71Q21 (on the ATSAMV71-XULT devkit).
I followed the instructions of the AN_44047 Application note (which seems a bit outdated), and changed my compiler flags from:
-mfloat-abi=softfp -mfpu=fpv4-sp-d16 to -mfloat-abi=hard -mfpu=fpv5-d16

I added the -mfloat-abi=hard to the linker flags as well as I got a compiler error that the object files and the .ELF file was not using the same instruction set. After this the code compiled. However, it ran into a runtime error immediately upon execution. Even before I got the chance to set up the UsageFault_Handler. This problem was caused by the AtmelSTART generated ASF4 code, which implements and calls the fpu_enable function that is describd in the AN above

_fpu_enable();

in the _init_chip() function that is (in heavy cascade through a bunch of other init functions) called from the main() function. This is of course way too late, since the first floating point instructions are being executed in the Reset_Handler() interrupt callback from the file "startup_samv71q21b.c" by the function "__libc_init_array();".

 

So I moved the _fpu_enable(); to execute before the libc_init and now the code finally run.

However, I'm still uncertain if I missed something, and if I have all functionality that's provided by the FPU (I have also included arm_math.h and the CMSIS DSP library in the project). So does anyone have any comments to add to this? Are there any other registers or Handlers that should be configured in order to put the FPU to it's full use? I've been trying to read the ARM documentation, but to an unexperienced embedded programmer, such as myself, that documentation feels like a jigsaw-puzzle suspended in a massive spider-web. Bits and pieces of relevant and irrelevant information spread across hundreds of documents and thousands of small 20-line pages that is sometimes aimed towards ASIC designers and sometimes towards programmers.

 

Anyway, this brings me to my second problem (which is related to the above):

Using floats where I previously used integers as placeholders gave me problems with freeing up dynamically allocated memory because of "unalignment". Now, let's temporarily ignore the fact that I probably shouldn't use the malloc() family at all in an embedded system (as I've learned debugging this problem), because the problem seems to be related to the way that floats are stored in the memory and that I just randomly happened to observe it in a call of the "free()" function. I don't fully understand how ARM processors handle alignment (or even how alignment works in the first place), but from what I've been able to figure out it seems like the core expects 4-byte aligned input values to the LDM, STM, LDRD, and STRD instructions (ref: ARM docs ). I've noticed that my code crashes upon "lds" instructions when looking at the disassembly, which I guess is what ARM calls LDM (?). I've also seen mentions in the ARM docs that most ARM processors don't even have capability to store unaligned data, so that this shouldn't really be an issue - but as I'm seeing this error I suppose that the SAMV71 implementation allows for that, and utilizes it during float operations. Can anyone comment on whether this is true? I can't find much information about the FPU in the datasheet. And if this is the case: how can I make sure that the floats are aligned in memory?

 

Finally: quickly back to the dynamic allocation that we ignored above. If the malloc() family is a nono, how am I supposed to store a local array that contains a few hundred ADC readings, performs a few float operations on them and overflows the stack?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why do you think AN 44047 is outdated?

 

If you want to be sure that hardware floating point is being used, time a floating point operation with the SYSTICK timer. Single-precision floating point operations (add, sub, mul) should take 2 cycles of the CPU clock and div 10 cycles, presuming the operands are in memory. Fixed-point approximations will take 10x as long as hardware Floating Point. Alternatively, look at the compiler disassembly (available on Atmel Studio under the Debug menu) and see if the floating point operations are being used. Since you were getting the hard fault when _FPU_enable() was in the wrong spot, you likely do have it right.

 

You can't find much info on the FPU in the datasheet, because Atmel/Microchip refer to the ARM documentation for that. The processor is an ARM design, essentially identical within a small set of options whether you buy from Atmel/Microchip, ST, NXP, or Renesas. The compiler should generate proper alignment for you, unless you explicitly tell the compiler to not do that with a pragma. http://infocenter.arm.com/help/i...

 

Using dynamic memory allocation may be discouraged in embedded programming circles, but it is not forbidden. The main reason is that most micro controllers are quite memory limited. Not so long ago I was programming on ATMEGA 168P with 1kB of RAM. It could be unclear if dynamically allocating 64 bytes was even possible. With up to 384kB SRAM in the ATSAME70 family, and the designer's option for external SDRAM, that is far less of a concern.

Josh @ CIHOLAS Inc - We fill the gaps from chips to apps

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

josh.w wrote:
AN 44047 is outdated?

Outdated is a stretch. Just missing some information that's mainly concerning ASF4 (such that the fpu_enable() function is already implemented in the code). Also it's only describing/using ASF3 functions.

 

josh.w wrote:
The compiler should generate proper alignment for you

That's all the info I've been able to find myself as well. But apparently the variable isn't properly aligned (or, of course, I'm misinterpreting the Fault that's being triggered), and I'm not doing anything funky with it (no casting pointers or anything). Plus, the code worked as is before I changed from uint32_t to float32_t.

 

josh.w wrote:
unless you explicitly tell the compiler to not do that with a pragma. http://infocenter.arm.com/help/i...

The information you're linking may very well still be true, but as far as I can tell that document covers the ARM compiler, not GCC which is used by Atmel Studio.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

simenzhor wrote:

The information you're linking may very well still be true, but as far as I can tell that document covers the ARM compiler, not GCC which is used by Atmel Studio.

 

Here are the equivalent pragmas from gcc's documentation: https://gcc.gnu.org/onlinedocs/g...

It certainly couldn't hurt to tell the compiler to force alignment on a 4- or 8-byte boundary.

 

Failing that, telling us a bit more about what operation seems to trigger the fault and exactly which fault bit(s) are set would help us help you.

Josh @ CIHOLAS Inc - We fill the gaps from chips to apps

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

josh.w wrote:
telling us a bit more about what operation seems to trigger the fault and exactly which fault bit(s) are set would help us help you.

Absolutely. I didn't have the code in front of me yesterday, but here is a somewhat simplified version of what I'm attempting to do. The code that's been omitted is mainly related to configuring an ASIC that's connected through SPI.

void calibration(void)
{
  int		min_delay = 35;
  int		max_delay = 350;
  const int	num_delays = max_delay - min_delay;
  int		num_charges_per_delay = 100;
  int		delay_idx = 0;
  float32_t	*adc_avg;
  float32_t	*adc_std;
  float32_t	*tmp_adc_readings;

  if(( adc_avg = calloc( num_delays, sizeof(*adc_avg) ) ) == NULL){
    gen_serial_printfln("Error: Calloc for adc_avg didn't work");
    return;
  }
  if(( adc_std = malloc( num_delays * sizeof(*adc_std) ) ) == NULL){
    gen_serial_printfln("Error: Malloc for adc_std didn't work");
    return;
  }
  if(( tmp_adc_readings = malloc( num_charges_per_delay * sizeof(*tmp_adc_readings) ) ) == NULL){
    gen_serial_printfln("Error: Malloc for tmp_adc_readings didn't work");
    return;
  }

  // a bit of application specific code. It configures an ASIC over SPI, doesn't touch any of the floats.

  for ( uint16_t delay = min_delay; delay <= max_delay; delay++ )
  {
    delay_idx = delay-min_delay;
    gen_serial_printfln("Testing delay: %u",delay);
    for ( int charge = 0; charge < num_charges_per_delay; charge++ )
    {
      //send data through SPI to ASIC, read ADC value over I2C in an interrupt handler.
      //The code below is simplified. Should be waiting for a flag from the interrupt handler, and handle errors if no flag is set.

      tmp_adc_readings[charge] = (float32_t)readout[0]; //this is ADC data that's originally stored as uint32_t
      adc_avg[delay_idx] += tmp_adc_readings[charge];
    }
    adc_avg[delay_idx] /= num_charges_per_delay;
    arm_std_f32(tmp_adc_readings,num_charges_per_delay, &adc_std[delay_idx]);
  }

  //a bit of printing happens here, no data manipulation

  free(tmp_adc_readings); //The crash happens when attempting to execute these free() operations.
  free(adc_avg);
  free(adc_std);
  return;
}

I should add that during the block:

//a bit of printing happens here, no data manipulation

there is a malloc/free combination that executes nicely on a char array that is used to print the entire contents of the adc_avg and adc_std arrays.

 

Here's the debugging information I've been able to extract:

Registers at HardFault entry:
R00 = 0x204029AC
R01 = 0x204005AC
R02 = 0x62A200DE
R03 = 0x4261C756
R04 = 0xDDE05E1D
R05 = 0x20403990
R06 = 0x00000000
R07 = 0x2040299C
R08 = 0x20400018
R09 = 0x00000000
R10 = 0x00000000
R11 = 0x00000000
R12 = 0xFFFFFFFF 

Stack content at HardFault entry (reordered):
SCB->HFSR = 0x40000000 (=Forced)
SCB->CFSR = 0x01000000 (=Usage Fault: Unalignment error)
lr  = 0xffffffff
pc  = 0x204005b4
psr = 0x0040fc4c
r0  = 0x425fdb6b
r1  = 0x00000000
r2  = 0x204005ac
r3  = 0x62a200de
r12 = 0x4261c756

At PC address in disassembly:
204005B4   subs	r7, #216
I haven't programmed much in assembly so here are the two ways I'm able to interpret that instruction.
(I've monitored that r7 doesn't change between running code and HardFault)
r7 		= 0x2040299C
r7 - 216 	= 0x204028C4 (interpreting 216 as decimal) 	-> 204028C4   strh	r0, [r0]
r7 - 0x216 	= 0x20402786 (interpreting 216 as hex)		-> 20402786   strh	r1, [r0, #8]

When I stepped through the disassembly earlier the code appeared to crash at an "lds" instruction. I had not looked at the PC stored on the stack at that time.