Optimizing Code for UC3A3

Go To Last Post
6 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've written a custom f_read function for FatFS that allows me to read directly from the disk (SD Card) to the USB controller, which has almost doubled my read performance (to 5.3 MB/s). This is good, but I'm still only at about 1/3 of the maximum raw read performance (13+ MB/s), so I know there's something going on that's "wasting" cycles.

FatFS has a lot of arithmetic to calculate where a particular sector is located, lots of X / 512 or X % 512 (or other X / 2^Y), both of which can be easily broken into simple arithmetic & bitwise operations.

X / 512 == X >> 9
X % 512 == X & (512 - 1)

FatFS uses divide/modulus because those numbers may not always be 512, but in my case they will be. Can I assume the compiler, using -Os, is smart enough to convert these to bitwise operations? Or, does the UC3A3 have an arithmetic unit that is smart/powerful enough to handle these without problems?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have a look at the AVR32UC Technical Reference Manual: Divisions and modulo are dead slow.

The compiler should optimise those operations to bit shifts and bit ands, but if you want to see whether that actually happen, just take a look at the generated code with a disassembler.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, I had assumed there wasn't any magic "make division fast" module on the chip since I didn't see one in the datasheet, but I'll check out that technical reference manual as well.

What I'm getting from your reply though is if I know I can get away with a bitwise operation over division/modulus, I should just code it as bitwise, and put a comment in saying what's going on? While it may not be strictly necessary, it's not much harder to type and seems like a valid thing to do when trying to optimize a section of code.

x = y >> 9;     // x = y / 512
x = y & 511;    // x = y % 512
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Whether or not you can get away with just optimising by hand depends on who you’re working with and for. I can’t answer that for you.

For any optimisation level above -O0 the compiler will definitely compute constant expressions. Writing (512 - 1) results in the same code as writing (511), so pick what looks better to you.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

catweax wrote:
For any optimisation level above -O0 the compiler will definitely compute constant expressions. Writing (512 - 1) results in the same code as writing (511), so pick what looks better to you.

That raises another question, where does -Os fall in relation to compiler optimization? I'm not sure -Os is strictly necessary for me, but it's the default value in the example I'm using, and everything I've read here says "Use -Os for embedded. Always."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0