Performance Test of the C-Code

Go To Last Post
14 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wasn't sure if I should put this in general programming or here, I hope I picked the right forum.

I'm working with an UC3C MCU.

 

So I have a main and some functions and I want to measure the exact executing time of each function and the whole main loop. I'm aware that Interrupts will add into this so I measure a few times and take the largest value.

I simply have a timer interrupt, that increments a variable "global time" each 10 µs.

When a function starts, it gets the current globalTime and at the end it stores the difference between start and end in an variable.

 

My problem is, how do I make sure, that the compiler optimization doesn't move the globalTime read functions around? In this case I would probably miss some code.

Or in other words, how can I guarantee that after optiminsing this

 

startMeasurement();

doStuff1();

doStuff2();

doStuff3();

endMeasurement();

 

doesn't turn into something like this:

 

startMeasurement()

doStuff1()

endMeasurement()

doStuff2()

doStuff3()

?

 

I'm not very familiar with optimizing compilers as you may have guessed.

I'd be thankful if you could shed some light on this.

 

edit: corrected some typos

 

[insert smart signature here]

Last Edited: Tue. Nov 8, 2016 - 02:52 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As my HS electronics teacher said, the "act of mearsuring something, changes it"!      Very profound now that I think about it. 

 

One way of doing the measurement is to use a simulator, and have it count the cpu cycles, then translate that to wall clock time.

Just adding the extra code to do the measurement real time will change it, but can be used to measure a single function or a given set of code.

 

 

Jim

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks, while this is definitely a good hint, it probably won't work for me.

The problem is, that my code involves a third party driver. That driver drives an external device with a protocol that involves multiple GPIOs and the SPI-Port. The driver is a black box for me and I'm not quite sure, that I could model a device that follows the protocol as strictliy as the real device is doing (it's quite complex).

Most of the execution time is taken by this driver, so I can't just switch it off and use an approximation instead.

Do you have other suggestions? For starters it wouldn't be a problem if the measured time is being increased by the measurement functions.

 

 

[insert smart signature here]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I would not think the optimizer would do as you wrote above, I would try it and see what happens. 

if you can create your timer functions in its own module, and then call the black box functions in another module, that may isolate it from the optimizer as well.

I'm assuming here your using gcc or similar, I don't use gcc so some others here may know better. 

Since it's a black box, you can only probe it with some tests and see what happens.  Carefully record your results and test conditions and proceed logically.

 

Good luck

Jim

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ki0bk wrote:
I would not think the optimizer would do as you wrote above,
Me neither. If you had a few mathematical expressions (say) I could see that the compiler might re-order things to make better use of the registers or something. But I have never know it re-order function invocations. Have you EVER seen this happen?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

a_random_Martin wrote:
Do you have other suggestions?
Non-intrusive trace :

  • NanoTrace

That was added in Atmel Studio 6.2 SP1 for UC3.

  • Trace

Exists in Atmel AVR32 Studio and IAR EWAVR32 C-SPY.

Third party driver - will need addresses from the link map.

 

Logic analyzer -

Not zero impact though close.

Insert macros to write to a spare port.

 

Debugger -

Tracepoints instead of breakpoints?

PC sampling

 


Atmel Studio 6.2 Service Pack 1

http://www.atmel.com/webdoc/GUID-ECD8A826-B1DA-44FC-BE0B-5A53418A47BD/index.html?GUID-A6C2F64D-BBB1-4838-99A0-27E14F6A4000

...

  • Support for trace buffers for ARM (MTB) and AVR32 UC3 (NanoTrace)

...

http://www.atmel.com/tools/studioarchive.aspx

http://www.atmel.com/Images/AVR32Studio_Release_Notea_2.6.0.pdf

https://www.iar.com/iar-embedded-workbench/#!?architecture=AVR32&currentTab=features

https://www.iar.com/support/user-guides/iar-embedded-workbench-for-atmel-avr32/ (page 163 of the C-SPY guide)

http://www.embedded.com/design/debug-and-optimization/4236800/Troubleshooting-real-time-software-issues-using-a-logic-analyzer

http://microchip.wikidot.com/mplabx:trace (Debugging, Instrumented Trace and Log)

http://www.atmel.com/webdoc/GUID-ECD8A826-B1DA-44FC-BE0B-5A53418A47BD/index.html?GUID-ECF8CABA-419E-4E99-8D7A-7C0CCDBECF98 (AVR32 PC Sampling)

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There is a counter (SYSTIMER) which is clocked at the CPU frequency.
You reset it with sysreg_write( AVR32_COUNT, 0 ); and read the ticks with sysreg_read( AVR32_COUNT );
The downside is that at 60MHz it can only do about 70 us before it overflows. (<---- actually it is about 70 seconds, see next post)

Last Edited: Wed. Nov 9, 2016 - 05:44 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The AVR32_COUNT register is 32 bits wide.  I use it extensively in my apps for time measurement.  At 66Mhz it rolls over every 2^32/66000000 seconds which = 65.075 seconds.  I have verified this.  mikech, I think you should check your math.smiley

Letting the smoke out since 1978

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was only wrong by 3 orders of magnitude. :embarrassed
I have some C routines that calculate micro/milliseconds from the count and the system clock, but they use 32-bit arithmetic which reduces the range if you want to avoid integer overflow.



... I must stop writing to the internet prior to my third morning coffee.

Last Edited: Wed. Nov 9, 2016 - 05:57 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gchapman wrote:
Non-intrusive trace : NanoTrace That was added in Atmel Studio 6.2 SP1 for UC3. Trace Exists in Atmel AVR32 Studio and IAR EWAVR32 C-SPY. Third party driver - will need addresses from the link map. Logic analyzer - Not zero impact though close. Insert macros to write to a spare port. Debugger - Tracepoints instead of breakpoints? PC sampling

Wow, thats the answer that  I was hoping for! Thanks

 

mikech wrote:
There is a counter (SYSTIMER) which is clocked at the CPU frequency. You reset it with sysreg_write( AVR32_COUNT, 0 ); and read the ticks with sysreg_read( AVR32_COUNT );
:O if I only knew that earlier, is this hidden in some application notes? Because I can't find it in the datasheet. I wonder how many "hidden" features are out there.

[insert smart signature here]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A compiler could re-order function calls if it can determine that the function calls do not have side-effects.

 

To get round that, I think if dostuff1() etc are in a separate compilation unit (and not inline in a header file) then the compiler will call them in order. Otherwise, you might need a memory barrier to prevent reordering.

Bob.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

donotdespisethesnake wrote:
A compiler could re-order function calls if it can determine that the function calls do not have side-effects.

Has anyone reading this ever seen that in reality?

 

Nope? Thought not.

 

Interesting discussions here though...

 

http://stackoverflow.com/questio...

http://stackoverflow.com/questio...

http://preshing.com/20120625/mem...

 

Definitely worth reading this too:

 

https://en.wikipedia.org/wiki/Se...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

donotdespisethesnake wrote:
A compiler could re-order function calls if it can determine that the function calls do not have side-effects.

Has anyone reading this ever seen that in reality?

 

Yes, I have seen it. I was writing some bit banging code to get precise timing, but the compiler reordered calls for me which may have been more efficient but messed up my timing :)

 

Just because you haven't seen it, doesn't mean it is not possible. Really, never assume anything about C compilers.

Bob.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Real calls or static inline?