SAMA5D2 - GPIO toggling - why so slow?

Go To Last Post
5 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

 

could someone, please, help me understand what's going on?

 

Consider the following main loop:

 

int main(void)
{
        unsigned int out = 0;

        while(1)
        {
                out = ~out;
                piob->PIO_MSKR = (1 << 14);
                piob->PIO_ODSR = out;
        }

        return 0;
}

 

This just toggles PB14. I don't present initialization here. The main clock is generated by the 12 MHz internal RC oscillator, and then multiplied in PLL (by factor 83). The core runs at 498 MHz, MCK = 166 MHz, so I assume that peripheral clock of PIO is 83 MHz. This peripheral clock is enabled in PMC. Frequency of MCK can be verified by configuring, say, PCK1 to use MCK and divide it by 250. The resulting square waveform (PCK1) has frequency near 666 kHz, that is the expected value.

 

The program is built with GCC 4.9, with optimization -Os. The while loop disassembles to:

  2104ec:       43e4            mvns    r4, r4
  2104ee:       f44f 4280       mov.w   r2, #16384      ; 0x4000
  2104f2:       601a            str     r2, [r3, #0]
  2104f4:       619c            str     r4, [r3, #24]
  2104f6:       e7f8            b.n     2104ea <main+0xae>

 

I would expect that this loop takes more or less 10 processor cycles, some in the core, some accessing the system bus. Assuming that all cycles are of MCK (lower bound), the waveform generated by toggling PB14 has frequency equal to MCK / 10 = 16.6 MHz.

 

My oscilloscope shows only ca. 1.85 MHz. Why?

 

Best regards,
Adam

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The while loop disassembles to:

  2104ec:       43e4            mvns    r4, r4
  2104ee:       f44f 4280       mov.w   r2, #16384      ; 0x4000
  2104f2:       601a            str     r2, [r3, #0]
  2104f4:       619c            str     r4, [r3, #24]
  2104f6:       e7f8            b.n     2104ea <main+0xae>

That's not the whole loop - it doesn't even include the target of the branch (ok, it's MOST of the loop.)

I don't know anything about the SAMA.  But the shorter loop (just the two stores and the branch) takes 10 cycles on a SAMD21, which I interpret as 4 cycles for each store and 2 for the branch.  The SAMD21 has much slower clocks; I think that assuming that the sequence would take 10x 166MHz clock cycles is very optimistic...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi westfw,

 

you're right, the whole loop is:

 

  2104ea:       4b08            ldr     r3, [pc, #32]   ; (21050c <main+0xd0>)
  2104ec:       43e4            mvns    r4, r4
  2104ee:       f44f 4280       mov.w   r2, #16384      ; 0x4000
  2104f2:       601a            str     r2, [r3, #0]
  2104f4:       619c            str     r4, [r3, #24]
  2104f6:       e7f8            b.n     2104ea <main+0xae>

  21050c:       fc038040        stc2    0, cr8, [r3], {64}      ; 0x40

 

and fc038040 is the address of PIO_MSKR.

 

I found a timing diagram in the datasheet suggesting that update of pin takes 5 MCK cycles. This is Fig. 33-3 in datasheet DS60001476C:

 

timing diagram

I'm not sure how long could take read and write from/to PIO_MSKR. For pin update with rate 2x1.85 MHz (twice the frequency of the observed rectangular waveform), that would mean 166/(2x1.85) = 44 or 45 cycles of MCK for one update - kind of long!

 

The write to PIO_MSKR doesn't have to be in the loop, but that's not the essence of the problem here. However, if I take it out of the loop, it becomes:

 

  210360:       43d2            mvns    r2, r2
  210362:       619a            str     r2, [r3, #24]
  210364:       e7fc            b.n     210360 <main+0xa8>

 

and the observed rectangular waveform now has 5.9 MHz.
5.9/1.85 = 3.2, this seems to be in agreement with removal of two memory accesses and mov.w.
5.9 MHz (frequency of waveform) corresponds to 166/(2x5.9) = 14 cycles of MCK. Still, why so many?

 

Best regards,
Adam

Last Edited: Sun. Feb 3, 2019 - 10:40 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Using Atmel Studio's debugger you can step over each line and see the number of actual cycles versus expected (Debug->Windows->Processor Status:Cycle Counter).  It could be wait states for the memory fetch.  Or interrupts.  Hard to say exactly.

 

Try using the register keyword on the "out" variable.  And play with moving the Secure PIO Mask Register setting outside the loop (not too familiar with the SAMA5D2, not sure if it will still work).  Something like this:

 

int main(void)
{
        register unsigned int out = 0;

 

        piob->PIO_MSKR = (1 << 14);
        while(1)
        {
                out = ~out;
                piob->PIO_ODSR = out;
        }

 

        return 0;
}

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Scott,

 

moving write to PIO_MSKR out of the loop of course makes the loop run faster, but just because it has less work, not because that solves the problem. As far as I can tell, there are no interrupts active. The delay is very regular, when it was due to interrupts, I would expect to clearly see the moments of handling.

 

Tech support has advised me to enable MMU and D-cache (I-cache was already enabled). I learned how to do that and this did make the thing run a bit faster. But still well below the performance that I would expect.

 

I was also told that the delay is significant because the access comprises first access to AHB, and then to APB, and that takes time. But according to datasheet, that should be happening very quickly. I think that the problem is related to how the system in the processor "administrates" access to the system bus. Maybe it wants to be "good for everybody", and this way doesn't pay enough attention to the work I have for it, so to say. I wonder if enabling MMU and both caches helped because it really was a step towards the solution, or just everything started to run a bit faster - similarly to moving the write to PIO_MSKR out of the loop.

 

Best regards,
Adam