SAM4S XPlained Program Counter debug calc issue

Go To Last Post
5 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi!

 

I am trying to calculate the execution time of a function I wrote that uses the SAM4S PDC.

The method I am using is:

 

- add stopwatches at the beginning and ending of a function

- once reached the first breakpoint, reset the cycle counter

- let the code execute untill the second breakpoint (placed at the end of the function)

 

The issue I am facing is related to the result.

 

Let me better explain with the relative code:

 

/** This function executes in:
 *	2845120 cycles
 *	@ 120,000 MHz
 *	23.709,33 µs (23,709 ms)
 */
void ssd1306_twi_clrscr(void)
{
	uint8_t packet[(SSD1306_HEIGHT*SSD1306_WIDTH/8)];
	uint16_t column_scan;
	packet[0] = SSD1306_DATA;

	for (column_scan = 1; column_scan < (SSD1306_HEIGHT*SSD1306_WIDTH/8); column_scan++)
	{
	packet[column_scan] = 0x00;
	}
	packet[column_scan+1] = 0x00;

	twi_package_t tx = {
		.chip = SSD1306_TWI_ADDRESS,
		.buffer = (uint8_t *) packet,
		.length = sizeof(packet)+1
	};
	twi_master_pdc_write(SSD1306_TWI, &tx);
}

If I place a break point at the beginning and to the end of ssd1306_twi_c, I get a result of 2845120 cycles for a total of ~23 ms.

And it seems a lot to me, since I programmed the PDC with the scope to make everything faster.

 

So, doing a "breakpoint-break-down" I discovered that up to the twi_master_pdc_write() the processor spends 4113 cycles for a total of 34,28 µs @ 120MHz.

You may say, that's the problem, is the PDC transfer function taking a lot of time.

But If I step into the function and I use another break point to reach a calculation to the end, I get a total of 306 cycles.

 

 

Debug session in action, worth 1000 words :) :

 

sam4s function debug

Link: https://youtu.be/skFj3hcd5ig

 

 

My questions are:

1) Since the PDC takes care of the TWI transfer, why the debugger counts all those cycles?

2) Am I using the debugger in a wrong way, and/or my knowledge is limited to understand this phenomenon?

 

Thanks!
D.

 

The only way of discovering the limits of the possible is to venture a little way past them into the impossible - Arthur C. Clarke

Last Edited: Mon. Apr 30, 2018 - 08:09 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

One way to double check this could be to add a watch on 

(*((volatile uint32_t*)0xE0001004))

which is the DWT.CYCNT register (the source of the numbers in the Cycle Counter view).

:: Morten

 

(yes, I work for Atmel, yes, I do this in my spare time, now stop sending PMs)

 

The postings on this site are my own and do not represent Microchip’s positions, strategies, or opinions.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I just note the packet array is too small, i.e., this

packet[column_scan+1] = 0x00;

is off the end of the array.

/Lars

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you meolsen, very interesting method!

 

Ok the test just confirmed the same behavior:

 

- Stepping out of the functions the Cycle Counter gives huge results

- Stepping in gives very few cycles in comparison

 

I did an hardware debug, adding a ioport_set_pin_level() instead of using a break point.

The correct count is the longer one, stepping out of the function.

Important note: the yellow trace is the gpio init function from the processor.

 

 

Stepping in the transfer function still gives a very few cycle count, and surprisingly the count include the TWI transfer:

 

 

I am quite confused.

The transfer takes 24ms, but should be managed by PDC, instead I see the whole transfer affects the CPU overhead.

Uhm.

 

This is my function:

 

void twi_master_pdc_write(Twi *p_twi, twi_packet_t *p_packet)
{
	uint8_t *buffer = p_packet->buffer;

	pdc_twi_packet.ul_addr = (uint32_t)buffer;
	pdc_twi_packet.ul_size = (p_packet->length);

	pdc_tx_init(g_p_twim_pdc, &pdc_twi_packet, NULL);
	/* Set write mode, slave address and 3 internal address byte lengths */
	p_twi->TWI_MMR = 0;
	p_twi->TWI_MMR = TWI_MMR_DADR(p_packet->chip) |
	((p_packet->addr_length << TWI_MMR_IADRSZ_Pos) &
	TWI_MMR_IADRSZ_Msk);
	/* Set internal address for remote chip */
	p_twi->TWI_IADR = 0;
	p_twi->TWI_IADR = twi_mk_addr(p_packet->addr, p_packet->addr_length);

	Write_Interrupt_Flag = 1;

	twi_enable_interrupt(SSD1306_TWI, TWI_SR_TXCOMP);
	/* Enable the TX PDC transfer requests */
	pdc_enable_transfer(g_p_twim_pdc, PERIPH_PTCR_TXTEN);
	/* Waiting transfer done*/
	while((twi_get_interrupt_status(SSD1306_TWI) & TWI_SR_ENDTX) == 0);
	/* Disable the RX and TX PDC transfer requests */
	pdc_disable_transfer(g_p_twim_pdc, PERIPH_PTCR_TXTDIS | PERIPH_PTCR_RXTDIS);
	while((twi_get_interrupt_status(SSD1306_TWI) & TWI_SR_TXRDY) == 0);
	p_twi->TWI_CR = TWI_CR_STOP;
	while (!(p_twi->TWI_SR & TWI_SR_TXCOMP)) {
	}
}

This is the step in calculations:

 

ssd1306_twi_clrscr():

4113
120,000 MHz
34,28 µs

 

twi_master_pdc_write():

306
120,000 MHz
2,55 µs

 

Looking closer, I see a connection between the twi_master_pdc_write() step in calc and the logic analyzer measure of a clock period timing:

 

 

So, at this point up to you your considerations.

For the record: I hope this "report" gave enough learning material.

 

Still dubious why PDC transfer is affecting CPU.

 

The only way of discovering the limits of the possible is to venture a little way past them into the impossible - Arthur C. Clarke

Last Edited: Tue. May 1, 2018 - 12:25 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

yes it's a dirty code, it is an intentional and lazy work around to a buggy array calc :)

Also this, is not nice:

.length = sizeof(packet)+1

The only way of discovering the limits of the possible is to venture a little way past them into the impossible - Arthur C. Clarke

Last Edited: Tue. May 1, 2018 - 12:24 PM