USART0_TX ISR setting time

Go To Last Post
13 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

I'm trying to play this the USAT in SPI. and the USART_TX ISR.

I use the ATmerg256RFR2, F_CPU is external 16Mhz.

The code below works fine, but i've got a trouble with the ISR setting time, ie: As you can see in the screenshot, there is more than 1us delay between the end of data transmit and the coming into the interrupt (yellow tace is SPI data, blue trace is SPI clk, and red trace is the debug pin B1).

 

 

#include <asf.h>
#include <avr/wdt.h>

ISR(USART0_TX_vect)
{
	PORTB |= (1 << PORTB1); // PORTB1 = '1'
	UDR0 = (0x55);
	PORTB &= ~(1 << PORTB1); // PORTB1 = '0'
}

int main(void)
{
	//disable wdt
	MCUSR &= ~(1<<WDRF);
	wdt_disable();
	
	DDRB |= (1<<PORTB1);
	PORTB &= ~(1 << PORTB1); 
	
	// disable all
	PRR0 = 0xff;
	PRR1 = 0xff;
	PRR1 = 0xff;
	
	// Init USART SPI
	PRR0 &= ~(1<<PRUSART0); // disable PRUSART0 - Power Reduction USART0
	UBRR0 = 1;
	/* Setting the XCKn port pin as output, enables master mode. */
	DDRE |= (1<<PORTE2); 
	/* Set MSPI mode of operation and SPI data mode 0. */
	UCSR0C = (3<<UMSEL00)|(0<<UCPHA0)|(0<<UCPOL0);
	/* Enable receiver and transmitter. Tx = PE1 */
	UCSR0B = (1<<TXCIE0) | (1<<TXEN0);
	UBRR0 = 1;
	
	sei();
	
	UDR0 = (0x55);
	while(1);
	
}

This huge setting time is not acceptable for my project, do i miss something?

Thanks in advance,

Xav

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

xav wrote:
F_CPU is external 16Mhz.

...

there is more than 1us delay between the end of data transmit and the coming into the interrupt

That are 16 clocks, which is totally reasonable for the interrupt latency and the ISR prolog.

 

xav wrote:
This huge setting time is not acceptable for my project
Huge? We must have different definitions of "huge".

Anyway, you can get a better time by not using the interrupt but polling the interrupt flag.

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for your answer, but according the datasheet part 7.8.1, "The interrupt execution response for all the enabled AVR interrupts is five clock"

In my case it is 18-19 clocks, and i don't think that is normal.

By huge, I mean "to big", it's 4 times what i expect.

I would like to use ISR and not polling because i have to do others stuff. And polling the flag is not useful in my case.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

xav wrote:
Thanks for your answer, but according the datasheet part 7.8.1, "The interrupt execution response for all the enabled AVR interrupts is five clock" In my case it is 18-19 clocks, and i don't think that is normal.
It is normal. You are forgetting the ISR prolog. That is stuff to be done after entering the interrupt and before your first code line in the ISR (like saving registers onto the stack).

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Look at the assembler the compiler generates - then reconsider what you think is 'normal'. My guess is the spi send is taking around 32 clocks. Take away 5 clocks for the hardware entry, how many clocks for a reti? Some more for the compiler to save and restore the cpu state and probably 6 clocks to execute your code. They start to add up. Besides, avr-gcc is known to be less than optimal with isrs. Assembler might be appropriate in this instance.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Think i understand.

When the interrupt is coming, first CPU takes 5 clocks to push the program counter onto the stack. After it executes the jump to the interrupt routine (3 clocks more).

These 8 clocks are not visible into the asm code below.

And before execute my code, it saves the cpu state (ISR prolog) => 10 clocks.

So, the ISR setting time is 18 clocks.

After execute my code (7 clocks).

And it restore the cpu state (9 clocks) + reti (5 clocks) for popped back the PC from the stack.

 

The assembler generates (I wrote the instruction clock size at the begin of the line) :

------------- saves the cpu state ------------- 
2 * 000004A4  PUSH R1		Push register on stack 
2 * 000004A5  PUSH R0		Push register on stack 
1 * 000004A6  IN R0,0x3F		In from I/O location 
2 * 000004A7  PUSH R0		Push register on stack 
1 * 000004A8  CLR R1		Clear Register 
2 * 000004A9  PUSH R24		Push register on stack 
------------- My code ------------- 
    PORTB |= (1 << PORTB1); // PORTB1 = '1'
2 * 000004AA  SBI 0x05,1		Set bit in I/O register 
    UDR0 = (0x55);
1 * 000004AB  LDI R24,0x55		Load immediate 
2 * 000004AC  STS 0x00C6,R24		Store direct to data space 
    PORTB &= ~(1 << PORTB1); // PORTB1 = '0'
2 * 000004AE  CBI 0x05,1		Clear bit in I/O register 
}
------------- restore the cpu state ------------- 
2 * 000004AF  POP R24		Pop register from stack 
2 * 000004B0  POP R0		Pop register from stack 
1 * 000004B1  OUT 0x3F,R0		Out to I/O location 
2 * 000004B2  POP R0		Pop register from stack 
2 * 000004B3  POP R1		Pop register from stack 
5 * 000004B4  RETI 		Interrupt return 

 

do i'm right ?

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think the evidence supports the observation.

At a rough count, optimised assembler would be around 25 clocks. That only leaves around 25% of cpu cycles for other code. If you can interleave other code between writes to the usart, then you will get better efficiency. As such, you don't need to poll the usart but make sure you use enough cycles between each write to UDR.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm getting a bit confused here.  At first, I thought the "problem" was a delay in sending the 0x55 'marker'.  But then I realized, if correct, that this is just a test bed to send bytes via the SPI one after the other.

 

Lessee, UBRR of 1 gives a bit rate of clk/4, right?  F_CPU of 16MHz then gives a byte every two microseconds, right?  Pretty fast to keep up, since in the real app you probably have to "do" something -- fetch next byte; update index/pointer; put away received byte; and similar.

 

 

xav wrote:
Thanks for your answer, but according the datasheet part 7.8.1, "The interrupt execution response for all the enabled AVR interrupts is five clock"

 

I usually use 12 cycles as the minimum to service an interrupt.  I'd have to go to your datasheet section to see what "interrupt execution response" means in context.  On AVR8:

 

-- Finish current instruction.  Of course this may vary, but use 2 or 1 clock.

-- Get to the vector.  That is probably your 5 cycles.  It can be 4 on smaller AVRs.

-- Take the vector.  JMP is 3 cycles on your model, right?  Could be 2-cycle RJMP on smaller AVRs.

-- RETI to get out is 4 cycles.

 

So that is 11-13 cycles depending on which AVR model.  Almost all ISRs will need SREG save and restore; add a few more cycles depending on your toolchain and skill at writing ISRs.  Then, you actually have to >>do something<< to make it worthwhile.

 

However, all is not lost.  Tell the purpose of your mission.  If it is to stuff a packet into an SPI as fast as practical, you will want to take advantage of the fact that USART-as-SPI is double-buffered.  You don't have to wait for TXC.  What is your mission?  How large is your packet?  Is it send-only?

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Unless I have missed something I don't think the ISR touches SREG or requires R1=0 so almost all the prologue and epilogue there is superfluous. You have two solutions:

 

1) take the Asm generated here and strip away the unnecessary bits and make it a .S file then build that instead of your C version

 

2) this "missed optimization" has actually been fixed. SprinterSB (Gerorg-Johann Lay) recently posted about a fix he's added to the compiler to stop it doing this when it is not necessary. So if you can get a very recent build of the compiler with the fix applied then you should be able to use that:

 

https://gcc.gnu.org/bugzilla/sho...

 

For now I'd probably go with (1)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Unless I have missed something I don't think the ISR touches SREG ...

Perhaps, in the test case shown.  And GCC just got better at "smart" ISRs, right?  But in a real app where the ISR does something useful (in this case almost certainly update of index/pointer and a check for end-of-packet) there will in all probability be SREG save/restore, along with a couple working registers:

 

theusch wrote:
depending on your toolchain and skill at writing ISRs

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So I built the code using -save-temps and for the ISR the code is:

        .file   "avr.c"
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__RAMPZ__ = 0x3b
__tmp_reg__ = 0
__zero_reg__ = 1
        .text
.global __vector_27
        .type   __vector_27, @function
__vector_27:
        push r1
        push r0
        in r0,__SREG__
        push r0
        clr __zero_reg__
        push r24
/* prologue: Signal */
/* frame size = 0 */
/* stack size = 4 */
.L__stack_usage = 4
        sbi 0x5,1
        ldi r24,lo8(85)
        sts 198,r24
        cbi 0x5,1
/* epilogue start */
        pop r24
        pop r0
        out __SREG__,r0
        pop r0
        pop r1
        reti
        .size   __vector_27, .-__vector_27

As I say, I think you could tidy this up to be:

#define __SFR_OFFSET 0
#include <avr/io.h>
        .file   "uartISR.s"
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__RAMPZ__ = 0x3b
__tmp_reg__ = 0
__zero_reg__ = 1
        .text
.global USART0_TX_vect
        .type   USART0_TX_vect, @function
USART0_TX_vect:
        push r24
/* prologue: Signal */
/* frame size = 0 */
/* stack size = 4 */
.L__stack_usage = 4
        sbi PORTB,1
        ldi r24,lo8(85)
        sts UDR0,r24
        cbi PORTB,1
/* epilogue start */
        pop r24
        reti
        .size   USART0_TX_vect, .-USART0_TX_vect

That seems to build to the same code (without the prologue/epilogue - apart from R24):

C:\SysGCC\avr\bin>avr-gcc -c -mmcu=atmega256rfr2 -Os -save-temps uartISR.S -o uartISR.o

C:\SysGCC\avr\bin>avr-objdump.exe -S uartISR.o

uartISR.o:     file format elf32-avr

Disassembly of section .text:

00000000 <__vector_27>:
   0:   8f 93           push    r24
   2:   29 9a           sbi     0x05, 1 ; 5
   4:   85 e5           ldi     r24, 0x55       ; 85
   6:   80 93 c6 00     sts     0x00C6, r24
   a:   29 98           cbi     0x05, 1 ; 5
   c:   8f 91           pop     r24
   e:   18 95           reti

This saves:

        push r1             ; 2 cycles
        push r0             ; 2 cycles
        in r0,__SREG__      ; 1 cycle
        push r0             ; 2 cycles
        clr __zero_reg__    ; 1 cycle
...
        pop r0              ; 2 cycles
        out __SREG__,r0     ; 1 cycle
        pop r0              ; 2 cycles
        pop r1              ; 2 cycles

So 15 cycles saved.

 

EDIT: But, yes, as Lee points out., this only works because the example was trivial. the two writes to PORTB and one to UDRO do not change SREG. If the code now gets more complex in the ISR you would have to re-evaluate.

Last Edited: Fri. Sep 8, 2017 - 01:59 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As it is the UART being used for SPI, why not take advantage of its tx buffer?

Use the data register empty interrupt instead of tx complete. That should get the performance up, without micro-managing every last cycle in the interrupt handler.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As it is the UART being used for SPI, why not take advantage of its tx buffer?
Use the data register empty interrupt instead of tx complete.

yes    Using DRE instead of TXC was my thought as well.   

Greg Muth

Portland, OR, US

Xplained Boards mostly

Atmel Studio 7.0 on Windows 10