editing context saving within C program?

Go To Last Post
22 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm trying to write a program that responds as fast as possible to an interrupt. I think I can prune out a bit of the context saving within the program, as i don't think i'm using the registers which are saved in any of the program... the context saving is thus (from the disassembler)

 

PUSH R1

PUSH R0

IN R0,0x3F

PUSH R0

PUSH R24

PUSH R25

 

so, as I can gather from web searches, R0 and R1 are used for results of arithmetic operations, 0x3F is the SREG, and R24 / R25 are used for 16 bit operations - is that all correct? I think I might want still to save the SREG but otherwise I might get away with not saving any of them, as my program is quite simple. So, I have a couple of questions regarding this 

 

- can i watch the registers R0, R1, R24, R25 inside a atmel studio C program to double check they don't change in the interrupt? Do I have to define a specific register name to watch or do they appear under different names in the IO view?

- can I edit the context saving from the C program, or can I only do this inside an assembly program? AND if I can only do this in an assembly program, will I be able to take a big short cut by writing a program in c, running a disassembler, then saving the disassmebly file as my program, of course editing out the commenting etc. that appears as a bookmark from the original c program? 

- is it a bad idea to delete context saving? (i mean, obviously the answer is "yes" if you need the information in the registers, but as long as you're not using them...)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You haven't given us much to go on.  What AVR model?  What toolchain?  What clock speed?  Define "fast as possible".  Tell what needs to be done in this ISR "as fast as possible".  Which will lead to possible alternate solutions.

 

Are there any other interrupt sources enabled, so could there be more latency?

 

I suspect that you have chosen to use the infinite-value toolchain.  It does many excellent things but skinny AVR8 ISRs ain't one of them.

 

Sure, you can go naked. (Search the forum for that as well as your toolchain documentation.)  It can get scary and cold when you are naked...in other words, if you don't do it right the 1/10000 glitch is hard to find.

 

But I'd like to see more description of the problem, and some numbers, and the ISR, and the cycle count (or LSS), and the desired cycle count.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

not sure which toolchain im using! I installed from the winavr installer, avr-gcc? Not sure. Otherwise, i'm using a attiny48, clocked at with the internal oscillator at 8MHz (so I could potentially go up to 20MHz, but if i can get away without using an external crystal, that would be great, but not a deal breaker if not). I want to get the ISR done in around 1uS, or as near to that as possible... at the moment it runs ~10uS. I *am* running another interrupt on INT1_vect that is exactly the same thing, but I thought if I could get them both down to 1uS then there would be minimal interference between the two

 

Having looked in more detail at the ISR, it's obvious that the bulk of it comes not from the context saving but the contents, and that most of the registers that are saved are in fact modified within the routine! I suppose I'm going to have to think of a way of streamlining it, prepping as much of the data in the program loop so as to avoid any operations that use the registers.

 

this is the interrupt as it stands. I am using the interrupt as a clock, looking at each consecutive bit in a byte (state1) and ouputting it to pind0....

 

ISR(INT0_vect)
	{
		if(bit_is_set(state1, count1))  // if bit [count1] is set,
		{
			PORTD |= (1 << 0);  // pind0 high
		}
		else
		{
			PORTD &= 0xFE;  // pind0 low
		}
	
		count1++;             //increment count and check if its over 7
		if(count1 == 8)
		{
			count1 = 0;   // if it is set back to 0
			PORTD ^=0b00001000;
		}
		
	}

 

I suppose maybe I could prep that next bit value for pind0 in the program loop, then just output it in the interrupt to avoid any comparisons, which would cut down the length of the interrupt and might avoid using the arithmetic registers

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pgo480 wrote:
I want to get the ISR done in around 1uS, or as near to that as possible... at the moment it runs ~10uS.

???  To keep my interest, I'd need to know why the interrupt needs to >>complete<< in 1us.

 

If the above takes 10us, then you aren't telling the whole story.  [How are you determining that?  'Scope?]  That would be 80 clock cycles at 8MHz.  Doesn't sound right. CKDIV8 set?  -O0?

 

Post the .LSS for your ISR.

 

Certainly there can be a better mechanism for marching through "state" than the one you are using.

 

Where does "state" come from?  Is it volatile?  How often does it change in operation?

 

Again, describe what you need to do.  When you get a trigger signal, then you output one bit?  And you want to process a million of these every second.

 

Let's start here:  The minimum for an AVR8 to process any ISR is about 12 clock cycles.  That would be a null ISR.  You are running at 8MHz, supposedly, so it takes about 1.5us per hit without doing anything.

 

The data can be per-processed so that the pin can just be toggled.  Now to find that thread -- was that skeeve?

 

 

 

 

 

 

 

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

1us is nearly impossible at 20MHz. That means 20 clock cycles. First, you have 6-8 cycles burned, just getting into the ISR (depending on the instruction being executed when the interrupt hits) and several cycles getting out, not counting context management. So, there you have 0.5us that you can do nothing about.  Reducing the ISR to 10 clock cycles is nearly impossible for all but the most bare ISR. Any more than 1 or 2 C instructions, and you will be  over your limit. I would really rethink that 1us goal.

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If the Disassembly shows no change to SREG then make the ISR "NAKED" and add a terminating call to reti() but you are doing things like incrementing count which is bound to affect SREG.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It looks like you are outputting the data from a register onto a pin at a serial rate that is determined by an external clock connected to Port D2 (hence using INT0).   Does this Tiny48 have SPI, or a USART, or even USI?  It would be easier to get a stream of data bits out of the Tiny48 at near mega-bit rates with the USI because you could just load it with the byte that is being output and then use the timer to generate the rate that the bits will be popped out of the SDA pin.  Then you get one interrupt each time one entire byte has been sent.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

to clear up any misunderstanding - the output is to drive a switch. The microprocessor needs to receive the interrupt signal, then decide *if* the switch need to be set in response to that (this comes from the state variable which can be modified by the user) then flip the switch, all as quick as possible. The stream of bits out doesn't need to be in the order of MHz, it'll be around audio frequencies, but the time between signal and the output flipping needs to be as short as possible - that is long pauses between interrupt signals but when the interrupt comes, the response needs to be fast. 
 

The reason why I chose 1uS - is because 100kHz is around the highest i expect the circuit to repeatedly switch - 10uS between interrupts - and ideally I would like the switch to kick in within a 10th of the time between the next interrupt to avoid distortion of the destination switching. it's not *absolutely* critical, but the quicker the better

 

I changed the program so that inside the program loop there is 

 

if(count1 == 8)

		{
			count1 = 0;
		}
		
		if(bit_is_set(state1, count1))
		{
			nextstate1 = 0b00000001;
		}
		else
		{
			nextstate1 = 0b00000000;
		}
		
		intstate1 = PORTD ^ nextstate1;

so the variable intstate is used to operate the output bit. Now the interrupt reduces to 

ISR(INT0_vect)
	{
		PORTD ^= intstate1; 
		count1++;
	}

which is about 22 cycles in length (before the output is set) and gives around 2.7uS response time, which seems about right if the program takes 6-8 clock cycles to get to the interrupt in the first place. 

 

I could now scrape out the R0 and R1 context saving to get a slightly smaller response time, but it appears the biggest change i can get now is to use a 20MHz crystal, which should get me most of the way to 1uS

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pgo480 wrote:
all as quick as possible. The stream of bits out doesn't need to be in the order of MHz, it'll be around audio frequencies, but the time between signal and the output flipping needs to be as short as possible - that is long pauses between interrupt signals but when the interrupt comes, the response needs to be fast.

 

You have chosen, apparently, not to answer any of the queries that I posed.

 

Apparently, yo are responding to this audio signal -- like a light organ?

 

If you are all "prepared" then indeed you can get to the "flip" in about a microsecond.

 

Beyond that, there was a reason for each of my queries.  Respond to each of them for more detail.  In general, the 'Freaks here like puzzles.  But not knowing what this "switch" is gives rise to speculation that the 1us may be arbitrary and immaterial.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

pgo480 wrote:

???  To keep my interest, I'd need to know why the interrupt needs to >>complete<< in 1us.

 

If the above takes 10us, then you aren't telling the whole story.  [How are you determining that?  'Scope?]  That would be 80 clock cycles at 8MHz.  Doesn't sound right. CKDIV8 set?  -O0?

 

Post the .LSS for your ISR.

 

Certainly there can be a better mechanism for marching through "state" than the one you are using.

 

Where does "state" come from?  Is it volatile?  How often does it change in operation?

 

Again, describe what you need to do.  When you get a trigger signal, then you output one bit?  And you want to process a million of these every second.

 

Let's start here:  The minimum for an AVR8 to process any ISR is about 12 clock cycles.  That would be a null ISR.  You are running at 8MHz, supposedly, so it takes about 1.5us per hit without doing anything.

 

The data can be per-processed so that the pin can just be toggled.  Now to find that thread -- was that skeeve?

 

Erm yes, I did mean to respond to those but - yes, 1uS is sort of arbitrary, in that it seems like an achievable length, but the shorter you can make the response time the better. I would have said 1nS but it didn't seem realistic :) 

- yes, i was looking on the scope and it took about 8uS, but I don't know what a .LSS file is!

- clockdiv8 not set

- state comes from an array of 23 different states which can be selected by an encoder, changing it is up to the user but the frequency of changes is insignificant compared to the frequency of the interrupts

 

iit's a little long-winded to explain what this does, as it's a small cog in a larger machine which is mostly of my own design. It would take me a while and probably involve me drawing pictures (to compensate for my bad explanation, rather than your lack of understanding ofcs). Sorry if that's frustrating - I really appreciate the help!  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
ISR(INT0_vect)
	{
		PORTD ^= intstate1;
		count1++;
	}

Obviously we cannot know very much about "count1" from what you have posted but are you happy that it just increments unchecked and presumably wraps when it reaches the width limit of the type?

 

Also the previous code was making decisions for the output based on a global "state1". Now it appears any such preparation is done by intstate1 being set elsewhere but how is that now synchronised to the interrupts?

 

As for .LSS - if you use Studio that is usually automatically created as part of the build process. If you are building in some other enviroment it's basically:

avr-objdump -S outputname.elf > outputname.lss

that is the the captured output of the "objdump -S" command. To be of much use the ELF needs to contain links in the code back to the C source ( the -S means "show disassembly with C Source"). To get such links embedded the code should be compiled with the -g switch (often -gdwarf-2 in particular).

Last Edited: Wed. Jan 11, 2017 - 02:52 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

yes, in the first code section there is a if statement which resets count1 when it gets to 8. 

 

intstate1 is an intermediary state which is necessary because i have to xor the operation once and use the result to xor the output PORTD in order that i can do it in the minimum time

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If it's a tiny48 (relatively modern AVR) then why ^= (which is a read-modify-write operation)? Why not use PIND writes to get atomic (quicker) toggles?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

hmm that's interesting...   im doing it that way as two xor operations seemed the most logical (ha) when trying to write only to only one pin in PORTD (the others have other stuff on them). Is there an easier way?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What else is this program loop doing?

274,207,281-1 The largest known Mersenne Prime

Measure twice, cry, go back to the hardware store

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pgo480 wrote:
I want to get the ISR done in around 1uS, or as near to that as possible... at the moment it runs ~10uS.

So let's back up and summarize a bit (pun intended).

 

[and you still haven't told optimization level as far as I can see...]

 

1)  IME Tiny48 has wimpy pin drive.  And not a lot of resources for the bucks spent.  YMMV

2)  We've learned that the primary criterion is fast response...

3)  ...and "finishing the ISR in 1us" is arbitrary.  You later talked of 100kHz, and then mentioned audio frequencies.  So the ISR only needs to be finished before the next event.

4)  The two channels will certainly complicate things.  Even once the ISR is skinny, there will still be latency if both events fire.  So more numbers:  What is indeed the maximum event rate?  If a single event handler is delayed a few microseconds, is that drop-dead bad?  If one is missed totally, is that drop-dead bad?

5)  To get help, you indeed have to tell more about the variables and how they are set and volatile or not.

 

[edit]  In this thread http://www.avrfreaks.net/comment... I found the link to skeeve's unrolled shift register driving, with the data "prepared":

http://www.avrfreaks.net/comment...

 

I think it could be adapted in this case...

 

 

5a) Also tell how often this setpoint is likely to change.  Is a different pattern used each pass?  Or is one pattern used over and over again in the microcontro9ller scheme of things?  (E.g. if the pattern changes every one second or every ten seconds then one optimizes for those hundreds of thousands repeat passes)

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Wed. Jan 11, 2017 - 03:15 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

how id this :

bit_is_set(state1, count1)

Implemented.

 

don't do :

if(count1 == 8)

just AND (&) count1 with 0x07

 

But I guess it would be faster to use a mask on count so instead of count1++ do a <<1 on count1

 

and if you declare count1 as a register I guess that 4 clk is saved.

 

add:

and which type is state1? and how often does it change?

 

 

Last Edited: Wed. Jan 11, 2017 - 03:29 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
how is this :

bit_is_set(state1, count1)

Implemented.

Well bit_is_set() is in sfr_defs.h:

sfr_defs.h:#define bit_is_set(sfr, bit) (_SFR_BYTE(sfr) & _BV(bit))

Obviously _BV() is:

sfr_defs.h:#define _BV(bit) (1 << (bit))

and _SFR_BYTE() is:

sfr_defs.h:#define _SFR_BYTE(sfr) _MMIO_BYTE(_SFR_ADDR(sfr))

which uses:

sfr_defs.h:#define _MMIO_BYTE(mem_addr) (*(volatile uint8_t *)(mem_addr))

and:

sfr_defs.h:#define _SFR_ADDR(sfr) _SFR_MEM_ADDR(sfr)

which in turn uses:

sfr_defs.h:#define _SFR_MEM_ADDR(sfr) ((uint16_t) &(sfr))

so as an example:

uint8_t state1, count1;

int main(void) {
    if (bit_is_set(state1, count1)) {

becomes:

uint8_t state1, count1;

int main(void) {
    if (((*(volatile uint8_t *)(((uint16_t) &(state1)))) & (1 << (count1)))) {

To say this is "horrendous" (in this context) is probably an understatement.  The intended use if the bit_is_set() macro in sfr_defs.h is something more like:

while(bit_is_set(ADCSRA, ADSC));

which might expand out to something like:

while(((*(volatile uint8_t *)(((uint16_t) &((*(volatile uint8_t *)(0x7A)))))) & (1 << (6))));

where all the inputs are compile time constants. In fact sfr_defs.h has:

sfr_defs.h:#define loop_until_bit_is_clear(sfr, bit) do { } while (bit_is_set(sfr, bit))

intended to be use as something like:

loop_until_bit_is_clear(ADCSRA, ADSC);

and bit_is_set() is just a component in achieving that.

 

Of course OP here might do himself some favours if at least bit_is_set(GPIOR0, n) could be used - at least one of the inputs is a compile time constant and in a fast access memory space. This then justifies the use of the _SFR_ADDR() stuff at least.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you want to stick with C, try something like this:

ISR(INT0_vect)
{
	if(bit_is_set(state1, count1))  // ???
	{
		PORTD |= (1 << 0);  // pind0 high
	}
	else
	{
		PORTD &= 0xFE;  // pind0 low
	}

	uint8_t tmp = count1;
	tmp = (tmp + 1) & 0x07;

	count1 = tmp;

	if(tmp == 0)
		PIND = 0b00001000;

	//count1 = tmp;
}

 

In case of doing ISR naked, you have to go into inline assembly since r24/r25 (allocated for tmp variables) are commonly used registers by the compiler.

Last Edited: Thu. Jan 12, 2017 - 11:51 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Whilst we still don't know why the 1us constraint has been imposed or what happens in hardware if it is missed, I shall offer this optimal solution.

 

You rightly have decided to pre-process count1 and state but I now store the preprocessed result in Bit-0 of GPIOR0 which on most AVRs (including tiny48) is bit addressable.

I can now reduce the ISR to this ultra simple code and instead of incrementing count I indicate that the ISR has fired by setting Bit-1 of GPIOR0:

ISR(INT0_vect, __attribute__ ((naked)) )
{
	if (GPIOR0 & 1<<0)
		PORTD |= 1<<0;
	if (!(GPIOR0 & 1<<0))
		PORTD &= ~(1<<0);

	GPIOR0 |= 1<<1;
	reti();
}

This results in a 8 cycle ISR as disassembled below:

The ISR can be naked now because the sbic, sbis,  cbi and sbi instructions do not affect any registers or SREG.

0000061a <__vector_1>:
 61a:	f0 99       	sbic	0x1e, 0	; 30
 61c:	58 9a       	sbi	0x0b, 0	; 11
 61e:	f0 9b       	sbis	0x1e, 0	; 30
 620:	58 98       	cbi	0x0b, 0	; 11
 622:	f1 9a       	sbi	0x1e, 1	; 30
 624:	18 95       	reti

Here's the refactored preprocessing code:  I've used a walking bit like sparrow2 suggested.  I runs only after the ISR has fired by testing Bit-1 of GRIOR0 which it clears before the next IRQ.

uint8_t bitpos1, state1;
void preprocess (void)
{
	if (GPIOR0 & 1 << 1) {
		GPIOR0 &= ~(1 << 1);
		bitpos1 <<= 1;
		if (bitpos1 == 0)
			bitpos1 = 1;

		if (state1 & bitpos1) {
			GPIOR0 |= 1<<0;
		} else {
			GPIOR0 &= ~(1<<0);
		}
	}
}

Unless I've missed something in your spec. it should work and be really fast.

 

Last Edited: Thu. Jan 12, 2017 - 11:23 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just an aside but:

ISR(USI_START_vect, __attribute__ ((naked)) )
{

might be better written as:

ISR(USI_START_vect, ISR_NAKED )
{

At first I thought your suggested code may not be safe if it performed an AND. But luckily (because it is GPIOR0) it does not:

00000034 <__vector_15>:
#include <avr/io.h>
#include <avr/interrupt.h>

ISR(USI_START_vect, ISR_NAKED )
{
    if (GPIOR0 & 1<<0)
  34:	98 99       	sbic	0x13, 0	; 19
        PORTD |= 1<<0;
  36:	90 9a       	sbi	0x12, 0	; 18
    if (!(GPIOR0 & 1<<0))
  38:	98 9b       	sbis	0x13, 0	; 19
        PORTD &= ~(1<<0);
  3a:	90 98       	cbi	0x12, 0	; 18

    GPIOR0 |= 1<<1;
  3c:	99 9a       	sbi	0x13, 1	; 19
    reti();
  3e:	18 95       	reti

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

thanks very much for the suggestions, I'm doing my best to keep up. I had a few boring tasks to attend to so haven't had more time to devote yet. theusch, Ill try and give you the answers you asked for...

 

theusch wrote:

pgo480 wrote:

[and you still haven't told optimization level as far as I can see...]

 

1)  IME Tiny48 has wimpy pin drive.  And not a lot of resources for the bucks spent.  YMMV

2)  We've learned that the primary criterion is fast response...

3)  ...and "finishing the ISR in 1us" is arbitrary.  You later talked of 100kHz, and then mentioned audio frequencies.  So the ISR only needs to be finished before the next event.

4)  The two channels will certainly complicate things.  Even once the ISR is skinny, there will still be latency if both events fire.  So more numbers:  What is indeed the maximum event rate?  If a single event handler is delayed a few microseconds, is that drop-dead bad?  If one is missed totally, is that drop-dead bad?

5)  To get help, you indeed have to tell more about the variables and how they are set and volatile or not.

 

[edit]  In this thread http://www.avrfreaks.net/comment... I found the link to skeeve's unrolled shift register driving, with the data "prepared":

http://www.avrfreaks.net/comment...

 

I think it could be adapted in this case...

 

5a) Also tell how often this setpoint is likely to change.  Is a different pattern used each pass?  Or is one pattern used over and over again in the microcontro9ller scheme of things?  (E.g. if the pattern changes every one second or every ten seconds then one optimizes for those hundreds of thousands repeat passes)

 

- what do you mean by optimisation level? 

- what do you mean by pin drive? 

- the maximum event rate, if i understand you correctly, is the 100kHz. Both interrupts must be able to fire within that time, missed events are drop dead bad, delayed events handlers are tolerable but should be minimised. 

- the pattern changes at user discretion, I don't know how often - it's controlled by a encoder so maybe every 10 seconds or maybe someone will feel adventurous and give it a full spin, in which case i suppose about 50ms? We all get carried away at times
- ok, which variables would you have to know about in this context? You mean volatile as in do they change in the rest of the program?