optimization -Os vs -O3 effect on: _delay_ms and ISR timings

Go To Last Post
20 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm using an ATtiny2313 to drive a constant current sink led driver (essentially an 8bit shift register) using the USI-SPI hardware.

In principal everything works as expected, but the delays I insert for debugging purposes (adjusting the led driver current pot etc.) are way off when using -Os optimization level. They're about a factor of 4.5 to 5.0 too long. I've checked the external oscillator and it runs at about 7.78MHz (should be 8.00MHz) and the DIV8 fuse is not active as well. At least that's what a web based fuse calculator tells me. The time discrepancy stays at about 4.5 if I wrongly change F_CPU to whatever value.

Also it seems that with -Os my soft-pwm ISR (or something inside - NO delays in there btw.) runs at a different speed (slower I guess) as compared to -O3. The effect of this is a noticeable difference in LED brightness.

I've compared the generated assembly code and can see that the ISR gets a few more push/pops depending upon optimization level, but the rest eludes me. Especially the major delay discrepancy disturbs me. The ISR (timer1_compa_vect, CTC mode) timing can be adjusted by playing with OCR1A if necessary.

I've attached my Makefile as well.

The code:

// moved to makefile // #define F_CPU 8000000UL
#include 
#include 
#include 
#include 

#define __LATCH_LOW PORTB &= ~(1 << PB4)
#define __LATCH_HIGH PORTB |= (1 << PB4)
#define __DISPLAY_ON PORTB &= ~_BV(PB1)
#define __DISPLAY_OFF PORTB |= _BV(PB1)

#define __max_brightness 128 // higher numbers at your own risk - should be divisible by 8 ;-)
#define __pwm_loop_max __max_brightness - 1
#define __OCR1A ( 0x0010 * (__max_brightness / 8) )

void setup(void);
void __delay_ms(uint16_t ms);
void current_calib(void);
void setup_timer1_ctc(void);
uint8_t spi_transfer(uint8_t data);
int main(void);

volatile uint8_t brightness[8] = {0,0,0,0,0,0,0,0};

int main(void) {
	setup();
	cli();
	__DISPLAY_OFF;
	for(;;) {} // stop here
};

void setup(void) {
  DDRB |= _BV(PB0); // set LED pin as output
  PORTB |= _BV(PB0); // turn the LED on
  DDRB |= _BV(PB1); // display enable pin as output
  PORTB |= _BV(PB1); // pullup on
  
  // USI stuff
  
  DDRB |= _BV(PB6); // as output (DO)
  DDRB |= _BV(PB7); // as output (USISCK)
  DDRB &= ~_BV(PB5); // as input (DI)
  PORTB |= _BV(PB5); // pullup on (DI)

  sei(); // turn global irq flag on

  setup_timer1_ctc();
  current_calib();
}

void current_calib(void) {
  uint8_t ctr1;
  for(ctr1 = 0; ctr1 <= 7; ctr1++) {
    brightness[ctr1] = 255;
  }
  __delay_ms(5000);
}

void __delay_ms(uint16_t ms) {
  uint16_t ctr1;
  for(ctr1 = 0; ctr1 < ms; ctr1++ ){
    _delay_ms(1);	  
  }
}

/*
Functions dealing with hardware specific jobs / settings
*/

uint8_t spi_transfer(uint8_t data) {
  USIDR = data;
  USISR = _BV(USIOIF); // clear flag

  while ( (USISR & _BV(USIOIF)) == 0 ) {
   USICR = (1<<USIWM0)|(1<<USICS1)|(1<<USICLK)|(1<<USITC);
  }
  return USIDR;
}

void setup_timer1_ctc(void)
{
	uint8_t _sreg = SREG;	/* save SREG */
	cli();			/* disable all interrupts while messing with the register setup */

	/* multiplexed TRUE-RGB PWM mode (quite dim) */
	/* set prescaler to 256 */
	TCCR1B |= (_BV(CS12));
	TCCR1B &= ~(_BV(CS11) | _BV(CS10));
	/* set WGM mode 4: CTC using OCR1A */
	TCCR1A &= ~(_BV(WGM11) | _BV(WGM10));
	TCCR1B |= _BV(WGM12);
	TCCR1B &= ~_BV(WGM13);
	/* normal operation - disconnect PWM pins */
	TCCR1A &= ~(_BV(COM1A1) | _BV(COM1A0) | _BV(COM1B1) | _BV(COM1B0));
	/* set top value for TCNT1 */
	OCR1A = __OCR1A;
	/* enable COMPA isr */
	TIMSK |= _BV(OCIE1A);
	/* restore SREG with global interrupt flag */
	SREG = _sreg;
}

ISR(TIMER1_COMPA_vect)
{				/* Framebuffer interrupt routine */
	uint8_t pwm_cycle;

	__DISPLAY_ON;		// only enable the drivers when we actually have time to talk to them

	for (pwm_cycle = 0; pwm_cycle <= __pwm_loop_max; pwm_cycle++) {

		uint8_t led;
		uint8_t red = 0x00;	// off

		for (led = 0; led <= 7; led++) {
			if ( pwm_cycle < brightness[led] ) {
				red |= _BV(led);
			}
			else {
				red &= ~_BV(led);
			}
		}

		__LATCH_LOW;
		spi_transfer(red);
		__LATCH_HIGH;

	}

	__DISPLAY_OFF;		// we're done with this line, turn the driver's off until next time

}

I'm aware of discussions about avr-libc and _delay_ms() issues. I tried version 1.7.2 (I think), but to no avail. My standard version is 1.7.0.x, which is the one I can get from the openSUSE repos.

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For _delay_ms, there will be no difference between -O3 and -Os. However there may be a difference for your __delay_ms (with two '_'s) since the loop overhead may be different. But I would be more concerned about your ISR. You are doing far more than you should be in an ISR.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes I know... for this micro it doesn't make much sense, as it won't have to do much else except waiting for a few key presses. All of it can be done in main(). It's just a simple kitchen lamp after all.

A bit of history regarding the ISR. I ported it over from one of my ATmega168 projects (8x8 RGB matrix with 15bit colors) and wanted to see if it would work here as well. On the ATmega it's nice to push the frame-buffer stuff out of the way. It takes about 50% cpu time, but other things like serial or I2C still work. It only serves 1 scanline at a time, so it doesn't block for too long. Using TLC5947s would be much better, but these are notoriously out of stock and not exactly cheap.

Maybe the 'malfunctioning' of the __delay_ms() and the ISR oddity are just a freak coincidence. Both depend on the optimization level apparently.

I'll have to check on my scope if _delay_ms() is truly independent of this.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Your mention of "factor of 4" rings alarm bells. Could it be that you are using "AVR Toolchain" rather than "WinAVR" by any chance? It has a serious fault in that introduces exactly a factor of 4 error.

If it is the case that -O3 rather than -Os quells this issue then you may have just discovered a clever workaround to the problem! :-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Software delays count a fixed number of cpu cycles. Your lengthy interrupt routine will steal cycles and make the delay longer. With O3 the interrupt would run faster making the delay seem shorter. If that's the reason for a 5x longer delay then the interrupt is using 80% of the CPU.
Another possibility is rapid hardware or software resets that make it look like the program is running slowly when what it is really doing is executing the startup code a lot. A distinctive startup sequence would flag that. In your case you can kill two birds with one stone with

main {
cli();
(led_on)
_delay_ms(1000); // one second led on startup
(led_off)
_delay_ms(100);
(initialize)
sei(); //allow interrupts now
(led_on)
_delay_ms(1000); //will be longer than a second
(led_off)
...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No resets. I would have seen that on my scope ;-)

int main(void) {
        /*
         * this runs with interrupts on and should light
         * a led bar for 5 seconds unsing __delay_ms(5000);
         * it turns out OK with -O3 (still a little of)
         * and is way off with -Os (factor 4 or 5)
         *
         */
        setup();

        /*
         * with interrupts off the next delays are 100%
         * correct regardless of opt. level
         */
        cli();
        PORTB |= _BV(PB0); // debug led on
        __delay_ms(5000);
        PORTB &= ~_BV(PB0); // off again
        __DISPLAY_OFF;
        for(;;) {} // stop here
};

But it IS correct that the ISR is interleaved into the delays and makes them longer. Letting a simple led blink with interrupts disabled gives perfect delays for the LED regardless of -Os or -O3.

A delay of 5000ms is totally wrong with -Os and only slightly off (by human standards) with -O3. Interrupts enabled in this case.

I've measured one ISR run to take these times:

with -Os: 6.4ms
with -O3: 2.3ms

The call rate is the same in both cases, about 8.4ms. This is set by OCR1A and the prescaler and I get the same results when calculating it with the measured oscillator frequency.

I'll do some scaling tests next.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK. Something is fishy here...

I've taken measurements at desired delays of 100ms, 200ms, 300ms and 400ms.

During a delay calculated for 400ms, the ISR gets called about about 48 times. This would add 48x2.3ms = 110.4ms in case of -O3 or 48x6.4ms = 307.2ms in case of -Os to the desired delay.

I measured these times:

-Os: 1740ms
-O3: 540ms

Correcting the measurement for these additional delays gives:

-Os: 1433ms (still not ok)
-O3: 430ms (pretty close)

Where's all the time getting lost?

The next thing to do would be to just use a dummy ISR that gets called but doesn't have any code in it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

madworm wrote:
A delay of 5000ms is totally wrong with -Os and only slightly off (by human standards) with -O3. Interrupts enabled in this case.

And with interrupts disabled ?

Quote:
I've measured one ISR run to take these times:

with -Os: 6.4ms
with -O3: 2.3ms

The call rate is the same in both cases, about 8.4ms. This is set by OCR1A and the prescaler and I get the same results when calculating it with the measured oscillator frequency.


You interrupts are way too long. I would not trust them at all. An interrupt should run in the us range. Also, you seen to run delays in the interrupt this is directly in contradiction with the rule that an interrupt should finish as fast as possible. Without any delay !

Markus

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I see no reason to use another level as -Os.
On all other levels only the Flash size grows, without no positive influence to the CPU load.

Even if -Os suggest size optimization, its not true.
Often it reorder the code to use more Flash, than needed.
E.g. such statements are always disoptimized by inserting an additional jump:

  val = 0;
  if( condition )
    val = 1;

But its the best optimization, which available.

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There are NO artificial delays in my ISR. It does what it has to do and exits. It's just a lot of stuff...

One option I've already tested is to remove the loop in the ISR and thereby chop it into thinner time slices, but the total runtime would be the same. And total ISR runtime is directly proportional to brightness in this case. This also increases overhead.

No matter how I shift things back and forth, there's only one sweet spot that gives good brightness and acceptable responsiveness of the thing. Seems I'll have to accept my _delay_ms() issue.

A dedicated led driver with included pwm circuitry would be the best thing - If only I didn't have a ton of these pwm-less driver chips sitting here.

Time to make extra crunchy croûtons for special friends out of them.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If something is timing critical then why on earth are you using interrupt sensitive software delay loops anyway? It's crying out for the use of a timer interrupt. Put the none time critical work in the main() loop and ONLY the things that must happen at exact moments in time into timer ISR(s). Have the ISRs as short as possible - the absolute minimum work that is truly timing critical.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You're absolutely correct.

But what can you expect from a dumbed down A. user that is getting senile as well? This is what happens if people choose to live without their 'security blanket' and venture into the wild.

Time to steal the code for the system ticker ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have a system timer ticking now. Some jitter and my old scope (>35 years) doesn't trigger on these signals properly anymore.

Still the framebuffer ISR must be called so often/long to keep the brightness up, that everything else is affected by it. I need pwm-enabled led drivers with internal grayscale clock and compare units.

End of story.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Your program is an ideal case for Simulation.
You will see exactly how many us are taken in the ISR().
You will see exactly what effect this has on the main program i.e. __delay_ms().

But from a brief glance, your ISR() has a 128 x 8 loop. That is 1024 !
Your Timer1 IRQ can occur as frequently as 16 x 256 clock cycles (512us). Admittedly your ISR() loop count is proportionately shorter.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

madworm wrote:
OK. Something is fishy here...

During a delay calculated for 400ms, the ISR gets called about about 48 times. This would add 48x2.3ms = 110.4ms in case of -O3 or 48x6.4ms = 307.2ms in case of -Os to the desired delay.

I measured these times:

-Os: 1740ms
-O3: 540ms

Correcting the measurement for these additional delays gives:

-Os: 1433ms (still not ok)
-O3: 430ms (pretty close)

Where's all the time getting lost?


Just to wrap up what you probably already realize, the ISR gets called 48 times in 400 msec. That is not the same thing as being called 48 times during a software delay calculated for 400 msec. If the ISR takes 100% of CPU it will execute continuously and the software delay will never finish (well actually it might get one instruction between each interrupt).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Simulation... maybe I'll go that way. If I find a simulator that isn't proprietary software. But not tonight.

Death to Micro$oft btw. There I said it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dak664 wrote:

If the ISR takes 100% of CPU it will execute continuously and the software delay will never finish (well actually it might get one instruction between each interrupt).

Yes I know this, and have seen the effect of my ISR starving the rest of the system. I adjust the timing so it doesn't go above about 66% load using my scope. Making the ISR speedier decreases brightness, reducing the amount of work done per invocation increases the overhead... and so on. Hardware PWM modules that run independently are such a nice thing, brute force software PWM sucks.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No. With a little 'design' effort, you can make your software PWM work very efficiently.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can get precise timing with interrupts AND no code in the handler by using the interrupt itself to wake the CPU from sleep, do your 'job' and then sleep again. It's really trivial and is completely foolproof. Oh and saves power too obviously ;D

Here is an example code straight out of simavr that does exactly that. BTW simavr is probably exactly what you need if you want to check timings on interrupts and events :D

volatile uint16_t tcnt;

ISR(TIMER2_COMPA_vect)		// handler for Output Compare 2 overflow interrupt
{
	// this really doesn't no anything but proves a way to wake the main()
	// from sleep at regular intervals
	PORTB ^= 1;
}

int main()
{	
	//
	// start the 16 bits timer, with default "normal" waveform
	// and no interupt enabled. This just increments TCNT1
	// at a regular rate
	//
	// timer prescaler to 64
	TCCR1B |= (0<<CS12 | 1<<CS11 | 1<<CS10);

	DDRB = 5;
	
	//
	// now enable a tick counter
	// using an asynchronous mode
	//
	ASSR |= (1 << AS2);		// use "external" 32.7k crystal source
	// use CLK/8 prescale value, clear timer/counter on compareA match
	// toggle OC2A pin too
	TCCR2A = (1 << WGM21) | (1 << COM2A0);
    TCCR2B = (2 << CS20); // prescaler
    OCR2A = 63;	// 64 hz
    TIMSK2  |= (1 << OCIE2A);

	sei();
	
	int count = 0;
    while (count++ < 100) {
    	// we read TCNT1, which should contain some sort of incrementing value
    	tcnt = TCNT1;		// read it
    	if (tcnt > 10000) {
    		TCNT1 = 500;	// reset it arbitrarily
    		PORTB ^= 2;		// mark it in the waveform file
    	}
		sleep_cpu();    	// this will sleep until a new timer2 tick interrupt occurs
	}
	// sleeping with interrupt off is interpreted by simavr as "exit please"
	cli();
	sleep_cpu();
}

This generates this curves:

Author of simavr - Follow me on twitter : @buserror

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'll have to checkout simavr when my brain is up to normal again - whenever that is.

I've fundamentally rewritten my ISR and now it executes in 100µs. CPU load is about 8%. But now there are some other issues. It still needs a correction for the runtime, as the error is cumulative. But I won't deal with that right now. Busy with other stuff.

Here's the code to dissect:

typedef struct {
	uint8_t number;
	uint8_t dutycycle;
} led_t;

// approx (1-dutycycle) - must be sorted ascending with respect to the 2nd number
volatile led_t brightness[8] = { {0, 10}, {1, 20}, {2, 30}, {3, 40}, {4, 50}, {5, 60}, {6, 70}, {7, 80} };

void setup_timer1_ctc(void)
{
	uint8_t _sreg = SREG;	/* save SREG */
	cli();			/* disable all interrupts while messing with the register setup */

	/* multiplexed TRUE-RGB PWM mode (quite dim) */
	/* set prescaler to 1024 */
	TCCR1B |= (_BV(CS10) | _BV(CS11));
	TCCR1B &= ~(_BV(CS11));
	/* set WGM mode 4: CTC using OCR1A */
	TCCR1A &= ~(_BV(WGM11) | _BV(WGM10));
	TCCR1B |= _BV(WGM12);
	TCCR1B &= ~_BV(WGM13);
	/* normal operation - disconnect PWM pins */
	TCCR1A &= ~(_BV(COM1A1) | _BV(COM1A0) | _BV(COM1B1) | _BV(COM1B0));
	/* set top value for TCNT1 */
	OCR1A = __OCR1A_max;
	/* enable COMPA isr */
	TIMSK |= _BV(OCIE1A);
	/* restore SREG with global interrupt flag */
	SREG = _sreg;
}

ISR(TIMER1_COMPA_vect)
{				/* Framebuffer interrupt routine */
	__LED0_ON;
	static uint8_t data = 0;	// init as off
	static uint8_t index = 0;

	/* starts with index = 0 */
	/* now calculate when to run the next time and turn on LED0 */
	if (index == 0) {
		OCR1A =
		    (uint16_t) ((uint32_t) (brightness
.dutycycle) * (uint32_t) (__OCR1A_max) / (uint32_t) (100)); index++; } else if (index == 8) { // the last led in the row data |= _BV(brightness[(index - 1)].number); /* calculate when to turn everything off */ OCR1A = (uint16_t) (((uint32_t) (100) - (uint32_t) (brightness[(index - 1)]. dutycycle)) * (uint32_t) (__OCR1A_max) / (uint32_t) (100)); index++; } else if (index == 9) { /* cycle completed, reset everything */ data = 0; index = 0; /* immediately restart */ OCR1A = 0; /* DON'T increase the index counter ! */ } else { /* turn on the LED we deciced to turn on in the last invocation */ data |= _BV(brightness[(index - 1)].number); /* calculate when to run the next time and turn on the next LED */ OCR1A = (uint16_t) (((uint32_t) (brightness
.dutycycle) - (uint32_t) (brightness[(index - 1)]. dutycycle)) * (uint32_t) (__OCR1A_max) / (uint32_t) (100)); index++; } __LATCH_LOW; spi_transfer(data); __LATCH_HIGH; __LED0_OFF; }