A modulos operation, takes 40us on Atmega328p?

Go To Last Post
19 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi there freaks,

hours are ticking by like seconds and Google presents only garbage...

I now tried a lot of combinations on a simple if-clause and came to the point that this is a Problem which i think isnt normal. My oszi shows 40us for this if-clause, which i think is way too much. With the assemblercode i counted 31 clocks on another combination (which i calculated to around 2us at 16Mhz).

I have this code:

#define TIMING_TEST_PIN 6

int volatile pos = 16;

void setup() {
    pinMode(TIMING_TEST_PIN, OUTPUT);
    pinMode(5, OUTPUT);

}

void test2(){

    PORTD |= (1<<TIMING_TEST_PIN);
    if(micros() % (1000L*1000) < 20000){
      PORTB &= ~(1<<5); // LED_BUILTIN(13) -> PB5 bit5
      //delayMicroseconds(10);
      //pos = pos+1;
    }
    PORTD &= ~(1<<TIMING_TEST_PIN);

}

void loop(){
    test2();
    delayMicroseconds(100);

}

Does anyone has a clue?

 

UPDATE:

I could shrink the code and still having the problem:

#define TIMING_TEST_PIN 6

unsigned volatile long var = 0;
unsigned volatile long var2 = 0;

void setup() {
    pinMode(TIMING_TEST_PIN, OUTPUT);
    pinMode(5, OUTPUT);

}

void test2(){
    PORTD |= (1<<TIMING_TEST_PIN);
    var2 = var % 1000000L;
    PORTD &= ~(1<<TIMING_TEST_PIN);
}

void loop(){
    test2();
    delayMicroseconds(100);

}

Is that bloody modulos operator taking all that time?

 

I use that quite often to time some functions. I dont know how i could then achive this without using a timer (timers are already used).

Yes i use Google, but there is no search enginge within Google to search through those many useless results...

Last Edited: Wed. Mar 22, 2017 - 09:31 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You need to tell us more.

 

In particular, this is an Arduino sketch, right?  So it could be interrupting and doing other stuff.  [just for fun, turn off interrupts during your test sequence--but I don't think that's it]

 

But anyway, I'd like to see the 31 clocks code.  You are invoking the micros() function.  I'm guessing that this might be e.g. a 32-bit value with timer overflows counted, augmented by a microsecond tick count.  That's going to take a few microseconds.

 

Then you force a modulus operation with a very large divider.  That could indeed take 10s of microseconds.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The compare will first have to AND the result from micros() (which must be 32bit) with 1000L * 1000. Just getting 4 bytes out via the stack will take a few cycles. Then, it will have to do a multi-byite compare. So, even that is not going to be speedy.

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

 

40us is 640 cycles on 16M. Sounds quite possible for a divide

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You're forcing a 32bit division for the modulus operation; 40us doesn't seem too unlikely.  I can't find benchmark info for modulus specifically, but a floating point division (which is similar, but only 24bits) claims to take 400+ cycles.  (A divide will take several operations for each bit of operand, each operation has to handle 32bit (4 bytes...))

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oooh, did not see that the operation is "%" rather than "&". Big difference.

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I thought the title was "simple if..."

 

We still haven't seen the generated code.

 

One can plug in a few numbres in a simulator program to see how long it takes.

 

Helpful if one knows the AVR model and clock speed and toolchain and version and optimization level, without trying to infer it all.

 

Racing to count microseconds with looping code wouldn't be my first approach.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Tobey wrote:

...40us...

 

I measure 47.6usecs at 16MHz clock using CVAVR.

#1 This forum helps those that help themselves

#2 All grounds are not created equal

#3 How have you proved that your chip is running at xxMHz?

#4 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand." - Heater's ex-boss

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Tobey wrote:

I dont know how i could then achive this without using a timer (timers are already used).

 

It's often possible to use one timer to do multiple things.

#1 This forum helps those that help themselves

#2 All grounds are not created equal

#3 How have you proved that your chip is running at xxMHz?

#4 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand." - Heater's ex-boss

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brian Fairchild wrote:
I measure 47.6usecs at 16MHz clock using CVAVR.

IIRC from previous threads where divide speed was measured, the time will depend on the values of the operands.  [Also] IIRC, CodeVision uses the "repeated subtract of powers of 10" and would be consistent with that.

 

Back when I was your age, I had an AT90S4433 app running at about 4MHz.  4x 7-seg display.  Taking an ADC reading and converting to selected pressure units and display with decimal point in the right spot took just shy of a millisecond.  And surprisingly was nearly the same using three different approaches.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Helpful if one knows the AVR model and clock speed and toolchain and version and optimization level, without trying to infer it all.

The fact that it is Arduino code possibly narrows the guessing game a little but tere have been a few issues of that with varying avr-gcc and with varying build options and of course we don't know if the Arduino is a mega2560 or a mega328 or what.

 

Anyway, here is my guess..

C:\SysGCC\avr\bin>type avr.c
#include <avr/io.h>

uint32_t a, b, c;

int main(void){

        a = b % c;
        return 0;
}
C:\SysGCC\avr\bin>avr-gcc -mmcu=atmega328p -Os -g avr.c -o avr.elf

C:\SysGCC\avr\bin>avr-objdump -S avr.elf

avr.elf:     file format elf32-avr


Disassembly of section .text:

00000000 <__vectors>:
   0:   0c 94 34 00     jmp     0x68    ; 0x68 <__ctors_end>
   4:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
   8:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
   c:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  10:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  14:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  18:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  1c:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  20:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  24:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  28:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  2c:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  30:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  34:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  38:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  3c:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  40:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  44:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  48:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  4c:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  50:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  54:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  58:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  5c:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  60:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>
  64:   0c 94 46 00     jmp     0x8c    ; 0x8c <__bad_interrupt>

00000068 <__ctors_end>:
  68:   11 24           eor     r1, r1
  6a:   1f be           out     0x3f, r1        ; 63
  6c:   cf ef           ldi     r28, 0xFF       ; 255
  6e:   d8 e0           ldi     r29, 0x08       ; 8
  70:   de bf           out     0x3e, r29       ; 62
  72:   cd bf           out     0x3d, r28       ; 61

00000074 <__do_clear_bss>:
  74:   21 e0           ldi     r18, 0x01       ; 1
  76:   a0 e0           ldi     r26, 0x00       ; 0
  78:   b1 e0           ldi     r27, 0x01       ; 1
  7a:   01 c0           rjmp    .+2             ; 0x7e <.do_clear_bss_start>

0000007c <.do_clear_bss_loop>:
  7c:   1d 92           st      X+, r1

0000007e <.do_clear_bss_start>:
  7e:   ac 30           cpi     r26, 0x0C       ; 12
  80:   b2 07           cpc     r27, r18
  82:   e1 f7           brne    .-8             ; 0x7c <.do_clear_bss_loop>
  84:   0e 94 48 00     call    0x90    ; 0x90 <main>
  88:   0c 94 87 00     jmp     0x10e   ; 0x10e <_exit>

0000008c <__bad_interrupt>:
  8c:   0c 94 00 00     jmp     0       ; 0x0 <__vectors>

00000090 <main>:

uint32_t a, b, c;

int main(void){

        a = b % c;
  90:   60 91 00 01     lds     r22, 0x0100
  94:   70 91 01 01     lds     r23, 0x0101
  98:   80 91 02 01     lds     r24, 0x0102
  9c:   90 91 03 01     lds     r25, 0x0103
  a0:   20 91 04 01     lds     r18, 0x0104
  a4:   30 91 05 01     lds     r19, 0x0105
  a8:   40 91 06 01     lds     r20, 0x0106
  ac:   50 91 07 01     lds     r21, 0x0107
  b0:   0e 94 65 00     call    0xca    ; 0xca <__udivmodsi4>
  b4:   60 93 08 01     sts     0x0108, r22
  b8:   70 93 09 01     sts     0x0109, r23
  bc:   80 93 0a 01     sts     0x010A, r24
  c0:   90 93 0b 01     sts     0x010B, r25
        return 0;
  c4:   80 e0           ldi     r24, 0x00       ; 0
  c6:   90 e0           ldi     r25, 0x00       ; 0
  c8:   08 95           ret

000000ca <__udivmodsi4>:
  ca:   a1 e2           ldi     r26, 0x21       ; 33
  cc:   1a 2e           mov     r1, r26
  ce:   aa 1b           sub     r26, r26
  d0:   bb 1b           sub     r27, r27
  d2:   fd 01           movw    r30, r26
  d4:   0d c0           rjmp    .+26            ; 0xf0 <__udivmodsi4_ep>

000000d6 <__udivmodsi4_loop>:
  d6:   aa 1f           adc     r26, r26
  d8:   bb 1f           adc     r27, r27
  da:   ee 1f           adc     r30, r30
  dc:   ff 1f           adc     r31, r31
  de:   a2 17           cp      r26, r18
  e0:   b3 07           cpc     r27, r19
  e2:   e4 07           cpc     r30, r20
  e4:   f5 07           cpc     r31, r21
  e6:   20 f0           brcs    .+8             ; 0xf0 <__udivmodsi4_ep>
  e8:   a2 1b           sub     r26, r18
  ea:   b3 0b           sbc     r27, r19
  ec:   e4 0b           sbc     r30, r20
  ee:   f5 0b           sbc     r31, r21

000000f0 <__udivmodsi4_ep>:
  f0:   66 1f           adc     r22, r22
  f2:   77 1f           adc     r23, r23
  f4:   88 1f           adc     r24, r24
  f6:   99 1f           adc     r25, r25
  f8:   1a 94           dec     r1
  fa:   69 f7           brne    .-38            ; 0xd6 <__udivmodsi4_loop>
  fc:   60 95           com     r22
  fe:   70 95           com     r23
 100:   80 95           com     r24
 102:   90 95           com     r25
 104:   9b 01           movw    r18, r22
 106:   ac 01           movw    r20, r24
 108:   bd 01           movw    r22, r26
 10a:   cf 01           movw    r24, r30
 10c:   08 95           ret

0000010e <_exit>:
 10e:   f8 94           cli

00000110 <__stop_program>:
 110:   ff cf           rjmp    .-2             ; 0x110 <__stop_program>

I'm not in a position to simulate that so cannot say how many cycles are there. But clearly there's a pretty big loop there being executed 32 times!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

...from previous threads where divide speed was measured, the time will depend on the values of the operands.  [Also] IIRC, CodeVision uses the "repeated subtract of powers of 10" and would be consistent with that.

 

Over a cup of coffee I tried with divisors of between 10 and the OP's 1,000,000 (in steps of x10) and the variance was 1.5us.

#1 This forum helps those that help themselves

#2 All grounds are not created equal

#3 How have you proved that your chip is running at xxMHz?

#4 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand." - Heater's ex-boss

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brian Fairchild wrote:

Tobey wrote:

I dont know how i could then achive this without using a timer (timers are already used).

 

It's often possible to use one timer to do multiple things.

 

I searched for the Interrupt and found it in wiring.c:

 

#if defined(__AVR_ATtiny24__) || defined(__AVR_ATtiny44__) || defined(__AVR_ATtiny84__)
ISR(TIM0_OVF_vect)
#else
ISR(TIMER0_OVF_vect)
#endif
{
	// copy these to local variables so they can be stored in registers
	// (volatile variables must be read from memory on every access)
	unsigned long m = timer0_millis;
	unsigned char f = timer0_fract;

	m += MILLIS_INC;
	f += FRACT_INC;
	if (f >= FRACT_MAX) {
		f -= FRACT_MAX;
		m += 1;
	}

	timer0_fract = f;
	timer0_millis = m;
	timer0_overflow_count++;
}

How can i define a function in my sourcefile  that is called from this interrupt?

Yes i use Google, but there is no search enginge within Google to search through those many useless results...

Last Edited: Thu. Mar 23, 2017 - 06:54 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can't. So you need to disable that copy and implement your own. If you want to retain what that one was doing then you need to do the work form there too.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Tobey wrote:
Tobey wrote: I dont know how i could then achive this without using a timer (timers are already used).

You haven't really said what you are tryng to time, what the minimum and maximum duration is, and what is the needed accuracy.

 

Your simple if() example seems to imply 20ms.  So a simple "soft timer" using millis() would be my first guess.  But tell us all the needed parameters and specifications.  Tell us what else is going on in the app such that all the timers are used.  Tell >>how<< they are used.  For example, if one timer is free-running doing PWM that can also be used to clock things to one timer tick.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

You can't. So you need to disable that copy and implement your own. If you want to retain what that one was doing then you need to do the work form there too.

 

Unless you use the ancient forbidden arts and hook that interrupt vector into you own function, like an old DOS virus devil

Nah, that's just crazy talk, a flashback to my old days as a teenage hacker. Carry on cheeky

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here some more information:

I wrote in the title that it is a atmega 328p at 16mhz.

The reasen i was using the modulus is, that i wanted to make things happen at an exact interval or at a certain timing.

Timer0 is used for millis(), delay().

Timer1 is used for a servo.

Timer2 for tone.

 

Now... this is what i came up with...

... from wiring.c
ISR(TIMER0_OVF_vect)
{
	// copy these to local variables so they can be stored in registers
	// (volatile variables must be read from memory on every access)
	unsigned long m = timer0_millis;
	unsigned char f = timer0_fract;

	m += MILLIS_INC;
	f += FRACT_INC;
	tobysCounter++;
	if (f >= FRACT_MAX) {
		f -= FRACT_MAX;
		m += 1;
		tobysCounter++;
	}

	timer0_fract = f;
	timer0_millis = m;
	timer0_overflow_count++;


	if(tobysCounter >= tobysCounterTop){
		tobysCounter = 0;
	}
	if(tobysCounter == tobysCounterCompA){
		tobysCounterCompFlagA = true;
	}

}



boolean tobysCounterCompFlagAResult(){
	if(tobysCounterCompFlagA){
		tobysCounterCompFlagA = false;
		return true;
	}
	else
		return false;
}

void setTobysCounterCompA(unsigned int value){
	tobysCounterCompA = value;
}

Usage:


	if(tobysCounterCompFlagAResult()){//tobysCounterCompare(8000)){
		PORTD |= (1<<TIMING_TEST_PIN);
		delay(1);
	}
	else{
		PORTD &= ~(1<<TIMING_TEST_PIN);
	}

I just realized that this is much better, since there is a flag until the function was called, so the timing does not have to be exactly and there is no "double execution" when having a to big of a value that is compared with the modulus operator.

 

 

Yes i use Google, but there is no search enginge within Google to search through those many useless results...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

IIRC, CodeVision uses the "repeated subtract of powers of 10"

Is that for ALL divisions by powers of 10, or just for number to string conversion?

 

 

The reasen i was using the modulus is, that i wanted to make things happen at an exact interval or at a certain timing.

    if(micros() % (1000L*1000) < 20000){

Doesn't look THAT exact.   Within 20 milliseconds of every second, right?  An AND with 1024*1024-1 would be within 50ms...  (and on an Arduino, isn't millis() accurate to within <2ms, so (millis() & 1023 < 8 should be within ~10ms...)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You got me ;-)

Initially it should work exactly, but then there came  a few obstacles.

 

Yes i use Google, but there is no search enginge within Google to search through those many useless results...