Inefficient (u)int64_t operations

Go To Last Post
15 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Any ideas why operations whith (u)int64_t are so inefficient?

Compiler instead of using add/adc combination for some reason emulates carry flag in software.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think it's probably falling back to the generic code in libgcc.a which is really for Intel 32 bit. You may want to take a look at the source of the AVR hand crafted libm.a to verify this.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just seems a little bit strange. Who on earth will implement own carry flag. Every processor I worked with had it. :)

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You'd need to read the source to see what the author had in mind.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
You'd need to read the source to see what the author had in mind.

I'll see, but it won't help me right now. For commercial product I can not patch WinAVR. I'll workaround it somehow.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

For commercial product I can not patch WinAVR. I'll workaround it somehow.

Err yes you can - the lib functions will be weak links. If you provide your own implementation of the same named function in a lib and link it right it will use your version rather than the lib one.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Did you at least try Cliff's first suggestion and link using the -lm linker flag? Or if that is too obtuse, perhaps taken a look at the source for libm in the avr-libc source tree?

You might find that you are using the default library implementations which, as Cliff said, are clumsy.

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

Err yes you can - the lib functions will be weak links. If you provide your own implementation of the same named function in a lib and link it right it will use your version rather than the lib one.

That's an idea! Thanks!

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

stu_san wrote:
Did you at least try Cliff's first suggestion and link using the -lm linker flag? Or if that is too obtuse, perhaps taken a look at the source for libm in the avr-libc source tree?

Yes, I tried. Code like this:

#include 

volatile uint64_t a;
volatile uint64_t b;
volatile uint64_t c;

int main()
{
  a = 0x12345678abcdabcdll;
  b = 10ll;

  asm("nop");
  asm("nop");

  c = a + b;
  
  asm("nop");
  asm("nop");

  return 0;
}

Compiling: avr-gcc -lm -Os t.c

avr-objdump -d a.out produces output like:

0000003c 
: 3c: bf 92 push r11 3e: cf 92 push r12 40: df 92 push r13 42: ef 92 push r14 44: ff 92 push r15 46: 0f 93 push r16 48: 1f 93 push r17 4a: 9d ec ldi r25, 0xCD ; 205 4c: 90 93 70 00 sts 0x0070, r25 50: 8b ea ldi r24, 0xAB ; 171 52: 80 93 71 00 sts 0x0071, r24 56: 90 93 72 00 sts 0x0072, r25 5a: 80 93 73 00 sts 0x0073, r24 5e: 88 e7 ldi r24, 0x78 ; 120 60: 80 93 74 00 sts 0x0074, r24 64: 86 e5 ldi r24, 0x56 ; 86 66: 80 93 75 00 sts 0x0075, r24 6a: 84 e3 ldi r24, 0x34 ; 52 6c: 80 93 76 00 sts 0x0076, r24 70: 82 e1 ldi r24, 0x12 ; 18 72: 80 93 77 00 sts 0x0077, r24 76: 8a e0 ldi r24, 0x0A ; 10 78: 80 93 60 00 sts 0x0060, r24 7c: 10 92 61 00 sts 0x0061, r1 80: 10 92 62 00 sts 0x0062, r1 84: 10 92 63 00 sts 0x0063, r1 88: 10 92 64 00 sts 0x0064, r1 8c: 10 92 65 00 sts 0x0065, r1 90: 10 92 66 00 sts 0x0066, r1 94: 10 92 67 00 sts 0x0067, r1 98: 00 00 nop 9a: 00 00 nop 9c: 50 91 70 00 lds r21, 0x0070 a0: 20 91 71 00 lds r18, 0x0071 a4: e0 91 72 00 lds r30, 0x0072 a8: 10 91 73 00 lds r17, 0x0073 ac: e0 90 74 00 lds r14, 0x0074 b0: d0 90 75 00 lds r13, 0x0075 b4: c0 90 76 00 lds r12, 0x0076 b8: b0 90 77 00 lds r11, 0x0077 bc: 80 91 60 00 lds r24, 0x0060 c0: 30 91 61 00 lds r19, 0x0061 c4: 40 91 62 00 lds r20, 0x0062 c8: 70 91 63 00 lds r23, 0x0063 cc: f0 91 64 00 lds r31, 0x0064 d0: b0 91 65 00 lds r27, 0x0065 d4: f0 90 66 00 lds r15, 0x0066 d8: 60 91 67 00 lds r22, 0x0067 dc: 58 0f add r21, r24 de: 91 e0 ldi r25, 0x01 ; 1 e0: 58 17 cp r21, r24 e2: 08 f0 brcs .+2 ; 0xe6 <__SREG__+0xa7> e4: 90 e0 ldi r25, 0x00 ; 0 e6: 83 2f mov r24, r19 e8: 82 0f add r24, r18 ea: 21 e0 ldi r18, 0x01 ; 1 ec: 83 17 cp r24, r19 ee: 08 f0 brcs .+2 ; 0xf2 <__SREG__+0xb3> f0: 20 e0 ldi r18, 0x00 ; 0 f2: 09 2f mov r16, r25 f4: 08 0f add r16, r24 f6: 91 e0 ldi r25, 0x01 ; 1 f8: 08 17 cp r16, r24 fa: 08 f0 brcs .+2 ; 0xfe <__SREG__+0xbf> fc: 90 e0 ldi r25, 0x00 ; 0 fe: 29 2b or r18, r25 100: 84 2f mov r24, r20 102: 8e 0f add r24, r30 104: 31 e0 ldi r19, 0x01 ; 1 106: 84 17 cp r24, r20 108: 08 f0 brcs .+2 ; 0x10c <__SREG__+0xcd> 10a: 30 e0 ldi r19, 0x00 ; 0 10c: a2 2f mov r26, r18 10e: a8 0f add r26, r24 110: 91 e0 ldi r25, 0x01 ; 1 112: a8 17 cp r26, r24 114: 08 f0 brcs .+2 ; 0x118 <__SREG__+0xd9> 116: 90 e0 ldi r25, 0x00 ; 0 118: 39 2b or r19, r25 11a: 87 2f mov r24, r23 11c: 81 0f add r24, r17 11e: 21 e0 ldi r18, 0x01 ; 1 120: 87 17 cp r24, r23 122: 08 f0 brcs .+2 ; 0x126 <__SREG__+0xe7> 124: 20 e0 ldi r18, 0x00 ; 0 126: e3 2f mov r30, r19 128: e8 0f add r30, r24 12a: 91 e0 ldi r25, 0x01 ; 1 12c: e8 17 cp r30, r24 12e: 08 f0 brcs .+2 ; 0x132 <__SREG__+0xf3> 130: 90 e0 ldi r25, 0x00 ; 0 132: 29 2b or r18, r25 134: 8f 2f mov r24, r31 136: 8e 0d add r24, r14 138: 31 e0 ldi r19, 0x01 ; 1 13a: 8f 17 cp r24, r31 13c: 08 f0 brcs .+2 ; 0x140 <__SREG__+0x101> 13e: 30 e0 ldi r19, 0x00 ; 0 140: 72 2f mov r23, r18 142: 78 0f add r23, r24 144: 91 e0 ldi r25, 0x01 ; 1 146: 78 17 cp r23, r24 148: 08 f0 brcs .+2 ; 0x14c <__SREG__+0x10d> 14a: 90 e0 ldi r25, 0x00 ; 0 14c: 39 2b or r19, r25 14e: 8b 2f mov r24, r27 150: 8d 0d add r24, r13 152: 21 e0 ldi r18, 0x01 ; 1 154: 8b 17 cp r24, r27 156: 08 f0 brcs .+2 ; 0x15a <__SREG__+0x11b> 158: 20 e0 ldi r18, 0x00 ; 0 15a: 43 2f mov r20, r19 15c: 48 0f add r20, r24 15e: 91 e0 ldi r25, 0x01 ; 1 160: 48 17 cp r20, r24 162: 08 f0 brcs .+2 ; 0x166 <__SREG__+0x127> 164: 90 e0 ldi r25, 0x00 ; 0 166: 29 2b or r18, r25 168: 8f 2d mov r24, r15 16a: 8c 0d add r24, r12 16c: 91 e0 ldi r25, 0x01 ; 1 16e: 8f 15 cp r24, r15 170: 08 f0 brcs .+2 ; 0x174 <__SREG__+0x135> 172: 90 e0 ldi r25, 0x00 ; 0 174: 32 2f mov r19, r18 176: 38 0f add r19, r24 178: 21 e0 ldi r18, 0x01 ; 1 17a: 38 17 cp r19, r24 17c: 08 f0 brcs .+2 ; 0x180 <__SREG__+0x141> 17e: 20 e0 ldi r18, 0x00 ; 0 180: 92 2b or r25, r18 182: 6b 0d add r22, r11 184: 96 0f add r25, r22 186: 50 93 68 00 sts 0x0068, r21 18a: 00 93 69 00 sts 0x0069, r16 18e: a0 93 6a 00 sts 0x006A, r26 192: e0 93 6b 00 sts 0x006B, r30 196: 70 93 6c 00 sts 0x006C, r23 19a: 40 93 6d 00 sts 0x006D, r20 19e: 30 93 6e 00 sts 0x006E, r19 1a2: 90 93 6f 00 sts 0x006F, r25 1a6: 00 00 nop 1a8: 00 00 nop 1aa: 80 e0 ldi r24, 0x00 ; 0 1ac: 90 e0 ldi r25, 0x00 ; 0 1ae: 1f 91 pop r17 1b0: 0f 91 pop r16 1b2: ff 90 pop r15 1b4: ef 90 pop r14 1b6: df 90 pop r13 1b8: cf 90 pop r12 1ba: bf 90 pop r11 1bc: 08 95 ret

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So this code looks like an implemantation of arithmetics in C. Its true that most or all CPUs have a carry flag, but the language C does not support it directly.
So there seems to be no optimized support im libm yet.

You may have to make your own ASM version of this. Using the carry flag is one of the points where ASM can really improve on C code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kleinstein wrote:
So this code looks like an implemantation of arithmetics in C. Its true that most or all CPUs have a carry flag, but the language C does not support it directly.
So there seems to be no optimized support im libm yet.

I have looked through GCC source. Yes, there are no 64 bit support in AVR part, so GCC substitutes some universal handler.

Seems like an interesting task to add this.

NOTE: I no longer actively read this forum. Please ask your question on www.eevblog.com/forum if you want my answer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Patches welcome.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As I posted in a different thread, wasn't there a project by a guy in Boulder to build 64-bit integer support for AVR-libC? I cannot remember the result of that project, but a forum search should bring it up.

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The function that we use is a 16 x 16 -> 32 bit multiply. I can give you source for that part of the problem, if you wish.

The real problem is that a 32 x 32 multiply on an 8-bit machine is going to be ugly no matter how you tackle it.

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

stu_san wrote:
As I posted in a different thread, wasn't there a project by a guy in Boulder to build 64-bit integer support for AVR-libC? I cannot remember the result of that project, but a forum search should bring it up.

Stu

IIRC, It was the same guy (Sean D'Epagnier) who wrote the fixed point math patches (which are still not integrated yet).