long long multiplication and optimization.

Go To Last Post
33 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

First of all my apologies if I am posting this in the wrong forum. I am using avr-gcc, however the solution to my problem may or may not be avr-gcc specific.

I have been toying recently with a AD9851 dds module which I currently have hooked up to a mega48 also running v-usb (software usb) in cdc mode. In other words it connects to a pc via usb and presents itself as a serial com device.

Naturally space is getting a bit tight but it all works and fits quite comfortably until I try to perform the following calculation in firmware.

#include 

#define DDS_XTAL 180000000

unsigned long tuning_word = (frequency * pow(2, 32)) / DDS_XTAL

So at last to my question. Is there any way I can optimize this somehow to squeeze it into my mega48? So far all my attempts to replace the pow function with constants or macros seem to have resulted in even more code and/or compiler warnings.

I guess the obvious solution in upgrade to a mega88 but since there is not that much in it (less than 1K) I thought it might be worth asking.

The other solution (I am currently using) is to have the pc perform the heavy maths and simply send the pre-calculated tuning word to the mcu. This of course requires me to have a custom application to communicate with the module whereas my original intention was to be able control the device via any serial terminal software.

Some readers may be wondering how I am compiling the code and with what optimizations. The answer is that I am using the Makefile (attached) which came with the cdc-io software I adapted to suit my needs.

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

pow(2,32) is surely just 4294967296ULL ?

BTW so others don't have to download the text file the key lines are:

## Options common to compile, link and assembly rules
COMMON = -mmcu=$(MCU)

## Compile options common for all C compilation units.
CFLAGS = $(COMMON)
CFLAGS += -Wall -gdwarf-2 -DF_CPU=12000000UL -Os -fsigned-char
CFLAGS += -MD -MP -MT $(*F).o -MF dep/$(@F).d 

Unless I missed my guess that's a Makefile created straight out of AS4.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm way confused. Put any value into "frequency", shift it left 32 bits, store the low 32 bits--won't it always be 0?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wondered about that - thread title says "long long" but code says just "long". I presume OP is aware that unlike a PC sizeof(int) = 2, sizeof(long) = 4, sizeof(long long) = 8?

I guess the code snippet may have been a typo?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
pow(2,32) is surely just 4294967296ULL ?

Yes, and unfortunately it produces about (if not exactly) the same amount of code as the pow function :(
I also tried (1 << 32) which gave a warning that the bit shift was to wide for the long type.

btw, thanks for pointing out the relevant section of the makefile.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
I wondered about that - thread title says "long long" but code says just "long". I presume OP is aware that unlike a PC sizeof(int) = 2, sizeof(long) = 4, sizeof(long long) = 8?

I guess the code snippet may have been a typo?


Sorry if the thread tittle is adding confusion, I am aware of the size of the long and long long types.

The problem seems to be that the intermediate type needs to be ULL to multiply the UL prior to the division.

However there was a slight typo in the code snippet I posted. DDS_XTAL is defined as 180000000UL not 180000000.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

The problem seems to be that the intermediate type needs to be ULL to multiply the two UL's prior to the division.


??? Even thought AFAIK pow() only returns a float result, the low 32 bits of your result will always be zero.

In the meantime you've pulled in a bunch of float code.

We can't see what "frequency" type is. But if starting with an integral frequency, doing integral operations, and getting/using an integral result then avoid the FP operations. The operation shown doesn't make much sense, so it is TBD if even long-long is needed.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

DDS_XTAL is defined as 180000000UL

Have you tried:

unsigned long tuning_word = ((unsigned long long)frequency * 4294967296ULL) / DDS_XTAL 

The intermediate part of this is likely to overflow otherwise.

But the 4294967296ULL / 180000000UL terms in this are fixed constants and my calculator says that is 23.86092942222 so why not simply multiply 'frequency' by that number? Depending on the accuracy you are looking for it could be that an integer multiply by 24 might be OK?

If you do need to use 23.8609etc then this is going to bring in 700 bytes of FP lib. But either way use:

unsigned long tuning_word = frequency * (4294967296.0f / DDS_XTAL) 

which will at least force the division to occur at compile time. The .0f suffix will guarantee that it's replaced by a float constant.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, I will try to clarify. The original formula came from an arduino example I found on the net, it works just fine but then space was not an issue on a mega328 without sw usb stack.

I have tried using both UL and float types for frequency, so far the following produces the least code but still it wont quite fit.

#define DDS_MULT 23.8609294  /* ( pow(2, 32) / 180000000 ) */

float frequency = 1000.0;

unsigned long tuning_word = frequency * DDS_MULT;

I am now thinking I need to order up a mega88 and just be done with it.

Edit: Sorry Cliff, you posted while I was typing, looks like we think alike but still no dice.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How about:

#define DDS_MULT 238609UL

uint16_t frequency = 1000;

uint32_t tuning_word = (frequency * DDS_MULT) / 10000;

That still allows "headroom" for frequencies up to 18,000 and keeps everything in 32bits. But to use scaled integers like this you need to think about if you need any decimal places of accuracy in frequency or is it OK in integer steps? If you want more digits of accuracy in DDS_MULT and more in freuqnecy (by scaling it) then this would spill into uint64_t which are almost as bad as floats.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

still no dice.

Quote:

frequencies up to 18,000

Quote:

(frequency * DDS_MULT) / 10000;

Any of the small(er) ratios below are off by less than 0.001:

23.8609294 0.000181711111110872 859 36 
23.8609294 0.000464283720930325 1026 43 
23.8609294 0.000929400000000413 1193 50 
23.8609294 0.000609061538462186 1551 65 
23.8609294 0.000181711111110872 1718 72 
23.8609294 0.000169906329112735 1885 79 
23.8609294 0.000464283720930325 2052 86 
23.8609294 0.000714346236559749 2219 93 
23.8609294 0.000772727659573036 2243 94 
23.8609294 0.000929400000000413 2386 100 
23.8609294 0.000456738613863195 2410 101 
23.8609294 0.000181711111110872 2577 108 
23.8609294 5.98347826077372E-05 2744 115 
23.8609294 0.000273662295082744 2911 122 
23.8609294 0.000859217886180375 2935 123 
23.8609294 0.000464283720930325 3078 129 
23.8609294 0.000609061538462186 3102 130 
23.8609294 0.000635282352941857 3245 136 
23.8609294 0.000384468613138012 3269 137 
23.8609294 0.000789539860139854 3412 143 
23.8609294 0.000181711111110872 3436 144 
23.8609294 0.000929400000000413 3579 150 
23.8609294 2.24768211865012E-06 3603 151 
23.8609294 0.000912705263157676 3627 152 
23.8609294 0.000169906329112735 3770 158 
23.8609294 0.000705820125787682 3794 159 
23.8609294 0.000323339393940358 3937 165 
23.8609294 0.000516383132531217 3961 166 
23.8609294 0.000464283720930325 4104 172 
23.8609294 0.000342276300578703 4128 173 
23.8609294 0.000594204469273052 4271 179 
23.8609294 0.000181711111110872 4295 180 
23.8609294 0.000949053038674208 4319 181 
23.8609294 0.000714346236559749 4438 186 
23.8609294 3.31668449184974E-05 4462 187 
23.8609294 0.000772727659573036 4486 188 
23.8609294 0.00082577305699516 4605 193 
23.8609294 0.00010465773195989 4629 194 
23.8609294 0.000609061538462186 4653 195 
23.8609294 0.000929400000000413 4772 200 
23.8609294 0.000232882587063443 4796 201 
23.8609294 0.000456738613863195 4820 202 
23.8609294 0.000352476923076495 4963 208 
23.8609294 0.000314619138755745 4987 209 
23.8609294 0.000975361904760774 5011 210 
23.8609294 0.000464283720930325 5130 215 
23.8609294 0.000181711111110872 5154 216 
23.8609294 0.000821752073733251 5178 217 
23.8609294 0.000569039639639612 5297 222 
23.8609294 5.71470852008815E-05 5321 223 
23.8609294 0.000677742857142505 5345 224 
...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the suggestions and insight guys but among other things, I plan to use this as a VFO for a homebrew shortwave receiver so frequency needs to be in the range of 100KHz - 30MHz accurate to the nearest 1Hz or so.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

so frequency needs to be in the range of 100KHz - 30MHz accurate to the nearest 1Hz or so.

I'd say that's an impossible dream. The crystal that clocks your AVR is probably only good for 30ppm yet you are asking for 0.03ppm

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I also tried (1 << 32) which gave a warning that the bit shift was to wide for the long type.
Because 1 is a 16 bit int. You need (1ULl << 32).
But 180000000 is divisible by 256, so you could use:

unsigned long tuning_word = frequency * (1ULL<<24) / 703125;

What is the range and accuracy of frequency and what accuracy do you expect from the result?

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
I'd say that's an impossible dream. The crystal that clocks your AVR is probably only good for 30ppm yet you are asking for 0.03ppm

I am not sure I follow you, the crystal clocking the avr only needs to be good enough for v-usb. The dds module has it's own crystal and I do not know the accuracy of this module in practice. I guess I mean to say that I want the control frequency to be specified in Hz to the nearest 1Hz. What I actually get may be another matter.

Koshchi, thanks, I will look at your suggestion more closely and see if that helps. I hope the above answers your question regarding accuracy.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

has it's own crystal

So that's a sub 1ppm crystal is it?
Quote:

What I actually get may be another matter.

What is the point in allowing control in 1Hz steps if you cannot be sure within 10's of Hz. Let's say that 180MHz crystal is 30ppm (many are about that). So it's only accurate to +/- 30 * 180 = +/-5,400Hz. So, yeah, maybe let the user step in 5kHz steps but 1Hz steps is not realistic. Also doing the maths to more accuracy than 5Khz in 180MHz does not make much sense either.

Some of those integer ratios Lee presented looked pretty good to me. Especially 3603 / 151 = 23.860927152317880794701986754967 so you multiply 'frequency' by 3603 then divide by 151.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

So that's a sub 1ppm crystal is it?

The best info I can find on the module is available here:
http://www.rocketnumbernine.com/2011/10/25/programming-the-ad9851-dds-synthesizer
For more info you would need to refer to the AD9851 datasheet.

All I know is that when used as a vfo in a simple direct conversion receiver, I am able to hear subtle changes in ssb modulation as I tune in steps of 1 or 2Hz

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

BTW: Along the lines of crystal precision and such, you might want to do a bit of numerical analysis on the results if you >>do<< the float stuff using Mega88 or whatever: A GCC float has only a 22 bit mantissa giving 23 bits after normalization (IIRC). The point is that it is only 6-7 decimal digits so your constant is only good to a 1/10000 probably anyway.

kavrcalc says that putting 23.8609294 into a float and then bringing it back gives 23.86093.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
kavrcalc says that putting 23.8609294 into a float and then bringing it back gives 23.86093.

The example in the link I posted above actually uses double for frequency. Not sure if that makes much difference to what you are saying.

If I do opt for the bigger part I will probably stick with the pow function as this has been proven to work and produces no more code than the constant equivalent.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

The example in the link I posted above actually uses double for frequency. Not sure if that makes much difference to what you are saying.

Yeah, it does, 'cause given the forum you are posting in, in your toolchain "float" and "double" are synonymous. (They have the same storage size and thus precision.)
Quote:

All I know is that when used as a vfo in a simple direct conversion receiver, I am able to hear subtle changes in ssb modulation as I tune in steps of 1 or 2Hz

That is pretty cool. Given the discussion about absolute accuracy, one approach would be to do as you are doing to get to the "nominal" frequency, and then tweak in one-count steps of "tuning_word". That's as small a step as it looks like you can get. From that, you can display e.g. the +/- in unitless "steps" from the nominal/programmed value, and if desired back-calculate the estimated frequency.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

maximax wrote:
... I plan to use this as a VFO for a homebrew shortwave receiver so frequency needs to be in the range of 100KHz - 30MHz accurate to the nearest 1Hz or so.
Not as flexible as what you're designing but here's a portable frequency reference:
AADE Precision Frequency Reference
AADE makes frequency counters for the amateur radio market.

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What is the type of frequency?
Why?
If it's float, that could be why your code is so big.
It also does not have the accuracy you claim to desire.
24 bits is not enough for one part in 30 million.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Shortwave is in the range of 1,800–30,000 kHz.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
Shortwave is in the range of 1,800–30,000 kHz.
With a step size of 0.001 kHz, that is one part in 30 million.
avr-gcc float cannot represent 29999999.
It can represent the even numbers on either side.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

The example in the link I posted above actually uses double for frequency. Not sure if that makes much difference to what you are saying.

If you want to use doubles (and hence almost 16 digits of accuracy) on an AVR your current two choices are IAR ($3000) or the pro version of Imagecraft ($800 is it?)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You could of course roll your own 64 bit division for just this calculation.

I once wrote a set of bignum functions that operate on an array of bytes.

void mb_rol(char n,void *p)
{
 asm("      movw  r26,r18   \n"      //  x=p
     "      clc             \n"
     "lp1:  ld    r24,x     \n"
     "      rol   r24       \n"
     "      st    x+,r24    \n"
     "      dec   r16       \n"
     "      brne  lp1       \n"); 
}

//
// Shift a into b; both are n bytes long
//
void mb_rol2(char n,void *a,void *b)
{
 asm("      movw  r26,r18   \n"    //  x=p
     "      clc             \n"
     "      mov   r17,r16   \n"     // save n

     "lp1a: ld    r24,x     \n"
     "      rol   r24       \n"
     "      st    x+,r24    \n"
     "      dec   r16       \n"
     "      brne  lp1a      \n"
     "                      \n"    
     "      ldd   r26,y+0   \n"     // x=b       
     "      ldd   r27,y+1   \n"
     "lp1b: ld    r24,x     \n"
     "      rol   r24       \n"
     "      st    x+,r24    \n"
     "      dec   r17       \n"
     "      brne  lp1b      \n"
);  
}


//
// Multibyte add, b<=a+b
//
void mb_add(char n,void *a,void *b)
{
 asm("      movw   r26,r18  \n"
     "      ldd    r30,y+0  \n"
     "      ldd    r31,y+1  \n"
     "      clc             \n"
     " lp2: ld     r24,x+   \n"
     "      ld     r25,z    \n"
     "      adc    r24,r25  \n"
     "      st     z+,r24   \n"
     "      dec    r16      \n"
     "      brne   lp2      \n");
}

//
// Subtract, a=a-b
//
void mb_sub(char n,void *a,void *b)
{
 asm("      movw    r26,r18 \n"
     "      ldd     r30,y+0 \n"
     "      ldd     r31,y+1 \n"
     "      clc             \n"
     " lp6: ld      r24,x   \n"
     "      ld      r25,z+  \n"
     "      sbc     r24,r25 \n"
     "      st      x+,r24  \n"
     "      dec     r16     \n"
     "      brne    lp6     \n");
}

//
// Compare two multiwords
// Returns 1 if b>=a
//
char mb_cmp(char n,void *a,void *b)
{
 asm("      movw    r26,r18 \n"
     "      ldd     r30,y+0 \n"
     "      ldd     r31,y+1 \n"
     "      sec             \n"
     " lp7: ld      r24,x+  \n"
     "      ld      r25,z+  \n"
     "      cpc     r24,r25 \n"
     "      dec     r16     \n"
     "      brne    lp7     \n"
     "      brcc    lp8     \n"     // if no carry, skip setting r16 to 1
     "      inc     r16     \n"
     " lp8:                 \n");
 return(n);
}

//
// Add one byte to a multibyte word, a=a+b
//
void mb_add8(char n,void *a,char b)
{
asm("       movw   r26,r18  \n" // load z  with a
     "      ldd    r25,y+0  \n" // get b
     "      ld     r24,x    \n" // get  lsb 
     "      add    r24,r25  \n" // add B
     "      st     x+,r24   \n" // store  back
     "      brcc   lp5      \n" // If no carry is generated return early
     "      clr    r25      \n"
     " lp4: dec    r16      \n" // correct n
     "      breq   lp5      \n"
     " lp3: ld     r24,x    \n" // loop to propagate carry
     "      adc    r24,r25  \n"
     "      st     x+,r24   \n"
     "      brcs   lp4      \n" // Early exit if no carry   
     " lp5: ret             \n");
}

// 
// 40x16 => 56 multiply
// Watch out, a and r are pointers, but b is not!
//
void mul40(uint40 *a,unsigned int b,uint56 *r)
{
 char i;
 uint56 ac;
 
 // Extend a to 56 bits
 ac.n[0]=a->n[0];
 ac.n[1]=a->n[1];
 ac.n[2]=a->n[2];
 ac.n[3]=a->n[3];
 ac.n[4]=a->n[4];
 ac.n[5]=0;
 ac.n[6]=0;

 // Set result to zero
 memset(r,0,7);
 
 for (i=16;i>0;i--)
  {
  if (b & 1) mb_add(7,&ac,r);
  b>>=1;
  mb_rol(7,&ac);
  }
}

//
// Divide a two 56 bit numbers 
// Stores the result in dividend
void div56(uint56 *dividend,uint56 *divisor)
{
 char i;
 uint56 rem;  

 for (i=0;i<7;i++)
  rem.n[i]=0;

 for (i=0;i<56;i++)
  {
  // shift dividend into remainder
  mb_rol2(7,dividend,&rem); 

  // Can divisor be subtracted from remainder?
  if (!mb_cmp(7,&rem,divisor))
   {
   mb_sub(7,&rem,divisor);
   dividend->n[0]|=1;
   }
  }
} 

I only needed two specific operations, mult40x16 and div56/56, but it would be easy to extend to any size you like.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just remembered this thread:
https://www.avrfreaks.net/index.p...
Simply add dannis64bit.S to your project and your memory problems will disappear. A test program using your formula went from:

Quote:
Program: 4220 bytes (51.5% Full)
(.text + .data + .bootloader)

Data: 256 bytes (25.0% Full)
(.data + .bss + .noinit)

to:
Quote:
Program: 666 bytes (8.1% Full)
(.text + .data + .bootloader)

Data: 0 bytes (0.0% Full)
(.data + .bss + .noinit)

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Quote:
has it's own crystal

So that's a sub 1ppm crystal is it?
Quote:
What I actually get may be another matter.

What is the point in allowing control in 1Hz steps if you cannot be sure within 10's of Hz. Let's say that 180MHz crystal is 30ppm (many are about that). So it's only accurate to +/- 30 * 180 = +/-5,400Hz. So, yeah, maybe let the user step in 5kHz steps but 1Hz steps is not realistic. Also doing the maths to more accuracy than 5Khz in 180MHz does not make much sense either.
Your confusing accuracy with resolution. The accuracy of the crystal may be off by 30ppm, but that is not going to change moment to moment. It should remain relatively constant for the short term. The actual frequency may not be what the requested frequency is, but the step size should be controllable accurately enough (in fact, the resolution of the chip is considerably better than 1Hz).

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jayjay1974 wrote:
I once wrote a set of bignum functions that operate on an array of bytes.
Is this code suposed to work with avr-gcc?

If so, I do't understand it. For example you use Y without initializing it in any functions, don't tell the compiler which registers are messed up, use non-local labels, etc.

clawson wrote:
If you want to use doubles (and hence almost 16 digits of accuracy) on an AVR your current two choices are IAR ($3000) or the pro version of Imagecraft ($800 is it?)
Or change 1 line of avr-gcc, recompile it and use newlib :-)

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Or change 1 line of avr-gcc, recompile it and use newlib

If it's that easy why hasn't there been a release with this used?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Is this code suposed to work with avr-gcc?

No, my code is for ImageCraft ;)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
If you want to use doubles (and hence almost 16 digits of accuracy) on an AVR your current two choices are ...
A third one is: Rowley CrossWorks for AVR, Floating-point implementation

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:
Or change 1 line of avr-gcc, recompile it and use newlib
If it's that easy why hasn't there been a release with this used?
GCC is free software. Anyone who likes to supply such a distribution can do so and even supply the tools/support against a fee if she likes.

If there is no such distribution/support, then there is obviously no market for such a thing.

avrfreaks does not support Opera. Profile inactive.