128 and 64 bit arithemtic in 'mega16

Go To Last Post
19 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

I need to perform 64 and 128 bit operations with numbers made up of a sequence of bytes. I need to perform add and multiply on signed and unsigned numbers. Are add and multiply available in AVR-GCC for this type of numbers?

What I have done is to implement a MAC (multiply-accumulate) in an FPGA that is connected to my 'mega16 by means of SPI. (The SPI works, so it's only the GCC math libs I'm uncertain about.) The FPGA code is in development, and I really need to verify it. And generic tools aren't well suited for 128-bit numbers.

The FPGA MAC is designed to multiply signed 64-bit numbers and add (accumulate) those in a signed 128-bit register. I'd like my 'mega16 to generate random or sequenced input to the MAC, send them as bytes to the FPGA, perform a 64x64->128 multiply and a 128+128 addition, and then compare its own result to the one from the FPGA.

The MAC does
X = X + A * B
where X is signed 128-bit, and A and B are signed 64-bit. However, I'd like to access the individual bytes of X, A and B in order to send them back and forth over SPI.

This doesn't have to be very speedy, but it needs to be 100% correct about the numbers. I'd basically set the MCU up to throw numbers at the FPGA all night, and halt the process if it detects a flaw.

The MCU is connected to a PC over an UART, so I could do the verification on the PC side, but I'd prefer to do the whole thing in the MCU.

Cheers,
Borge

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well GCC will support up to 64 bit with "long long". Beyond that you'll have to handle it yourself. Even for 64 bit * 64 bit there are going to be 16 of the 32 AVR registers involved.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Okay, so is there a neat way of splitting (u)int64_t and (u)int_32 into individual bytes? I guess multiplying signed and unsigned 32-bit numbers into two 64-bit numbers could do the trick.

Cheers,
Borge

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Okay, so is there a neat way of splitting (u)int64_t and (u)int_32 into individual bytes?

Union or just mask and shift

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks, I'll play around with this tomorrow. I've done all of this in Verilog. Could you please point me towards the syntax for union in C?

Borge

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Could you please point me towards the syntax for union in C?

Any C textbook?
Any (many, most) of the online references in the sticky post at the top of the main Forum?
http://www.open-std.org/jtc1/sc2...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Even for 64 bit * 64 bit there are going to be 16 of the 32 AVR registers involved.

Why would it be required to have all bytes in registers? You can simply loop through the bytes, loading one byte from number 1 from RAM, one byte of number 2, add them, then store back the result. This way you can have as many bits as you like.

Arbitrary precision arithmetic has been discussed fairly recently.

I'm just wondering what quantities need 128 bits?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi JJ,

that was my first thought, too, except I don't know how I would do carry progression in that scheme.

Or I guess I could multiply int8_t by int8_t and cast the result into int16_t. I've been messing around with bit widths a lot in Verilog, but C syntax and casting are this domain is new to me.

The C code is designed to verify a MAC (multiply/accumulate) unit in an attached FPGA. The MAC is for a high-order IIR filter that needs hideous resolution in order to avoid harmonic noise and instability.

In the final application the MAC will run at 50-ish MHz in the FPGA, but the execution speed of the MCU during the MAC core verification is not important.

Cheers,
Borge

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I solved the carry propagation problem by simply writing the routines in assembly :)

I made a few basic assembler routines to add/sub/comp/shift arbitrary length numbers. Multiplication and division were done in C functions. I used a typedef to hide the string of bytes.

Can't you verify the FPGA with a testbench?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm tempted to use asm myself. But for now I've played around in GCC (3.4.4 on Cygwin). I have UART between PC and MCU that I have used for manual verification already. But I'd prefere to have the code in the MCU and not involve another platform.

It looks like the upper 32 bits of the 64 bit result are 0'ed. That also seems to be the case with less trivial numbers.

int32_t a = -1; // same result with 0xFFFFFFFF;
int32_t b = 0x00000001;
int64_t mult_out_64;

mult_out_64 = (int64_t)(a*b);
printf("0x%08X x 0x%08X = 0x%016X\n", a, b, mult_out_64);

// Prints: 0xFFFFFFFF x 0x00000001 = 0x00000000FFFFFFFF

I have thought about a testbench, but the numbers involved are so large. There are quite a few carry transitions in the pipelined FPGA stuff. So my plan was basically to have the MCU throw random numbers at it all night and read the log in the morning.

Borge

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

yo have to cast either a or b to an uint64_t.

mult_out=(int64_t)a*b;

Your code casts a uint32 into a unit64, but then the upper bits already have been lost.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks, but I don't think this quite did it. This time around I tried with larger numbers. The cast did improve things, though. Now I suspect the error is with printf.

int32_t a = 0x79FFE321;
int32_t b = 0x6988FAFF;
int64_t mult_out_64;

mult_out_64 = (int64_t)a * b;
	
printf("0x%08X x 0x%08X = 0x%016X\n", a, b, mult_out_64);
printf("0x%08X x 0x%08X = 0x%016X\n", a, b, mult_out_64>>32);

Prints:
0x79FFE321 x 0x6988FAFF = 0x000000009C3977DF
0x79FFE321 x 0x6988FAFF = 0x00000000324B3BB6

The shifted MSBs on the second line look right, I was able to do a 20x20 bit multiplication in excel.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

0x%016X

I don't know the conventions of your printf(). In CV you'd need an lX just for the 32-bitters.

http://www.open-std.org/jtc1/sc2...
That's what the C standard says, too, at 7.19.6.1-7 of the above link, and llX for long-long:

Quote:
ll (ell-ell) Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
long long int or unsigned long long int argument; or that a
following n conversion specifier applies to a pointer to a long long int
argument.

(Hmmmm--must it be "lX" or is "LX" OK? )

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

llx did the trick. Thanks guys!

Next I'll try to convert it from 32x32->64 to 64x64->128. But int128_t is now known, so I guess I'll play around with carry propagation for the 64-bit numbers.

Or could you please point me to the arbitrary-length arithmetic pages you mentioned?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Or could you please point me to the arbitrary-length arithmetic pages you mentioned?

"arbitrary" is the keyword in that - try a thread search for it here (maybe adding "arithmetic" or similar?) and you should find what you seek. The usual requirement for it is in cryptography so adding that word to the search may also help.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Actually, I did find the GMP and MPFR C libraries and my head is spinning with the stuff I can build with those!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Another thought I had was that rather than trying to do the maths on the AVR - just use it as a conduit between the FPGA and a PC program that actually calculates the test vectors. It'll be far easier doing "wide" maths on a PC.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

just some input
as jayjay1974 wrote why not just loop (the AVR has a 8 HW mul) so if you can write veriloc you should be able to do that in AVR ASM ;)
remember when you start using bigger numberes than the C compiler know you have to deal with sign extend your self.
For me it's a bit odd to make 64x64 and just add that to 128 in a MAC I would add 8 bit to the adder so you avoid overflow ;)

Jens

Edit and I should add that you can make the 8 bit mul as char mul on a struc in C , but again remember that you need to deal with sign extend (I assume that you use 2 complement)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Jens,

you're absolutely right. But sooner or later I'll have to generate the coefficients for the MAC, so I might as well read up on a math library now. (Octave doesn't have the precision.) I've done clean asm in avr and clean C. No experience mixing the two.

The largest output of the multiply results from -2^63 multiplied by itself, resulting in a 0 sign bit and 127 bits for 2^126. So just there you have almost x2 headroom to the largest positive number held by 128 bit signed (2^127-1). And besides I will allow for 4 to 6 bits of overflow in the way I design my coefficients. But thanks for your consern.

Borge