forcing code to be inlined Previous  1, 2 All
Author Message
 SprinterSB
 Posted: Aug 02, 2012 - 09:25 PM
 Joined: Dec 21, 2006 Posts: 1483 Location: Saar-Lor-Lux

 skeeve
 Posted: Aug 03, 2012 - 12:04 AM
 Joined: Oct 29, 2006 Posts: 2648
 SprinterSB wrote: Writing the two factors R and S in base B = 2^16, we haveP = r·s = (r0 + r1·B) · (s0 + s1·B)Using the distributive law this results in 4 widening multiplications and some additions:P = r0·s0 + r0·s1·B + r1·s0·B + r1·s1·B²Following an idea of Karatsuba, we can write this asb = (r0 + r1) · (s0 + s1) a = r0 · s0 c = r1 · s1 P = a + (b-a-c)·B + c·B²leaving us with only 3 multiplications (multiplications with B effectively cost nothing). Thus, we now have 3 MUL + 6 ADD instead of 4 MUL + 3 ADD. A problem with this is that the first two sums can overflow 16 bits. The matter can be dealt with. I expect that whether the Karatsuba formula reduces the cycle count depends on the details of how it is dealt with. _________________Michael Hennebry Iluvatar is the better part of Valar.

 nick.parker
 Posted: Aug 03, 2012 - 12:17 AM
 Joined: Jul 05, 2002 Posts: 61 Location: Perth, Australia
 SprinterSB, thanks for your excellent input.

 joeymorin
 Posted: Aug 03, 2012 - 01:14 AM
 Joined: Jul 17, 2012 Posts: 428

 nick.parker
 Posted: Aug 03, 2012 - 01:38 AM
 Joined: Jul 05, 2002 Posts: 61 Location: Perth, Australia
 gregd99 wrote: out of interest.... what sort of application are you working on that requires the space and time efficient multiplication. My hobby, engine management. I wanted faster division and to give a new lease of life to my software by clawing back CPU time, and speeding up some calculation rates. There is lots of table look-up type code (various index calcs require division), and I like to use 'real' physics too. I traded off 16Kb of flash (4096 x 4B) for speed. The table of reciprocals is scaled by 2^32 to implement division quickly in my application. I've tried other 'pseudo floating' point ideas with a table of reciprocals from say 1/2 to 1, but listing below is faster for my app where i typically divide by less than 4096. MulU4U4SH32RND is the 32x32 mutliply with an add of 0x00008000 and then shift (ie rounding). Ideally I should incorporate the ">>s" into the mult and also round this too. Cheers, Nick This code is way faster (2-3x) than GCCs 'proper division'. I save 100's of cycles per division, and there are many division per 100Hz control loop in my code. Code: uint32_t divU4U2(uint32_t n, uint16_t d) {   uint8_t s=0;   uint32_t temp;   if (d==0) { return MAX_UINT4; }   if (d==1) { return n; }   // 'normalise' denominator   while (d > 4096) { d >>= 1; s++; }   temp = pgm_read_dword(&reciprocal[d]);  // 32-bit reciprocal   temp = MulU4U4SH32RND(n, temp) >> s;   return(temp); } if you sacrifice some more precision its faster still to use mulu3u3 and top 24 bits from the table. A 16-bit reciprocal table is too imprecise to be useful for 'larger' denominators.

 gregd99
 Posted: Aug 03, 2012 - 04:01 AM
 Joined: Oct 10, 2011 Posts: 238 Location: Sydney, Australia
 Thanks for the info Nick. That sounds interesting. My hobby is model railway and I use atmegas in that. the consequence of me getting something wrong is probably less spectacular than in engine management you have probably long since taken this action but.... is there an option for a faster processor in the same footprint that can help with time constraints? _________________regards Greg

 nick.parker
 Posted: Aug 03, 2012 - 04:14 AM
 Joined: Jul 05, 2002 Posts: 61 Location: Perth, Australia
 No problems Greg. I'm still on my first engine If an AVR cant do it, what are you doing? I'd probably go Cortex M4. The Cortex M4 kind of invalidates all the stuff discussed in this thread. No fun problems to solve, anymore now you're dealing with DMA issues and more complex peripherals. The M4 is more capable than the Texas DSPs I've used at work - a sign of the times. For the STMicro STM32F4: -168MHz clock -Native floating point -32 x 32 = 64 in one cycle -32-bit Division in 2-12 cycles (wow) AVR's are efficient and highly capable in the right hands. And frankly I enjoy trying to squeeze more out of the devices and learning GCC/AvrStudio etc. Nick.

 Display posts from previous:  All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 Jump to: Select a forum Forum index|--[AVR (8-bit) Technical Forums]|   |-- AVR forum|   |-- XMEGA forum|   |-- AVR Wireless forum|   |-- AVR GCC forum|   |-- AVR Studio 5 and Atmel Studio 6 forum|   |-- AVR studio 4 forum|   |-- AVRfreaks Academy forum|   |-- AVR Tutorials|--[AVR Software Framework]|   |-- AVR Software Framework|--[AVR32 (32-bit) Technical Forums]|   |-- AVR32 Linux Forum|   |-- AVR32 General (standalone)|   |-- AVR32 Software Tools|   |-- AVR32 Hardware|--[General Electronics Technical Forums]|   |-- General Electronics|   |-- Atmel Security Products|--[Non-technical forums]|   |-- AVRfreaks.net Housekeeping|--[Non-topical forums]|   |-- Off-topic forum|   |-- AVRfreaks Trading Post
All times are GMT + 1 Hour