Compile-time execution of arbitrary algorithms

Go To Last Post
14 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Fellow Freaks,

I'm making a framework for Tiny26 (and more small devices, in the future). The goal of this framework is hide detail and present a programming API with easy to use functions and arguments. This is aimed at people who know C.
Since this device is very short on memory, I'm thinking hard on code size.
I reached a point where the "user-friendliness" of some functions demands calculations that take up to much space, even when invoked with constant arguments.
One specific example is the timers. Let's pick the function

timer0_SetupPeriodicHz(uint32_t frequency)

which mainly sets the prescaler and initial counter value.
Because of the prescaler fixed options I need to perform a small algorithm to calculate the values, and not only a simple (C) mathematical expression (which the compiler would perfectly optimise).
In a "perfect World", if my application only invokes timer0_SetupPeriodicHz() with a constant argument, which is almost always true, I would like the "compiler" to run the algorithm and replace the call by its constant results. When *all* inputs to a function are constant, the result is also constant. Additionally, I would like to be able to issue warnings or errors, like for example if the requested frequency could not be achieved with targeted CPU clock.
This would allow me to have simple high-level calls like timer0_SetupPeriodicHz() giving arguments in Hz, and still have the smallest code size possible.
But, of course, this would force the compiler to include a C interpreter!

I thought of and tried to implement several methods using macros, but without success. I concluded that I could do it in a reasonable time only by inserting a new processing step in the build pipeline, and ended-up doing it. I made a simple parser which I named "AVR-C preprocessor", which runs between the C preprocessor and the C compiler.
This allows me to have "calls" to special macros that make complex compile-time calculations. For example, my current implementation of the function mentioned above is

#define timer0_SetupPeriodicHz(frequency) \
        { \
            TCCR0 = 0; \
            TCNT0 = _timer_cnt_init[0] = [[t0cnt_hz frequency]]; \
            TCCR0 = (TCCR0 & 0xFC) | [[t0pre_hz frequency]]; \
            TIMSK |= _BV(TOIE0); \
        }

The "AVR-C preprocessor" runs the "special macros" inside [[ and ]] and replaces the invocation by the macro's result (the macros are pre-defined functions in TCL).

I also thought about having a tool to calculate aside the values (like some of the existing tools, as AVRCALC), but putting the values in the code is more friendly and improves code readability. Also, this extra pre-processing tool is fully transparent, you don't notice it's there; all it's needed is a change in the project's makefile.

Well, I was wondering if anyone has a better idea. Any thoughts?...

Edit: fixed spelling errors

Embedded Dreams
One day, knowledge will replace money.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I don't see the purpose of #define-ing the entire function. You would get the same functionality by #define-ing just the parts that do the calculation. And it would help greatly to see the actual calculations that you want done. Since the output of those calculations is always an integer, and the preprocessor is perfectly capable of doing integer math, I doubt that your extra step is even necessary.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
I don't see the purpose of #define-ing the entire function. You would get the same functionality by #define-ing just the parts that do the calculation.

How?... I want a clean abstraction.

Koshchi wrote:
And it would help greatly to see the actual calculations that you want done. Since the output of those calculations is always an integer, and the preprocessor is perfectly capable of doing integer math, I doubt that your extra step is even necessary.

If the calcs where just a math expression, I agree.
I have only the TCL version, but I think you'll more or less understand it even if you don't know TCL. This is an example:

set gTimer0PrescalerTab  {1 8 64 256 1024}
set gTimer1PrescalerTab  {1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384}

proc timer_calc {prestab us}  {

    set max_time  [expr { 1000000 * 256 / $::Fcpu }]
    set m_time  $max_time

    # If the desired period is not achievable (too big), stop.
    if {$m_time * 1024 < $us}  {
        error "Unable to achieve $us us"
    }

    set i  1
    foreach pres $prestab  {
        if {$max_time * $pres >= $us}  break
        incr i
    }

    set cnt  [expr { ($max_time * $pres - $us) % 256 }]

    list $i $cnt
}

proc t0cnt_hz hz  {
    set us  [expr { round(1.0 / $hz) }]
    foreach {pres cnt} [timer_calc $::gTimer0PrescalerTab $us] break
    return $cnt
}

The argument prestab in the function is a prescaler list, as the ones on the top.
There are loop constructs, I don't think you can do that with the C preprocessor. Or is there a simpler way (one math expression) of calculating the values for the registers?

Embedded Dreams
One day, knowledge will replace money.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not sure it is worth the trouble to compute this during compilation (will you really be changing the timers that often?).
But anyway, you could use programs called from the makefile and just genereate some -D defines to pass to the compiler.
Or, note that you are not limited to what the preprocessor can do, there is also the optimizer. Something like this should work (example just for the prescaler bits):

#define PRESCALE8(hz) \
({ \
    const uint32_t maxt = 1000000*256/F_CPU; \
    const uint32_t us = 1000000/hz; \
    int p = 0; \
    if (maxt >= us) { \
        p = 1; \
    } else if (maxt * 8 >= us) { \
        p = 8; \
    } else if (maxt * 64 >= us) { \
        p = 64; \
    } else if (maxt * 256 >= us) { \
        p = 256; \
    } else if (maxt * 1024 >= us) { \
        p = 1024; \
    } \
    p; \
})

#define PRESCALE_BITS8(hz) \
({ \
    uint8_t p; \
    switch(PRESCALE8(hz)) { \
    case 1: \
       p = 1; \
       break; \
    case 8: \
       p = 2; \
       break; \
    case 64: \
       p = 3; \
       break; \
    case 256: \
       p = 4; \
       break; \
    case 1024: \
       p = 5; \
       break; \
    default: \
       break; \
    } \
    p; \
})
.
.
.
TCCR0 |= PRESCALE_BITS8(100);

That would work with -std=gnu-99 only (i.e., it is not standard C) however something similar can be made with static functions (that will be optimized away). Any optimization level except -O0 would work. The code above will give you a warning if the prescaling is not possible (since p is not assigned in the default case).

/Lars

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think you can rely on the constant folding optimizations to make any simple 4-banger arithmetic go away. Maybe even some common math lib fuctions -- I don't know the details of that, it would be worth an experiment. Also, if you encourage the compiler to inline, even more stuff will get exposed to the constant folder. In fact, in many applications, the #1 benefit of inlining is that it exposes more code for constant propagation, the #2 benefit is that it exposes bigger straight line blocks for code motion (not that code scheduling is a big deal on AVR's). Remember, our avr-gcc has the same optimizer as the gcc used for Itanium's, and without these optimizations, an Itanium can't get out of the starting blocks.

The tricky part is going to be getting your warnings printed. One approach would be to segregate the computations away from the cross-check. That way, the computations will optimize through constant propagation, and the warnings can be put under a #ifdef DEBUG or similar. That way, if the warnings cause the code to explode, it will only be in debug versions. Or, perhaps the warnings can be done entirely in macros, and you can use #error to halt compilation on bad cases.

Also... be aware that constant propagation is perfectly happy with floating point arithmetic... so that way you can incorporate floating point constants in the computation that produce an integer result, and all the floating point will evaporate during compile time as it gets folded away.

Write some simple test cases, try the optimization switches, and look at the resulting assembly. I think you will find that you can achieve what you want simply by letting the optimizer do it's thing.

-dave

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Lajon wrote:
Not sure it is worth the trouble to compute this during compilation (will you really be changing the timers that often?).

The issue is not that they will change often; is that the interface should be as simple as possible, and abstract the details.

Lajon wrote:
But anyway, you could use programs called from the makefile and just genereate some -D defines to pass to the compiler.

I think the parser as I implemented is "cleaner" that that, and more flexible.

Lajon wrote:
Or, note that you are not limited to what the preprocessor can do, there is also the optimizer.

Unfortunately the optimizer doesn't optimize "away" constructs like loops.

Lajon wrote:
(...)
That would work with -std=gnu-99 only (i.e., it is not standard C) however something similar can be made with static functions (that will be optimized away).

Interesting syntax. And in fact it's nice that the switches can be optimized away... I didn't remember that.

Lajon wrote:
Any optimization level except -O0 would work. The code above will give you a warning if the prescaling is not possible (since p is not assigned in the default case).

Nice trick... I'll re-evaluate my options.
Thanks for all the answers so far.

Embedded Dreams
One day, knowledge will replace money.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dbc wrote:
I think you can rely on the constant folding optimizations to make any simple 4-banger arithmetic go away. Maybe even some common math lib fuctions -- (...)

Thanks for the answer Dave, I'm going to explore better what the compiler can do with optimizations. I know that math expressions get "fully" optimized, my problem was that the code I wanted to optimize away was not composed of expressions only.
I tried one version without the loop, using a bunch of ifs, but gcc didn't optimize it away. I don't remember if I had the function inline, maybe I hadn't and that's why it didn't went away. I'll do more tests.
Cheers!

Embedded Dreams
One day, knowledge will replace money.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The optimizer should be capable of making an if..then..else with a constant condition go away. Likewise, a loop with a constant number of passes can be made to go away. If things are not going away, and you think they should, study the optimization switches, and look at your code to make sure there is nothing getting in the way of constant propagation.

A function call in the middle that is *not* inlined will, in general cause problems since strictly local analysis is no longer sufficient to guarantee a safe optimization. But, certainly, the more you can pull out of if's and put into expressions, the easier it is to see the optimizations. Using ?: might make the intent more obvious, although it shouldn't matter to the compiler.

-dave

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've been trying and gcc can actually optimize away things like:

inline static bool timer0_Setup (word us)
{
    // Calculate the maximum possible time that can be achieved with 1 full
    // timer oveflow (an 8 bit count), by using a freq divider of 1, in us.
    word  max_time = 1000000 * 256 / F_CPU;
    dword m_time = max_time;
    byte  pres;

    // If wanted_time is not achievable (too big), exit with false.
    if (m_time * 1024 < us)
    {
        return false;
    }

    pres = 1;
    while (m_time < us)
    {
        pres++;
        m_time <<= 3;
    }

    TCCR0 = 0;                  // counter off
    TCNT0 = (m_time - us) % 256;
    TCCR0 = (TCCR0 & 0xFC) | pres;
    TIMSK |= _BV(TOIE0);        // enable the interrupt

    return true;
}

if called with constant arguments.
Using -Os, if I have more than 1 call to the function (always with constant arguments), it inserts a call instead of inlining the function. But, with -Os -finline-limit=600, it inlines all calls (constant or not, in as many places as they are invoked).
The curious thing is that, according to Gcc's manual, -finline-limit=600 is the default and -Os doesn't enable nor disable any inline-related Gcc options. In fact, none of the -f*inline* options except -finline-limit makes any difference when added to -Os.
I think -finline-limit=xx turns ON inlining for functions marked "inline" that respect the limits, always, although is not my direct interpretation of the manual.

I couldn't find a way of making Gcc inline when the result is constant and call when it is not. It either always inlines or not. Anyways, I think always inlining serves my purpose, since the calls I have in mind will all have constant results.

Embedded Dreams
One day, knowledge will replace money.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nuno,

When using C, as other fellow freakers already said, you have to rely on the optimizer to remove code like you're doing or have some other preprocessor that accomplish the same work.

The other option would be use C++ and templates. Using templates allows you to do the things you're after without any other preprocessor and it also works independently of the optimization level selected. I'm using C++ exactly in this way for my own projects to allow compile time evaluations of expressions.

/Robert

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rejsbob wrote:
The other option would be use C++ and templates. Using templates allows you to do the things you're after without any other preprocessor and it also works independently of the optimization level selected. I'm using C++ exactly in this way for my own projects to allow compile time evaluations of expressions.

Yes, seems another good technic. I keep forgetting that I can use C++ on AVR too. Ahhh, templates... I'll have to go dig by books, it has been a long time!
Thanks!

Embedded Dreams
One day, knowledge will replace money.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nuno wrote:
Yes, seems another good technic. I keep forgetting that I can use C++ on AVR too. Ahhh, templates... I'll have to go dig by books, it has been a long time!
Thanks!
If you need to make loops go away,
I'm pretty sure that templates won't do what you want by themselves.
I think that you will have to do some preprocessing.
You might require that a parameter be either a constant
expressionor a simple function call without arguments.
Require that the function be defined in the
same file and not require any AVR headers.
Compile and run the function to get the value.
Replace the function call with its value in the new C/C++ code.

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
If you need to make loops go away,
I'm pretty sure that templates won't do what you want by themselves.
I think that you will have to do some preprocessing.(...)

I never used templates for anything useful, but I "eared" that you can make recursive "calls", so you can probably make loops.

Embedded Dreams
One day, knowledge will replace money.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nuno wrote:
skeeve wrote:
If you need to make loops go away,
I'm pretty sure that templates won't do what you want by themselves.
I think that you will have to do some preprocessing.(...)

I never used templates for anything useful, but I "eared" that you can make recursive "calls", so you can probably make loops.
They won't do explicit loops.
Recursion can produce the effect of a loop, but the syntax is pretty awful.
Also, the depth limit might beat you.
Another constraint is that by definition a constant expression does not involve floating point,
though one may have arbitrarily complex expressions involving the built-in operators.

Templates will guarantee compile-time evaluation of constant expressions,
but will reject others as syntax errors.
A constant expression may involve recursive template calls,
so one may get the effect of a loop, if one does not mind the syntax.
Constant-folding by the optimizer will cause compile-time evaluation of constant expressions
other invariant expressions,
even (much to my surprise) some involving loops.

If simple enough constant expressions are what one wants, templates are the way to go.
They are guaranteed to work and optimization won't get in the way of debugging.

If one needs to deal with loops, templates can still be used:

template iter class {
    enum {val=iter::val, lo+1, hi, expr>::val } ;
} ;

template class {
    enum { val=expr::val } ;
}

The user would define a template to pass to iter.

Preprocessing is likely to give your users the least amount of synactic horror.
It will give the most generality without interfering too much with debugging.

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods