Time for a quiz.

Go To Last Post
62 posts / 0 new

Pages

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

you can't do this in C and come close to ASM" "contests"?

Please define close

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In general I can match or beat any fragment from a "normal" programmer. If you are one of those gurus--I've known a handful in my life--that think in opcodes then I can't beat them regardless of the language or toolchain.

But I make specific questions on one of your points and then you come back on a tangent.

Now the latest response brought my choice of toolchain into a discussion of "a cost of a bigger program" spoken as a postulate. Let's clear that one up first. What kind of app did you port 10 years ago? How big? What toolchain was used for the comparisons? What kind of size difference did you see?

Then if you want to start a "my toolchain is better than yours" I'll be happy to participate. I did a bit of back-scouting and did not uncover your choice of perfect toolchain, other than the indirect references to "few registers available" and "register misalignment" which gives me some hints. lol

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

But I make specific questions on one of your points and then you come back on a tangent.

Thats it's you I think!
If close to ASM is 3-4 times slower I'm ok, (that is fine for real world but not for a contest if you ask me)
As I all ready sayed I don't have the code (or compiler) here, and yes the AVR compiler was IAR (at that time it was the only one).
But I spend some time looking at the instruction set and how I (read we) would write a C compiler for the AVR, we had just written a C compiler for our own CPU (We useed the open LCC compiler, the same a imagegraft is based on (or at least was) so you only have to write the backend.

Are there other things I forgot?

I yes I'm a opcode freak, and all this is just for fun!

so if you ask me for something it will take 6 hours for an answer because I have work to do IAR on a 6812 bad bad).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

But I spend some time looking at the instruction set and how I (read we) would write a C compiler for the AVR,

Aahh, pure speculation.

Quote:

If close to ASM is 3-4 times slower I'm ok, (that is fine for real world but not for a contest if you ask me)

No, in general I'd say 10% or less. The exception would be if a fragment or algorithm can take great advantage of things that just ain't in C. A short list includes algorithms that can make great use of the concept of Carry and/or Rotate; C has no "concept" of those. Another is variable widths other than 8-,16-,32-bits: if the algorithm can do 24-bit arithmetic to great advantage then standard C has no concept of that.

But in general AVR microcontroller applications I'd see no reason why the modern compilers in the hands of a competent, experienced programmer should be more than a few percent larger/slower. Note that in some of the "beat this" contests, multiple C programmers with different toolchains actually beat the ASM "benchmark".

Before you lambaste me further, you might want to explore a few prior "C vs. ASM" and "Compiler Wars" battles that I have participated in. Read the whole threads to get the back-and-forth.

https://www.avrfreaks.net/index.p...

The thrust of the "contest":

Quote:
Hi just to add a number to the discussion. “ASM” 20clk vs “C” 92 clk for “same” code.
I have programmed the same function, a interrupt driven PWM code for 4 standards pins on a Atmega16 in both assembler and c. To my horror the C code that did not use any for loops only get data and putt it on the pins, did take 92 clk on optimal settings for speed on codevisionAVR. But with assembler it did take 20 clk.

Further discussion reveals that no way could it be a 20-clock fragment (the real code conveniently cannot be found), and the C versions are quite competitive thank you.

https://www.avrfreaks.net/index.p...

Aahh, the seminal thread on "Beat this, C breath!". My C versions are almost exactly the same size, and 8 cycles shorter than the "benchmark" ASM presented for the contest by a competent ASM programmer. See my other posted ISR in the thread, and >>you<< make it shorter in ASM. ;)

https://www.avrfreaks.net/index.p...

...and one on Compiler Wars. CV did nicely in this "contest" for size. There was another "contest" for size that I could dig up on how big a program is needed to do "Hello, World!" where the original speculation was a '2313 wasn't big enough. Plenty big with CV on the first pass, though the others came close after some gnashing of teeth. [Somewhere are another couple threads on speed--FP library IIRC.]

Review those, especially the middle one. Come back with a "Beat this, C breath!" contest and I'll be happy to participate.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Aahh, the seminal thread on "Beat this, C breath!". My C versions are almost exactly the same size, and 8 cycles shorter than the "benchmark" ASM presented for the contest by a competent ASM programmer. See my other posted ISR in the thread, and >>you<< make it shorter in ASM.

Did you have to bring that up... :lol:

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

From first link I guess you mean this code.

/*---------------------------------------------------- 
  Timer 1 output compare A interrupt service routine*/ 
interrupt [TIM1_COMPA] void timer1_compa_isr(void) 
{ 
   if(phase ==1){  //check what phase the program is in 
      if (motorpointer==8){ //check witch motor output that is next 
         phase=0; //set phase to next phase 
      }else{ 
         motorpointer ++;   // point on the next output 
         motorport=out[motorpointer];  //set the pins 
         OCR1A= motordelays[motorpointer]; 
         //Sett the timer to next change on the pins. 
      } 

   }else if(phase ==0){   //step upp the phase if phase ==0 
      phase ++;   //step up    
   } 
} 
//------------------------------------------------------ 

First I don't think that the code is doing what it was meant to do.
If we want to do this fast in ASM it will be somthing like this
3 byte low reg dedicated for this interrupt R2,R3,R4
phase is a boolean (but take up a byte) R5
motorpointer is a byte R16
motorport is byte in IO (??? guess)
out a array of 256 byte in flash
motorlelays a array of 256 byte in flash

1   movw r2,r30 ; save Z
1   in   r3,SFR
1   sbrs r5,0 ; check ph
l/2 rjmp L000 ; jmp if ph <> 1
1   cpi  r16,8 ; check motorpointer
1/2 breq L001
1   inc  r16 ; motorpointer++
1   mov ZL,r16
1   ldi ZH, (page of out array)
3   lpm ZH
1   out (motor port???)
1   ldi ZH, (page of motor delay)
3   lpm ZH
1   out OCR1A,ZH
L002:
1   movw r30,2r
1   out SFR,r3
4   iret
L000:
    inc r4 ; set phase
    rjmp L002
L001:
    eor r5,r5 ; clear phase
    rjmp L002
 

So that is 24 clk with iret.(27 with the hidden irq. call) that is close to 20!
move phase to low reg add 1 clk.
move iret to the end add 1 clk.

I will take a look at the rest later

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

About the PWM are there a trap ?
The way I see it this will do

;tick in R16
;the ADC's in R7..R14
cp  r16,r7
rol r17
cp  r16,r8
rol r17
cp  r16,r9
rol r17
cp  r16,r10
rol r17
cp  r16,r11
rol r17
cp  r16,r12
rol r17
cp  r16,r13
rol r17
cp  r16,r14
rol r17
out ,r17

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You aren't getting the point. Yes, you can always micro-manage. One of my profs had a saying: "Any program can be made one cell shorter." If you are the guru you can always shave off a word or a cycle, and apply recursively.

But the average competent ASM programmer is going to create average, competent programs. The decent C compiler in the hands of a competent operator can get to the same point just as fast (cycles) by a route no longer (words). Right, Lennart?

I gave them as examples of prior discussions. You can pick at them all you like. Note: if you are going to speed things up by

Quote:

If we want to do this fast in ASM it will be somthing like this
3 byte low reg dedicated for this interrupt R2,R3,R4
phase is a boolean (but take up a byte) R5
motorpointer is a byte R16 ...

I can do exactly the same thing in my C program.

Here you go off on the tangent, again, picking at a prior thread. I'm still waiting for things from you:

A) Defend this statement of yours, with something other than "common knowledge". I dug up several benchmark studies that showed the opposite. If it is such common knowledge and apparent, surely there must be some numbers out there?

Quote:
On a AVR ..., but for a cost of a bigger program (counted in byte)

B) If you want to play "Beat this, C breath!", then bring it on. Post an ISR or an algorithm routine, self-contained. No, I ain't gonna take the time to recode a whole app. Besides the "contest", I like to see how my skills and toolchain stack up. There are certain ones that I >>know<< I can't beat with my compiler due to the code generation model, but GCC can.

C)

Quote:
I will take a look at the rest later

If you indeed want to micro-poke, then go to the second link and
Quote:
See my other posted ISR in the thread, and >>you<< make it shorter in ASM.
This was from a real app, the most time-critical I had ever written on an AVR. That ISR was indeed cycle-counted and massaged, and makes heavy use of global register variables. When I was done I could find no fat; it was as skinny as I could make it and resulted in exceeding the design specs. [I indeed have another version that used a dedicated register for SRAM save/restore so I can post that if needed--saves 4 cycles IIRC. The posted version was "fast enough" to meet my specs.]

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Re c) OK, micro-poke away. Below is the version with the SREG save, and my notes say 6 cycles less. The debugging pin toggle could be removed for two cycles. Comments deleted in the generated code for brevity. C version below.

                 ;    5905 #pragma savereg-
                 ;    5906 interrupt [TIM1_CAPT] void timer1_capt_isr(void)
                 ;    5907 {
                 _timer1_capt_isr:
                 ;    5908 	REG_SREG = SREG;
000b66 b6ef      	IN   R14,63
                 ;    5909 // Debugging
                 ;    5910 TOGGLE_SPARE();
000b67 9a32      	SBI  0x6,2
                 ;    5916 	REG_PARTIAL = ICR1;
000b68 9060 0086
000b6a 9070 0087 	__GETWRMN 6,7,0,134
                 ;    5919 	REG_DIFFERENCE = REG_PARTIAL;	// force use of no other working registers
000b6c 0153      	MOVW R10,R6
                 ;    5920 	REG_DIFFERENCE -= REG_PREVIOUS;
000b6d 18a8
000b6e 08b9      	__SUBWRR 10,11,8,9
                 ;    5923 	REG_PREVIOUS = REG_PARTIAL;
000b6f 0143      	MOVW R8,R6
                 ;    5928 	REG_DIFFERENCE -= REG_TARGET;
000b70 18ac
000b71 08bd      	__SUBWRR 10,11,12,13
                 ;    5931 	REG_PARTIAL = REG_LEEWAY;
000b72 2c63      	MOV  R6,R3
000b73 2477      	CLR  R7
                 ;    5933 	if (REG_DIFFERENCE > REG_PARTIAL)
000b74 146a
000b75 047b      	__CPWRR 6,7,10,11
000b76 f410      	BRSH _0x1A0
                 ;    5937 		{
                 ;    5938 		// Not in window
                 ;    5939 		REG_COUNT = 0;
000b77 2444      	CLR  R4
                 ;    5940 		}
                 ;    5941 	else if (++REG_COUNT >= REG_NEEDED)
000b78 c006      	RJMP _0x1A1
                 _0x1A0:
000b79 9443      	INC  R4
000b7a 1445      	CP   R4,R5
000b7b f018      	BRLO _0x1A2
                 ;    5954 		flg_pulse_done = 1;
000b7c 9af0      	SBI  0x1E,0
                 ;    5963 		TIMSK1 = (unsigned char) (REG_PARTIAL>>8);
000b7d 9270 006f 	STS  111,R7
                 ;    5971 		}
                 ;    5972 
                 ;    5973 	SREG = REG_SREG;
                 _0x1A2:
                 _0x1A1:
000b7f beef      	OUT  0x3F,R14
                 ;    5974 }
000b80 9518      	RETI
                 ;    5975 #pragma savereg+
#pragma savereg-
interrupt [TIM1_CAPT] void timer1_capt_isr(void)
{
	REG_SREG = SREG;
// Debugging
TOGGLE_SPARE();

// Need extra 16-bit register to hold ICR1 value.  Re-use REG_PARTIAL since it
//	is not needed in this phase till a good echo is detected.

//	Get the new value
	REG_PARTIAL = ICR1;

// Calculate the difference from previous in 50ns counts
	REG_DIFFERENCE = REG_PARTIAL;	// force use of no other working registers
	REG_DIFFERENCE -= REG_PREVIOUS;

// Save for next time
	REG_PREVIOUS = REG_PARTIAL;

// If less than min target, or greater than max target, no good edges found
//
//	Do a single 16-bit comparison.  If too short, will be a negative number and very large
	REG_DIFFERENCE -= REG_TARGET;

//	Extend REG_LEEWAY so no other working registers
	REG_PARTIAL = REG_LEEWAY;

	if (REG_DIFFERENCE > REG_PARTIAL)
		{
		// Not in window
		REG_COUNT = 0;
		}
	else if (++REG_COUNT >= REG_NEEDED)
		{
		// A "hit" within the window--save
		// (note: WHOLE may have just rolled over)

// REG_WHOLE has the number of 3.2768ms periods
// REG_PREVIOUS has the partial; copy to REG_PARTIAL in mainline
// REG_COUNT has the number of good hits
// REG_TARGET + (REG_LEEWAY/2) is the (average) width of each echo pulse
//
//	so distance = REG_WHOLE:REG_PARTIAL - REG_COUNT * (REG_TARGET + (REG_LEEWAY/2))
//
		// mark as "done"
		flg_pulse_done = 1;

		// Cleanup:
		//	-- Disable this interrupt and the rollover OCIE1A
		//	-- Clear any bits in the TIFR1 for timer1 pending interrupts.

		// In order to avoid having the compiler use a temporary, use the high byte of
		//	REG_PARTIAL which was cleared above as a zero register for TIMSK1,
		//	and postpone the TIFR1 step till back in the mainline.
		TIMSK1 = (unsigned char) (REG_PARTIAL>>8);
		}

	SREG = REG_SREG;
}
#pragma savereg+

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think sparrow2 is actually an alias of The Usch. Since nobody else will play the C versus assembler game with Lee anymore, he's had to become his own adversary. It's all that keeps him alive...

Quote:
One of my profs had a saying: "Any program can be made one cell shorter." If you are the guru you can always shave off a word or a cycle, and apply recursively.

I used to know a guy, who claimed that any amount of data could be compressed into 40 bytes. Never made any sense to me - why couldn't you combine two 40 byte packets in 80 bytes, and then compress them down to 40? He tried to explain this to me, but his English was bad, and my Greek almost non-existent, so I never found out. My theory is that he was crazy. Just out of idle curiosity, what nationality was your professor, Lee?

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Since nobody else will play the C versus assembler game with Lee anymore, he's had to become his own adversary. It's all that keeps him alive...

lol. Perhaps I resemble that remark.

I'll be among the first to dive into a chunk of ASM when it suits the need. But I also will nearly always point out when the Emperor has no clothes. After listing specific points so that the bluster factor cannot overwhelm things, things have gotten quite quiet from sparrow2.

From above, I'm still waiting for:
A) evidence--ANY evidence--that x51 code is significantly more compact in a real app.
B) An ISR or fragment that cannot be done efficiently in C (given the rotate, carry, and data size restrictions that I mentioned).
C) Micro-poke for cycles at my posted ISR, completely in C.

As I expected, things got quiet. And now, after micro-poking for cycles throughout this thread he is advocating printf() and escape string decoding for LCD positioning...

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Pages