62 posts / 0 new

## Pages

Author
Message

Quiz time.
I don’t know if it fits here but I will start it here.

I think we are a good mix of new and old people in this forum, so I want to start a little quiz, so the old probably can say yes I remember, and new can learn, or we all can learn.

My little start is simple:

How to swap two registers without changing any other registers?

I want to solve is this

;swap r16,r8

MOV r30,r16
MOV r16,r8
MOV r8,r30

But without changing r30.
It’s not legal to use in and out !
The code will work on any AVR so no push and pop

The solution is 3 instructions in 3 clk.

Hint the Zero flag can change value.

I don’t want any one give the solution before 48 hours after this.

Have fun

Jens

Isn't this the classic xor swap?

This one comes up about once every month or two so a thread search should find all the previous traffic. As tim says it involves XOR and 0xaa/0x55's (there are other ways to do it but that's the one I prefer.

Pushing the two registers then popping them in the other order would be another way.

Cliff

To Cliff
a push pop version would take 5 clk and need a stack!
How can it involve 0xaa/0x55 there is no EORI (XORI to keep your notation) instruction on a AVR?

Edit and work with low reg.

sparrow2 wrote:
How to swap two registers without changing any other registers?

If I remember right, this question was already posted several years ago (and the answer).
The solution use the same instruction 3 times.

Peter

My guess is that there are some restrictions that haven't been stated, like at least one of the registers has to be r16 - r31. I imagine the first two instructions are EOR A,B and EOR B,A. The third is left as an exercise to the student.

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

Ok Peter is correct and sorry if it has been showed before, I made a search before I made the post without a result. But I total beleive Peter.
But at least it look like at least Chuck and Cliff didn't know (Or cliff diden't read the question).
As Peter say it's this:

;swap r6 and r7

EOR R6,R7
EOR R7,R6
EOR R6,R7

From Cliffs negative response I get the hint and will not do this again.

Jens

The solution has a Wikipedia article dedicated to it.

Quote:

Or cliff diden't read the question

Guilty as charged :oops:

Cliff

Quote:
But at least it look like at least Chuck and Cliff didn't know
No award for stumping Chuck. It happens all the time.

Nice technique. I had not seen it before.

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

Jens,
New to the forum.
I learned something.
So don't sweat the forum crmudgeons.

A

Why would you want to swap them in the first place?
Whatever you're trying to achieve, sparrow2 (if that's really your name) I'll bet there's a much easier way to do it.
I can't remember the last time I needed to do anything like this, and I've got a really bad memory, but it was probably back in the sizties, when byte-swapping parties were all the rage.
If programmers needed to do this sort of thing on a regular basis, then Brian and Guy would have built it into the C language.

Four legs good, two legs bad, three legs stable.

Quote:
Why would you want to swap them in the first place?

???

And let's get rid of all those silly load and store instructions that nobody ever uses. :)

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

Quote:

Why would you want to swap them in the first place?

E.G. when passing values to a function or subroutine.

Nice trick. :D I shall keep it in mind if I run out of registers. :)

If you think education is expensive, try ignorance.

Are you people on drugs or something? I will NEVER use this "trick". I will ensure that I put the values in the correct registers in the first place. If I wanted to be doing tricks I would have become a magician or a circus pony. Youth of today, swapping bytes, EORing things with each other all the time... Where's it going to end, I ask you?

Four legs good, two legs bad, three legs stable.

Spoken like a true curmudgeon. :D

If you think education is expensive, try ignorance.

Quote:
Spoken like a true curmudgeon.

And a Charter Member of the Turing Restoration Society (our motto: Achievement Through Perseverance)

Question: can you teach a circus pony to be a Turing Machine?

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

el oh el, flaming @ avrfreaks :D

I laughed loudly at this one ^^:

Quote:

???

And let's get rid of all those silly load and store instructions that nobody ever uses. Smile

Quote:
Where's it going to end, I ask you?
Probably not soon enough.

## Attachment(s):

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

Quote:

Three EORs is a trick

Is that result then a "one-trick pony"?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

And who will step forward and make the "ponyprog" joke?

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

lfmorrison

Quote:

The solution has a Wikipedia article dedicated to it.

Thanks for the info. And from that article I learnd that care should be taken if you try to make this as a macro, because if you make swapbyte r6,r6 the result will be wrong.
But because I only use it handwritten code I would catch that one.

Why ever use this
First on AVR's without RAM you often run out of registors.
When you write ASM code for a C function you often have a limited numbers of reg. to work with.
The difference from using 3 mov instructions is that this change the flags. Sometimes you can save a instruction after the move (Ex BRNE work), sometimes you want to save the flags then use mov.

Jens

Quote:

First on AVR's without RAM you often run out of registors.

First, most of us have run out of AVRs without SRAM. At least in this millenium. The Tiny11 is scarce or non-existent. The Tiny12 costs more than a Tiny25.

Tiny28, with its boutique features? Hmmm, maybe.

Quote:

When you write ASM code for a C function you often have a limited numbers of reg. to work with.

Not with my compiler.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Quote:

Not with my compiler.

There can be two resons for that:
1) You or your compiler don't use registor variabels.

2) Your compiler spend a lot of time with PUSH and POP

The reson I was thinking about this was the Uint16 to BCD rutine, where I ended up with some misaligned reg. because HW mul and movw only work with even reg. set(I didn't end up using it)
I ended up using 70clk in worst case, (about 100clk if used in a C function depending of compiler) where the best C code is about 300clk (and having problems with some high numbers)

Jens

Quote:

There can be two resons for that:
1) You or your compiler don't use registor variabels.

2) Your compiler spend a lot of time with PUSH and POP

3) None of the above.

I know the code generation model of my compiler. When I drop into #asm, I know what registers are available to use without repercussions. If I need a lot, >>I<< know what global and local register variables I am using chances are that several of these are indeed used by me for scratchpads. In short, I have very few of the AVR GP registers unavailable in CodeVision when in #asm. And no, I don't have a lot of pushes and pops.

Yes, the first time one comes across the swap thingy it is a curiousity. But it has been around for like forever and rarely if ever gets used in modern processors. If it were that valuable the compilers themselves would generate the sequence and I havn't seen it yet. So don't obsess over it. And if you keep coming back with supposed reasons I'll keep refuting them.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

You run out of work registers? If you come from the PIC world anything more than one work reg is like heaven ^^
I kinda like this trick though.

It seems to be rather hard to come up with genuinely useful tricks that nobody knew before; at least in this community.

By the way, this trick can be implemented using addition and subtraction, too.

Quote:
And who will step forward and make the "ponyprog" joke?

There once was a pony called Prog
who actually looked like a dog
He never was claimed to be good
because he was misunderstood
"Why wasn't I born as a frog"..

Quote:
Not with my compiler.

So your compiler creates new registers when it needs them? What a great feature!!

Quote:
Yes, the first time one comes across the swap thingy it is a curiousity. But it has been around for like forever and rarely if ever gets used in modern processors. If it were that valuable the compilers themselves would generate the sequence and I havn't seen it yet. So don't obsess over it. And if you keep coming back with supposed reasons I'll keep refuting them.

I have seen it used in some C libraries, though there is really no necessity for it. But on a micro with limited registers it can be invaluable. Also, I wonder if some implementations of FORTH use it since SWAP is a basic action.

Quote:
Three EORs is a trick

And two wrongs don't make a right, but three do.

Regards,
Steve A.

The Board helps those that help themselves.

Easy, now, Steve--

Quote:
But on a micro with limited registers it can be invaluable.

So what does this have to do with an AVR "quiz"?

Quote:

So your compiler creates new registers when it needs them? What a great feature!!

Either you are taking things out of context, or extrapolating WAAYYYY out there.

I will concede that very few micros have an infinite number of registers--one could say that micros that can hit cache or internal RAM in a single-cycle have virtually infinite registers.

But I certainly did not imply that the compiler creates registers, did I? that was a response to a particular claim:

Quote:

Quote:

When you write ASM code for a C function you often have a limited numbers of reg. to work with.

Not with my compiler.

and the implication is that "limited" means "few of those available". You took my response to be "unlimited"> Is THAT the way you would interpret sparrow's claim? I'd take it to mean the small percentage of AVR GP registers that are readily available when dropping into assembler in a code-generation model such as GCC's. And I continued refuting the PUSH-POP and lack of global register variables.

The fact remains that with CodeVision's model that when you drop into #asm there are very few registers--less than 10% of the GP registers--that you as the C programmer do not have total awareness of the contents and could freely manipulate without PUSH/POP worries. Are any of the other mainstream compilers as forgiving?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Quote:
Also, I wonder if some implementations of FORTH use it since SWAP is a basic action.

In a past life when Z80s ruled and IBM had just come out with the PC using the upstart 8088, I took FigFORTH written for the 8080 and ran it through a translator program that Intel had and created an x86 source. AFAICR at that time there was no x86 version of FigFORTH.

After filing off a couple of rough edges, the translated version worked. But was really ugly to those used to writing x86 for industrial control applications in ASM86 and PLM86. The dispatcher in the main loop took like 50 instructions to do a no-operation.

Examining and tweaking and recoding for full 16-bit operation I got the dispatch down to a very few instructions and also skinnied the primitives down a lot. Getting back to the topic, this was done by keeping TOP in AX. I actually had three versions that had similar performance: Just top-of-stack in a register; top two in registers; and top plus a "shadow" copy of the second which lived on the stack.

The "best" depended on the instruction mix, as a certain sequence that did a lot of stack manipulation caused more washing on-and-off the stack when in registers than when a fully SRAM stack where the pointer only needed adjustment. But in any of those models SWAP was fairly painless as I always had a spare "working" register anyway.

For a FORTH it was a screamer and I could beat the simple benchmarks of the commercial x86 offerings of the time.

x86 seemed like a lot of registers after the Z80 and 6809. ;) But the 68000 had 2x (?) the registers and the battle began...

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Quote:

By the way, this trick can be implemented using addition and subtraction, too.

in 3 inst ?
I can only get that A,B => -B,A

Quote:

x86 seemed like a lot of registers after the Z80 and 6809. But the 68000 had 2x (?) the registers and the battle began...

I came from 6502,Z80 to x86 68000 about the same time.
And yes x86 is very nice if you stay in a com file (64k) but it was messy using 20 bit.
I love the clean structure of the 68000.
PLM 86 I remember that i could take 15 min to compile and link. at that time I was all ready used to turbo pascal first cp/m and later x86.

Jens

On the 8051 the XCH instruction was useful to read and clear a counter without losing a count:

```CLR A
XCH A, TL0
MOV R7, A
CLR A
XCH A, TH0
MOV R6, A
```

This can not be done on the AVR.

But naturally you can solve this task on the AVR also.
Simple store the previous reading and subtract it from the current reading.

Peter

Quote:

x86 seemed like a lot of registers after the Z80 and 6809.

Though the 6809 was nice as the two stack pointers meshed nicely with Forth's requirement (I did it on a 6809 and also a PDP/11 myself ;-) )

As for Xch instructions I used to love the power of the Z80's "EX (SP), HL" it seemed to achieve so much more in one instruction than a lot of the other opcodes (with the possible exception of LDIR which was like a whole program in one opcode)

Z80 never ruled anything, IMO. LDIR? If memory serves it was something like 16 clocks per byte. Z80 was what happened when PR guys started to get involved in the microprocessor business. All the IX and IY instructions took so many clock cycles that a colleague of mine theorised that the IX and IY registers didn't really exist, they were two-timing with the HL or DE pairs. If you've ever written an assembler, or a disassembler for the Z80, you'll know what a top-heavy mongrel it was. There's no comparison with the 6809.
jsr[b,x], PCR (program counter relative addressing) multiple register pull and push instructions... drool....

Four legs good, two legs bad, three legs stable.

John,

Many cycled it may have been but it wasn't too shabby in overall performance. How many of the people working in computers in the UK or Europe were brought up on a diet of Sinclair ZX80, ZX81, Spectrums, Radio Shack/Tandy TRS-80, Sharp MZ, Amstrad CPCs or Amstrad PCWs (not to mention the most excellent NC100 and NC200). It could certainly perform OK in those environments.

Cliff

Remember at that time most Z80 was 4MHz (6809 1MHz) and it could run CP/M (a lot of programs needed a Z80 as my poly(turbo) pascal).

Not fair. 1mhz 6809 really had E and Q out of phase, so it was like 2mhz. 2mhz 68B09 was like a 4mhz Z80 i think.

Imagecraft compiler user

John_A_Brown wrote:
Are you people on drugs or something? I will NEVER use this "trick"...

EORs are usually found in their "gloomy place". I learned that before kindergarten. :wink:

John

Quote:

Not fair. 1mhz 6809 really had E and Q out of phase, so it was like 2mhz. 2mhz 68B09 was like a 4mhz Z80 i think.

I do think so , because we talk about the speed of a memory cycle.
For me a 6809 is just a 'super' 6502.
And yes both 6502, 6809 and Z80 was made faster. I'm talking about the 'normal speed' at the the boom in the early 80's.
My commodore PET could be overclocked to 2MHz (with new EPROM's I had a org. 1977 model), but the monitor could only show the picture up to 1.25 MHz (10 MHz xtal) , and as I remember that was the limit for still reading the tapes.

While benchmarks are a pretty meaningless indicator of speed they are probably a better indication of performance than raw clock speed. If you had a 1MHz AVR doing most ops in 1 cycle it might actually process faster than a Z80 doing 5 or 6 cycle (avg.) opcodes on 4MHz. (there again they are arguably more powerful opcodes - well some of them)

Historically, Intel cpus were microcoded as were the Z80A, Motorola and 6052 were random logic control units. With the multi-phase clock the 68/65 series did more per clock than their Intel brothers. As for overall performance -they were neck and neck - depending on what you were actually doing. I've spent years counting cpu cycles! AVRs as they prescribe to the RISC philosophy, have many registers to cut down on memory access as this was considered slow.

If you wanted more registers, you could always look at the AMD 29k series (now obsolete).

When I went over to the 8051, I was amazed at how fast the i/o was - one instruction of 12clocks (~1uS) to set/clear/flip a i/o bit vs the Z80 read/modify/write that used a shiteload of cpu clocks to perform the same function. Receiving DMX512 became easy on the 8051 vs the hardware/software nightmare on the Z80.

Remember that the AVR is a Harvard structure, so it can do more then one memory cycle at the same time.
On a AVR a RAM cycle take 2 clk, and Flash cycle is 1 clk (16 bit wide), so it better be better per clk and it is, but for a cost of a bigger program (counted in byte), and no code change in the fly.(I know that 94's can!)

Jens

Quote:

On a AVR ..., but for a cost of a bigger program (counted in byte)

danni claims that he can get an x51 program same/smaller than an AVR. In a mix of small benchmarks the AVR code density compares quite well with other architectures so I'll ask you to give evidence to back up your claim of larger programs on an AVR.
http://www.maxim-ic.com/appnotes...

```Table 2. TI Study Results: Code Size (no. of bytes)
Application 	MSP430F135 	ATmega8 	PIC18F242 	8051 	H8/300L 	MC68HC11 	MAXQ20 	ARM7-TDMI (Thumb)
8-bit math 	172 	116 	386 	141 	354 	285 	352 	660
8-bit matrix 	118 	364 	676 	615 	356 	380 	378 	408
8-bit switch 	180 	342 	404 	209 	362 	387 	202 	504
16-bit math 	172 	174 	598 	361 	564 	315 	286 	676
16-bit matrix 	156 	570 	846 	825 	450 	490 	526 	428
16-bit switch 	178 	388 	572 	326 	404 	405 	188 	504
32-bit math 	250 	316 	960 	723 	876 	962 	338 	620
Floating point 	662 	1042 	1778 	1420 	1450 	1429 	1596 	1556
FIR filter 	668 	1292 	2146 	1915 	1588 	1470 	1828 	1420
Matrix multiply 	252 	510 	936 	345 	462 	499 	494 	432
TOTALS 	2808 	5114 	9302 	6880 	6866 	6622 	6188 	7208```
```Table 4. Results from Maxim's Study: Code Size (no. of bytes)
Application 	MSP430F149 IAR 	MSP430F149 Rowley 	ATmega8 IAR 	ATmega8 Rowley 	MAXQ2000 Rowley
Configuration 	Small 	Fast 	Small 	Fast 	Small 	Fast 	Small 	Fast 	Small 	Fast
8-bit math 	192 	192 	258 	262 	98 	98 	212 	212 	248 	284
8-bit matrix 	152 	180 	240 	232 	318 	304 	220 	250 	202 	222
8-bit switch 	180 	180 	230 	230 	312 	164 	202 	200 	152 	152
16-bit math 	140 	140 	220 	220 	162 	154 	222 	238 	162 	164
16-bit matrix 	240 	240 	312 	312 	398 	374 	294 	350 	260 	378
16-bit switch 	178 	178 	230 	230 	346 	178 	212 	240 	152 	152
32-bit math 	236 	236 	284 	388 	306 	296 	380 	460 	274 	324
Floating point 	1100 	1100 	966 	1004 	1026 	1046 	816 	936 	1018 	1090
FIR filter 	1178 	1174 	924 	966 	1258 	1258 	860 	896 	1024 	1044
Matrix multiply 	266 	250 	312 	316 	476 	324 	294 	348 	254 	264
TOTALS 	3862 	3870 	4076 	4160 	4700 	4196 	3712 	4130 	3746 	4074```

The summary when you take everything in that document into account is that the AVR code density is fine, and roughly the same as other architectures. If you refute that, please give other third-party evidence. When you read the whole paper you find that a lowly Mega8 running at the AVR's modest clock speed ain't no slouch in the throughput department either.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

I will agree with Danny the 'normal' code on a 8051 is more compact.(but not as fast)
The problem is to define 'normal', I don't have any tools here so I can't make the numbers!
But the use of the Keil compiler would probely help the 8051.
A lot of the things that is very good on a 8051 is hard to implement i C.

Jens

Edit I think that the problem is that 'normal' today is 16 bit, (your numbers is made for 430 that is a 16 bit). The 8051 love 8 bit.

Quote:

I will agree with Danny the 'normal' code on a 8051 is more compact.(

Why would you, without any evidence?!? At least I'm referring to a third-party set of benchmarks with two implementations. There is no evidence there of the AVR having "a cost of a bigger program". danni is a craftsman and can squeeze whatever he has to work with but let's have him chime in--even if his x51 is the same size or smaller, is it significantly smaller such that you can support your claim? At this point it seems to be unfounded speculation, or parroting of "common knowledge".

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Let's look at more numbers, from a very unbiased source--Atmel itself:
http://mathcs.slu.edu/~fritts/cs...

```Device Max Speed [MHz] Code Size [Bytes] Cycles Execution Time[uS]
ATmega16 16 32 227 14.2
MSP430 8 34 246 30.8
T89C51RD2 20 57 4200 210.0
PIC18F452 40 92 716 17.9
PIC16C74 20 87 2492 124.6
68HC11 12 59 1238 103.2```

It may not come out in columns, but the AVR is ~40% smaller than the x51. A tiny benchmark, eh? Then let's scroll down a chart for a suite of "real programs"--hmmm, 19 different architectures, and the AVR is the >>smallest<<! x51 is 50% larger. How CAN you say a blanket "at the cost of a bigger program"?!?

"The Emperor has no clothes!"

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Thu. Jan 15, 2009 - 07:56 PM

Speaking of clothes, what about the real issues? Which chip is cuter? Which is going to look marvelous on the runway?

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

Quote:

Which chip is cuter?

A chip off the old block? Or the chip on Smiley's shoulder? :roll:

If you think education is expensive, try ignorance.

Quote:

Why would you, without any evidence?!?

From about 90 to 98 I programmed 8051's, and when the AVR came we made a lot of test's. (yes the old version)
At that time we were making our own CPU that made even smaller C code (A true stack machine with addressing like the transputer) so I do know!
And again how is the compiler setup.
I downloaded the compiler you use just to see how it can avoid push/pop. And everything for a function is pushed on the datastack using Y, and when there is local var in a function it push what else was in R16-R21) But yes the instructions PUSH and POP isn't used.
And even max optimized for speed this code
a+=*p; //both int
call a subrutine! (then it's easy to make small code)
Ok if you have a lot of 16 bit code in your C program I can cut it down to about 60-70% by writing a 'interpeter'(speeling problem) and the speed perhaps drop 10-30% (a call/ret=7 clk get a token and return take 12clk so the overhead isn't that big).

Jens
If you realy want small code (ASM) take a look at a SAM47xxxxx.

So, in all that I guess it boils down to "everyone knows that AVR produces bigger code". You mention tests but give no details.

Note that the AVR compilers 10 years ago were new.

I doubt if Atmel used Codevision; they are tight with IAR.

I'm glad that your AVR C compiler emits "perfect" code. Obviously Codevision is vastly deficient. Given that I can pack all I need to into modest-sized AVRs and don't come close to running out of cycles, I just keep muddling along with these production AVR apps, month after month, using the crappy old AVR architecture.

In my old fuddy-duddy style I use few pointers at all, and can't every remember using a multi-byte source/destination except for manipulating and passing small structures.

True, most of the time parameters are stacked. Not always but close enough for a generality. It works fine for me. Pointers could be improved especially in tight loops.

If the code emitted by CV is so crappy how come I can match the "you can't do this in C and come close to ASM" "contests"?

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.