ASM Can this be done faster ?

Go To Last Post
6 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is no important :lol:

I’m still playing with my emulator, and now I’m playing with the ldd instruction (ld is just a ldd with q=0).

The instruction has this form

xxqx qqxd dddd xqqq

And for better reference I will call the bits this

x x q5 x q4 q3 x d4 d3 d2 d1 d0 x q2 q1 q0

I have the high byte in zl and the low byte in r16
And when done r16=q and r17=d (but can be any place)

lddinst:
mov r17,r16
swap r17
andi r17,0x0f l keep low 4 bit
sbrc zl,0; check d4
ori r17,16 ; if d4=1 place the bit
and r16,7;keep low 3 q
sbrc zl,2 ; check q3
ori r16,8 ; if q3=1 place the bit
sbrc zl,3 ; check q4
ori r16,16 ; if q4=1 place the bit
sbrc zl,5 ; check q5
ori r16,32 ; if q5=1 place the bit

Are there a faster way, the problem is that q5 is an odd ball for using roll etc.

Jens

[/code:1][code:1]
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Since that's in the high byte which you use for your jumps. have each unique vector jump to a separate load section (ldi rx, value) that preloads a register with the high bits, and then jumps to the common code. This way you save yourself from a lot of shifting and combining, and all you have to do is OR in the low 3 bits at the end.

the cost is a ldi, and a rjmp (3 cycles) far less than the sbis/ori sequence you have above.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

Last Edited: Sun. Dec 16, 2007 - 08:00 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

yes but the I need to have (all most)the same code 8 times

Jens

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sorry now I understand what you mean

lddsomeq1:
ldi r17,number1
rjmp lddcode

lddsomeq2:
ldi r17,number2
rjmp lddcode

lddsomeq3:
ldi r17,number2
rjmp lddcode

thanks

Jens

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

[edit] I see you got it while I was posting :) [/edit]

sparrow2 wrote:
yes but the I need to have (all most)the same code 8 times

Jens

no, you have one common block, and 8 unique loads.

  ld  r16, X+ ; opl 
  ld  r17, X+ ; oph
  mov  r30, r17
  ijmp ; execute it
; end of fetch

optable:
  rjmp fetch ; entry 0x00, NOP
.
.
.
  rjmp ldd00 ; entry 0x80, ldd
  rjmp ldd00 ; entry 0x81, ldd
  rjmp std00 ; entry 0x82, std
  rjmp std00 ; entry 0x83, std
  rjmp ldd08 ; entry 0x84, ldd
  rjmp ldd08 ; entry 0x85, ldd
  rjmp std08 ; entry 0x86, std
  rjmp std08 ; entry 0x87, std
  rjmp ldd10 ; entry 0x88, ldd
  rjmp ldd10 ; entry 0x89, ldd
  rjmp std10 ; entry 0x8a, std
  rjmp std10 ; entry 0x8b, std
  rjmp ldd18 ; entry 0x8c, ldd
  rjmp ldd18 ; entry 0x8d, ldd
  rjmp std18 ; entry 0x8e, std
  rjmp std18 ; entry 0x8f, std
.
.
.
  rjmp ldd20 ; entry 0xa0, ldd
  rjmp ldd20 ; entry 0xa1, ldd
  rjmp std20 ; entry 0xa2, std
  rjmp std20 ; entry 0xa3, std
  rjmp ldd28 ; entry 0xa4, ldd
  rjmp ldd28 ; entry 0xa5, ldd
  rjmp std28 ; entry 0xa6, std
  rjmp std28 ; entry 0xa7, std
  rjmp ldd30 ; entry 0xa8, ldd
  rjmp ldd30 ; entry 0xa9, ldd
  rjmp std30 ; entry 0xaa, std
  rjmp std30 ; entry 0xab, std
  rjmp ldd38 ; entry 0xac, ldd
  rjmp ldd38 ; entry 0xad, ldd
  rjmp std38 ; entry 0xae, std
  rjmp std38 ; entry 0xaf, std
.
.
.
  rjmp em_sbrs ; entry 0xff
; end of opcode table

; dispatch routines
ldd00:
  ldi r18, 0x00
  rjmp ldd_common
ldd08:
  ldi r18, 0x08
  rjmp ldd_common
ldd10:
  ldi r18, 0x10
  rjmp ldd_common
ldd18:
  ldi r18, 0x18
  rjmp ldd_common
ldd20:
  ldi r18, 0x20
  rjmp ldd_common
ldd28:
  ldi r18, 0x28
  rjmp ldd_common
ldd30:
  ldi r18, 0x30
  rjmp ldd_common
ldd38:
  ldi r18, 0x38
  rjmp ldd_common

ldd_common:
  mov r19, r17
  andi r19, 0x07
  or  r18, r19
  ; rest of ldd code here

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The problem is that I only save 2 clk.
7 clk in old
5 clk in new (you have an extra move I don't need).
But if I make 2 hole code pices one with q5=0 and one with q5 it will be about the same 5 for 0 and 6 for 1
so it's actually ont the 8 repeated times that give a real gain the code would be 3 clk (for the q calc).
And for the emulator it's not a big problem with ldd because it's a 2 clk instruction. There some bad 1 clk instructions !
A big problem is if you make a ldd pointing at SREG then the emulator will give a wrong result bacause for speed i have SREG in a register and don't make a st at the SREG addr!
or load and store need to make an extra check that slows it down. For now I don't plan on fixing it because it's not used that often.
But I plan on making the check to IN and OUT