Basic question about opcodes

Go To Last Post
13 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am trying to get my feet wet with embedded system by creating a little assembler for my ATmega microcontroller.
I got stuck on something so basic that I'm ashamed to ask (I'm quite new to this all), but here it goes anyway:

If I take an opcode, like for example (a simple one) 'RET', which is:

| 1001 |  0101 | 0000 | 1000 |

and if I simply convert this to hexadecimal, I get: 0x9508
but what I should have got (according to the output of avr-as) is: 0x0895

For some, probably obvious, reason the two bytes are swapped. I cannot explain why, can somebody else please?

Thank you in advance.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
For some, probably obvious, reason the two bytes are swapped.
They are not swapped, you are reading them backwards.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think, it's about endian. Check this link out just in case : http://en.wikipedia.org/wiki/End...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Drahcir wrote:
I am trying to get my feet wet with embedded system by creating a little assembler for my ATmega microcontroller.

I'm curious. Most people start off by blinking an LED or something simple like that. Why are you writing an assembler to run on a microcontroller?

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
               PORT_INIT:
000011 e000       	ldi		temp,0x00
000012 bb08       	out		PORTB,temp
000013 e10f       	ldi 	temp,0x1f 					
000014 bb07       	out  	DDRB,temp
000015 bb08      	out		portb,temp
000016 9508             ret

I copied the above snippet from a listing of
an assembly program. Examining the last line it shows the RET instruction code is
exactly as deined in the AVR Instruction set manual.

How did you get 0x0895 ?

I'll believe corporations
are people when Texas executes one.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

tubecut wrote:
How did you get 0x0895 ?

Page 115 of this:

Instruction Set Manual

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Drahcir wrote:
For some, probably obvious, reason the two bytes are swapped.

The two bytes aren't swapped, it's just how they are stored in memory and displayed.

As you stated, the opcode is 0x9508 which is a 16-bit word with a high byte and a low byte. For a 'little endian' device, the low byte is stored first and then the high byte. For 'big endian' it is just the opposite. For opcodes, the avr is little endian.

Here is a short test program I wrote:

volatile uint8_t x;

void Test_Endian_ness(void)
{
	x = 2;
}

int main(void)
{
    Test_Endian_ness();

    while(1)
    {
    }
}

Here is the .lss file that was generated:

volatile uint8_t x;

void Test_Endian_ness(void)
{
	x = 2;
  9c:	82 e0       	ldi	r24, 0x02	; 2
  9e:	80 93 00 01 	sts	0x0100, r24
	
}
  a2:	08 95       	ret

000000a4 
: int main(void) { Test_Endian_ness(); a4: 0e 94 4e 00 call 0x9c ; 0x9c a8: ff cf rjmp .-2 ; 0xa8 000000aa <_exit>: aa: f8 94 cli 000000ac <__stop_program>: ac: ff cf rjmp .-2 ; 0xac <__stop_program>

Here is part of the .hex file (spaces added to highlight the 0895):

:100090000E9452000C9455000C94000082E0809362
:0E00A0000001 0895 0E944E00FFCFF894FFCF9C
:00000001FF

As this shows, for opcode storage, the avr is little endian - the low byte of the opcode is stored first in the lower memory address followed by the high byte.

For more on endian-ness in avr, see this post by lfmorrison.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Drahcir wrote:
I am trying to get my feet wet with embedded system by creating a little assembler for my ATmega microcontroller.

I'm curious. Most people start off by blinking an LED or something simple like that. Why are you writing an assembler to run on a microcontroller?


LOL--It is a time-immemorial thing, I think--language developers are always intrigued by writing a compiler in itself. A simple corollary is writing self-hosted tools.

[In a past life in a big app using x86 (80186), Intel tools were used. First on "blue box" and VAX, and later on PCs. However, there was a little debugging kernel that had disassembly and instruction assembly. And indeed the disassembled version could be "dumped" to a file, and within some limits a source file could be loaded.]

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Page 115 of this:

I read 0x9508 as the RET instruction code in the manual.

If you exam the code snippet I posted which shows the op code generated by the assembler, list file shows the same value.

Is the OP dissecting the hex output file?

I have not attempted to exam a hex file contents in a long time and if it is the reversing Hi/Lo bytes then that is an issue with the hex method but doesn't change the actual op code as defined by Atmel.

I do recall my first computer, I did not have any of the tools like an assembler. I had to manual build up the op code exactly as specified in the micro documentation. It had to be entered at a desired memory location. If the hi/lo bytes had been reversed it would not work as expected for the op code I wanted.

If you have to know how the op code is stored, then I guess you could use the AS debugger and exam Flash contents.

I think chuck99 post indicates that the AVR
'stores' code as little endian,and this does not change the op code.

I'll believe corporations
are people when Texas executes one.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I read 0x9508 as the RET instruction code in the manual.

That's because humans naturally read big-endian. It's already established above that the AVR stores code little endian and that's why the line:

:0E00A0000001 0895 0E944E00FFCFF894FFCF9C 

unequivocally holds 0x08, 0x95 in consecutive bytes but when the AVR makes the 16bit fetch in to its instruction decoder it will see 0x9508. There is no mystery here - just endianism.

Further interesting observations. If I start AS4 and write an avr-gcc program:

int main(void) {
}

this ends with:

int main(void) {
}
  7c:	80 e0       	ldi	r24, 0x00	; 0
  7e:	90 e0       	ldi	r25, 0x00	; 0
  80:	08 95       	ret

where avr-gcc is very "byte centric" and simply lists the opcode in generated byte order. If I now load that self same .elf file into the simulator and observe it with Atmel's disassembler I see:

+0000003E:   E080        LDI       R24,0x00       
+0000003F:   E090        LDI       R25,0x00       
+00000040:   9508        RET         

For one thing the addressing has changed (the numbers have halved 3E = 7c / 2 etc.) as it's showing word not byte addressing and the opcode encoding has apparently changed because it's choosing to display the opcodes as their big-endian 16bit values. Meanwhile the .hex file has:

:100070000E943E000C9441000C94000080E090E04F
:060080000895F894FFCF83

which contains:

                                80 E0 90 E0
          08 95

showing that in flash memory they are actually in little endian order.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Drahcir wrote:
I am trying to get my feet wet with embedded system by creating a little assembler for my ATmega microcontroller.
I don't read it that OP wants to write an assembler - it would unlikely be a feet-wetting task.

I read it that OP is trying to write a small amount of ASM code and is looking at the assembler output and getting confused.

...but until OP increases his post count to >1, the forum can carrying hypothesising and speculating like it always does ;)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I actually am writing a little assembler by disassembling hex files to get my feet wet :) Just as a puzzle hobby, some people like making sodokus.

At first it didn't work but when I examined a working hex file I saw that for some reason I had to flip the bytes around. When I did this everything worked.
Then I found out that it didn't work for 32-bit opcodes, so I thought that I'd better find out why those bytes are not in the place I suspected them to be.

I understand now what I was doing wrong, thank you all for the information.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

UmHum, now I see. Drahcir has TCS - Terminal Curiosity Syndrome. Very common amongst techno-types. There's no know cure.