atmega binary code asks me riddles

Go To Last Post
20 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am currently trying to understand the hex code of an atmega processor.

If I look into my main.lss file, I see

0c 94 d8 01 	jmp	0x3b0

which is a simple jump to address 0x3b0.

What I'am puzzling over is how the '0x3b0' value is encoded within '0c 94 d8 01'. As an atmega is a little-endian machine, the real reading is likely "94 0c" to get the opcode, and then "01 d8" to get the jump-address.

I have absolutely no clue how the destination address is encoded in that fragment, I cannot find any logic pattern, and it's getting even worse if I see the next entry in the lss-file:

0c 94 ea 01 	jmp	0x3d4

which is again a jump, but does not reveal any arithmetic rule to calculate the different offsets of both jumps. The story continues with other types of opcodes: The corresponding parameters do not seem to be self-evident.

I know that there is of course such a rule - otherwise these processors wouldn't work :?

But how is it done?

I am asking this because I am trying to build a disassembler. I've already been successfull in doing so for e.g. an 68HC11 or an 8051. But for the atmega I have simply no clue how values/addresses can be derived from the hexadecimal code.

Can someone point me to the right direction or a documentation?

PS: The atmega-manuals (instruction set) I've seen so far didn't really help me.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm fairly certain that there is a document somewhere that shows how the opcodes are constructed.
Oh look! Here it is!
http://www.atmel.com/images/doc0856.pdf

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

John_A_Brown wrote:
I'm fairly certain that there is a document somewhere that shows how the opcodes are constructed.
Oh look! Here it is!
http://www.atmel.com/images/doc0856.pdf

By the way, all I did was google for "AVR Instruction set"

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

0x3b0 = 2 * 0x1d8
0x3d4 = 2 * 0x1ea

I will leave it up to the reader as to why.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

PS: The atmega-manuals (instruction set) I've seen so far didn't really help me.


??? Say what?

www.atmel.com/images/doc0856.pdf

JMP instruction:

Syntax: Operands: Program Counter: Stack:
(i) JMP k 0 ≤k <4M PC ←k Unchanged 
32-bit Opcode:
1001 010k kkkk 110k
kkkk kkkk kkkk kkkk
Words: 2 (4 bytes)
Cycles:3

I do have to partially agree with you, though--the parts of kkkk aren't obvious as to bit order.

You should be able to find existing disassembler work to get you started, at least with the decoding. Five entries in the Projects section of this Web site alone.

Quote:
User projects
Project Type Compiler
Simple AVR Disassembler
Posted: 2013-09-06
Simple AVR Disassembler

ReAVR
Posted: 2012-05-25
Interactive AVR Disassembler creating asm source
Complete code AVR Assembler
Disassembler ATMEL AVR
Posted: 2008-05-10
D�sassembleur pour fichier hex ATMEL AVR
Complete code AVR Assembler
AVR disassembler
Posted: 2006-11-05
Disassembler for Atmel AVR chips for GNU/Linux
Complete code
AVR opcodes
Posted: 2006-11-05
AVR opcodes taken apart
Part-specific Lib.functions
Articles

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

John_A_Brown wrote:
John_A_Brown wrote:
I'm fairly certain that there is a document somewhere that shows how the opcodes are constructed.
Oh look! Here it is!
http://www.atmel.com/images/doc0856.pdf

By the way, all I did was google for "AVR Instruction set"

All you should do is to read my first post: I know this document. It didn't help me.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
0x3b0 = 2 * 0x1d8
0x3d4 = 2 * 0x1ea

I will leave it up to the reader as to why.

Thank you Steve! I tried a lot of complicated transformations coming to my mind, and didn't expect this to be that simple.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

??? Say what?

www.atmel.com/images/doc0856.pdf

JMP instruction:

Syntax: Operands: Program Counter: Stack:
(i) JMP k 0 ≤k <4M PC ←k Unchanged 
32-bit Opcode:
1001 010k kkkk 110k
kkkk kkkk kkkk kkkk
Words: 2 (4 bytes)
Cycles:3


Thank you, I had seen this paragraph in the manual but simply didn't understand. Maybe I messed up my bit-fiddling.

It was not clear to me, in how far the little-endian rule would apply to the kkk...kkk collection of bits.

Last Edited: Thu. Apr 17, 2014 - 05:19 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

All you should do is to read my first post: I know this document. It didn't help me.

If it didn't help you, how could you possibly say:

Quote:

What I'am puzzling over is how the '0x3b0' value is encoded within '0c 94 d8 01'. As an atmega is a little-endian machine, the real reading is likely "94 0c" to get the opcode, and then "01 d8" to get the jump-address.

The bit chart of the op code, from that document, shows exactly where the "opcode" bits are.

IIRC the operand bits are always MSB to LSB but a few trials should help. Or consult one of the projects I listed.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

> how could you possibly say: ...

because one sometimes simply needs a kick to get over a mind-barrier.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

emax.. wrote:
John_A_Brown wrote:
John_A_Brown wrote:
I'm fairly certain that there is a document somewhere that shows how the opcodes are constructed.
Oh look! Here it is!
http://www.atmel.com/images/doc0856.pdf

By the way, all I did was google for "AVR Instruction set"

All you should do is to read my first post: I know this document. It didn't help me.


I did read your first post. That should be fairly evident since I responded to it. What do you think I do? Make posts at random and hope they might be pertinant in some way? :D

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@Theusch:

Ok, maybe I am too dumb, but I am still not understanding the documentation. Perhaps you can help me to find my error in reasoning.

The manual states:

Quote:
1001 010k kkkk 110k
kkkk kkkk kkkk kkkk

which is in one line

Quote:
1001 010k kkkk 110k kkkk kkkk kkkk kkkk

my opcode is 0c 94 d8 01, in little endian this yields

Quote:
1001 0100 0000 1100 0000 0001 1101 1000

If I mask out the non-'k' values, I get:

Quote:

1001 010k kkkk 110k kkkk kkkk kkkk kkkk < opcode pattern
1001 0100 0000 1100 0000 0001 1101 1000 < opcode in binary file
xxxx xxx0 0000 xxx0 0000 0001 1101 1000 < masked out for 'k' bits only

So the final bitpattern is (if I interpreted the manual correctly):

00-0000-0000-0001-1101-1000 == 0x0001D8

This is a 22bit value as described in the manual: 4M address space, fine. That's exactly where I already was yesterday.

But I still get to 0x01D8. Now, that Koshchi has given the hint that this is exactly half of the address in question, I could simply mutliply all such values by two. But this is not the complete story: I still cannot find the crucial indication for that rule in the manual.

What did I miss?

But apart from that: Is my calculation basically correct?

Last Edited: Thu. Apr 17, 2014 - 05:55 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

John_A_Brown wrote:
That should be fairly evident since ...
What do you think I do? Make posts at random and hope they might be pertinant in some way? :D

I give you just back your claim: I had written in my first post, that the instruction set manual didn't help me to get further. So it should be fairly evident that I found this manual myself.

So the only explanation for your rebuke "just googled for instruction set" was that you didn't get what I wrote.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

emax.. wrote:
...I could simply mutliply all such values by two. But this is not the complete story: I still cannot find the crucial indication for that rule in the manual.

What did I miss?

All execution addresses, when encoded, are word addresses (because opcodes are 16-bits each).

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brian Fairchild wrote:
emax.. wrote:
...I could simply mutliply all such values by two. But this is not the complete story: I still cannot find the crucial indication for that rule in the manual.

What did I miss?

All execution addresses, when encoded, are word addresses (because opcodes are 16-bits each).

BINGO!

Thank you so much! That was my blocker. Now, things are clear.

Thank you again !! :D

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I had written in my first post, that the instruction set manual didn't help me to get further. So it should be fairly evident that I found this manual myself.

Only "fairly". As your next statement about the opcode/operand parts was so far off, I could not tell whether you wore working off the Instruction Set Summary for a datasheet, or not. If you >>hadn't<< found that full manual, we gave links and told you how to find it.

Quote:

But apart from that: Is my calculation basically correct?

I suppose. You posted in the general forum and not the GCC forum. But we can infer GCC from
Quote:

If I look into my main.lss file, I see ...

So, the toolchain >>you have chosen to use<< uses all byte addresses.

If I use the Atmel assembler, it looks different. And indeed, it appears as if the opcode and operand are separated cleanly:

                 ;0000 000C #asm
                 ;0000 000D         JMP 0x1234
0000e2 940c 1234         JMP 0x1234
                 ;0000 000E         JMP 0x03d4
0000e4 940c 03d4         JMP 0x03d4
                 ;0000 000F         JMP 0x01ea
0000e6 940c 01ea         JMP 0x01ea
                 ;0000 0010 #endasm

Again, I'd think that examination of the existing User Projects would give you some hints. (Didn't Cliff do one also as an exercise? I think it was an emulator?)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

An opcode "chart":
https://www.avrfreaks.net/index.p...
Lots of links to prior in this recent thread:
https://www.avrfreaks.net/index.p...
This avr-objdump of a typical vector table might be of interest to you...
https://www.avrfreaks.net/index.p...
Endianness of assembler listing discussion:
https://www.avrfreaks.net/index.p...
General disassembler query:
https://www.avrfreaks.net/index.p...
Lots of discussion here, including the quote "actually thinking about it "avr-objdump -S" in the GCC binutils is a disassembler for which the source is DEFINITELY open source (GPL in fact)":
https://www.avrfreaks.net/index.p...

(...still haven't found Cliff's emulator ...)

Found it! It should be of good use to you:
https://www.avrfreaks.net/index.p...

If I were to tackle disassembly, I'd start with the opcode "charts". And then enough of a test program and binary image and ASM listings to get an idea of byte order.

Then I'd take e.g. your JMP and write a set of ASM instructions with various address targets to verify the kkkkk order.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

emax.. wrote:
John_A_Brown wrote:
That should be fairly evident since ...
What do you think I do? Make posts at random and hope they might be pertinant in some way? :D

I give you just back your claim: I had written in my first post, that the instruction set manual didn't help me to get further. So it should be fairly evident that I found this manual myself.

So the only explanation for your rebuke "just googled for instruction set" was that you didn't get what I wrote.


To be fair, you said "PS: The atmega-manuals (instruction set) I've seen so far didn't really help me."
And that's because the ATMega manuals just give an instruction set summary - they don't go into detail about how the code is made up. So I was trying to be helpful by linking to a document that does go into that sort of detail. Have a nice weekend.

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@theusch
Some clarifications: You're right, I was somewhat blind concerning the toolchain: I am a linuxer through and though and hadn't thus even considered other platforms and tools. Sorry, my bad.

Secondly, English is not my native language (I am german), so it's really difficult to explain a (from my point of view) complicated problem. Maybe I was somewhat knee-jerk after reading the atmega-manuals for hours and then getting an answer like "it's so simple to google the manual". I use to struggle through things as much as I can - and only ask forums as my last resort.

However: I apologize for having been a bit bitchy.

Thank you all for your help, I've already tried and things now decode as they should.

Quote:

Again, I'd think that examination of the existing User Projects would give you some hints.

Yes, I had looked into a project (simulavr) before to find an explanation, but instead, I only found a solution. :wink:

It is one thing to find a solution, but a different thing to understand that solution. The multiplication of an execution-address offset by two will yield correct addresses, but the reason for the multiplication stays hidden. I like to understand what I am doing, copying alone is a bad strategy.

So the hint with the word-addressing in the post before made the scales fall from my eyes.

I think most of us have already experienced a situation where you stubbornly look at a problem over and over again - but you're stuck because for some reason you're consequently thinking in the wrong direction.

Maybe I'll come back here since I have seen that atmega-opcodes seem to have some uncomfortable surprises (for disassembly).

But for the time being I am happy with what I've learned. :-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Secondly, English is not my native language (I am german)

That must be why I got away with spelling pertinent incorrectly. :oops: :oops:

Four legs good, two legs bad, three legs stable.