Mapping back to register names in avr-gcc ssa form

Go To Last Post
16 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi guys,

For the following simple sample program

=================================
#include
int main( void ) { PORTB = 0xFF;
DDRB = 0x1;
}
================================

I get the following ssa form when doing "avr-gcc -mmcu=atmega8 -O3 -fdump-tree-all program.c"

-----------------------------------------------
;; Function main (main, funcdef_no=0, decl_uid=1495, cgraph_uid=0)

main ()
{
volatile uint8_t * _1;
volatile uint8_t * _4;

:
_1 = 56B;
*_1 ={v} 255;
_4 = 55B;
*_4 ={v} 1;
return;
}
--------------------------------------------------

When looking at this ssa tree through a gcc plugin, I wish to be able to get the register names PORTB and DDRB when dealing with the statements in basic block 2.

How can I do that? Can I somehow get back to the register names with the help of 56B and 55B?

I understand that the register names are just macros, and therefore this situation. But I still I need to be able to get the names for my work. Would debug info help? Please let me know and help me with it.

Thanks and regards,
Sandeep.

Thanks and regards,
Sandeep K Chaudhary,
University of Waterloo, Canada.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

well, 56b is 0x38

This implies a SFR with a STS memory address of 0x38 or OUT address of 0x18. e.g. PORTB on a Mega32

You would simply use a lookup table for all the STS addresses of the SFRs.

Life is considerably simpler if you just write clear unambiguous source code in the first place.

Attempting to reconstruct C code from a HEX file is a mug's game.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I understand that the register names are just macros, and therefore this situation. But I still I need to be able to get the names for my work. Would debug info help?
No it wouldn't. As you said yourself they are macros so are stripped at the preprocessor stage. The compiler itself has never heard of PORTB etc. As David says your best bet is some kind of reverse lookup table. But the problem is that every 0x38 literal in your program (or whatever the value is) may not be "PORTB".

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Depending on why you need to do this, you could make the compiler aware, at least indirectly:

#include 
int main( void ) {
  volatile uint8_t * const portb_ptr = &PORTB;
  volatile uint8_t * const ddrb_ptr = &DDRB;
  *portb_ptr = 0xFF;
  *ddrb_ptr = 0x1;
}

I expect that in most cases this will compile to the same optimal code:

00000080 
: #include int main( void ) { volatile uint8_t * const portb_ptr = &PORTB; volatile uint8_t * const ddrb_ptr = &DDRB; *portb_ptr = 0xFF; 80: 8f ef ldi r24, 0xFF ; 255 82: 85 b9 out 0x05, r24 ; 5 *ddrb_ptr = 0x1; 84: 81 e0 ldi r24, 0x01 ; 1 86: 84 b9 out 0x04, r24 ; 4

... but should allow you to see what's going on:

;; Function main (main, funcdef_no=0, decl_uid=1495, cgraph_uid=0)

main ()
{
  volatile uint8_t * const ddrb_ptr;
  volatile uint8_t * const portb_ptr;
  volatile uint8_t * _3;
  volatile uint8_t * _6;

  :
  portb_ptr_1 = 56B;
  # DEBUG portb_ptr => portb_ptr_1
  ddrb_ptr_2 = 55B;
  # DEBUG ddrb_ptr => ddrb_ptr_2
  _3 = 56B;
  *_3 ={v} 255;
  _6 = 55B;
  *_6 ={v} 1;
  return;

}

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the reply David ! I now get it, and also understand that attempting to reconstruct C code from HEX file is indeed a mug's game. But I have to do it. However, the relief for me is that it is not the entire C code I care about, it's just the register assignments I am interested in. I only need to get the registers for performing a consistency match.

Thanks and regards,
Sandeep K Chaudhary,
University of Waterloo, Canada.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Clawson,

Thanks for your reply !

"But the problem is that every 0x38 literal in your program (or whatever the value is) may not be "PORTB".

Can you please explain why every 0x38 literal might not be "PORTB"? Could it not be "PORTB" even in assignment statements? Please explain.

Thanks and regards,
Sandeep K Chaudhary,
University of Waterloo, Canada.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Joey,

Thanks for the reply !

I am not at liberty of modifying the code. In fact, I don't wish to modify the source code in any manner. What I need to do is to perform some inspections on the values being assigned to different registers in the program. I kind of want to do it somewhere in the compilation stage by writing a GCC plugin for it. This is why I am looking at the SSA tree of the code.

Thanks and regards,
Sandeep K Chaudhary,
University of Waterloo, Canada.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Writing a disassembler is fairly trivial.
Analysing which parts of the binary are data, strings, tables, ... or machine instructions is the hard part.

If you know the compiler well, you can have a pretty good guess. I used to be able to deconstruct C code from popular compilers on a 68000. Then distinguish data and code areas. This would mean I could create different types of variables.

If the binary was produced from ASM, the gloves were off. For a start there are an infinite number of ways of doing things and secondly, the author may have deliberately introduced self-modifying code, dummy functions, dummy strings, dummy tables, ...

I have no desire to deconstruct GCC binaries.

Now, you can probably set data breakpoints on addresses of interest. Then JTAG can tell you.

However, I have never used data breakpoints. Nor do I know if they work properly. And I don't know whether it will catch both OUT 0x18,Rx and STS 0x38,Rx.

I would think that your time would be better spent with MISRA or enforcing some corporate style.

There are also LINT tools. But a determined Luddite will find ways to frustrate you.

As I said earlier, just write clear, straightforward, simple source code. Be kind and polite to your employess.

You will end up with reliable software.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Can you please explain why every 0x38 literal might not be "PORTB"?

Simply because there might be other literal 0x38's coming from completely different code. What about

something = PORTA + 56;

for example?

Sure, as long as you identify a register-manipulating machine instruction, and the register operand is 0x38 you can be sure that this is indeed PORTB. But that is a more elaborate "pattern matching" than just saying "every 0x38 is PORTB".

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

for example?

I was even thinking of:

for (uint8_t i = 0; i < 0x38; i++) {
  PORTB = 0x55;
}

The compiler sees this (roughly) as:

for (uint8_t i = 0; i < 0x38; i++) {
  *(volatile uint8_t *)0x38 = 0x55;
}

If you then just replaced every occurrence of 0x38 with &PORTB you would reconstruct the source as:

for (uint8_t i = 0; i < &PORTB; i++) {
  *(volatile uint8_t *)&PORTB = 0x55;
}

which would not be right.

BTW you do know about the the .s file from -save-temps and even my avr-source program:

https://spaces.atmel.com/gf/proj...

That shows a source annotated version of the assembler that was generated by the C compiler.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The real mystery is "Why".

Some people attempt to convert a locked binary into source code.
Some people attempt to convert a valid binary that they do not own the source code.
Some people write their source code but the hard-drive crashes and they want to reconstruct their own intellectual property.

Since you come from a University, perhaps you want to check your students' work.

Or even verify your own work.

At the end of the day, stealing other's intellectual property is a mug's game. You might just as well re-write from scratch.

Likewise, if you are trying to recover from your PC disaster, a box of tea bags and a pad of paper is your best approach.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

The real mystery is "Why".

+1

As OP here is compiling to look at the SSA then clearly he's not trying to break into .bin/.hex but has some source so I simply don't understand why he doesn't just compile it normally and look at the lss or, as I say, the -save-temps .s file with my avr-source utility to annotate it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

He is not compiling a module, he is trying to extend gcc by means of a plugin.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

s.chaudhary wrote:
Can you please explain why every 0x38 literal might not be "PORTB"?

Obvious example: it could be the character '8' - couldn't it?!

s.chaudhary wrote:
What I need to do is to perform some inspections on the values being assigned to different registers in the program. I kind of want to do it somewhere in the compilation stage

So why not just a simple grep :?:

You could add that as a pre-build step...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks everyone for the replies and all the help !

And for some clarity about what I want to achieve, and why I want to do it -

I am a student, and am looking to enhance correctness in published code based on the information from mcu specifications. Because of this, I am attempting to write a compiler (gcc) plugin which would inspect the program and check for the validity of constraints mentioned in specs. The reason I want to make this a compiler plugin is that I want to make the whole process automatic without any manual effort (this is why I don't want to use tools such as 'grep'). Also, I am not trying to steal anyone's IP by doing anything on the binaries of their code. I am simply looking at publicly available codes.

Thanks again, everyone ! This forum has been so helpful for me, and I will continue to seek help in future as well. :-)

Regards,
Sandeep.

Thanks and regards,
Sandeep K Chaudhary,
University of Waterloo, Canada.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

s.chaudhary wrote:
am looking to enhance correctness in ... code

I guess the biggest challenge there is to know what is "correct" - ie, how to infer the programmers intention from what they actually wrote.

That's a big enough challenge at the 'C' level - trying to do it at the binary level must be nigh-on impossible...?

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...