8 bit - 32 bit??? which code is fastest

Go To Last Post
25 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello guys have a thinking problem maybe you guys could help me.

The following

We have a atmega128.
following code.

volatile char a;
volatile char b;
volatile char c;
volatile char d;

a = 1;
b = 2;
c = 3;
d = 4;

As far as i know each assignment will take one clock tick.

Now lets presume we have a 32 bit arm7 or something.

code eamaple 1.

volatile char a;
volatile char b;
volatile char c;
volatile char d;

a = 1;
b = 2;
c = 3;
d = 4;

code examaple 2.

volatile int32 a;
volatile int32 b;
volatile int32 c;
volatile int32 d;

a = 1;
b = 2;
c = 3;
d = 4;

If the arm7 32 bit can acces the memory with byte adress than example 1 will be the fasted code on this arm then example 2.

But if the arm7 can only acces the memory in 32bits junks than example 2 is fastest.
Why because in example 1 the compiler can use one int32 memmory variable too store all 4 chars. If so when accesing these chars the compiler must do byte shifting and other stuff that will take time.

Of coure if the compiler uses a int32 for every char than it does not matter which example you use because they both are fast only example 2 will take more memmory space..

So can i conclude that it depends how the arm7 can acces memmory (8 bit 32 bit) and how the compiler make the code that one of these examples is the fastest?????

Thanks for the help.

P.S. When i have read and understand the whole manual of my atmel arm7
then i will try this in gcc and let you guys know what the compiler does make of it. Thanks again and sorry if my english is not 100%

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

PP

AFAIK the ARM has MOV, MOV.W and MOV.B instructions to make 32bit, 16bit and 8bit memory accesses so I think it would use MOV.B as long as the compiler is working optimally.

Cliff

PS But the one thing I'm not sure about is what the situation is with non 32 bit boundary aligned data

Last Edited: Fri. Feb 3, 2006 - 05:10 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Its easy to compile that up and look at the assembler generated. The 8 bit operations have a load and store, and the 32 bit ops have four loads and stores, so AVRs are great for moving 8 bit things around, like a bridge or router, but 4 times slower moving 32 bit things around.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

As far as i know each assignment will take one clock tick.

In the general case, let's assume that your variables are NOT register-based or otherwise optimized away by the compiler.

Then you will get AVR code something like this for a global variable in SRAM:

         ;       8 unsigned char frog;
         
          	.DSEG
          _frog:
000160      	.BYTE 0x1
...
         ;      11 frog = 1;
00004a e0e1      	LDI  R30,LOW(1)
00004b 93e0 0160 	STS  _frog,R30

3 cycles, 3 words, repeated for each byte.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Bob you correct about trying it out. But i'm still reading the 544 pages of the **** thing.
Afte rthat i need to get winarm and try to make these examples.
But like the avr first understand what the reg does be fore writing any codes.

Thanks for the reply guys will look into the asm of arm procs to see those 4 types of mov instuctions.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Take a look at the LDR, STR, LDRB, STRB, LDRH, and STRH single register load-store instructions. Have you downloaded the PDF of the ARM Architecture Reference Manual (yes, the ARM ARM?) This is a must read. Also, the book "ARM System Developer's Guide" by Sloss, Symes, and Wright is very helpful.

There is also "Embedded System Design on a Shoestring", this is the ARM equivalent to Smiley's book for the AVR.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I downloaded the ar arm. wooowwww, avr are easy but this arm stuff. a lot to read i think.
I'm not using the arm for nothing fancy in my project, so I can keep the learning to a minimum.
Thanks for the names of the books they seem very good.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh.... ARM.... I thought you were asking about 8 vs 32 in an AVR..... that I know about....... never mind me!

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Atmel has some interesting ARM parts. Is there an ARM forum similar to this? I don't mean an implied critisism, I just want to know.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Miel look at this
http://www.at91.com/
Seems oke i like a www.armfreaks.com better but is already in use.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

PinkPanter: This is all a very loaded set of questions.

Mostly this has something to do with you, and not the compiler.

If you could get an ARM slower than an AVR... it would be slower. The fact is that even if an ARM is was (that that I'm saying it is) less efficient than an AVR, its clocked so much faster, so who cares about efficiency? And ARM would clean the AVR's clock, so to speak.

Your question is soooo narrow in scope as to be meaningless.

You can in fact read in all 4 bytes into an ARM in one clock cycle. However, what are you going to do with them? Process them? You need a different set of binary operations to work on each different byte in the word. Numeric operations would overflow into neighbouring bytes... You could build routines that handled this, but they would be slow...

Which brings me to the next point... by default the compiler is likely to treat example 1 the same as example 2. If I recall correctly, by default most compilers will treat chars and

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ARM has 8, 16 and 32 bit access loads/stores. However, 16 bit access must be word aligned and 32 bit access must be dword aligned, otherwise strange things will happen! Unless slow byte/word wide memory is used, 32 bit types are fastest in C as they don't incur the overhead of automatic promoting to int.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Packing the bytes into a 32bit word way way less efficient on the CPU but obviously more efficient with memory.

This is not true with the ARM. When the ARM7TDMI reads a byte or half-word from a 32-bit wide memory, an entire word is read.

From the ARM7TDMI Manual:

Quote:
When a halfword or byte read is performed, a 32-bit memory system can return the
complete 32-bit word, and the processor extracts the valid halfword or byte field from
it. The fields extracted depend on the state of the BIGEND signal, which determines
the endian configuration of the system. See Memory formats on page 2-4.

This does not require any additional clock cycles.

Also, remember that all address mode-1 data-processing instructions contain a shifter operand. This allows for operations such as:

mov  r2, r0, lsl#2   ; Shift R0 left by 2, write to R2
add  r9, r5, r5, lsl #3 ; r9 = r5 + r5 * 8 (or r9 = r5 * 9)
mov  r12, r4, ROR R3 ; r12 = r4 rotated right by value of r3

Pretty nifty, eh?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
There is also "Embedded System Design on a Shoestring", this is the ARM equivalent to Smiley's book for the AVR.

I just read some sample pages from this at Amazon...this looks like a really good book :)
Do you have this book> and if so, did you find it useful for you?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I do have this book. I bought it after having worked with various ARM processors for 10 years. I think that anyone new to ARM would find it very useful for content, including the tools included on the CDROM.

I wish that the "ARM System Developer's Guide" had been written 8 years earlier. I had to learn most of what they cover the hard (but lasting) way. The section on the MMU is particularly useful.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If starting with ARM, then I recommend this device also:

http://www.amontec.com/chm_world...

It is a re-configurable probe that can be used for ARM JTAG and programming Xilinx FPGA, and...

They have announced a USB version of their JTAG for the ARM. Looks interesting.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

zoomcityzoom: You're all up in ARMs! (Sorry I just had to say that.)

Do you know of an ARM compiler that supports ARM conditional instructions properly? My understanding (according to the Philips rep) is that none of the ARM compilers support conditional instructions. What most compilers do, is branch around conditional pieces of code. The issue is of course that the pipeline cache in the ARM may take a hit.

A programmer here used conditional instructions quite a lot, and it yielded amazing performance.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I know that the GreenHills ARM compiler supports conditional instructions. I haven't actually checked to see what GCC does.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Don't forget that with an AMR9 with an MMU, you can lock critical sections of code into the I-Cache.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

UHF wrote:
Do you know of an ARM compiler that supports ARM conditional instructions properly?

I'd be hugely astonished if ADT and now ADS didn't produce the most optimal ARM code possible taking into account the pipelining and such - if ARM can't get it right for their own silicon then who could?

Cliff

PS Looking at arm.com it looks like this is now going under the name Realview (wonder exactly how many times they need to rename their development systems? (probably to justify yet another price hike!))

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

With most things in life, you win on some, lose on others. The simple example given really doesn't tell us too much - real life code does not do just assignments. The bottom line is to time the critical pieces of code. Just today, I've timed some encryption code on a LPC2129 that was written in 'c' vs asm on a PIC. The ARM does the encryption/decryption and some CRC stuff in about 470uS vs about 15mS on a pic16 4Mhz. Sure the ARM is running at 60Mhz (15 times the PIC) but it is still twice as fast clock for clock. Also don't forget we're talking compiled 'c' code vs assembler on the PIC. The 'c' code has not been optimised - it is written using the same algorithm as the PIC code (the code was translated into C from the assembler). I'm not sure how the AVR would fare - I dare say it might match the ARM clock for clock but I don't think compiled 'c' on the AVR would be able to match the ARM in this instance. The other consideration may be power consumption - I have yet to compare the ARM power vs PIC power. This might be of interest to some people - especially battery operated equipment. It always pays to do some comparative testing rather than throw around claims. Sometimes the results are surprising.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I'm not sure how the AVR would fare - I dare say it might match the ARM clock for clock but I don't think compiled 'c' on the AVR would be able to match the ARM in this instance.

Maxim's MAXq analysis work based on TI benchmarks
http://www.maxim-ic.com/appnotes...
may give some reasonably unbiased numbers. AVR hangs in there nicely for 8-bit work. There are a number of tables; here is one:

Table 1. TI Study Results: Execution Speed (no. of cycles) Application MSP430F135 ATmega8 PIC18F242 8051 H8/300L MC68HC11 MAXQ20 ARM7-TDMI (Thumb) 
8-bit math 299 157 318 112 680 387 421 185 
8-bit matrix 2899 5300 20045 17744 9098 15412 31691 2227 
8-bit switch 50 131 109 84 388 214 58 146 
16-bit math 343 319 625 426 802 508 815 259 
16-bit matrix 5784 24426 27021 29468 15280 23164 60214 2998 
16-bit switch 49 144 163 120 398 230 51 146 
32-bit math 792 782 1818 2937 1756 1446 1034 115 
Floating point 1207 1601 1599 2487 2458 4664 1943 108 
FIR filter 152193 164793 248655 206806 245588 567139 464558 43191 
Matrix multiply 6633 16027 36190 9454 26750 26874 66534 2918 
TOTALS 170249 213680 336543 269638 303198 640038 627319 52293 

but it appears that the columns are not preserved. The summary was that the AVR was equivalent to the ARM7-TDMI (in Thumb) cycle-for-cycle on 8-bit and some 16-bit operations, but is about 1/4 the speed (cycle for cycle) in other operations such as floating point, matrix manipulation, and FIR filtering.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Zoom,

Where can I download the ARM Architecture Reference Manual (the ARM ARM)?

Many thanks

Have anice day

Pippo

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Pippo,

If it's available online then it's probably to be found somewhere off:

http://www.arm.com/documentation/

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Pippo,

Fill out the form here: http://www.arm.com/documentation...

They'll send you a free CDROM that includes the ARM ARM.