AVR32 little-endian instructions

Go To Last Post
12 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In the AVR32 Architecture manual I read:

All instructions are interpreted as being big-endian. However, in order to support data transfers that
are little-endian, special endian-translating load and store instructions are defined.

 

Does anyone know how to make use of these instructions writing C/C++ code?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

savman wrote:
in order to support data transfers that are little-endian

 

The "standard" - and portable - way to do that is with library functions like htons():

 

https://linux.die.net/man/3/htons

 

 

 

Quote:
Does anyone know how to make use of these instructions writing C/C++ code?

It would be a proprietary language extension - so you would need to check the documentation for the specific toolchain you're using...

 

Start with the Library documentation - look for htons() et al...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As far as I am aware the htons() (Host to Network Short) is specific to networking so is not part of C/C++ Library in Atmel Studio.

 

From memory I found the macros I currently use in winsock.h:

#define htons(A) ((((uint16_t)(A) & 0xff00) >> 8) | (((uint16_t)(A) & 0x00ff) << 8))

I'm not sure how good the optimizer is, whether it is smart enough to replace that logic with endian assembly instructions.

 

I did some googling and digging in headers and in compiler.h and I found:

/*! \brief Toggles the endianism of \a u16 (by swapping its bytes).
 *
 * \param u16 U16 of which to toggle the endianism.
 *
 * \return Value resulting from \a u16 with toggled endianism.
 *
 * \note More optimized if only used with values unknown at compile time.
 */
#if (defined __GNUC__)
#  if (!defined __OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
  #define swap16(u16) ((U16)__builtin_bswap_16((U16)(u16)))
#  else
  // swap_16 must be not used when GCC's -Os command option is used
  #define swap16(u16) Swap16(u16)
#  endif
#elif (defined __ICCAVR32__)
  #define swap16(u16) ((U16)__swap_bytes_in_halfwords((U16)(u16)))
#endif

This may be a better implementation for htons(), my guess is it would more likely use the endian assembly instructions.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

htons() and htonl() are the standard byteswapping access macros/functions, even if they did originate in the networking world (that's where "endianness issues" show up most dramatically.)

They should be easy to adapt to any special instructions that you have available.  We've also used "GETSHORT()" and "GETLONG()" macros, which handle alignment issues.  Although it's always been a bit ambiguous whether GETSHORT() should do the byteswap if needed, or whether programmers should be forced to do "ntohs(GETSHORT(p))"  Sigh.

 

Intel has a swell "bi-endian" compiler that lets you tag data so that it will be byteswapped on each load/store (a byteswap instruction taking a fraction of the time of any memory operation), but it's only for x86.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you want specifically optimised AVR32 code using these special opcodes you probably need to write an invite asm() macro to ensure those opcodes are used. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Was interested in a way without resorting to assembly, trouble is as you say becomes specific to AVR32.

 

__builtin_bswap_16() and __swap_bytes_in_halfwords() appears to be at least compiler specific, doesn't appear to be any C/C++ standard.

 

Sounds like Intel has the ideal solution.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

__builtin's are a GCC mechanism for implementing intrinsics. Nothing to do with the C standard. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Was interested in a way without resorting to assembly

Well, you can see how good the optimizer is.  If one of the standard (shift or union based) byteswap methods compiles to the special instructions, then you're all set.

Otherwise... this is exactly when you should feel OK about using assembly.  You bury the definition of n2h and n2hl and such off in some CPU-specific file, and you never have to worry about it again.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok going all out on this one

 

Test code:

 #define htonl(A) ((((uint32_t)(A) & 0xff000000) >> 24) | (((uint32_t)(A) & 0x00ff0000) >> 8) | (((uint32_t)(A) & 0x0000ff00) << 8) | (((uint32_t)(A) & 0x000000ff) << 24))

 volatile int32_t x = 0x0F;
 volatile int32_t y = htonl(x);
 volatile int32_t z = swap32(x);

 

Dissassembly - Optimize Most (-O3):

    volatile int32_t x = 0x0F;
8000A166  mov R8, 15   
8000A168  stdsp SP[0x12c], R8   
    volatile int32_t y = htonl(x);
8000A16A  lddsp R10, SP[0x12c]   
8000A16C  lddsp R9, SP[0x12c]   
8000A16E  lsr R10, 24   
8000A170  lddsp R8, SP[0x12c]   
8000A172  andh R9, 0x00ff, COH   
8000A176  lddsp R11, SP[0x12c]   
8000A178  or R10, R10, R11 << 24   
8000A17C  or R9, R10, R9 >> 8   
8000A180  andl R8, 0xff00, COH   
8000A184  or R8, R9, R8 << 8   
8000A188  stdsp SP[0x128], R8   
    volatile int32_t z = swap32(x);
8000A18A  lddsp R9, SP[0x12c]   
8000A18C  swap.b R9   
8000A18E  sub R8, R6, -5640   
8000A192  stdsp SP[0x124], R9   

 

Not quite what I was expecting but definitely an improvement

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

savman wrote:
doesn't appear to be any C/C++ standard.

That's right; there isn't - which is why we have htons(), compiler intrinsics, etc.

Last Edited: Sat. Dec 31, 2016 - 01:28 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am starting to think my implementation of htons is wrong.

What htons should do depends on the endian of the host and network.

if the host and network endian are the same then no swap

if the host and network endian are different then swap

 

For example if I was to recompile my UC3 code for Atmega I would expect one to swap and the other not to

 

Question is though is there a standard way to identify the endianness of the host like a define or something?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am starting to think my implementation of htons is wrong.

What htons should do depends on the endian of the host and network.

 

Question is though is there a standard way to identify the endianness of the host like a define or something?

 

Ah.  Yes, this is a substantial problem.  You're right and ntoXXX should be processor-dependent.   When we all said "use h2ns()", we were referring to the typical problem where you do WANT it to be processor-dependent.  If you want an explicit byteswap operation for some other reason, you shouldn't use h2ns.

 

It's a common problem, and bites the best of us.  At one time, <big networking company> (it was smaller then) has both hton() and h2n() macros, one of which was unconditional and one conditional.  And neither was used anywhere near as often as was needed to make the code actually run on either endianess CPU. (because the original codebase was written on a big-ending processor.)  It gets especially annoying when you have the occasional little-endian network protocol :-(