Arduino Due + Assembler

Go To Last Post
16 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
I'm trying to access an assembler method from C code.  It looks like it ought to work but when I try to execute a line of asm code it jumps to Dummy_Handler, almost seems like a pointer problem.
But it compiles OK and the map and lss files look OK, so not sure what's happening?  Any help would be appreciated.

count value is passed in R0

extern void delay(int);

/**
 * \brief Application entry point.
 *
 * \return Unused (ANSI-C compatibility).
 */
int main(void)
{
    /* Initialize the SAM system */
    SystemInit();

	//Disable WDT
	WDT->WDT_MR = WDT_MR_WDDIS;
	
    while (1) 
    {		
        delay(5); 
    }
}

/*
 * utils.s
 *
 * Created: Mon 10 6 2014 6:14:19 PM
 *  Author: Mike
 */ 
	.section .text.utils
	.global delay
delay:	mov	R1, #0x0021
loop:	sub	R1, R1, #1
	cbz	R1, iloop
	b	loop
iloop:	sub	R0, R0, #1
		cbnz	R0, ret
	b	delay
ret:	bx	lr	
		.end

 

Happy Trails,

Mike

JaxCoder.com

Last Edited: Tue. Oct 7, 2014 - 01:34 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Can you show the asm listing for the C code. Sounds like you are getting a data abort perhaps because of an alignment issue or something?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
   8030c:	4b07      	ldr	r3, [pc, #28]	; (8032c <main+0x28>)
   8030e:	f44f 4200 	mov.w	r2, #32768	; 0x8000
   80312:	605a      	str	r2, [r3, #4]
	
    while (1) 
    {
		static int count = 0;
		
        delay(5); 
   80314:	2005      	movs	r0, #5
   80316:	4b06      	ldr	r3, [pc, #24]	; (80330 <main+0x2c>)
   80318:	4798      	blx	r3
		
		++count;
   8031a:	4b06      	ldr	r3, [pc, #24]	; (80334 <main+0x30>)
   8031c:	681b      	ldr	r3, [r3, #0]
   8031e:	1c5a      	adds	r2, r3, #1
   80320:	4b04      	ldr	r3, [pc, #16]	; (80334 <main+0x30>)
   80322:	601a      	str	r2, [r3, #0]
    }
   80324:	e7f6      	b.n	80314 <main+0x10>
   80326:	bf00      	nop
   80328:	00080235 	.word	0x00080235
   8032c:	400e1a50 	.word	0x400e1a50
   80330:	00080338 	.word	0x00080338
   80334:	20000454 	.word	0x20000454

00080338 <delay>:
 * Created: Mon 10 6 2014 6:14:19 PM
 *  Author: Mike
 */ 
		.section .text.utils
		.global delay
delay:	mov		R1, #0x0021
   80338:	2121      	movs	r1, #33	; 0x21

0008033a <loop>:
loop:	sub		R1, R1, #1
   8033a:	3901      	subs	r1, #1
		cbz		R1, iloop
   8033c:	b101      	cbz	r1, 80340 <iloop>
		b		loop
   8033e:	e7fc      	b.n	8033a <loop>

00080340 <iloop>:
iloop:	sub		R0, R0, #1
   80340:	3801      	subs	r0, #1
		cbnz	R0, ret
   80342:	b900      	cbnz	r0, 80346 <ret>
		b		delay
   80344:	e7f8      	b.n	80338 <delay>

00080346 <ret>:
ret:	bx		lr	
   80346:	4770      	bx	lr

Hmm hadn't thought of alignment problem, but looks OK?

 

[edit]

I tried adding an .align 8 with no success?

[/edit]

Happy Trails,

Mike

JaxCoder.com

Last Edited: Tue. Oct 7, 2014 - 01:58 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That's curious. In Thumb mode (which all Cortex code is) the destination of a BRANCH (or fancy BRANCH LINK and EXCHANGE) (cat get off keyboard!! -> "````````````````````````````````````````````````````````````````````````````````````````") is usually given as the destination address plus 1. All opcodes (16 bit) are on an even byte boundary and on ARM a CALL/BRANCH/whatever to an odd address means "call this but in Thumb mode". So I would have expected;

   80316:	4b06      	ldr	r3, [pc, #24]	; (80330 <main+0x2c>)
   80318:	4798      	blx	r3
...
   80330:	00080338 	.word	0x00080338

to have a target of 0x00080339 not 0x00080338. It still goes to 0x80338 and executes what's there but it does it in Thumb mode.

 

I wonder if the assembler is mistakenly assembling the .S as if it is ARM rather than Thumb code? What is the command line invocation when the .S file is assembled?

 

(having said that you can see the opcode addresses in the .S listing going up in 2 byte steps so they do appear to be 16 bit not 32 bit instructions. Byte that 0x80338 still puzzles me.

 

As a test (I suppose I could do this) build the same but implement delay: as a C function (but in a separate file) and see what the BLX to it shows. Also in doing this add -save-temps then look at the .s the .c generates and compare to your .S and see if anything is obviously different (like some directive you missed).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Clawson I appreciate your time with this.

 

The asm invocation is; -mthumb -D_SAM3X8E_   //Looks OK

 

void cdelay(int count)
{
   80304:	b480      	push	{r7}
   80306:	b085      	sub	sp, #20
   80308:	af00      	add	r7, sp, #0
   8030a:	6078      	str	r0, [r7, #4]
	for (int i = 0; i < count; i++){}
   8030c:	2300      	movs	r3, #0
   8030e:	60fb      	str	r3, [r7, #12]
   80310:	e002      	b.n	80318 <cdelay+0x14>
   80312:	68fb      	ldr	r3, [r7, #12]
   80314:	3301      	adds	r3, #1
   80316:	60fb      	str	r3, [r7, #12]
   80318:	68fa      	ldr	r2, [r7, #12]
   8031a:	687b      	ldr	r3, [r7, #4]
   8031c:	429a      	cmp	r2, r3
   8031e:	dbf8      	blt.n	80312 <cdelay+0xe>
}
   80320:	3714      	adds	r7, #20
   80322:	46bd      	mov	sp, r7
   80324:	f85d 7b04 	ldr.w	r7,  , #4
   80328:	4770      	bx	lr
   8032a:	bf00      	nop

0008032c <main>:
 * \brief Application entry point.
 *
 * \return Unused (ANSI-C compatibility).
 */
int main(void)
{
   8032c:	b580      	push	{r7, lr}
   8032e:	af00      	add	r7, sp, #0
    /* Initialize the SAM system */
    SystemInit();
   80330:	4b06      	ldr	r3, [pc, #24]	; (8034c <main+0x20>)
   80332:	4798      	blx	r3

	//Disable WDT
	WDT->WDT_MR = WDT_MR_WDDIS;
   80334:	4b06      	ldr	r3, [pc, #24]	; (80350 <main+0x24>)
   80336:	f44f 4200 	mov.w	r2, #32768	; 0x8000
   8033a:	605a      	str	r2, [r3, #4]
	
    while (1) 
    {
		cdelay(5);
   8033c:	2005      	movs	r0, #5
   8033e:	4b05      	ldr	r3, [pc, #20]	; (80354 <main+0x28>)
   80340:	4798      	blx	r3
        delay(5); 
   80342:	2005      	movs	r0, #5
   80344:	4b04      	ldr	r3, [pc, #16]	; (80358 <main+0x2c>)
   80346:	4798      	blx	r3
    }
   80348:	e7f8      	b.n	8033c <main+0x10>
   8034a:	bf00      	nop
   8034c:	00080235 	.word	0x00080235
   80350:	400e1a50 	.word	0x400e1a50
   80354:	00080305 	.word	0x00080305
   80358:	0008035c 	.word	0x0008035c

0008035c <delay>:
 *  Author: Mike
 */ 
		.section .text.utils
		.align	2
		.global delay
delay:	mov		R1, #0x0021
   8035c:	2121      	movs	r1, #33	; 0x21

0008035e <loop>:
loop:	sub		R1, R1, #1
   8035e:	3901      	subs	r1, #1
		cbz		R1, iloop
   80360:	b101      	cbz	r1, 80364 <iloop>
		b		loop
   80362:	e7fc      	b.n	8035e <loop>

00080364 <iloop>:
iloop:	sub		R0, R0, #1
   80364:	3801      	subs	r0, #1
		cbnz	R0, ret
   80366:	b900      	cbnz	r0, 8036a <ret>
		b		delay
   80368:	e7f8      	b.n	8035c <delay>

0008036a <ret>:
ret:	bx		lr	
   8036a:	4770      	bx	lr

Interesting that the cdelay pointer in the jump table is to  0x00080305 but the code says it begins at 0x00080304? and if you look at the asm code it points to what I would thing would be the correct address.

The C code equivalent works and the asm code does not?  Now I'm really confused...but at my age that don't take much. :)

Happy Trails,

Mike

JaxCoder.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That was my point. In Thumb code the destination encoded for a branch is never the target of the branch. It should always be destination+1 byte. Yet in the code you showed previously it was 80338 when I would have expected 80339. That was odd! If you get it wrong then it will throw an exception just as you are seeing.

 

BTW I was interested in seeing the .s not the .lss. I'll try it myself...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Clawson,

 

I included the .s file in my original post.

 

Why does the thumb code point to destination + 1 instead of destination?  All the asm code I've wrote over the years always pointed to the actual address, but then again it never used a jump table like they do here.

 

Here's the .s file again;

 

/*
 * utils.s
 *
 * Created: Mon 10 6 2014 6:14:19 PM
 *  Author: Mike
 */ 
		.section .text.utils
		.align	2
		.global delay
delay:	mov		R1, #0x0021
loop:	sub		R1, R1, #1
		cbz		R1, iloop
		b		loop
iloop:	sub		R0, R0, #1
		cbnz	R0, ret
		b		delay
ret:	bx		lr	
		.end

 

Happy Trails,

Mike

JaxCoder.com

Last Edited: Tue. Oct 7, 2014 - 04:08 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No I said:

As a test (I suppose I could do this) build the same but implement delay: as a C function (but in a separate file) and see what the BLX to it shows. Also in doing this add -save-temps then look at the .s the .c generates and compare to your .S and see if anything is obviously different (like some directive you missed).

So I did this:

void mydelay(int n) {
	volatile int m = n;
	while (m--);
}

and using -save-temps I found the C compiler converted that to be:

	.syntax unified
	.cpu cortex-m4
	.fpu softvfp
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 1
	.eabi_attribute 30, 1
	.eabi_attribute 34, 1
	.eabi_attribute 18, 4
	.thumb
	.file	"delay.c"
	.section	.text.mydelay,"ax",%progbits
	.align	2
	.global	mydelay
	.thumb
	.thumb_func
	.type	mydelay, %function
mydelay:
	@ args = 0, pretend = 0, frame = 8
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	sub	sp, sp, #8
	str	r0, [sp, #4]
.L3:
	ldr	r3, [sp, #4]
	subs	r2, r3, #1
	str	r2, [sp, #4]
	cmp	r3, #0
	bne	.L3
	add	sp, sp, #8
	@ sp needed
	bx	lr
	.size	mydelay, .-mydelay
	.ident	"GCC: (crosstool-NG 1.19.0 - Atmel build: 275) 4.8.3 20131129 (release) [ARM/embedded-4_8-branch revision 205641]"

Admittedly I did that in a SAM4 project I happened to have rather a SAM3 one but, like I say, you could do the same. Some key things I see in there are:

	.cpu cortex-m4

and perhaps this is the big one:

	.thumb

If it were me and I wanted to inter-work C and Asm I don't think I'd ever start writing a blank .S from scratch. I'd always at least sketch out the ABI of the functions I want to provide in C and then build that with -save-temps then take the generated .s and rename to .S and use that as a template for the Asm I wanted to write. That way you get to know things that currently only the C compiler "knows" like whether ".thumb" and ".cpu cortex-mX" were important. I'd also be looking up all those other directives the C compiler has choosen to use. I wonder what ".syntax unified" means and whether all those ".eabi_attribute" lines are something important.

 

BTW as I say I happened to have a SAM4 not SAM3 project so not only is ".cpu cortex-m4" wrong for you I rather imagine ".fpu softvfp" is wrong too as M3 differs from M4 in the floating point support.

 

So do what I did. Build with -save-temps and study the .s

 

Oh and another interesting thing. When I look at the call to mydelay() from the main .c file I see:

	ldr	r3, .L10+8
	blx	r3
...
.L10:
	.word	SystemInit
	.word	.LC0
	.word	myprintf
	.word	mydelay

yet in the LSS I see:

  400284:	4b05      	ldr	r3, [pc, #20]	; (40029c <main+0x24>)
  400286:	4798      	blx	r3
...
  400294:	0040021d 	.word	0x0040021d
  400298:	004003d4 	.word	0x004003d4
  40029c:	00400259 	.word	0x00400259
  4002a0:	0040022d 	.word	0x0040022d

Note those targets, including mydelay: are all odd. Yet the .s just showed ".dw mydelay" so the linker assigned an odd address to mydelay. It's my belief that IS because of the ".thumb" it contains.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

PS thanks for this thread - one of the most interesting on Freaks in weeks. I definitely learned something new today (the need for .thumb in .s for Cortex).

 

Oh and I gotta ask: why Asm? If anything the ARm C compiler is even better at creating Asm than the AVR one so it is very rare where you can out-think the C compiler when working with ARM. About the only reason I can ever think of is cycle correct video generation (though I imagine that could actually be tricky in Cortex because of the 3 stage pipeline and the ability to do accurate cycle counting).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I starting a large IoT project, been researching for a long time and ordering parts I need and one of the first things I want to do is convert some nRF24L01+ code I wrote in C++ for the ATMega to C code for the Due and in the code are some areas where I need to pulse the CE line for 10uS+ and while not critical thought an asm method from C might be the best way to go.  I had thought about using RTT but the best I could do with it would be 8mMHz / SCLK which would give me around 30uS or so, don't have the calculation handy.  I could use one of the timers I guess but just thought asm would be quick and clean?

 

This code works;

/*
 * utils.s
 *
 * Created: Mon 10 6 2014 6:14:19 PM
 *  Author: Mike
 */ 
		.section .text.utils
		.align	2
		.global delay
		.thumb
		.thumb_func
		.type	delay, %function
delay:	mov		R1, #0x0021
loop:	sub		R1, R1, #1
		cbz		R1, iloop
		b		loop
iloop:	sub		R0, R0, #1
		cbnz	R0, ret
		b		delay
ret:	bx		lr	
		.end

Did a real quick test and it works fine, will look up the other stuff later right now I'm putting a dishwasher in for my son and daughter-in-law.

 

Thanks for all your help I really appreciate it.  I guess I didn't get the output that you did because I didn't do the .save-temps thing as I didn't understand it and thought that the .lss would give me the same.

Happy Trails,

Mike

JaxCoder.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Clawson,

 

Followup;

Found an excellent article while researching my problem and found http://www.coranac.com/tonc/text/asm.htm "Whirlwind Tour of ARM Assembly", very good article and explains a lot.

 

Thanks again for your help,

Mike

Happy Trails,

Mike

JaxCoder.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Note that the tutorial you found is essentially pre-cortex, and assumes that the full 32bit ARM instruction set is available.

 

I've also been playing with ARM assembly language recently (for another vendor's chip, alas), and it's made me rather uncomfortable that as far as I can tell, the gnu assembler doesn't have a "Cortex" mode that enables both thumb and thumb-2 instructions.  Instead, that's sort-of kludged in there with .thumb and .syntax unified and maybe influenced by .cpu ?  In this particular case, I'm very surprised that -mthumb on the command line doesn't do the same thing as .thumb in the source.  Perhaps it's a failure to provide as much information to the linker as it needs.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

Note that the tutorial you found is essentially pre-cortex, and assumes that the full 32bit ARM instruction set is available.

 

I've also been playing with ARM assembly language recently (for another vendor's chip, alas), and it's made me rather uncomfortable that as far as I can tell, the gnu assembler doesn't have a "Cortex" mode that enables both thumb and thumb-2 instructions.  Instead, that's sort-of kludged in there with .thumb and .syntax unified and maybe influenced by .cpu ?  In this particular case, I'm very surprised that -mthumb on the command line doesn't do the same thing as .thumb in the source.  Perhaps it's a failure to provide as much information to the linker as it needs.

 

Found a good article on ARM assembler http://www.coranac.com/tonc/text/asm.htm and the .thumb can be used but it also says, and I use the .code 16 directive.  I guess that still a bit of a kludge but the .code 16 is a little more definitive.

Happy Trails,

Mike

JaxCoder.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But that's the same link as before?

 

Anyway, as I said before the best way to learn "thumb/cortex Asm" is to get the C compiler to write it for you - just sketch out what you want in C, -save-temps then use the .s as a template for your own .S.

 

It can also help to debug (C) in "goto disassembly" view and step opcodes not statements then look at the CPU view to see what they are actually achieving (alongside a manual or quick ref card).

Last Edited: Thu. Oct 9, 2014 - 09:46 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

But that's the same link as before?

 

Anyway, as I said before the best way to learn "thumb/cortex Asm" is to get the C compiler to write it for you - just sketch out what you want in C, -save-temps then use the .s as a template for your own .S.

Yes I was just passing it on to the OP.

 

I agree let the C compiler do the work!

Happy Trails,

Mike

JaxCoder.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Grr.  I just spent several hours puzzling over essentially the same problem, only with a runtime-computed ISR vector.

You'd think, having participated in this discussion, it wouldn't have taken so long.  I guess part of the problem is that sometimes the assembler adds the 1 for you, and other times it doesn't.