Schroedinger's Carry

Go To Last Post
3 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I encountered something strange last week. If it's a real problem it's probably been fixed by Atmel, but I figured I'd go ahead and post it anyway because it's one of the more unusual bugs I've ever seen. If you happen to be wrestling with something similar, you may not actually be crazy. Like Schroedinger's Cat experiment, the result depends on whether you're looking or not.

I have some boards designed around 90S2313 which I upgraded to Tiny2313. Mainly that involved adding a longer interrupt table, which moved the code. So now I had a lookup table problem:

000160 708f      	andi	r24,0x0F
000161 e0f2      	ldi	ZH,high(table*2)
000162 efe6      	ldi	ZL,low (table*2)
000163 0fe8      	add	ZL,r24
000164 9184      	lpm	r24,Z
 ...
                 table:
00017B 0200
00017C 0400      	.db	0x00,0x02,0x00,0x04
00017D 1008
00017E 4020      	.db	0x08,0x10,0x20,0x40
00017F 0000
000180 0000      	.db	0x00,0x00,0x00,0x00
000181 0000
000182 8000      	.db	0x00,0x00,0x00,0x80

You can see the problem by inspection - the code moved up, the table now splits across an X00 address and the pointer doesn't accommodate it. Ok then, I'll add carry propagation:

	andi	r24,0x0F
	ldi	ZH,high(escrow_table*2)
	ldi	ZL,low (escrow_table*2)
	clr	r0
	add	ZL,r24
	adc	ZH,r0
	lpm	r24,Z
...
table:
	.db	0x00,0x02,0x00,0x04
	.db	0x08,0x10,0x20,0x40
	.db	0x00,0x00,0x00,0x00
	.db	0x00,0x00,0x00,0x80

Should do the trick, right? Imagine my surprise when the lookup failed again, but this time at the LOW end of the table. So I pulled out my new Dragon and had a look using DebugWire. The problem disappeared. Thinking I probably needed more sleep, I took the debug wire off and ran the board again. The problem came back.

I tried moving the code elsewhere in memory, forcing the table to an X00 boundary, etc. Nothing worked, though the byte R24 picked up from the LPM instruction changed every time something moved. Cutting a day's work down to a few words, eventually I deduced that the Tiny2313's Carry flag was always set after the "add ZL,r24" instruction, whatever the values of ZL and R24. However, when it was being observed by DebugWire, the Carry flag worked as advertised. This was a nuisance, as it had to be debugged without tools.

My part's date code is 0447, so it's quite an early one.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That >>is<< weird.

Does it happen under debugWire if you put a breakpoint after the sequence, or only if you step through it? In either case, what happens if you slow down the clock speed of the AVR?

And also try it on another part if you have one. I always try to insist on two prototype units for sanity checks of situations like this.

lpm   r24,Z 

is a bit of a curiousity if the code was ported from an AT90S2313, as that wouldn't be allowed. Are you sure that there aren't some side-effects from the port to the Tiny, like register being clobbered in an ISR?

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As far as I can tell, it never happens under DebugWire at all, whether stepping, full speed or with breakpoints. The code just magically works. You're right about the LPM - the original code had a bare LPM to load up R0. That was one of the first things I changed, but it didn't make any difference. I tried it with different clock speeds too. The chip has a 7.37M crystal, I prescaled it down to 1M with no effect. And I paid special attention to the ISRs because that was my first guess.

I found it in the end by shoving the table lookups out on a port, writing them all down and then searching the code for a match, which I found 256 bytes further up the flash when there should not have been a carry. If I put the table on the 0x0180 address boundary the lookup took place at 0x0200, consistently, although it was quite impossible for the "ADD ZL,R24" to generate a carry for any value in R24.

It's not just one chip, it's all the ones I have with the 0447 date code. I'll check into it again with chips from a different series.