XMega SRAM slow turnaround? - Solved (Glitchy Power Supply).

Go To Last Post
59 posts / 0 new

Pages

Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Something that once worked on Mega seems to fail on XMega....


ld r16,x
st x,r17

This is part of a tight loop that reads data from SRAM as set by the X pointer and then clears that exact location on the next instruction using data in the R17 register.

Seems simple enough, and works just fine on Mega series.

But on XMega, it fails horribly, setting whatever damn address in SRAM it feels like. In fact it is so random that it causes an instant crash as SRAM, stack, and IO space is randomly overwritten.

Any ideas?

Is there some kind of slow turnaround in the XMega SRAM?

It seems as though the x pointer flails wildly in the wind if you try to ST just after a LD instruction.

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

Last Edited: Fri. Dec 27, 2013 - 09:18 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you need to configure wait states on external memory by any chance?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Do you need to configure wait states on external memory by any chance?

Sorry, did not mention... using the internal SRAM.

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've never heard about this kind of issue . The code works in the Simulator my '256A3 and sign. #1 and #2 below .

	ldi XL, lo8(0x2000)
	ldi XH, hi8(0x2000)
	ld r16,X
	st X,r17 

	ret

BUT it doesn't write to the ram until after I step past the "ret" instruction . IOW it should do it in the Sim. v2 when cursor lands on "ret", but it does the write after stepping the last instruction !

Edit: It prints the correct value to my LCD after I add code to then

ld r24, X 

1) Studio 4.18 build 716 (SP3)
2) WinAvr 20100110
3) PN, all on Doze XP... For Now
A) Avr Dragon ver. 1
B) Avr MKII ISP, 2009 model
C) MKII JTAGICE ver. 1

Last Edited: Sat. Mar 31, 2012 - 09:12 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sounds like something is SERIOUSLY wrong here then. This is going to break my project in a big way since I need to read a location into a reg and then instantly replace it with another value.

I do this 15000 times in a tight loop at 32MHz, writing to the internal SRAM!!!!

This is a real bummer. So far every bit of code I ported from my Mega projects worked as good or better on the XMega until now.

Is this some new functionality of internal SRAM use or a bug?

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In my application, the issue is much worse than having to wait for a ret. Here, the XMega just pukes all over the internal SRAM randomly until a hard crash...


ldi xl,low(8200) ;1
ldi xh,high(8200) ;1
clr r2 ;1
ldi r16,150 ;1

pixels:
ld r17,x ;2
st x,r2 ;2
sts PORTD_OUT,r17 ;2
andi r17,15 ;1
sts TCC0_CNT,r17 ;2
sts TCC0_CNT+1,r2 ;2
adiw xh:xl,1 ;2
dec r16 ;1
brne pixels ;1/2

In this simple loop, 150 SRAM locations are read and then cleared.

It works very well in any Mega AVR, but in the XMega, I have to remove the ST command completely.

I can read (LD) or write (ST) as fast as I want, but can never do both for some reason!

Odd.

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sounds like you bought a broke chip, maybe ! Do you have another Xmega to try it on ?

Does it mess up at lower speeds ?

1) Studio 4.18 build 716 (SP3)
2) WinAvr 20100110
3) PN, all on Doze XP... For Now
A) Avr Dragon ver. 1
B) Avr MKII ISP, 2009 model
C) MKII JTAGICE ver. 1

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Tried it on several. XMega256 in both packages.

I would also assume the internal SRAM may be toast if it wasn't for the fact that I can read or write to it all I want as long as I don't do it one after the other.

In the code I posted, I can simply replace ST X,R2 with 2 NOPs and all is well. Other parts of the code write to SRAM after the video generation interrupt has completed.

In the code I posted, I was clearing a pixel right after it was sent to the monitor, which means this does not have to be done in the main loop.

This works on Mega and unless there are new timing constraints on the XMega, it "should" work here as well.

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just tried on ATxmega64A1 rev H, running at 32MHz (8MHz oscillator plus 4x PLL multiplier). Seems to work. You are not overclocking it or running at low voltage or anything like that?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well... I am planning on it!

But it this point, running at 3.3volts and at a nice slow pace of "only" 28MHz

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Still no solution.

Even tried using the x reg to load and the y reg to save, but as long as ST follows LD in a tight loop, the SRAM is written randomly.

Code always works on any Mega, so am ruling out "operator error".

Oh, and "simulation", which is something I never use... works fine.

Hmmmmm.

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Longshots:
Is the compiler (assembler?) outputing the correct opcodes?
Could it be RAMPX|Y|Z shennanigans?
Does it work if the program is in a different location in flash? (above/below 64k boundaries??)
Does it work to clear a different area of SRAM?

Workround:
Can you shuffle the post-LD code such that the ST is a few instructions later?

Nigel Batten
www.batsocks.co.uk

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the ideas. Tried most of them, and so far the problem persists.

This project generates a 256 color NTSC signal directly from software using only a few resistors, and the frame buffer lives in the XMega SRAM from location 8200 to 23200.

The video signal is perfect, and having 2 free cycles between pixels, I was simply trying to clear a pixel right after sending it to the TV. This saves the main program 15000 cycles later on because it no longer has to clear the 150x100 pixel screen before drawing new graphics.

So at this point everything is perfect. I can read pixels from SRAM during the video line rendering interrupt and I can read or write them during the program main loop.

But I simply cannot write the same location after reading it in the tight horizontal rendering loop. No way, no how.

... works on an AVR644 overclocked to 28.626 MHz, but NOT on an XMega clocked correctly at the same.

I have tried all sorts of workarounds, mostly stupid, but just in case...

- used the Y reg rather than X reg for an SRAM pointer since the help in Studio5 does not claim you can use the X reg for this (I know you can).

- used the y reg and both the x reg, having x read and y write.

- moved the instructions around to add a delay of 4 cycles between read and write.

- checked ramp, other SRAM lcoations.

Even stranger is the fact the the "core" system works at 57.272MHz!!! It's jsut that LD and ST thing that fails at any speed. Again, only on the XMega, not any Mega series.

I think it's time to let it go and deal with the 15000 lost cycles in the main loop sadly.

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've known freaky folk that got screwed up behavior in their non-Xmega MCUs using AS5 and cleared up by trying Studio 4 ! Worth a swing !

1) Studio 4.18 build 716 (SP3)
2) WinAvr 20100110
3) PN, all on Doze XP... For Now
A) Avr Dragon ver. 1
B) Avr MKII ISP, 2009 model
C) MKII JTAGICE ver. 1

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have discovered part of my problem.

Get this....

ST no longer takes 2 cycles, it only takes 1!!!!

Now this may seem trivial, but not to anyone counting cycles in a tight loop like I am. This is a REALLY BIG DEAL!

LD still takes 2 cycles, but ST only takes a single cycle. (On Mega, both take 2 cycles).

Go figure!

But alas, I still cannot get them to work one after the other. I am suspecting that this only happens in the interrupt though.

More confusion, more testing....

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Finally figured it out.

1) On the XMega, the ST instruction only takes 1 cycle as opposed to the Mega, where it takes 2 cycles. This is not shown in the help file. The LD instruction still takes 2 cycles on the XMega.

2) I had to add a delay after setting up the PLL for some reason. I guess things were not stabilized enough when hitting the interrupt, and the SRAM was getting written randomly by the x pointer.

Now it works.

I have 256 colors displayed on an NTSC TV and a 4 channel sample sound system all running on a bare XMega 256 with only 10 resistors and a capacitor.

This was a long time personal challange that started by trying to synthesize full NTSC color on a bare Mega644 (totally in software), and it kind of worked with some serious overclocking. The XMega makes it work perfectly.

The video system is fully bitmapped from internal SRAM and the NTSC signal is rock solid, showing 16 shades of 16 colors for a total of 256 colors. Sound samples are mixed on the fly and sent to the onboard DAC.

Will be releasing the project "soon" with example code and a few arcade game ports.

I am eagerly awaiting the XMega384 so I can boost the resolution of this system. If the 384 ever comes out, I may even make boards.

Thanks again for this community!
Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AtomicZombie wrote:
... and a few arcade game ports.
Cool! Looking forward to your creations. Asteroids (Atari, MOS 6502)? If yes you'll have a customer!
ST - XMEGA AU series datasheet does show 1 clock cycle for most ST instructions, but I wasn't aware of that change until you found it.

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So there was never a problem with the ST instruction at all? There was just memory corruption in the startup code due to PLL issues?

It sounds like a great project! Did this discussion imply that you're not using DMA at all, so this should run on D-series xMegas as well as the "full featured" parts?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks!

No DMA used, just one timer interrupt for the video rendering engine and another timer for the chroma phase trickery.

I am using the onboard DAC as well, but may switch to an R2R network as it sounds better (cleaner).

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

This is not shown in the help file.

But is in the opcode manual. (in fact the opcode manual has been complicated by things like this - trying to document all of the small tinys, normal tiny/mega and xmega in a single document)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Time to resurrect a 2 year old thread!

I have had some time to experiment, and this LD / ST bug is still present on all of my XMega chips (256 and 384).

If I do this, all is fine...

ldi r16,200 ;1
TESTLOOP:
ld r17,Y+ ;2
out VPORT1_OUT,r17 ;1
dec r16 ;1
brne TESTLOOP ;1/2

This reads 200 bytes from SRAM as pointed by Y and sends them out to a port. It works as expected.

But as soon as I add an ST op into the tight loop, all XMegas fail. Sure simulation may show a green light, but real hardware just crashes. I did my tests with multiple clock ranging form an external 1MHz module to the internal 32MHz and failure is always the same.

This fails...

movw XL,YL ;1
ldi r16,200 ;1
clr r1
TESTLOOP:
ld r17,Y+ ;2
out VPORT1_OUT,r17 ;1
st X+,r1 ;1
dec r16 ;1
brne TESTLOOP ;1/2

What should happen is that after the byte is read from internal SRAM and sent to the port, it is cleared by writing the value of zero (r1) to the same SRAM location.

I have reworked this code many ways, and as long as there is an ST closely following an LD, it always results in a complete lockup of any XMega.

I have the latest XMega384C3 and 2 revisions of XMega256. All fail.

This has stumped me for 2 years, and I am still not brave enough to blame the hardware.

Anyone have any suggestions, or an idea on how to further test this problem?

Cheers and happy holidays,
Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sounds like a pipeline problem. Put it to Atmel as you seem to have a simple, concrete example.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Indeed, the more I mess around, the more I see that something is not working as expected internally.

In fact, if I simply do 2 concurrent reads form SRAM in my loop, there is significant jitter in my code, about 1 half cycle to 1 cycle randomly.

This causes it...

ld r17,Y+ ;2 
ld r17,Y+ ;2 

Whereas this creates no jitter at all...

ld r17,Y+ ;2 
nop ;1
nop ;1

Unfortunately, jitter is not acceptable in my application, as I am sending bytes to a VGA monitor, and a timing error of even half a cycle is very noticeable.

So 2 concurrent reads causes internal timing issues and a read + write in a tight loop crashes all XMegas.

Where do I go on the Atmel site to open a webcase? I will strip the code to a simple LED flasher, showing how easy it is to completely lock up an XMega.

Thanks,
Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I thought it looked from one of the docs I read (can't find it now) that the xmega deferred the maths on a post inc/dec to speed up those operations.

If you tried to do them back to back it added an extra clock (unless there was a interrupt between)

A post inc then an out was all 1clock each, but two post incs in a row made the first one 2 clocks.

Oh I realise it is over a year late, but if gchapman has not packed up shop and moved to Florida he said

"Cool! Looking forward to your creations. Asteroids (Atari, MOS 6502)? If yes you'll have a customer!"

Maybe you should look at this one on the Uzebox

http://www.youtube.com/watch?v=X...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

interesting. So this introduces some ARM-like quirks into the otherwise predictable Xmega then.

Bummer, that's why I stayed away from ARM so far - cycle counting is my bag baby!

I think XMega is the best 8 bit chip out there since you can easily. run it at 64MHz and predict assembly by 99% accuracy. I used to say 100% accuracy, but now that I know of this LD ST pipeline glitch, I can no longer claim 100%.

Too bad, I could have increased my graphics performance by almost 30% by clearing the video memory right after it was displayed in the tight active line loop. This way, I wouldn't have to clear the entire video memory before drawing a new frame.

Even so, I managed to achieve great performance from a single XMega384...

https://www.youtube.com/watch?v=...
Single XMega does color video and sound!

Over my holidays, I am converting this from NTSC to VGA for release here in the projects section.

I am hoping more will jump into XMega, since it is a real alternative for ARM. Advantages as I see them...

- Still 8 bit so easy to learn
- Great free toolchain that works with minimal setup
- 64 MIPs if you are not afraid to overclock
- 32K SRAM and 385K progmem in the XMega384!
- Very predictable assembly cycle counting
- Super easy assembly instruction set

And of course, a great community!

If I ever solve this LD ST crashing dilema, I will report back, but for now I am moving on with my NTSC to VGA conversion using the original code.

Cheers,
Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brad/Mr_Zombie,

I've only got an xmega I recently purchased myself in a bag. I have no PCB for it yet so I can't test this idea.

What happens if you put a NOP between the Post-Inc-X and the dec

movw XL,YL ;1 
 ldi r16,200 ;1 
 clr r1 
 TESTLOOP: 
 ld r17,Y+ ;2 
 out VPORT1_OUT,r17 ;1 
 st X+,r1 ;1 
 nop              ; <<<<< NOP in here
 dec r16 ;1 
 brne TESTLOOP ;1/2

Does that stop the random trashing of RAM ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brad, it's nice to see you are still around. Happy Holidays!

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AtomicZombie wrote:

Too bad, I could have increased my graphics performance by almost 30% by clearing the video memory right after it was displayed in the tight active line loop. This way, I wouldn't have to clear the entire video memory before drawing a new frame.

If you had to place a NOP after each memory fetch/write, you would not be any further behind than the 644 that takes 2 clocks always would you?

In the asteroid and tempest render loops I clear RAM during the H_Blank. Uze/Alec has set aside 200 clocks per scanline for the Audio-out routines, but I still have enough time to clear RAM then. Though my video buffer is only 2K.

In the Tempest render code I had to use several silly tricks to get enough free time. I did not have enough spare clocks left to do a DEC/BRNE to count the pixels even.

Do you have the complete render code available somewhere to look at. Maybe I or someone else might be able to spot somewhere else you could save a clock or two.

AtomicZombie wrote:

I am hoping more will jump into XMega, since it is a real alternative for ARM. Advantages as I see them...

- Still 8 bit so easy to learn
- Great free toolchain that works with minimal setup
- 64 MIPs if you are not afraid to overclock
- 32K SRAM and 385K progmem in the XMega384!
- Very predictable assembly cycle counting
- Super easy assembly instruction set

And of course, a great community!

I think the ARM M0 looks about as clean/predictable as the AVR/XMega. I think it only gets silly on the M3 upwards.

I have bought xmega A and E series to play with, but have not had enough time yet.

With 32K RAM, a 57.272 Mhz clock, Single cycle RAM access - it could make for a lot more options.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you tried eliminating the post-increment thing?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Has your testing always been done with a loop? How many times through the loop before it fails, or does it fail the first time? In the loop, LD follows ST as well as ST follows LD, so which causes the failure?

Are you sure the ST operand address doesn't go out of bounds and clobber something else?

Does the program crash if you replace the ST with another LD? Or if you replace the LD with another ST?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Happy holidays to all!
Thanks for all the responses and ideas.

Andrew...
My video engine draws 200x160 over the VGA standard 640x480 screen. Also included is a 4 channel stereo sound system that fetches and mixes sound effects stored as program bytes. Needless to say, I have used all but 25 free cycles in the entire 2034 cycle horizontal line interrupt.

I do however have 2 free cycles in the pixel drawing loop, and could indeed jam the ST instruction in there to clear after read, but this crashes every time. Yes, I also tried to unroll and add nops, but even with 2 nops, it still fails.

Here is my working pixel loop...

// DRAW 200 HORIZONTAL PIXLES (1600 CYCLES / 8 = 200 PIXELS)
ldi r16,200 ;1
PIX:
ld r17,Y+ ;2
out VPORT1_OUT,r17 ;1
nop ;1
nop ;1
dec r16 ;1
brne PIX ;1/2

But like I said before, adding ST in there crashes the XMega (all variants), and even adding a second LD in there causes random jitter.

Methinks something is wrong under the hood, and not in my code!

Note : adding ST after LD on any ATmega does not fail, it only causes a lockup on an XMega.

I will post the entire project for examination as soon as I finish the PC program that converts BMPs and WAVs to usable sound and graphics includes.

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

Last Edited: Wed. Dec 25, 2013 - 05:14 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AtomicZombie wrote:

This fails...

movw XL,YL ;1
ldi r16,200 ;1
clr r1
TESTLOOP:
ld r17,Y+ ;2
out VPORT1_OUT,r17 ;1
st X+,r1 ;1
dec r16 ;1
brne TESTLOOP ;1/2

HI. Sorry for my english.

This code is fully working.
What does not work for you?

.include "ATxmega128A1Udef.inc"

	ldi	ZL, low(SRAM_START)
	ldi	ZH,high(SRAM_START)
	ser	R16
	out	VPORT0_DIR,R16
	clr	R1

START:	movw	YL,ZL
	ldi	R16,200
LOOP1:	st	Y+,R16
	dec	R16
	brne	LOOP1
	
	movw	YL,ZL
	movw	XL,YL
	ldi	R16,200
LOOP2:	ld	R17,Y+
	out	VPORT0_OUT,R17
	st	X+,R1
	dec	R16
	brne	LOOP2

	rjmp	START

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AtomicZombie wrote:

Here is my working pixel loop...

// DRAW 200 HORIZONTAL PIXLES (1600 CYCLES / 8 = 200 PIXELS)
ldi r16,200 ;1
PIX:
ld r17,Y+ ;2
out VPORT1_OUT,r17 ;1
nop ;1
nop ;1
dec r16 ;1
brne PIX ;1/2

It certainly sounds like something "under the hood". You are not making any mistakes at all. It may be a poorly documented artifact of how they sped things up. A lot of the new docs from Atmel are bad. Mixing up the words internal and external seems to be a favorite.

Have you tried just using the Y register alone.

Read the pixel with Y and don't increment.
Write zero using the Y register and do increment.
Do the OUT instruction after this to give the CPU some breathing space.

// DRAW 200 HORIZONTAL PIXLES (1600 CYCLES / 8 = 200 PIXELS)

   ldi   r16,200
PIX:
   ld    r17,Y            ; Don't post inc here and use Y for both LD and ST
   st    Y+,r0            ; Clear the pixel just read and post inc
   out   VPORT1_OUT,r17 
   dec   r16
   brne  PIX

It also saves a precious 16 bit register. I need all the 16 bit registers I can get hold of :)

The final thing to try is this. I know it blows your 8 clock budget, but you can do some other tricks to get that down if it works.

// DRAW 200 HORIZONTAL PIXLES (1600 CYCLES / 8 = 200 PIXELS)

   ldi   r16,200
PIX:
   ld    r17,Y            ; Don't post inc here and use Y for both LD and ST
   st    Y,r0             ; Clear the pixel just read and Still don't inc
   out   VPORT1_OUT,r17 
   subiw -1               ; Increment Y seperatly
   dec   r16
   brne  PIX
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I will try knock up a test PCB today.

I have an ATXMEGA192A3U in a static bag. I will just put it on a PCB and then do timing test on the scope.

See what I can work out about the instructions.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Found it!!!

Seems after all this time, I had a flaky power supply!
The extra SRAM access must have just been enough to push the power supply to begin ringing. Ground bounce perhaps?

Anyhoo, I built a new supply and right away the new loop with LD and ST began to work as expected.

Wow, that is the last thing I would have ever tried, and only found it because I moved my board to another desk and had another 3.3 volt supply there already.

I was 99.9% certain that this was an internal issue in the XMega, but once again the blame falls on the operator.

This would also explain why the code worked on the ATMegas I tested... they are running from another 5v supply.

What a great XMas present... a 30%-40% increase in the performance of my video engine running the auto-clear code.

Thanks to all who offered advice.
Will post some updated videos and code once my new audio/video demo called "Bare Metal" is completed.

Cheers,
Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

excellent.

BTW are you planning on using DMA for that task eventually?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AtomicZombie wrote:
The extra SRAM access must have just been enough to push the power supply to begin ringing. Ground bounce perhaps?
Maybe.
The MCU's load will end up as a (kinda) randomized step input to the power supply's error amplifier.
Doesn't take much with some power supplies to get a slight, or destructive, oscillation.
With analog circuits the first try is with a cell or battery;
could likewise do the same with digital.
For XMEGA a proposed cell is LFP or a battery made of two series-connected NiZn cells.
Use the matching XMEGA schematic checklist.
Keep wires short and use appropriate capacitor(s).
LFP = Lithium Iron Phosphate
Testing power supply: Measuring stability by Bob Hanrahan (Texas Instruments, Apr 23 2013)

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

andrewm1973 wrote:
Oh I realise it is over a year late, but if gchapman has not packed up shop and moved to Florida ...
Nope.
This Texan deals with Texas.
Still interested in Brad's AMAZING creations.

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Given that the problem occurred at 1 MHz, where power consumption is much lower, it seems very odd that the the power supply would only fail with that particular instruction sequence instead of a general failure at high clock rates. Until that can be explained, I would be leery of declaring the problem completely solved.

You might look at transient current draw with a scope and current probe. If no probe then make a current xfmr with small toroid core.

Rick

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Another current measuring instrument for embedded computing is Real-Time Current Monitor (ee-equipment.com).
Its dynamic range may be better than a current transformer though its price may be greater.

Could use the XMEGA as a step load on the power supply using a scope to observe the XMEGA's power supply voltage.
The problem could be a brownout then a non-spec power-up causing the XMEGA to hang (maybe a malfunctioning main oscillator due to mis-operation).
Or Vcc exceeds the spec and the XMEGA's Vcc shunt activates and this feeds back to the power supply and so forth and so on.
Comparing the XMEGA A and C schematic checklists, power supplies section, shows more information in the notes for XMEGA C.
For both XMEGA A and C Atmel recommends a 10micro-henry wire wound inductor between the power supply and the XMEGA Vcc; that's significant and may hint that Atmel has seen XMEGAs upset some power supplies.

MEGA -
AVR042: AVR Hardware Design Considerations
states use of a ferrite bead between the power supply and the MEGA's Vcc.

Ref.
Profiling power: real-time current monitor by Jack Ganssle (embedded.com; March 18, 2013)

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

andrewm1973 wrote:
Maybe you should look at this one on the Uzebox
Thanks for the link.
An additional URL wrt Asteroids, Vector Game - Uzebox (uzebox.org)

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AtomicZombie wrote:
The extra SRAM access must have just been enough to push the power supply to begin ringing. Ground bounce perhaps?
You might be onto something.
VRM Stability - Part I: Feedback by Dr. Howard Johnson (9/10/2007)
VRM = Voltage Regulator Module
Search for bounce.
You may want to examine your PCB layout.
VRM stability is dependent on the load impedance which varies.
Some VRM data sheets state some confidence about stability, others are conditional (analyze, measure, test), and others have preconditions (you better get it right).

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gchapman wrote:
andrewm1973 wrote:
Maybe you should look at this one on the Uzebox
Thanks for the link.
An additional URL wrt Asteroids, Vector Game - Uzebox (uzebox.org)

The actual game code (in C) is a bit of a dogs breakfast. It is poorly factorized and very lacking in comments. It was done in a bit of a rush.

I am going to go through and fix it up so it all looks neat and is readable.

The ASM code is quite OK though. I spent a lot of time optimizing the line/object draw routines. When I write ASM I always try to keep the comments up to date as I go, because it is harder to go back too.

Still after all the optimizing I did. When I just re looked at it then, I saw I could shave a few clocks of the C-Call-able SetPixel routine.

The original mode-6 from Uze/Alec was a good starting point. It didn't have anywhere near enough speed to pull of the asteroids game though.

BTW, this is the first thing in the forum I asked about using the linker to put a variable at a fixed RAM location (and was no doubt told the compiler is smarter than me and I should let it sort it out)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My power supply was just a bodge job I made for my breakout board. A single 1117 connected to a 5v wall wart. No caps.... I know, I asked for it!

But things are humming along now, so all is good.

Andrew... no DMA, this slows things down, halting the CPU during transfers.

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AtomicZombie wrote:
no DMA, this slows things down, halting the CPU during transfers.

Brad

That sounds counter to the way I would expect DMA to work.

I have played with the DMA on the AT32-UC3 devices and it can quite happily transfer things while the CPU belts along at full speed.

I wonder why Atmel did a crippled version of DMA the XMega.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gchapman wrote:
andrewm1973 wrote:
Maybe you should look at this one on the Uzebox
Thanks for the link.
An additional URL wrt Asteroids, Vector Game - Uzebox (uzebox.org)

OH - There is an emulator for the Uzebox you can run on your desktop PC if you want to try playing the Asteroids clone.

http://uzebox.org/wiki/index.php?title=Emulator

You don't actually need to build it yourself as there is a windows EXE in the full download.

You do have to install SDL for it to work though.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Indeed, I have been messing around with the AT32-UC3 as well, and it does have some serious juice. Sadly, cycle counting is sketchy at best, so most of my work is a no-go on the UC3, just as it is on the ARM.

I do have a second project in the works though, and it is another XMega "impossibility". My other project uses ONLY an Xmega, a pair of external SRAMs, a few 74HC logic chips, and offers the following specs...

- 640x480 VGA with 256 colors
- 4 or 8 stereo sample sound channels
- Fully double buffered video screens
- 95% free CPU time to draw graphics!
- 64 MIPs operation and 380K free program memory
- 32k internal SRAM 100% free for program use.

The new unit is based on my 200x160 VGA system I am working on now, but I want to finish this one first and release it because it is easy to build... just a single XMega and a few resistors to solder.

The entire audio/video engine is an assembly include, and all of the graphics routines are part of the assembly file. Programming both system is easy, since all routines are called from the "C side" like so...

DrawSprite(x,y,SpaceShip);
PlaySound(vol,freq,LaserBeam);

I also modified the "GetFarAddress" routine by Carlos, and now accessing all of the 384K program space is seamless to the C programmer. No worrying about where your data is stored.

To make game sounds and graphics, I wrote a converter that opens BMPs and WAVs and then spews out C files that are just included into the game.

I have so little time to work on these projects lately, but this one is so close now that I sorted out that ST LD issue!

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The specs are good, but do you fear you are loosing the "cool" factor of Baremetal or Uzebox with their single chips solutions?

To the layperson the single chip ATMega or XMega looks less impressive than the single chip Rasberry Pi.

To the people that know what it takes to get something like this running, the single ATMega/XMega seems a whole lot more interesting than throwing a lot of silicon at the problem.

The thing you did with the 8 pin AVR might not have 256 colour bouncing balls, but it was something only a few people around the interwebs could have pulled off. I thought IT was cool.

Hooking a counter up to a RAM as a linear frame buffer doesn't seem as magical. Even though there might be a little bit of magic with say using an internal timer and the event system to pump out the address lines.

At the end of the day you have one byte of RAM per pixel and a CPU that can run rings around a 68000. It looks a whole lot less of a challenge than getting ~~200x200 pixel resolution from 4K of internal RAM and no split data bus.

Still, in saying all that, if you are going to make and sell a dirt cheap board with it on, I will buy one to try and write a version of Zarch/Virus.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I fully agree!

This is why my new site launch is starting with the single XMega 200x160 VGA version. Only one chip and a dozen resistors.

The cool factor is there since anyone can just solder down the Xmega to a carrier board and be up and running in an hour. Single IC, loads of processing power, fun games and demos!

I actually did an XMega/FPGA system last year and put it on hold because like you said... cool factor wasn't there. Well, I guess if you know the quirks of FPGAs, then there was some cool factor writing a 100MHz SRAM interface, taking into account 24ns of propagation delay! Yeah... you send 2 addresses before you get back the first byte, as if it was caught in a time warp!

Anyhow, my single chipper is definitely my favorite, and anyone can build a soldering iron.

The other version still keeps some "coolness" though since the XMega pushes the limits using all 5 timers just to keep the counters in sync and pushing pixels. I did the original with a Xilinx XC2C256 CPLD, but realized I could replace the $15.00 CPLD with $5.00 in old school logic chips!

My only final decision is if this single chip system goes out as a "Demo Box" or a "Game Box". If I do the game route, the choice then comes down to either a simple 8 pin Atari/Commodore like joystick port where users can just build their own controller by putting a few switches in a box or using the SNES controller.

If I do the SNES controller, I won't torture anyone with finding a receptacle, but instead show how to cut the plug, decode the wires and just solder to the board.

Great fun.

Brad

I Like to Build Stuff : http://www.AtomicZombie.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

50Mhz CPU clock
25Mhz is VGA clock at 640x480

So you would have 4 clocks per pixel at 320x240.

OR 5 clocks per pixel at 256x240

I will do you a 16 colour Star Wars arcade game if you like. (Probably 256x240 maybe 320x240)

32K of RAM will be sheer luxury compared to 4K I have for Tempest.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AtomicZombie wrote:
If I do the game route, the choice then comes down to either a simple 8 pin Atari/Commodore like joystick port where users can just build their own controller by putting a few switches in a box or using the SNES controller.

Put an unpopulated 4021/74HC165 on the PCB and have best of both worlds. If they want to hook up atari style put on the shift register. If they want NES style leave the extra IC off and hook up to the serial lines.

Pages