Help with HMATRIX @ UC3B - DMAneeds to speed up.

Go To Last Post
30 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi
I am playing with the Helix Mp3-Decoder software. it works good on UC3B0256.

My problem: I have several DMA-channels that are transfering memory:
UART RX+TX for CPU-Communication
SSC TX for playback
SPI RX+TX for SD-Card

I am getting errors now. Sometimes a peripherial dma channel looses some bytes. (Seems to be an bufferoverflow in the hardware due to slow memory access) This is a sure fact.
I if change the number of a dma-chanel the erronus behavior changes to that one that has the lower priority then.

I can assign higher priority channels (0 and 1) to SSC and the playback works perfect and the UART-communication fails periodically.. If i assign high prioirity channels to UART the UART works perfect and the SSC fails..

I am sure this behavior is due to bandwidthlimitations on the HSB bridge.
I am absolutely sure this problem can be fixed with some settings on the HMATRIX module.

I have no idea how to optimize the settings.
It is mandatory that DMA works perfect, but due to bandwith needs of mp3-decoding the Ram also needs to be verry fast.. and sinc program is executed from flash.. the flash needs also to be fast ;-)
everything needs to be fast :-D
But DMA needs to be the fastest.

I have no code for HMATRIX since its a magic for me..
if someone could support me with this, i would be really thankfull..

Greetings :-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you checked out the HMATRIX example in ASF? Go to http://asf.atmel.no/selector/ and enter hmatrix in the search field.

Hans-Christian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Try to set the PBA frequency as high as possible.

-sma

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Try to set the PBA frequency as high as possible.
Basically you want it synchronous with the CPU if you do huge DMA transfers and heavy processing on the side.

Hans-Christian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

PBA is equal to CPU frequency currently. (28MHz)
The example does not help verry much (or i miss something) I tried to set all slaves to LastDefaultMaster.. (like the example) but then my software crashes totally if the memory load becomes higher...

I think i need something more specific.. :-(

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

PBA is equal to CPU frequency currently. (28MHz)
The example does not help verry much (or i miss something) I tried to set all slaves to LastDefaultMaster.. (like the example) but then my software crashes totally if the memory load becomes higher...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why don't you just double the clock speed? 56 MHz is within the maximum specification.

Hans-Christian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Powerconsumption. ;-)
The less the better.

And i am sure this can be fixed by proper setting in the HMATRIX so cheap workarounds are not an option :-)

What is the setting that speeds things up? LastDefaultMaster - I doubt its best because 44100*2*16bit are transfered. This access is verry often and short. Do i need to play around with the arbitation? Who needs to be interrupted..:-/
DMA as the fixed master to ram.. would slow down decoding, right?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Powerconsumption.
The less the better.

That depends, go to sleep when you do not do anything.

Hans-Christian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes but all i can shut down is the cpu. As long as the peripherials are running that fast, they drawing more power than necessary.. AND: i tried this.

Symptom: it becomes more seldom, but it still happens that e.g. UART communication looses some bytes.
I think its an aribitation-thing.

The more efficient way is that i setup a timer with interrupt that PWMs the CPU-clock due to estimation of audiobuffer fill status. But this is only future until i cant set the dma to an higher priority..

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How long transfers do you use? If you do longer bursts the CPU will have to handle less interrupts. I think it's weird your application is struggling at 28 MHz. Have you tried measuring the various clock outputs, just to be sure you actually run on the speeds you think you're running on?

Some basic hints below to work further. It's tricky to help debugging an application you don't have the sources or platform to test.

Are you doing any busy waits at all? If so, remove them.
Are all critical sections (sections with interrupts disabled) short and non-locking?
Are you sure not any of your interrupt handlers are called excessive? Most interrupts needs an interrupt flag to be cleared within the interrupt handler.
If you do a final write to the peripheral bus within the interrupt handler before returning, make sure to add a read from the peripheral bus as well.
Do you divide down the buses? If you transfer loads of data it is also important to have fast buses, basically the same as the CPU clock.

Hans-Christian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As said the peripherial clocks are not divided. ;-)
I have no buggy interrupts and no atomic code.. or at least if i have, its secure.
Busywaits.. yes.. i wait that the previos audiobuffer is finished... and the SD-card of course.

Basically its just that the periphierial dma has too low prioity:
My communicationsystem depends on the UART-dma. The receiverchannel is set up to receive 5 Bytes. Then an interrupt is triggered and the software follows a protocoll.
Now i experienced random problems with my protocoll. Sometimes a command does not trigger the interrupt due to a missing byte or two. This only happends during playback (when SSC dma is active)

This behavior changed when i changed the dma channel numbers. Lower numbers have higher priority. So the UART RX is now handled by DMA-channel 0. Now, the SSC is causing problems that before gave perfect playback.

The communication systems uses syncronous transfer upto several MBaud. (600kB/sec max) I think that the dma is sometimes not able to store an incomming byte before the next one other comes. Remember: The SSC also needs 176kB/sec.

Slowing down communication only works partially and is a workaround that acutally costs power - no way :-)

Thats basically it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Let put some numbers? 5 bytes on a 115k baud UART?
SSC output at 176 kBps, stereo 44 kHz signal.
SPI input at 600 kBps.

Basically you shuffle less than 800 kBps through the peripheral bus which have a theoretical maximum throughput of 112 MBps.

I really think your problem is somewhere else ;)

What interrupt source do you use? Do you use automatic reload for the PDCA?

Hans-Christian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Interrupts are currently only used for starting the Command-Interface(UART RX Interrupt afert dma finsihes 5 Bytes)
SSC uses automatic relaod, yes.
But you mixed up some stuff ;-)
-UART works at 600kb/s (28MHz/5 Baud)
-SPI is currently not driven by dma (if yes it get worse)
-SSC was correct.

I think you are right with that 1MB.. there should be no problem!
BUT as i have said there COULD be a problem if the UART dma does not get the aribation at the right time to get the received byte out of the peripheri.

This is not a question of what the Bus is capable of per _Second_. Its a question of whats the bus capable of in the _moment_ a command comes in to the uart.

AND a question: to support your theory, please tell my why changing dma priority changes behavior IF its not a hardware cause. ;-)
DMA channels are assigned by #define So the code does not change if i change the dma-nr. (Except for some memoryconstants..)

Greetings.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

28 MHz / 5 = 5.6 MHz, quite fast UART ;) But up to five small bytes, that is basically nothing, handling that by interrupts sounds like the correct solution.

Do you do a lot in the interrupt handlers, or typically move the data to a memory buffer and process it in your main loop?

SPI should get a lot better if you used DMA, weird.

Have you verified that the chip actually runs on the clock you have set it up to run at?

Do you have an old device? Old revision? Checked the errata list for your revision of the chip?

Hans-Christian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks hce for your answers :-)
the UART is clock syncron. So its just as fast as the device can. (CLK/4.5)
It is not on so old revision. I cant give a Letter now. But is has nothing to do with a silicon bug. i have chacked this already.

In my Interrupt handler the whole protocoll is done. the current command ask for 13 bytes to send to the host processor.
Handling single bytes by interrupts is a second way with "more" overhead. BUT the incomming 5 Bytes MUST also be received by the dma channel. There is no reason why it cannot.
The speed.. yeah.. hmmh. Its basically needed to be fast. :-) If my Ap7000 needs fast SD-Card access, the decoder asks the Ap7000 to do the memory access because it is the SD-Master then. So the whole source data for decoing comes from the UART. Then speed is needed - and usually no problem. But in that case the 512 Bytes are not transfered with dma ;-)
It IS a working concept.
(Hard to say that ;-) )

There was a misunderstanding: theSDcard gets better ;-) and its faster with dma (high priority channels). but other dma channels become worse. If SPI dma has a low priority its does not work if ssc is active ;-)

As you see: i checked many cases to verify my theory.
This cannot be a pure software bug.

Think about it:
- SSC is clocked externally, so memory access NEEDS to be done when the SCC module want it.
- UART is clocked externally, so memory access NEEDs t be done when the UART module want it.

Both modules need immediate memory access to load /or store bytes / halfwords.
If the HSB bridge blocks that access...hmmh ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

I recently ran into the same issues DerAlbi is describing, I have a AT32UC3B0256 connected to a microSD, TFT LCD, and a VS1003 on the same SPI bus, I also have a GPS module an a bluetooth module on a UART each. The sd card transfers are handled by the PDCA as well as the transfers to the TFT. The PDCA also sends/receives blocks of data to/from the Bluetooth and GPS modules.

The SD card is set to 24MHz (PBA is running at 48MHz), and sometimes the received data from the memeory card is corrupted. I had to lower the SPI bus speed to 12 MHz in order to work without problems, but yesterday I tried dealing with the Hmatrix configuration and so far I haven't had any problems @24MHz. Here is the HMATRIX config:

static void vInitHmatrix(void)
{
  	/* Config flash slave as last default master */
	union
	{
		unsigned long                 scfg;
		avr32_hmatrix_scfg_t          SCFG;

	} u_avr32_hmatrix_scfg = {AVR32_HMATRIX.scfg[AVR32_HMATRIX_SLAVE_FLASH]};

	u_avr32_hmatrix_scfg.SCFG.defmstr_type = AVR32_HMATRIX_DEFMSTR_TYPE_LAST_DEFAULT;
	AVR32_HMATRIX.scfg[AVR32_HMATRIX_SLAVE_FLASH] = u_avr32_hmatrix_scfg.scfg;


	/* Config SRAM slave with the PDCA as the fixed master */
	union
	{
		unsigned long                 scfg;
		avr32_hmatrix_scfg_t          SCFG;

	} u_avr32_hmatrix_scfg_sram = {AVR32_HMATRIX.scfg[AVR32_HMATRIX_SLAVE_SRAM]};

	u_avr32_hmatrix_scfg_sram.SCFG.defmstr_type = AVR32_HMATRIX_DEFMSTR_TYPE_FIXED_DEFAULT;
	u_avr32_hmatrix_scfg_sram.SCFG.arbt = AVR32_HMATRIX_ARBT_FIXED_PRIORITY;
	u_avr32_hmatrix_scfg_sram.SCFG.fixed_defmstr = AVR32_HMATRIX_MASTER_PDCA;
	AVR32_HMATRIX.scfg[AVR32_HMATRIX_SLAVE_SRAM] = u_avr32_hmatrix_scfg_sram.scfg;

	/* Give the PDCA the highest priority */
	avr32_hmatrix_pras_t avr32_hmatrix_pras_sram = AVR32_HMATRIX.prs[AVR32_HMATRIX_SLAVE_SRAM].PRAS;
	avr32_hmatrix_pras_sram.m0pr = 2;
	avr32_hmatrix_pras_sram.m1pr = 2;
	avr32_hmatrix_pras_sram.m2pr = 2;
	avr32_hmatrix_pras_sram.m3pr = 3;

	AVR32_HMATRIX.prs[AVR32_HMATRIX_SLAVE_SRAM].PRAS = avr32_hmatrix_pras_sram;
}

Daniel Campora http://www.wipy.io

Last Edited: Fri. Mar 25, 2011 - 08:48 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

YEEES! Great that someone else had that problem!

Do you have measured some performance lacks after that modification? An interesting test would be memoryaccess and calculation mixed.. eg with an fft algorithm...

pow(intersting, 3) ;-)

Greetrings

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

I am finishing a routine that applies a FIR filter and other that makes a lot of trigonometry calcs with the GPS data, both routines use fixed-point math. I will let you know if I notice any performance difference with the HMATRIX configured.

Best regards,

Daniel Campora http://www.wipy.io

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Any progress in this topic?

actually I have some similar problems.
hardware:
32UC3A0512
PBA and CPU 60 MHz
SDRAM 64MB
ABDAC 22050 samples/s (clock 11.2896MHz)

when I do not use ABDAC, SD card clock at 30MHz works correct but when I am enabling ABDAC , SPI PDCA loses one byte. I get only 511bytes from 512bytes package.
After I reduced SPI clock to 15 MHz everything looks better. But after 5 minutes or 3 minutes sometimes 8 minutes PDCA loses one byte again. Now I did some "magic" with HMATRIX configuration and everything works good. But if I speed up SPI clock to 30 MHz or 20 MHz again situation become unstable.
I tried to work with SRAM and with SDRAM , the situation is the same.

I even did some testing with 48MHz CPU speed and 24MHz SPI, but results are the same.
And with other frequencies I can see the same problem when SPI = CPU / 2 and ABDAC is working.

Last Edited: Tue. Apr 5, 2011 - 03:13 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

I experienced the same problems you describe, sometimes the pdca was loosing a byte after 1 minute, sometimes after 30 minutes, etc. but after playing with the HMatrix is working reliable @ SPY = CPU / 2. I also noticed that when receiving from the SD card the best place to locate the TX dummy data buffer is in an unused flash location, but not the user page. With my UC3B0256 I am using the last 512 bytes from the flash area:

pdca_load_channel(SPI_PDCA_CHANNEL_TX, (void *)0x803FC000, MMC_SECTOR_SIZE);

Try it that way and check if you get better results. Also, in the HMATRIX code I posted before, the unions should be declared as volatile...

Daniel Campora http://www.wipy.io

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks danicampora for answer.
and I have one more question: Do you use SRAM or SDRAM ?

Thanks

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I use SRAM. With the SDRAM results should be worse since accesing the SDRAM is a lot slower than accesing the internal SRAM.

Daniel Campora http://www.wipy.io

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Normally everything should be ok with this configuration.
Try to use PDCA Channel 0 & 1 for SPI if this is a reason that causes serious problems if it hangs.
E.g. The ABDAC can have a higher number (therefor a lower priotrity) because small clicks are not the best thing but it will not cause your system to hang.

Remember: 0 is highes priority channel, and te higher the number the lower the priority. A good ranking makes things easier.

Greetings

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes SDRAM is much slower but results were the same.
and Yes I played a lot with PDCA channels trying to get best results without success.

ok, now I will try to implement your offers.
so the plan:

PDCA channel:0 for SPI RX ( buffer in SRAM )
PDCA channel:1 for SPI TX ( dummy array from flash )
PDCA channel:2 for ABDAC ( buffer in SDRAM )

flow:
loading a piece of wav file one to SRAM_buf_wav1
loading a peace of wav file two to SRAM_buf_wav2

dsp_multiply
SRAM_buf_wav1 = SRAM_buf_wav1 * 0.5
dsp_multiply
SRAM_buf_wav1 = SRAM_buf_wav2 * 0.5

dsp_add
SDRAM_buf[buffNum] = SRAM_buf_wav1 + SRAM_buf_wav2

ABDAC_PDC_RELOAD with SDRAM_buf[buffNum]

buffNum++;

.....

I hope this should work without problems.

Thanks danicampora and DerAlbi.

UPDATE:
THANKS ONE MORE TIME !!!!
it's working !! :D

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Any sulotions?
I have the same problem with SDHC & SPI & PDCA
system clock 64Mhz, UC3A0512
When I speeds up the spi to 12.5Mhz and up, in some cases I loss one byte. (or more if I speed it more)
I uses in my system 4 PDCA, Its really great feture, but if it fail "once in a while" Its lost all its power...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is PBA clock 64 MHz also? With the PBA clock equal to the CPU clock and SPI frequency up to CPU/4 (and configuring the HMATRIX) I haven't any problem in 6 months of testing.

Daniel Campora http://www.wipy.io

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, all clocks are in maximum 64Mhz
(But I uses 4 PDCA, all in ~10Mhz, Or more if i can...)
Do you think the CPU can handle this?
generally, How can I Know the Hmatrix doenst miss a byte? is there any performence counter for that? or error bit?
even if I try (and i will) your solution with playing with the matrix, how can I know it will not re-happen in the future?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I didn't use any performance counters. If I am not mistaken the UC3 doesn't have any, only the AP7K.

Daniel Campora http://www.wipy.io

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

so is there any other way to know that there was "miss"? i dont think that one o the exception can help?