VGA demo: Texture mapping on an M2560@16MHz

Go To Last Post
4 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

This is a demo I wrote in assembly on an ATMEGA2560.  It features SNES Mode 7 style rotation and scaling at an effective resolution of 100x120.

 

https://www.youtube.com/watch?v=...

Last Edited: Thu. Feb 9, 2017 - 08:04 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Wow, that's pretty amazing - surprised no one else has commented on this so far. Do you have a blog or web site or similar with the details of the hardware and software involved?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Thanks.  I'll post more details here:

 

Hardware:

 

The board used is an Arduino 2560 clone made by Elegoo.  It is unmodified and runs at the stock 16MHz.  The AVR on the USB port is disabled.

 

Joystick connection:

 

The X and Y axes connect to the AVR's single ended analog inputs with a 100K Ohm external pull-down each.  Button pins connect directly to AVR pins with the internal pullup enabled.

 

VGA connection:

 

HSYNC and VSYNC pins are connected directly to the AVR.  The RGB signals connect to PORTF through resistors of various values.  Red and green get three pins and blue gets two.  PORTF is always 0xFF and color is set via DDRF.  The monitor presents a fairly low impedence load, so pushing the pixel data through PORTF instead skews the color intensity significantly.  The monitor's black level correction compensates for the small voltages produced by the AVR's pullups when DDRF==0 at the end of a scan line.

 

Display:

 

The VGA signal follows industy standard 640x480@60Hz timing.  Even lines are drawn and odd lines are black.  During the black lines, the picture is rendered.  A visible line writes 100 pixels to the screen using LDS+OUT.  It takes two black lines to render one visible line in this eight line cycle:

 

Render first half of line A
Display line B
Render second half of line A
Display Line B
Render first half of line B
Display line A
Render second half of line B
Display line A

 

Rendering:

 

It takes nine clocks to render one pixel and three clocks to draw it.  The displayed texture resides in an aligned 64KB block in flash.  This turns R30 and R31 into horizontal and vertical texture pointers when used with the ELPM instruction.

Math operations are usually 16-bit fixed point.  A 256 sample, 16-bit division lookup table in flash is used to speed up perspective processing.  Cosine and sine lookups are used for rotation and derived from a 256 sample, 16-bit table, covering cos(0) to cos(90).  This allows the player to rotate in 0.35 degree increments.

These tables are used to generate a third lookup table which is stored in RAM and contains four 8.8 words per line.

This table is updated by the main program when the player turns.  It does not need to be updated when the player moves forward or changes speed.

 

Interrupt handling:

 

The AVR spends 93% of its time in interrupt code.

 

All three timer compare and the overflow interrupt of TIMER5 are used for display output.  The timer overflows every other line.  It generates an even line of video at its half way point and an odd line at TOP.

 

T5COMPA - Even line - Rendering within visible region
T5COMPB - Odd line - Display within visible region
T5COMPC - Even line - VBLANK region
T5OVF  - Odd line - VBLANK region

 

There are several advantages to using four interrupts:  

 

GPIOR0 is used as a line countdown which is decremented every other line.

 

This reduces the number of DEC and OUT operations by half.  Also, it does not overflow when counting down in the visible region (480/2=240), making it possible to use a single byte to track line number.

Instead of using 16-bit comparisons to test for all the possible events happening within a 525-line video frame, each of the four interrupts test only for conditions which can occur inside of those interrupts.  For example, there's no need to test for VSYNC within the visible region interrupts.  Also quite conveniently, since a VSYNC pulse lasts for two lines, it can be switched entirely from a single VBLANK region interrupt.   

The visible interrupt pair can turn over control to the VBLANK interrupt pair and vice versa by flipping the TIMER5 interrupt enable bits.

 

All four interrupts drive HSYNC via bit bang.

 

One of the VBLANK interrupts also bit bangs the VSYNC line.

 

The typical approach of pushing and popping registers becomes very expensive at 31.5Khz.  Each register pushed/popped consumes 126K clocks per second (0.78% of total CPU time).  Also, by the time all the pushing and popping is complete, there's little time for the main program to do much.

 

For this reason, visible region interrupts do not return to the main program.

 

They undo the effect of the interrupt call on the stack by manipulating SPL to prevent it from growing (three POP instructions is the safe way to do this and avoid stack pointer alignment issues, but it takes three extra clocks).  They drive HSYNC, re-enable interrupts and then IJMP to either drawing or rendering code.  When this code is finished, it falls into an endless loop, awaiting the next timer compare match.   

 

Without the stack adjustment, the AVR would have to execute 480 RETI instructions in addition to needlessly consuming >1.5K of RAM at the end of a frame.  

 

In this solution, both the main program and interrupt drawing/rendering code are free to use all registers.  PUSH/POP of most registers occurs at only 60Hz for much lower overhead.

 

The odd/even interrupt scheme is inherited from another of my projects, in which the HSYNC interrupt code also drives 4 channel, 15.75Khz 8-bit PWM audio playback with independent volume and frequency control (allowing for Amiga MOD playback, etc), in addition to a full width video frame.

 

Programs:

 

I developed the project under AVR Studio 4.   

I wrote a C# program to convert 24-bit RGB bitmaps into 8bpp assembly, according to the AVR's resistor network, which was then imported into AVR Studio.  

I wrote another C# program to generate the RAM lookup tables while getting the 3D math right (far easier to do in floating point in C# than fixed point AVR ASM) and eventually to generate the fixed point tables for the AVR to perform the calculations itself.

 

 

 

Last Edited: Thu. Feb 9, 2017 - 03:30 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Wow. I think you rank up there with the likes of the Atomic Zombie.

The largest known prime number: 282589933-1

Without adult supervision.