[TUT] Generating video on M32 for PAL

1 post / 0 new
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


Generating video on an AVR is simple in concept, but tricky in the details. You will need to understand the basic terminology and timings of the video system you are trying to drive, and understand the limitations of the system. This tutorial will explain first the video system parameters, discuss a number of different ways of generating the video, and consider some of the trade-offs involved in the process.

It is assumed that the video signal required is for PAL or NTSC formats, and I'll use PAL as an example mostly because I've worked with PAL broadcast systems for thirty years and therefore consider it the superior system. It is also assumed that the signal will be created entirely within an AVR - that is, apart from a couple of mixer resistors, there will be no external components.

The PAL and NTSC standards

Both PAL and NTSC systems are broadcast colour systems - but we will be ignoring the colour aspects and simply generating a monochrome image. Colour is essentially an AM signal modulated over the monochrome image and referenced to a colour burst in an otherwise unused part of the signal; by not supplying that colour burst a normal TV will assume a monochrome signal.

The PAL picture is made of a series of horizontal stripes through the picture separated by synchronisation pulses; at the end of each field of the picture a frame sync pulse is also added. Both PAL and NTSC are interlaced formats, so a half-line is added every field to force the lines on this field to fall between the lines drawn in the last field.

The signal across a line looks like this - from www.epanorama.net - and as you can see, the time for each line period is 64 useconds (63.5 for NTSC). The sync pulse is 4.7us long, and the active picture time is 52us. The intervals before and after the sync are called the front and back porch, and are 1.65 and 5.65us respectively, and should be kept at black level (see levels, later).

The field rate for images is 50Hz for PAL and 60Hz for NTSC, which you will easily calculate from the line rates to have 312.5 lines (PAL) or 262.5 lines (NTSC) - the terms '625 line' or '525 line' come from the fact that the signals are interlaced on alternate fields and therefore double. Because static signals tend to flicker at a particularly annoying rate, we won't use that many lines and will restrict ourselves to a single field.

The field sync signal comes right after visible picture lines. To be correct, it's slightly different after odd or even fields, and contains a number of half-width and inverted line sync pulses designed to keep early analogue TVs happy - but we can ignore them and simply use three lines of sync level.

But you can't use all the remaining lines to put images on; the first 25 are reserved by the system to carry either the field sync or teletext data.

The bandwidth for the signal is nominally limited to 5.5MHz, with a notch for the colour subcarrier at 4.433MHz (PAL) or 3.59MHz (NTSC) - the specifications require that this notch be removed if a monochrome signal is being received, but it doesn't always happen - particularly as no major broadcaster has transmitted mono signals for forty years or so.

Usable image statistics

So, after that lot, what have we got? A space on each line 52us long, and (assuming we use the 312 line non-interlaced model) 287 vertical lines (237 for NTSC). The bandwidth limits - I'm going to stick to PAL hereafter, you can do the sums yourselves! - suggest that we should be safe with 4MHz, so a dot clock, of 8MHz will mean nothing higher than that gets through. If we assume that we use eight bits for a character width, that gives us 52 characters across the screen. However, most non-digital TVs overscan horribly. If you try and use all the 52us, you'll find you can't see the start or the end of each line. Rule of thumb; assume only 80% of the screen is visible.

(In pixel terms, we could be looking at 52*8*287 = 119,392 pixels on the screen, but to store them would require 52*287 = 14,982 bytes - seven or eight times more ram than most AVRs have available. Damn. So we'll stick with characters instead.)

A character has height as well as width. For a character seven pixels wide - plus one space - twelve or fourteen lines are common. (You'll find several fonts elsewhere in the tutorial under my name, in various sizes.) If we assume 12 pixels, we can get 22 character lines in on PAL, and if we play a bit loose with the spec, 20 in under NTSC - so let's choose 12 pixels.

We have other things constraining our number of characters: we need to store the value of each character somewhere, plus have room for the other variables the system will need. And as we can assume that however we generate the image, we're going to be extremely busy during active lines, and reasonably busy during even the blank lines. As we'd like to have *some* processor time available to do something useful, we can reduce that load by reducing the number of lines displayed.

We'll pick 48 characters by 16 lines - that needs a storage area of 768 bytes, which isn't too unmanageable. It fits in the centre 80% of the screen, and it leaves about 40% of the time with the processor not too busy.

Displaying a character

Let's pretend that we've (somehow!) arranged that we're displaying a character. We know which line of the display we're on, because we're keeping count, and from there we know which line of the character (0-11). We know which character in the line, because we're counting microseconds... each character takes about a microsecond to display. We have to arrange for the bits representing the appropriate line of the correct character to be presented *somewhere* to be output as bits on the video signal.

The way we do this is to arrange an array of bit patterns, one per character - the font table. We can assume that we need 92 of these, since characters lower than 0x20 are not displayable, and we need twelve bytes to record each bit (though we actually use sixteen, for reasons which will become clear later).

So to get the right byte pattern to display, we need to use the character row and column position to index into the video memory. The ascii value from that and the current video line count - mod 12 - are used to index into the font table. That gives us a byte of data which can be output.

How about the syncs?

A critical point about video is that the eye is very sensitive to flicker - particularly with vertical lines, tiny misalignments are immediately obvious and annoying. We need to ensure that the vertical timing, and the timing of each and every character line, start correctly - to the same clock each time.

We can use a timer to trigger an interrupt, and use that interrupt to draw the sync pulse... except that the time for the AVR to respond to an interrupt depends on what it's doing at the time. Only if the AVR is in sleep mode when the interrupt arrives can we guarantee the response - four clocks after the interrupt, the code in the handler will begin execution.

Now the trick is to ensure that we're asleep at the appropriate moment. What we do is a bit sneaky - bear with me here.

We know that we need to be in interrupt mode at the start of the line, and during the entire active period of the image - about sixty microseconds later, but we need to be asleep at the start. So we set a counter running, with a comparator to reset it in CTC mode at 64us. But we use another CTC to trigger a different interrupt at 63us - and in that interrupt, all we do is enable the interrupts and go to sleep. Provided we've done all our display stuff and returned to normal mode before that second interrupt triggers, everything works fine.

When the 64us trigger comes along, we have three conditions:

1: we're on a display line. We drop the sync line to zero, wait 4.7us, return it to one, wait for the back porch, and then start spitting out video. Then we return.

2: we're on a blank line at the top or bottom of the screen. We drive the sync the same way, but once we've finished the sync pulse we can return, and let the processor do something useful.

3: we're on a field sync line (1, 2, or 3). All we need to do there is drive the sync output low, and return, since we want that line low for three line periods.

Generating video levels

I've been chattering away about video and sync pulses, but I haven't discussed what the levels actually are or how we drive them.

To meet the spec, the signal should be 1v peak-peak, with sync tip clamped to 0v and black level at 0.3v - into a 75ohm load. The normal way to do this is to use two pins on the system as outputs - one for sync and one for video - and mix them together with a couple of resistors. Feel free to include a buffer transistor if you wish - it will make things more robust - but you can usually assume that using a 330ohm resistor from the video pin, a 1kohm resistor from the sync pin, joined together and the junction taken to ground through an 82 ohm resistor, will provide a suitable signal at that junction, on a 5v system. Adjust to suit; you may need to play a little.

If both signals are low, the output will be at 0v - sync tip. If sync is high and video is low, then the output should be around 0.3v - black level. And if both are high, then the output should be around a volt - white level. We ignore the other combination as it should never occur.

Generating the video

At this point, if we got our code correct, we would have a nice stable picture - albeit black. Now we have to think about outputting the video information.

As it happens, there's a convenient parallel to serial converter built into most AVRs, the SPI system. We can use that automatically to output the bit pattern we need. We have to accept a couple of small issues, though: firstly, the system requires at least one clock to separate each character sent, so we need nine clocks to output eight bits. Secondly, the last clock of those nine is always high - which is a pain. That means that we have to display black text on a white ground if we choose to do it this way - if you must have white on black, you'd have to invert the output before the mixer resistor.

The SPI system can output our text, but we only have eighteen clocks to get the right data into it... tricky. So let's do all we can to optimise that.

Recall how the character is created. We need to know the character row and column to get the character ascii value, and we need to know the ascii value and video line to get the right byte from the font table. There's a lot of indexing going on in there. But we can simplify matters if we put the table in the right place.

To index the character, we multiply the row number by the number of characters in the row, and add the column number. We can generate the first before we need it, since it remains the same for the whole column, and then all we need to do is add the column number... and since that's sequential, we can do an autoincrement on a pointer.

We cunningly avoid having to add an offset to the low value of the screen buffer by ensuring it starts at a page boundary; we choose 0x0400.

	.org 0x400
screen:		.byte 48*16

// 256 bytes of circular buffer
buffer:		.byte	256
// that takes us to 0x800, so we will keep the 0x60 bytes above here for the
// stack, and start normal program variables at the bottom of ram...
	.org	0x060

// the video line counts
vid_line:	.byte 2				// 0-312, the video line being output
char_line:	.byte 1				// 0-11, the video line within the character
char_block:	.byte 1				// 0-15, the character line we're outputting

// the read and write pointers for the serial incoming data buffer
read_ptr:	.byte 1
write_ptr:	.byte 1

After we finish the sync pulse, calculate the buffer position for the start of the current row:

	lds	iarg0,char_line
	// the constants we need
	ldi	iarg1,lines_per_pic		// constant 16
	ldi	iarg2,chars_per_line	// constant 40
	// the pointer to the start of the line to output
	// ie the character block
	lds 	iarg3,char_block
	mul 	iarg3,iarg2
	movw	yl,r0							
	subi	yh,-(screen>>8)				// now Y points to the ram row to read

Then we can wait out our remaining microseconds till the video write starts. When we're hot to trot, we turn on the SPI system and start writing the data. We're pointing at the first character in the row in the screen buffer, so we need to take that value and index it into the font table, which we do by multiplying it by sixteen (since we earlier reserved sixteen bytes per character) and adding the current video line. We grab that data and output to the SPI buffer - if we time it right we don't need to check that it's empty - it is after nine dots. We then decrement a character counter and loop until we've finished. It turns out that we have four whole clocks unused which we need to eat up with NOPs.

Another time-saving tip is to ensure that the font table is in the right place. If you started it at zero in the flash, you'd have to subtract that offset from the one calculated - as it doesn't start till 0x20. But if you start it at 0x20 * 16 = 0x100 (word count!) then you can index into it directly without the need for the offset.

	// squirt data...
	ld		iarg3,y+						// get byte and point to next
	mul		iarg3,iarg1					// offset to character in rom
	add		r0,iarg0						// offset to line in character
														// will never overflow, so we don't
														// care about the high byte
	movw	zl,r0								// get it back to z pointer
	lpm		iarg3,z							// read the font info
	com		iarg3
	out		spdr,iarg3					// send the data to video stream
	nop												// timing adjustment
	dec		iarg2								// finished the line yet?
	brne	squirt							// nope... more!

After the last character, we simply wait until the SPI finishes spitting it out, and then turn the SPI off again and return - after precalculating the character line and block counts for the next line.

What didn't I mention?

Well, a display isn't much use unless it has something to display. For this example, I've assumed that there will be a serial data feed which will be displayed on the screen. The problem is that we can't have any more interrupts since they might occur when their execution would break the display timings - which are paramount. But there's quite a lot of spare time hanging around here and there - so what I've done is sneaked a polling routine into the sync delay - if it finds a character in the UART input register it transfers it to a circular buffer and adjusts the pointers of that buffer accordingly.

It doesn't attempt to check for overflow; if it does, tough. Neither is there any code to decide what to do with the data once you've got it... that needs writing in the main routine. It is left - as they say - as an exercise for the student! As written, the code outputs the video on PB5 and the syncs on PB3 - mix them together and stuff them into a TV and you'll get a stable screen full of repeated 'AAAA's. It's up to you to do something useful with it. If you're in the states, you'll want to change some parts, mostly related to the reduced line count available. You'll still be able to get the same number of lines and characters, but you'll have less time for non-display purposes. You can change the two comparison values to trigger eight counts lower, and you'll get the correct values for NTSC line timings - though I think you can get away with them as they are.

This software is unusual in that so much of it is run in the interrupt state. Normally, one enters an interrupt, does what one needs as fast as possible, and gets out as soon as one can. This is generally good programming practice - but this program demonstrates that sometimes you need to throw away the rule book.