AVR GCC +700% Code Size

Go To Last Post
34 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Good evening,

 

First off, thank you to the broader community for such a great site and wealth of knowledge.  I've lost the last week's worth of sleep diving into Assembly / C / AVR Tool Chain for the first time (using https://blog.podkalicki.com/100-projects-on-attiny13/ and http://www.avr-asm-tutorial.net/avr_en/micro_beginner/micro_beginner_complete.pdf).

 

I have a pretty basic question, and by basic I know the answer is probably to not do pre-mature optimization, or that this is an edge case.  But I'd still appreciate a conceptual understanding of what is happening here.  

 

I started by recreating Lukasz' Blink with Timer OVF project, with just some slight adjustments and I was surprised that AVR-GCC compiled it to ~200+ bytes.  I rewrote the 'same' code in Assembly and compiled using AVRA and the code is just 31 bytes.  Now just that experience has been quite educational and I see the value in both approaches....but considering how small the ATTiny13 is can anyone explain why the same code compiles 7 times larger with AVR-GCC?  Comparing the assembly is definitely 'different' but only about a 20% difference, so I must be missing something......Only guess I can hazard is it has something to do with the target device / boiler plate, as I noticed if I changed the TIMER_0VF_vect to the attiny13 equivalent and recompiled with -mmcu=attiny13, the code size dropped but still no where close to the assembly version I wrote.

 

Finally, any comments on the code are more than welcome.  I've been teaching myself to code for the last year, and this is my first dive into any of these topics.  RTFM isn't a substitute for experience ;)

 

Thanks in advance, Rob

 

#Compiled with "avr-gcc -Wall -g -Os -mmcu=atmega328p -o blink.bin blink.c" version 10.1.0

#include <avr/io.h>
#include <avr/interrupt.h>

#define F_CPU 		8000000UL 	// 8 MHz internal clock
#define CLKPRESCALE	256		// Slowest system clock prescale
#define PRESCALE	1		// Timer Prescale
#define PERIOD		8
#define T0_CLKS		((F_CPU/CLKPRESCALE*PERIOD)/1000) 

unsigned char count = 25;

ISR(TIMER0_OVF_vect){
	TCNT0 += (256 - T0_CLKS);		// Tick Interupt every 8ms
	count--;
}

int main (void) {

	CLKPR = (1 << CLKPCE);			// Enable setting System Clock Prescaler
	CLKPR = (1 << CLKPS3);			// Set System Clock Prescaler to 256 -- 31.25KHz

	TIMSK0 = (1 << TOIE0);			// Enable overflow interrupt
	TCCR0A = 0;				// Normal mode
	TCCR0B = (1 << CS00);			// Set Timer Clock Prescaler to 1
	TCNT0 = (256 - T0_CLKS);		// Inital Load Timer0 Counter

	DDRB |= (1 << DDB5);			// Enable PB5 for Builtin_LED

	while (1) {
		cli();
		if(~count){		// 200ms (25*8) intervals
			count = 25;
			PINB |= (1 << PB5);	// Toggle PB5
		}
		sei();
	}
	return 0;
}
; Compiled with avra version 1.4.1

.NOLIST ; Output of listing off
.INCLUDE "m328Pdef.inc" ; Read port definitions
.LIST ; Output of listing on

.def rmp = R16			; Multi Purpose Register
.def rcounter = R17		; Counter Register

.equ counter = 25		; Counter Value (25*8ms=200ms)

.CSEG
.ORG 0
	rjmp Start		; RESET

.ORG OVF0addr
	rjmp T0OVFISR		; TIMER0 OVF

T0OVFISR:
	in R15, SREG		; Save Status Register
	dec rcounter		; Decrement counter
	ldi rmp, TCNT0+6	;
	out TCNT0, rmp 		; Reset Timer to 250 clicks
	out SREG, R15		; Reset Status Register
	reti			; End ISR and re-enable Interrupt

Start:
	ldi rmp, LOW(RAMEND)
	out SPL, rmp	 	; Stack pointer to RAMEND

	ldi rmp, 1 << CLKPCE
	sts CLKPR, rmp 		; System Clock Prescaler enable

	ldi rmp, 1 << CLKPS3
	sts CLKPR, rmp 		; Set System Clock Prescaler to 256:31.25KHz

	ldi rmp, 1 << TOIE0
	sts TIMSK0, rmp		; Enable Timer0 Interrupt Overflow

	ldi rmp, 1 << CS00	;
	sts TCCR0B, rmp 	; Set Timer Clock to Prescaler 1

	sbi DDRB, DDB5		; Set PB0 as output (BUILTIN_LED)

	ldi rmp, 6		;
	out TCNT0, rmp 		; Reset Timer to 250 clicks
	ldi rcounter, counter	; Pre-load Counter

Loop:
	tst rcounter		; Is Counter 0? ( 25 * 8ms has elapsed )
	brne Loop		; If not 0, loop
	sbi PINB, PB5		; Else Toggle PB5
	ldi rcounter, counter	; Reset Counter to 25
	rjmp Loop		; Loop

 

This topic has a solution.
Last Edited: Mon. Jun 29, 2020 - 11:02 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rob6118 wrote:
so I must be missing something

Probably the runtime library support ?

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

The compiler make a general code, and an example it will make a table for ALL interrupts for the chip in use (map all unused to error).

 

If you want to see then look at the .lss file from the compiler.(could be that somewhere in the project setup you need to mark it.)  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 just 31 bytes

it must be 31 words, so 62 byte.

 

 

on the other hand your ASM code is far from optimal ;)

 

  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rob6118 wrote:

if(~count){

~ seems strange here, you're saying if the result of a bitwise not (complement) of count is non zero

 

did you mean

if (count ==0)

which could be written as

if (!count)

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rob6118 wrote:
I rewrote the 'same' code in Assembly

But you did not! 

Your assembly does not do:

TCNT0 += (256 - T0_CLKS);		// Tick Interupt every 8ms

its more like:

TCNT0 = (256 - T0_CLKS);		// Tick Interupt every 8ms

A big difference for a one character diff in the source, the above, rmw's the TCNT0 reg, requires more registers, so isr pre/post amble needs to save/restore more reg's,

your asm does not, it just loads a constant (held in a reg). Also your assembler does not cli() or sei(), a small diff indeed, making a few mods to the C code to better match

what the asm is doing and I come up with 137 bytes, still 100 more then your asm, but the compiler stores "rcount" in ram and as that is a global, is cleared by the crt before

starting the main(). 

Note: I'm using ICC compiler, not gcc, perhaps gcc's more aggressive optimizer will do better.

 

Jim

My modified C, program size 42 bytes, not counting crt

/* #Compiled with "avr-gcc -Wall -g -Os -mmcu=atmega328p -o blink.bin blink.c" version 10.1.0 */

#include <iccioavr.h> //<avr/io.h>
#include <macros.h>
//#include <avr/interrupt.h>

#define F_CPU 		8000000UL 	// 8 MHz internal clock
#define CLKPRESCALE	256		// Slowest system clock prescale
#define PRESCALE	1		// Timer Prescale
#define PERIOD		8
#define T0_CLKS		((F_CPU/CLKPRESCALE*PERIOD)/1000)

#define count 25
volatile unsigned char rcount = 25;

//ISR(TIMER0_OVF_vect){
#pragma interrupt_handler T0_OVRFLO:iv_TIM0_OVF
void T0_OVRFLO(void){
	TCNT0 = (256 - T0_CLKS);		// Tick Interupt every 8ms
	rcount--;
}

int main (void) {

	CLKPR = (1 << CLKPCE);			// Enable setting System Clock Prescaler
	CLKPR = (1 << CLKPS3);			// Set System Clock Prescaler to 256 -- 31.25KHz

	TIMSK0 = (1 << TOIE0);			// Enable overflow interrupt
	TCCR0A = 0;				// Normal mode
	TCCR0B = (1 << CS00);			// Set Timer Clock Prescaler to 1
	TCNT0 = (256 - T0_CLKS);		// Inital Load Timer0 Counter

	DDRB |= (1 << DDB5);			// Enable PB5 for Builtin_LED

	while (1) {
		//CLI();
		if(~count){		// 200ms (25*8) intervals
			rcount = count;
			PINB |= (1 << PB5);	// Toggle PB5
		}
		//SEI();
	}
	return 0;
}

 

 

(Possum Lodge oath) Quando omni flunkus, moritati.

"I thought growing old would take longer"

 

Last Edited: Fri. Jun 26, 2020 - 04:32 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

When I build your code for 328P in AS7 I get:

00000000 <__vectors>:
   0:	33 c0       	rjmp	.+102    	; 0x68 <__ctors_end>
   2:	00 00       	nop
   4:	44 c0       	rjmp	.+136    	; 0x8e <__bad_interrupt>
   6:	00 00       	nop
   8:	42 c0       	rjmp	.+132    	; 0x8e <__bad_interrupt>
   a:	00 00       	nop
   c:	40 c0       	rjmp	.+128    	; 0x8e <__bad_interrupt>
   e:	00 00       	nop
  10:	3e c0       	rjmp	.+124    	; 0x8e <__bad_interrupt>
  12:	00 00       	nop
  14:	3c c0       	rjmp	.+120    	; 0x8e <__bad_interrupt>
  16:	00 00       	nop
  18:	3a c0       	rjmp	.+116    	; 0x8e <__bad_interrupt>
  1a:	00 00       	nop
  1c:	38 c0       	rjmp	.+112    	; 0x8e <__bad_interrupt>
  1e:	00 00       	nop
  20:	36 c0       	rjmp	.+108    	; 0x8e <__bad_interrupt>
  22:	00 00       	nop
  24:	34 c0       	rjmp	.+104    	; 0x8e <__bad_interrupt>
  26:	00 00       	nop
  28:	32 c0       	rjmp	.+100    	; 0x8e <__bad_interrupt>
  2a:	00 00       	nop
  2c:	30 c0       	rjmp	.+96     	; 0x8e <__bad_interrupt>
  2e:	00 00       	nop
  30:	2e c0       	rjmp	.+92     	; 0x8e <__bad_interrupt>
  32:	00 00       	nop
  34:	2c c0       	rjmp	.+88     	; 0x8e <__bad_interrupt>
  36:	00 00       	nop
  38:	2a c0       	rjmp	.+84     	; 0x8e <__bad_interrupt>
  3a:	00 00       	nop
  3c:	28 c0       	rjmp	.+80     	; 0x8e <__bad_interrupt>
  3e:	00 00       	nop
  40:	27 c0       	rjmp	.+78     	; 0x90 <__vector_16>
  42:	00 00       	nop
  44:	24 c0       	rjmp	.+72     	; 0x8e <__bad_interrupt>
  46:	00 00       	nop
  48:	22 c0       	rjmp	.+68     	; 0x8e <__bad_interrupt>
  4a:	00 00       	nop
  4c:	20 c0       	rjmp	.+64     	; 0x8e <__bad_interrupt>
  4e:	00 00       	nop
  50:	1e c0       	rjmp	.+60     	; 0x8e <__bad_interrupt>
  52:	00 00       	nop
  54:	1c c0       	rjmp	.+56     	; 0x8e <__bad_interrupt>
  56:	00 00       	nop
  58:	1a c0       	rjmp	.+52     	; 0x8e <__bad_interrupt>
  5a:	00 00       	nop
  5c:	18 c0       	rjmp	.+48     	; 0x8e <__bad_interrupt>
  5e:	00 00       	nop
  60:	16 c0       	rjmp	.+44     	; 0x8e <__bad_interrupt>
  62:	00 00       	nop
  64:	14 c0       	rjmp	.+40     	; 0x8e <__bad_interrupt>
	...

00000068 <__ctors_end>:
  68:	11 24       	eor	r1, r1
  6a:	1f be       	out	0x3f, r1	; 63
  6c:	cf ef       	ldi	r28, 0xFF	; 255
  6e:	d8 e0       	ldi	r29, 0x08	; 8
  70:	de bf       	out	0x3e, r29	; 62
  72:	cd bf       	out	0x3d, r28	; 61

00000074 <__do_copy_data>:
  74:	11 e0       	ldi	r17, 0x01	; 1
  76:	a0 e0       	ldi	r26, 0x00	; 0
  78:	b1 e0       	ldi	r27, 0x01	; 1
  7a:	e6 ee       	ldi	r30, 0xE6	; 230
  7c:	f0 e0       	ldi	r31, 0x00	; 0
  7e:	02 c0       	rjmp	.+4      	; 0x84 <__do_copy_data+0x10>
  80:	05 90       	lpm	r0, Z+
  82:	0d 92       	st	X+, r0
  84:	a2 30       	cpi	r26, 0x02	; 2
  86:	b1 07       	cpc	r27, r17
  88:	d9 f7       	brne	.-10     	; 0x80 <__do_copy_data+0xc>
  8a:	16 d0       	rcall	.+44     	; 0xb8 <main>
  8c:	2a c0       	rjmp	.+84     	; 0xe2 <_exit>

0000008e <__bad_interrupt>:
  8e:	b8 cf       	rjmp	.-144    	; 0x0 <__vectors>

00000090 <__vector_16>:
#define PERIOD		8
#define T0_CLKS		((F_CPU/CLKPRESCALE*PERIOD)/1000)

unsigned char count = 25;

ISR(TIMER0_OVF_vect){
  90:	1f 92       	push	r1
  92:	0f 92       	push	r0
  94:	0f b6       	in	r0, 0x3f	; 63
  96:	0f 92       	push	r0
  98:	11 24       	eor	r1, r1
  9a:	8f 93       	push	r24
	TCNT0 += (256 - T0_CLKS);		// Tick Interupt every 8ms
  9c:	86 b5       	in	r24, 0x26	; 38
  9e:	8a 5f       	subi	r24, 0xFA	; 250
  a0:	86 bd       	out	0x26, r24	; 38
	count--;
  a2:	80 91 00 01 	lds	r24, 0x0100	; 0x800100 <__DATA_REGION_ORIGIN__>
  a6:	81 50       	subi	r24, 0x01	; 1
  a8:	80 93 00 01 	sts	0x0100, r24	; 0x800100 <__DATA_REGION_ORIGIN__>
}
  ac:	8f 91       	pop	r24
  ae:	0f 90       	pop	r0
  b0:	0f be       	out	0x3f, r0	; 63
  b2:	0f 90       	pop	r0
  b4:	1f 90       	pop	r1
  b6:	18 95       	reti

000000b8 <main>:

int main (void) {

	CLKPR = (1 << CLKPCE);			// Enable setting System Clock Prescaler
  b8:	e1 e6       	ldi	r30, 0x61	; 97
  ba:	f0 e0       	ldi	r31, 0x00	; 0
  bc:	80 e8       	ldi	r24, 0x80	; 128
  be:	80 83       	st	Z, r24
	CLKPR = (1 << CLKPS3);			// Set System Clock Prescaler to 256 -- 31.25KHz
  c0:	88 e0       	ldi	r24, 0x08	; 8
  c2:	80 83       	st	Z, r24

	TIMSK0 = (1 << TOIE0);			// Enable overflow interrupt
  c4:	81 e0       	ldi	r24, 0x01	; 1
  c6:	80 93 6e 00 	sts	0x006E, r24	; 0x80006e <__TEXT_REGION_LENGTH__+0x7f806e>
	TCCR0A = 0;				// Normal mode
  ca:	14 bc       	out	0x24, r1	; 36
	TCCR0B = (1 << CS00);			// Set Timer Clock Prescaler to 1
  cc:	85 bd       	out	0x25, r24	; 37
	TCNT0 = (256 - T0_CLKS);		// Inital Load Timer0 Counter
  ce:	86 e0       	ldi	r24, 0x06	; 6
  d0:	86 bd       	out	0x26, r24	; 38

	DDRB |= (1 << DDB5);			// Enable PB5 for Builtin_LED
  d2:	25 9a       	sbi	0x04, 5	; 4

	while (1) {
		cli();
		if(~count){		// 200ms (25*8) intervals
			count = 25;
  d4:	89 e1       	ldi	r24, 0x19	; 25
	TCNT0 = (256 - T0_CLKS);		// Inital Load Timer0 Counter

	DDRB |= (1 << DDB5);			// Enable PB5 for Builtin_LED

	while (1) {
		cli();
  d6:	f8 94       	cli
		if(~count){		// 200ms (25*8) intervals
			count = 25;
  d8:	80 93 00 01 	sts	0x0100, r24	; 0x800100 <__DATA_REGION_ORIGIN__>
			PINB |= (1 << PB5);	// Toggle PB5
  dc:	1d 9a       	sbi	0x03, 5	; 3
		}
		sei();
  de:	78 94       	sei
	}
  e0:	fa cf       	rjmp	.-12     	; 0xd6 <main+0x1e>

000000e2 <_exit>:
  e2:	f8 94       	cli

000000e4 <__stop_program>:
  e4:	ff cf       	rjmp	.-2      	; 0xe4 <__stop_program>

A few things to note about the size of this. Locations 0: to 64: (which are hex byte addresses so 100 of the 230 bytes in total) are the reset jump and a complete interrupt vector table. C does this for you as a "favour" assuming that you may want to use some interrupt vectors (so it provides entries ready for all. as the header for 328P shows this is therefore:

#define _VECTORS_SIZE (26 * 4)

so 26 vectors each of 4 bytes.

 

Some other code in the C generated version that you "don't own" is the "C runtime". This is the bit the C compiler provides to get the CPU into a known state before it enters main(). In this case that is from 68: to 8c: so that is another 36 bytes. More than half of that is __do_copy_data. That exists because you used:

unsigned char count = 25;

The whole loop (which is standard no matter how many .data items you have) is there to copy the value 25 out of flash and into a RAm location called "count" at startup. The rest of the CRT (C RunTime) is about making sure R1 holds 0 (C relies on this a lot), clearing SREG (to make sure the I bit is clear), setting the stack (which is a "belts and braces" action on 328P because it defaults the stack to RAMEND anyway). Oh and until you get to GCC V9 the ISRs are not necessarily as optimized as they could be so:

ISR(TIMER0_OVF_vect){
	TCNT0 += (256 - T0_CLKS);		// Tick Interupt every 8ms
	count--;
}

is generating:

  90:	1f 92       	push	r1
  92:	0f 92       	push	r0
  94:	0f b6       	in	r0, 0x3f	; 63
  96:	0f 92       	push	r0
  98:	11 24       	eor	r1, r1
  9a:	8f 93       	push	r24
	TCNT0 += (256 - T0_CLKS);		// Tick Interupt every 8ms
  9c:	86 b5       	in	r24, 0x26	; 38
  9e:	8a 5f       	subi	r24, 0xFA	; 250
  a0:	86 bd       	out	0x26, r24	; 38
	count--;
  a2:	80 91 00 01 	lds	r24, 0x0100	; 0x800100 <__DATA_REGION_ORIGIN__>
  a6:	81 50       	subi	r24, 0x01	; 1
  a8:	80 93 00 01 	sts	0x0100, r24	; 0x800100 <__DATA_REGION_ORIGIN__>
}
  ac:	8f 91       	pop	r24
  ae:	0f 90       	pop	r0
  b0:	0f be       	out	0x3f, r0	; 63
  b2:	0f 90       	pop	r0
  b4:	1f 90       	pop	r1
  b6:	18 95       	reti

some of this is about saving then restoring SREG but there's also code there preserving R1 (which must always hold 0x00) even though, in reality, this ISR code doesn't actually touch R1 anyway.

 

BTW if you ever find yourself modifying TCNT in an timer ISR the chances are you missed the possibility of using CTC mode (which would reset the counter anyway).

 

In your own code consider replacing:

	CLKPR = (1 << CLKPCE);			// Enable setting System Clock Prescaler
  b8:	e1 e6       	ldi	r30, 0x61	; 97
  ba:	f0 e0       	ldi	r31, 0x00	; 0
  bc:	80 e8       	ldi	r24, 0x80	; 128
  be:	80 83       	st	Z, r24
	CLKPR = (1 << CLKPS3);			// Set System Clock Prescaler to 256 -- 31.25KHz
  c0:	88 e0       	ldi	r24, 0x08	; 8
  c2:	80 83       	st	Z, r24

with:

clock_prescale_set(clock_div_256);

which is guaranteed to meet the 4 cycle requirement (because it's coded in Asm) and is likely more efficient anyway. I get:

void clock_prescale_set(clock_div_t __x)
{
    uint8_t __tmp = _BV(CLKPCE);
    __asm__ __volatile__ (
  b8:	98 e0       	ldi	r25, 0x08	; 8
  ba:	80 e8       	ldi	r24, 0x80	; 128
  bc:	0f b6       	in	r0, 0x3f	; 63
  be:	f8 94       	cli
  c0:	80 93 61 00 	sts	0x0061, r24	; 0x800061 <__TEXT_REGION_LENGTH__+0x7f8061>
  c4:	90 93 61 00 	sts	0x0061, r25	; 0x800061 <__TEXT_REGION_LENGTH__+0x7f8061>
  c8:	0f be       	out	0x3f, r0	; 63

but in part it's because it has added something you missed. The two register write is protected against interrupts.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

which is a "belts and braces" action on 328P because it defaults the stack to RAMEND anyway

A tiny13 also init the stack, so no need for init in the code. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Really appreciate the quick replies...and the eye for details....lots to chew on.  

 

sparrow2 wrote:

 just 31 bytes

it must be 31 words, so 62 byte.

 

 

on the other hand your ASM code is far from optimal ;)

 

  

 

Yes blush 62 bytes.  Any tips on the optimization?  It's my very 1st attempt at assembly, so understanding the addressing limitations of out/sbi/sts and defining the vector tables with .ORG was challenging enough.  

 

ki0bk wrote:

 

Your assembly does not do:

TCNT0 += (256 - T0_CLKS);		// Tick Interupt every 8ms

its more like:

TCNT0 = (256 - T0_CLKS);		// Tick Interupt every 8ms

 

Fair enough and something I intentionally changed when trying to RTFM and why I couldn't use the same (smaller) assembly examples on the atmega328 registers.  I read through your example (thank you!) and made the same modifications except for:

#pragma interrupt_handler T0_OVRFLO:iv_TIM0_OVF
void T0_OVRFLO(void){

Not sure if this is a difference specific to the compiler or the macro include you added to the top.  But commenting out the cli()/sei() and switching to rcounts only got me down to 216 laugh

 

I did use avr-gcc's -S flag to try to compare apples to apples.  I'm not seeing too much difference besides the use of 'lo8(-k)' and the TCNT0 +=.  

 

Much obliged and thanks again,

 

Rob

 

; Updated assembly adding:
; subi += increment for TCNT0
; Timer interrupt disable and re-enable for each branch of loop

.NOLIST ; Output of listing off
.INCLUDE "m328Pdef.inc" ; Read port definitions
.LIST ; Output of listing on

.def rmp = R16                  ; Multi Purpose Register
.def rcounter = R17             ; Counter Register

.equ counter = 25               ; Counter Value (25*8ms=200ms)

.CSEG
.ORG 0
        rjmp Start              ; RESET

.ORG OVF0addr
        rjmp T0OVFISR           ; TIMER0 OVF

T0OVFISR:
        in R15, SREG            ; Save Status Register
        dec rcounter            ; Decrement counter
        in rmp, TCNT0           ; Is this the canonical way to load the counter, increment it and reset?
        subi rmp, -6            ;
        out TCNT0, rmp          ; Reset Timer to 250 clicks
        out SREG, R15           ; Reset Status Register
        reti                    ; End ISR and re-enable Interrupt

Start:
        ldi rmp, LOW(RAMEND)
        out SPL, rmp            ; Stack pointer to RAMEND

        ldi rmp, 1 << CLKPCE
        sts CLKPR, rmp          ; System Clock Prescaler enable

        ldi rmp, 1 << CLKPS3
        sts CLKPR, rmp          ; Set System Clock Prescaler to 256:31.25KHz

        ldi rmp, 1 << TOIE0
        sts TIMSK0, rmp         ; Enable Timer0 Interrupt Overflow

        ldi rmp, 1 << CS00      ;
        sts TCCR0B, rmp         ; Set Timer Clock to Prescaler 1

        sbi DDRB, DDB5          ; Set PB0 as output (BUILTIN_LED)

        ldi rmp, 6              ;
        out TCNT0, rmp          ; Reset Timer to 250 clicks
        ldi rcounter, counter   ; Pre-load Counter

Loop:
        cli                     ;
        tst rcounter            ; Is Counter 0? ( 25 * 8ms has elapsed )
        sei                     ;
        brne Loop               ; If not 0, loop
        cli                     ;
        sbi PINB, PB5           ; Else Toggle PB5
        ldi rcounter, counter   ; Reset Counter to 25
        sei                     ;
        rjmp Loop               ; Loop
~                                                 
; .lss generated by avr-gcc inclusive of the changes described to blink.c
; 

        .file   "blink.c"
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__tmp_reg__ = 0
__zero_reg__ = 1
        .text
.global __vector_16
        .type   __vector_16, @function
__vector_16:
        __gcc_isr 1
/* prologue: Signal */
/* frame size = 0 */
/* stack size = 0...4 */
.L__stack_usage = 0 + __gcc_isr.n_pushed
        in r24,0x26
        subi r24,lo8(-(6))
        out 0x26,r24
        lds r24,rcount
        subi r24,lo8(-(-1))
        sts rcount,r24
/* epilogue start */
        __gcc_isr 2
        reti
        __gcc_isr 0,r24
        .size   __vector_16, .-__vector_16
        .section        .text.startup,"ax",@progbits
.global main
        .type   main, @function
main:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
        ldi r24,lo8(-128)
        sts 97,r24
        ldi r24,lo8(8)
        sts 97,r24
        ldi r24,lo8(1)
        sts 110,r24
        out 0x24,__zero_reg__
        out 0x25,r24
        ldi r24,lo8(6)
        out 0x26,r24
        sbi 0x4,5
.L3:
        rjmp .L3
        .size   main, .-main
.global rcount
        .data
        .type   rcount, @object
        .size   rcount, 1
rcount:
        .byte   25
        .ident  "GCC: (GNU) 10.1.0"
.global __do_copy_data

                                                                                                                                            1,1-8         Top

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Wow, thank you as well.  Truly impressive the level of detail in this forum and the time you guys are taking on someone's first post:

 

1)  Yes I suspected that the compiler was setting up a complete vector ISR table given the warnings about what happens if you trigger an undefined interrupt.  The example i referenced used individual reti calls which break on the ATMEGA328 because of the 2 word entries, and which were a pain to manually recreate just to find out they didn't work :).  What is the canonical way of doing it?  I found the .ORG + address on this forum and adapted it to suit.  Is the compiler to conservative or am I playing with fire?

 

2) As per my reply ^ the .lss didn't look to bad for the ISR code.  A bit cleaner than the example you pasted above.  

 

3) Noted on the CTC.  I discovered this in the next example in the PDF I referenced.  Super cool & efficient way to avoid the TCNT0, but I'm using an Arduino Uno and just the builtin_led on PB5 so I'd have to jumper the pin to use a different output.

 

4) I still need to learn / understand how I can use that function clock_prescale_set within an .asm file.  Wanted to try without any helper code before including outside libraries / calls.

 

5) Is it really important to disable interrupts on the prescale set?  This is done in the file prior to setting the interrupt as enabled? 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 3

The advantages of working in assembler is only apparent when using microcontrollers that are tiny in size and flash memory space: specifically the Tiny13 and the new Tiny10 that only have 1000 bytes of flash memory total.

 

The cost of writing and especially maintaining programs that are 10,000+ instructions in size is much greater in assembler than it is in higher level languages like C or C++. Even if you have one super-coder on staff who can keep the complex assembler patterns organized in his head, the advantage is lost when he has to create documentation that will enable his assistant coders up to speed.  So with complex assembler-based programs, the company often has one guy who is the "high priest" or "guru" of the specific program.  And if you have to go to a new microcontroller of a different family, or the super-coder leaves the company, everything starts over from the beginning when using assembler.

 

Instead of learning assembler, I suggest learning how Arduino works on all of its various levels.  It is a system for moving advanced code between different microcontroller devices and families with as little friction as possible.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

things to optimize: 

 

If you want a 8 ms timer ISR don'y use overflow but compare match (then you don't need the correction of the timer).

 

You don't need to init the stack on a tiny13

 

There is no need to use STS and LDS on IO's on a tiny13, IN and OUT will do. (If you need LDS and STS for RAM then make Y point to first RAM Addr, then all RAM can be reached with LDD and STD instructions that is half size, and then YH will then also be your zero register).

 

since rcounter is one 8 bit register you don't need cli and sei around it.

 

I don't know what else you want to add of code, but often you would move the handling of the interrupt counter over in the interrupt (clr counter and set a flag "main" can check and clr).

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
You don't need to init the stack on a tiny13

We've all been round and round on this one -- is there any possibility that your app could run away, or have a stack overflow, or other method to get to 0?  Then the RAMEND is a precaution.  Is the aim of this exercise to get the smallest program?  To what end?  Is the aim to configure your C toolchain to match your ASM?  Then you can skip vector table or have your own.  You can run "naked" without a preamble.  You can skip initial values, and hope you only have a poweron from a clean chip.

 

But there is a reason for the preamble, and not just C requirements (or preferences) with the way it gets to main().  My toolchain turns off the watchdog to prevent cascading.  There are other little pieces, that you want in any "real" app.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Simonetta wrote:

Instead of learning assembler, I suggest learning how Arduino works on all of its various levels.  It is a system for moving advanced code between different microcontroller devices and families with as little friction as possible.

 

Isn't assembler just one level of 'Arduino? wink Now we are definitely going down the philosophical path of design choices.  That sort of key man dependency and complexity equally applies when the ability to write ANY code sits in the hands of a finite number of Java developers.  When any change request, or frankly even fixing any of the numerous defects that make its way into production, becomes a multi year affair, it's better to strip away the layers of abstraction and encourage people to take a 'bootstrap' approach and 'waste' some of their time getting their hands dirty. IMH experience, having as complete of a view of the system as practical is the secret to coming up with insights and solutions that 99% of people miss.  Or maybe its a bit of cart before the horse, and the sort of people who try to understand the big picture are the questioning types who come up with creative solutions.

 

sparrow2 wrote:

There is no need to use STS and LDS on IO's on a tiny13, IN and OUT will do. (If you need LDS and STS for RAM then make Y point to first RAM Addr, then all RAM can be reached with LDD and STD instructions that is half size, and then YH will then also be your zero register).

 

Thank you sparrow.  Could I trouble to ask you for a bit of example code?  Noted that it wouldn't be needed on a tiny13 but my learning platform is an Arduino Uno just to avoid the jumble that is the breadboard.  If it's an obvious or simple question, could you point me to any learning resources?  The .lss implements with STS/LDS so its not something I can learn through imitation, and I'm finding it rare that someone explains a coding difference between a attiny13 and atmega328p.  The vector table was a prime example where it was pretty simple to find an example of defining a table, but quite a pain to find an example of how to manage the 2 word addresses of the atmega328.  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have learned by try and error, when I started with AVR (about 20 years ago), it was a lot about compare it to a 8051 structure,(+ Z80 and 6502) I have used in the past, I have never read any "real" AVR books, and I guess that most that is out there is outdated.

 

I have never used Arduino (about only used stk200 and later stk500). but for sure there are sample code in C(PP). And try to understand how the code works.

 

We need to know what your goal is 

 

The main thing about a project is planning, I also know that is hard in a learning process.

Because I have done this kind of things for 40 years, the planning is hard of explain.

Often there is a new chip involved and I get that to work with the dev. board, (often I have AC power control involved so care must be taken and use SSR blocks at least to begin with).

I spend some time (more than others I know), about what is worst case timing we can live with, and what is "about" timing of all the functional blocks, when to start ADC, RX TX etc.

Then a ISR plan. I have only used org. AVR's (1 level ISR), so I very often only have a timer ISR that run relative fast (something like 10KHz for C and more for ASM). That will then pull all the other "interrupts", decoding of incoming data is often handled on the fly. This part of the the code I will always check for worst case time (even if I write it in C).

 

If you want fast and small code have a good plan for the use of registers, and the lifetime of the value of them. (when are they holding a variable and when are they temp. registers free of use) (and here I run into problems with xmega's and newer AVR's there you can't load and store registers with pointers (like memcpy) )

 

 

If you make a lot of flash read so Z gets busy, perhaps have a low register pair where you can store Z (so movw can be used instead of 2 push).

 

 

 

 

 

 

 

 

 

  

 

 

  

Last Edited: Sat. Jun 27, 2020 - 11:58 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
You don't need to init the stack on a tiny13
but according to
rob6118 wrote:

#Compiled with "avr-gcc -Wall -g -Os -mmcu=atmega328p -o blink.bin blink.c" version 10.1.0

he's building 328p code not t13 (in part this is why there is so much "wasted space" from the large IVT)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ok 

then :

you don't to init the stackpointer on a mega328

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:

ok 

then :

you don't to init the stackpointer on a mega328

 

lol...
theusch wrote:
We've all been round and round on this one -- is there any possibility that your app could run away, or have a stack overflow, or other method to get to 0?  Then the RAMEND is a precaution. 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Can anyone tell me if there is a good reason not to init your stack pointer? Or your global variables? Or your peripheral registers? You can accomplish all that in what, 199 bytes for a '328?

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brian Fairchild wrote:
You can accomplish all that in what, 199 bytes for a '328?

;)  I thought we just did this counting not too long ago.

 

-- How do you get 199 bytes?  or any odd number.

-- 100 words does like 50 registers.  Or maybe it is just the "important" registers?

 

[edit] I guess that is "all":

theusch wrote:

Brian Fairchild wrote:

It costs me 22 bytes of flash which is, IMHO, memory well spent.

 

Brian Fairchild wrote:

Some people, myself included, initialise EVERY peripheral register even if the peripheral is not used.

 

Now, let's say your target AVR model is a Mega48.  How much does it take then?  Exercising the wizard to see...

[I'm partially on your side, but in a full 48/88 app it can get onerous.  BTW, how do you handle interrupt vectors?  One of my colleagues would make a handler for each, disabling the offending enable bit.]

74 words,

Answer to the above:  74 words, 148 bytes.  Acceptable in most situations.  206 bytes for a '328.

 

 

...from https://www.avrfreaks.net/commen...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Sat. Jun 27, 2020 - 03:53 PM
This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can get the size to near assembly by using certain unholy tricks that you shouldn't use unless you really need to.

  • replace global variable by dedicated register variable, that you will have to initialize in main(). This gets rid of __do_copy_data().
  • pass the -nostarfiles option to the linker, this will remove the vector table, stack init and well, everything that you don't put there explicitly.
  • place main() in the ".init0" section, since it's small enough to fit in the vector table before the timer ISR jump.
  • Align main() so that whatever comes next will be placed at the timer ISR position.
  • place the timer ISR in the ".init1" section, so that it comes after main().
  • Use recent version of avr-gcc (at least 8) so that ISRs are optimized.

 

#include <avr/io.h>
#include <avr/interrupt.h>

#define F_CPU 		16000000UL 	// 8 MHz internal clock
#define CLKPRESCALE	256		// Slowest system clock prescale
#define PRESCALE	1		// Timer Prescale
#define PERIOD		8
#define T0_CLKS		((F_CPU/CLKPRESCALE*PERIOD)/1000)

unsigned register char count asm("r16");

ISR(TIMER0_OVF_vect, __attribute__((section(".init1")))){
  TCNT0 += (256 - T0_CLKS);		// Tick Interupt every 8ms
  count--;
}

int main() __attribute__((section(".init0")));

int main (void) {
  asm(".balign 0x40");

  count = 25;

  CLKPR = (1 << CLKPCE);			// Enable setting System Clock Prescaler
  CLKPR = (1 << CLKPS3);			// Set System Clock Prescaler to 256 -- 31.25KHz

  TIMSK0 = (1 << TOIE0);			// Enable overflow interrupt
  TCCR0A = 0;				// Normal mode
  TCCR0B = (1 << CS00);			// Set Timer Clock Prescaler to 1
  TCNT0 = (256 - T0_CLKS);		// Inital Load Timer0 Counter

  DDRB |= (1 << DDB5);			// Enable PB5 for Builtin_LED

  while (1) {
    cli();
    if(count == 0){		// 200ms (25*8) intervals
      count = 25;
      PINB |= (1 << PB5);	// Toggle PB5
    }
    sei();
  }
  return 0;
}

 

result on avr-gcc v10 (86 bytes):

 

Disassembly of section .text:

00000000 <main>:
   0:	09 e1       	ldi	r16, 0x19	; 25
   2:	80 e8       	ldi	r24, 0x80	; 128
   4:	80 93 61 00 	sts	0x0061, r24	; 0x800061 <__TEXT_REGION_LENGTH__+0x7e0061>
   8:	88 e0       	ldi	r24, 0x08	; 8
   a:	80 93 61 00 	sts	0x0061, r24	; 0x800061 <__TEXT_REGION_LENGTH__+0x7e0061>
   e:	81 e0       	ldi	r24, 0x01	; 1
  10:	80 93 6e 00 	sts	0x006E, r24	; 0x80006e <__TEXT_REGION_LENGTH__+0x7e006e>
  14:	14 bc       	out	0x24, r1	; 36
  16:	85 bd       	out	0x25, r24	; 37
  18:	8c e0       	ldi	r24, 0x0C	; 12
  1a:	86 bd       	out	0x26, r24	; 38
  1c:	25 9a       	sbi	0x04, 5	; 4
  1e:	f8 94       	cli
  20:	01 11       	cpse	r16, r1
  22:	02 c0       	rjmp	.+4      	; 0x28 <__gcc_isr.n_pushed.001+0x26>
  24:	09 e1       	ldi	r16, 0x19	; 25
  26:	1d 9a       	sbi	0x03, 5	; 3
  28:	78 94       	sei
  2a:	f9 cf       	rjmp	.-14     	; 0x1e <__gcc_isr.n_pushed.001+0x1c>
	...

00000040 <__vector_16>:
  40:	8f 93       	push	r24
  42:	8f b7       	in	r24, 0x3f	; 63
  44:	8f 93       	push	r24
  46:	86 b5       	in	r24, 0x26	; 38
  48:	84 5f       	subi	r24, 0xF4	; 244
  4a:	86 bd       	out	0x26, r24	; 38
  4c:	01 50       	subi	r16, 0x01	; 1
  4e:	8f 91       	pop	r24
  50:	8f bf       	out	0x3f, r24	; 63
  52:	8f 91       	pop	r24
  54:	18 95       	reti

 

note: this is for mega328p only! For other chips main() may not fit in the ISR table and the timer interrupt vector may be in a different address.

Last Edited: Sat. Jun 27, 2020 - 04:49 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Then the RAMEND is a precaution

of what ?

 

If you want any  precaution, you should make a forever loop so the WD will do the cleanup  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I don't see r1 being cleared anywhere but used as zero!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
I don't see r1 being cleared anywhere but used as zero!

 

You're right! R1 needs to be initialized to zero. Fortunately my post is full of disclaimers cheeky

 

edit: we can put it right after the align

int main (void) {
  asm(".balign 0x40");
  asm("sub r1,r1");

 

Last Edited: Sat. Jun 27, 2020 - 10:12 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I started by recreating Lukasz' Blink with Timer OVF project, with just some slight adjustments and I was surprised that AVR-GCC compiled it to ~200+ bytes.  I rewrote the 'same' code in Assembly and compiled using AVRA and the code is just 31 bytes. 

... why the same code compiles 7 times larger ...

 I want to go back and look at that "31 bytes" more carefully...

First, as was pointed out early but not emphasized enough, that's 31 WORDS, or 62 bytes...

Next, there is this interesting output from AS7:
 

        "ATmega328P" memory use summary [bytes]:
        Segment   Begin    End      Code   Data   Used    Size   Use%
        ---------------------------------------------------------------
        [.cseg] 0x000000 0x00007c     62      0     62   32768   0.2%

 

It confirms that 62 bytes.  HOWEVER: notice that the begin/end addresses are NOT AT ALL consistent with that "62 bytes."   0x0007C end address means 122 bytes!
Apparently, when you do:

.org 0
  jmp start
.org 0x20
  jmp myISR

The assembler will count that as two instruction words, NOT including the words in between them.  We can debate whether that's "fair", but a C program will count all those bytes from unused vectors (including ALL the vectors.)

So the vectors are counted as four bytes in the ASM program, but take 104 bytes in the C program (for a m328.)

That means that the code, not including the vectors, is 58 bytes for the ASM version, and ~130 bytes for the C version - only about 220% increase.  (My Atmel-latest has a 236byte final size for the .C version.)

 

That includes 40-odd bytes of C runtime initialization that is only paid once, some useless "call main" and "return" instructions, some overzealous context saving in the ISR (which is better in some newer C compilers, I hear), and the sort of growth that you'd expect from moving your COUNT variable from a register to a memory location, and your sreg save from a register to the stack...

 

The actual written code from the program has 82 bytes:

 

avr-nm -S --size-sort a.out
00000096 00000028 T __vector_16
000000be 0000002a T main

 

PS: don't use an ATtiny13.  The newer attiny402 has 4x the memory (both flash and ram), more and better peripherals, and is cheaper.

 

 

Last Edited: Sun. Jun 28, 2020 - 02:05 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rob6118 wrote:

PINB |= (1 << PB5);	// Toggle PB5

I recommend against.

What happens can depend on the optimization level.

I recommend either PINB=(1<<5) or inline assembly.

Whether either statement ought to do

what you want is a matter of some debate.

 

BTW if you are looking for small code, you could get rid of interrupts.

Use CTC mode and read the overflow flag.

 

Also, I recommend against PB5.

All the Pxd things are d.

The Px's are just harder to read.

 

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And another weekend gone getting AVR_SIM to work on Linux, and FINALLY getting back to functional code that works.  I read every bit of code above and thank you all for the great feedback.  ATTiny13 is still cheaper on Aliexpress, and I basically just bought a mixed cluster of bottom bucket specials: ATTiny13, ATmega2313, STC89rc52.  I'll keep an eye out for the attiny402 because indeeds its pretty much what I'm looking for at only .66 cents before shipping!

 

@sparrow2

 

You asked what my goal is:  Learning for the sake of learning, so I can be as lazy or as pedantic as the moment dictates.  I have a 5 yr old little boy and I bought him a robot on a whim (he loves to tinker and since we moved to Singapore there is no garage, garden, or other mechanical things I can occupy him with).  That combined with a couple raspberry pis I had laying around got us started with tinkering with hardware.  Little boys aren't too good with understanding the basics of SD corruption, so one of the first things I noticed was no power monitoring for the Pi Tablet or the Pi based robot I built for him.  The Pi has no ADC so that got me started tinkering with SPI bit banging on a BoB ADC I had laying around in an electronics kit.  Then I picked up one of these from aliexpress which are pretty nifty for power supply, but the BMS chip is pretty chinese and I couldn't figure a way to capture an interrupt signal before the power gets cut.  So now I'm looking to either embed an ADC or ADC on a chip (ATtiny13) to piggy back onto the battery voltage and speak to the Pi via SPI.  I might go down the path of seeing how well can the ATtiny work as a BMS (alternate samples of main vs battery voltage and selecting power routing to the Pi).  

 

My little boy has already let out the magic blue smoke on another project (silly dad soldered a DC barrel jack knowing full well the number of laptop power supplies we have lying around) so an abundance of cheap parts and patience is a necessity.  Aliexpress is a blessing, and we've made the most out of 4 months of quarantine :).

 

Anyways thanks to all for taking the time to reply.  Back to hitting the books! ;)

 

Rob

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@sparrow2

 

Thanks for pointing me in the right direction and this helped get me across the finish line.  I'd welcome any further critiques.........

 

.NOLIST                                                 ; Output of listing off
.DEVICE "ATmega328P"                                    ;
.LIST                                                   ; Output of listing on

.def rmp1 = R16                                         ; Multi Purpose Register1

.equ T0_CLKS = 98                                       ;
.equ OFFSET = 0x60                                      ; Store Indirect Offset

.CSEG
.ORG 0

        rjmp Reset                                      ; Reset

.ORG 0x001C

        rjmp TIM0_OCA                                   ; Jump to ISR

.ORG 0x0034                                             ; Is it bad to overwrite the other vector addresses even when they aren't needed?

TIM0_OCA:                                               
        reti                                            ; End ISR and re-enable interrupt

Reset:

        ldi rmp1, LOW(RAMEND)                           ; Initializing the 328 requires a 2 byte pointer
        out SPL, rmp1                                   ;
        ldi rmp1, HIGH(RAMEND)                          ;
        out SPH, rmp1                                   ;

        sbi DDRD, DDD6                                  ; Simulators make everything easier (but not easy)

        ldi YH, 0x00                                    ; Too bad you are limited to 63 bytes advance
        ldi YL, OFFSET                                  ; Can't replace OUT with STD?'

        ldi rmp1, 1 << CLKPCE                           ; Enable System Clock Prescaler
        std Y+(CLKPR-OFFSET), rmp1                      ;

        ldi rmp1, 1 << CLKPS3                           ; Set System Clock Prescaler to 256 == 31.25KHz
        std Y+(CLKPR-OFFSET), rmp1                      ;

        ldi rmp1, 1 << CS00 | 1 << CS01                 ; Set Timer Clock to Prescale 64
        out TCCR0B, rmp1                                ; Off by .7ms from 200ms target

        ldi rmp1, 1 << WGM01 | 1 << COM0A0              ; CTC Mode and enable OC0A Toggle
        out TCCR0A, rmp1                                ;
        
        ldi rmp1, 1 << OCIE0A                           ;
        std Y+(TIMSK0-OFFSET), rmp1                     ; Enable Compare A Interrupt

        ldi rmp1, T0_CLKS                               ;
        out OCR0A, rmp1                                 ; Set Timer

        sei

Loop:
        rjmp Loop                                       ; Loop
        
Program             :       24 words.
Constants           :        0 words.
Total program memory:       24 words.
Eeprom space        :        0 bytes.
Data segment        :        0 bytes.
Compilation completed, no errors.
Compilation endet 29.06.2020, 23:42:40

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
        ldi YH, 0x00                                    ; Too bad you are limited to 63 bytes advance
        ldi YL, OFFSET                                  ; Can't replace OUT with STD?'

        ldi rmp1, 1 << CLKPCE                           ; Enable System Clock Prescaler
        std Y+(CLKPR-OFFSET), rmp1                      ;

        ldi rmp1, 1 << CLKPS3                           ; Set System Clock Prescaler to 256 == 31.25KHz
        std Y+(CLKPR-OFFSET), rmp1                      ;

Maybe it's just me but why do you add an offset to then subtract it? Wouldn't this be the same as:

        ldi YH, 0x00                                    ; Too bad you are limited to 63 bytes advance
        ldi YL, 0x00                                  ; Can't replace OUT with STD?'

        ldi rmp1, 1 << CLKPCE                           ; Enable System Clock Prescaler
        std Y+ CLKPR, rmp1                      ;

        ldi rmp1, 1 << CLKPS3                           ; Set System Clock Prescaler to 256 == 31.25KHz
        std Y + CLKPR, rmp1                      ;

but then if you were going to do that why not simply:

        ldi YH, 0x00                                    ; Too bad you are limited to 63 bytes advance
        ldi YL, CLKPR                                  ; Can't replace OUT with STD?'

        ldi rmp1, 1 << CLKPCE                           ; Enable System Clock Prescaler
        std Y, rmp1                      ;

        ldi rmp1, 1 << CLKPS3                           ; Set System Clock Prescaler to 256 == 31.25KHz
        std Y, rmp1                      ;

what am I missing here?

 

(it's recently been proven I need more caffeine so perhaps it's something obvious?!?). 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

what am I missing here?

 

(it's recently been proven I need more caffeine so perhaps it's something obvious?!?). 

The website I linked says that Y+q is limited to advancing 63 bytes from the reference pointer. I tried YL=0x00 but that means a 90-110 offset for those registers = no go. I also tried to see if I could turn Q negative to hit the DDRD Out commands but negative displacement was out of range as well.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Don't try RTFM.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:

Don't try RTFM.


Yeah um 1 line on Page 18 out of 660 pages. Gee I wonder why I tried trial and error with a debugger first?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rob6118 wrote:
sparrow2 wrote:

Don't try RTFM.

Yeah um 1 line on Page 18 out of 660 pages. Gee I wonder why I tried trial and error with a debugger first?

Start with the table of contents or the index.

 

As noted earlier, you can make the code even smaller by eliminating the ISR.

Your variable can be local and non-volatile.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry then

 

The AVR instruction manual

 

http://ww1.microchip.com/downloa...