I'm working on a stripped-down AVR emulator in order to simulate in real time an ATmega1280/ATmega2560-based game I'm working on (my current emulator, simavr, is fairly accurate but slower than I'd like). While implementing timers, however, I've noticed a pretty consistent 1 cycle discrepancy between my emulator and the hardware. The problem, I think, comes down to the "effective" times of sts and lds.
These are two-cycle instructions. In my emulator, I do all the work on the first cycle and nothing on the second. But I think the hardware actually uses both cycles for its magic, effectively doing everything on the second cycle. My tests seem to confirm this. No difference usually, unless you're accessing some timer registers (TCCRnB or TCNTn, probably others).
clr r1 ldi r16, (1 << CS10) ldi r17, (1 << TSM) | (1 << PSRSYNC) sts TCCR1A, r1 ; normal mode sts TCCR1B, r16 ; 1/1 prescaling ; TCNT1 = -/0 ; TCNT1 = 0/1 nop ; TCNT1 = 1/2 nop ; TCNT1 = 2/3 out GTCCR, r17 ; TCNT1 = 3/4 lds r20, TCNT1L write_byte framebuffer, r20 ; get 3
Seems reasonable, but is this behavior specified somewhere? I feel like it probably is, since it can make a noticeable difference, but I haven't found anything in the datasheet or various app notes.