I have an external memory-like device connected to ATMega128 (X=14.7456MHz), with 8-bit data connected to a port and a /RD signal "manually" bit-banged by an another port's pin.
As usually, I am trying to squeeze out the last cycle off the 'M128. The spec for the device says, /RD falling edge to data out valid is max 150ns. Now, considering the AVR input sync mechanism, which wastes half a cycle before the actual IN, it means, that after setting /RD low, I have to put 4 cycles (nop-s) to have at least a 60-70ns safety margin for the unknown delays on my board and on the way from pin to latches on chip (btw. my measurements indicate that the device's spec plays far on the safe side, with data out times as low as 50ns, so I am now quite happy with 2 NOPs - but I want to stay on the safe side).
I am somehow reluctant to burn cycles on idle, so I came up with the following construct:
cbi RD_PORT, RD_PIN_NR nop nop sbi RD_PORT, RD_PIN_NR in DATA_PIN
 PS. ... of course, with interrupts disabled momentarily ... [/edit]