IO register address inconsistencies

Go To Last Post
50 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I had thought that different AVRs used the same address for the same register. For example SREG always seems to be at 0x3f. I find it convenient since I can build a binary for one part that will work on another part without rebuilding.

In testing picoboot, after testing a build on an ATtiny84, I decided to test it on an ATtiny88. I was able to flash it to the chip without problems but didn't run. I checked my connections, and everything was OK. So I did a fresh build for the ATtiny88, flashed it, and it worked. So something is different.

Looking at the datasheets, for ATtinyx5, x4, & x313, PORTB = 0x18. On the ATinyx8, it's at 0x05.
After a little more digging through datasheets, I noticed that's the PORTB address for most of the ATmega parts, such as the 88, 168, and 328. So the ATtiny88 seems more like a stripped-down ATmega88 than an original ATtiny part.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
[T]he ATtiny88 seems more like a stripped-down ATmega88 than an original ATtiny part.

And, indeed, they are -- pin compatible with stripped-down peripherals.

There are other places where the registers have moved ( in different families ), so a rebuild is really the safest approach.

Martin Jay McKee

As with most things in engineering, the answer is an unabashed, "It depends."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I had thought that different AVRs used the same address for the same register.

Nope. For big shock, check out the UART on mega8 vs mega88. They're not even the same peripheral!
(it looks like in the early days, they really tried to cram the peripherals down in the low memory where CBI/etc and IN/OUT worked. When the chips with bigger memory came along, they didn't bother any more. (It annoys me on some level that nearly half of the desirable low addresses aren't even USED on ATmegaXX8!)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just use egrep on your system header files. This can tell you which AVRs have different addresses. Typing a single command is a lot easier than trawling through data sheets.

Since a bootloader is a on-off upload, I really would not worry about different builds. Most bootloaders identify themselves with a hard coded Signature.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Once you have been apalled by the tiny/mega and registers/bits "all over the place" then take a look at Xmega. As you will see Atmel have seen the error of their ways and designed a symmetric layout for all the Xmega devices. All that changes (on the whole - though there are a few exceptions) is that the simpler devices, with fewer peripherals, don't have all the register blocks in place. But those that do exist are the same. This makes it easy to develop on an A1 (say) and then port down to a D4 for production (or whatever).

Of course this was a trade-off on Atmel's part. They could do this because almost all the registers are only LDS/STS addressable and don't fall in range of IN/OUT or SBI/CBI.

Clearly the reason the tiny/mega are "all over the place" was that many devices tried to make sure that there regularly accessed registers had their peripherals in range of IN/OUT or even SBI/CBI. Especially all the early/small ones that packed everything into 64 locations (even doing stuff like URSEL double-ups to make everything fit).

This is why almost all "library" code for AVR (tiny/mega) is not supplied as a pecompiled linkable binary but as just a handful of C source (.c/.h) as it's expected that it will have to be recompiled once a target is chosen.

Thinks: if Atmel have even made 7 bits available in IN/OUT decode the situation could all have been quite different. (bet they are kicking themselves for not having made that decision earlier on). Oh and the SBI/CBI could have done with at least one more bit too.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Thinks: if Atmel have even made 7 bits available in IN/OUT decode the situation could all have been quite different.
Except of course that that would have taken up 2048 more opcodes (which means that they would have to drop something else since there are only about 1300 left).

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:

Since a bootloader is a on-off upload, I really would not worry about different builds. Most bootloaders identify themselves with a hard coded Signature.

David.


Once I've completed and debugged the bootloader, I agree making different builds is not a big deal. While I'm testing and rebuilding, it would be a lot more convenient if only had to build once for the four different chip families I'm supporting. At least I'll be able to use the same build on the t85 and t84. Now that I think of it, I might be able to use the same build for the t2313, t85, and t84. If I build for a target with 2K flash, it should work fine even with an 8K part... and only for a final release version I'd need different builds for the 8K vs 2K parts.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
Quote:
I had thought that different AVRs used the same address for the same register.

Nope. For big shock, check out the UART on mega8 vs mega88. They're not even the same peripheral!
(it looks like in the early days, they really tried to cram the peripherals down in the low memory where CBI/etc and IN/OUT worked. When the chips with bigger memory came along, they didn't bother any more. (It annoys me on some level that nearly half of the desirable low addresses aren't even USED on ATmegaXX8!)

They seem to have done a better job with that on the ATtiny series. The ATtinyx5, which is a relatively new (<10 yr) part, has 2/3rds of the address space below 0x20 used, and has no IO space above 0x3f (where in/out stop working). Performance wise, that means the more expensive mega parts tend to be slower and need more code to do the same thing than the tiny parts. lds is 4 bytes and 2 cycles vs. 2 bytes and 1 cycle for in.
Now if the mega parts had the 16-bit lds instruction for 1-cycle access to 0x40..0xbf this would be less of an issue.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
Quote:
Thinks: if Atmel have even made 7 bits available in IN/OUT decode the situation could all have been quite different.
Except of course that that would have taken up 2048 more opcodes (which means that they would have to drop something else since there are only about 1300 left).

???
They already have an instruction on some devices that acts like a 7-bit in/out. Opcode 10100 See the 16bit lds I mentioned. But for some reason it's not implemented on many AVRs. So far the tiny9/10 are the only ones I've noticed that have it (there must be at least a few others though).

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
So far the tiny9/10 are the only ones I've noticed that have it
But those only have 16 GP registers, so the opcodes have an extra bit. They just repurposed the unused bit for the address.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

So far the tiny9/10 are the only ones I've noticed that have it (there must be at least a few others though).

lol -- commonly referred to here as the "brain dead" AVR models. ;) (Tiny 4/5/9/10; Tiny20; Tiny40)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
Quote:
So far the tiny9/10 are the only ones I've noticed that have it
But those only have 16 GP registers, so the opcodes have an extra bit. They just repurposed the unused bit for the address.

I still don't see the problem.
Sure, in/out work with 32 registers, but ldi only works with the upper 16, so a 16-bit lds/sts on the upper 16 registers would still be a bit improvement.
If you're saying there's another opcode that already uses the 10100 & 10101 prefixes I couldn't find it.
The 32-bit lds/sts opcodes start with 1001000 and 1001001.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
the more expensive mega parts tend to be slower and need more code to do the same thing than the tiny parts. lds is 4 bytes and 2 cycles vs. 2 bytes and 1 cycle for in.

When I was trying to shrink optiboot down to 256bytes, I figured out that I could get another 64 bytes worth of cheap IO register access by sticking a base address in Z and using the "LDD r, Z+q" instructions (which is how I came to notice that the uart addresses are so different.) It wasn't enough, though. :-(

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
If you're saying there's another opcode that already uses the 10100 & 10101 prefixes I couldn't find it.
LDD Rd, Y+q, LDD Rd, Z+q, STD Y+q, Rr and STD Z+q, Rr.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
Quote:
the more expensive mega parts tend to be slower and need more code to do the same thing than the tiny parts. lds is 4 bytes and 2 cycles vs. 2 bytes and 1 cycle for in.

When I was trying to shrink optiboot down to 256bytes, I figured out that I could get another 64 bytes worth of cheap IO register access by sticking a base address in Z and using the "LDD r, Z+q" instructions (which is how I came to notice that the uart addresses are so different.) It wasn't enough, though. :-(

I think it's still doable. I started an stk500 (arduino) compatible version of picoboot a while back. It's <100 bytes, and I'd say it's close to half way complete.
http://code.google.com/p/picoboo...

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
Quote:
If you're saying there's another opcode that already uses the 10100 & 10101 prefixes I couldn't find it.
LDD Rd, Y+q, LDD Rd, Z+q, STD Y+q, Rr and STD Z+q, Rr.

Ahh, so much for that idea.
You know your opcodes well (or at least are better at searching through them).

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you are using my list of synonymous opcodes then note that it is not exactly in binary order as:

10o0 oo0d dddd booo    ldd r,b        1

occurs early on. So the first 4 bits aren't exactly 1010, and ordered in the list in that position, but the pattern means it can be 1010 (or 1000).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The crappy register layout on some devices is making it a real pain to use anything more than the PORT registers (which always seem to be below 0x20) in assembler.
Something like this in C:
while(!(SPSR & (1<<SPIF)))

could be:
wait:
sbis SPSR, SPIF
rjmp wait

or:
wait:
in r16, SPSR
andi r16, (1<<SPIF)
breq wait

or even:
lds r16, SPSR
andi r16, (1<<SPIF)
breq wait

Maybe one of these days I'll write a high-level assembly compiler. I'd use C-style flow control like if/while, but no types - variables would be defined by size only. i.e.
r8 bitcount; // 8-bit register
r16 delay; // 16-bit register

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I'll write a high-level assembly compiler.

Isn't that simply called "C" already? I think you'll find someone (several in fact) already beat you to it.

My C compiler of choice (avr-gcc) already does a pretty good job of picking the most appropriate instructions for the build target and its memory map.

As I said above:

Quote:
This is why almost all "library" code for AVR (tiny/mega) is not supplied as a pecompiled linkable binary but as just a handful of C source (.c/.h) as it's expected that it will have to be recompiled once a target is chosen.

There's little you can do to change that situation.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:

I'll write a high-level assembly compiler.

Isn't that simply called "C" already?

C doesn't give you access to the status flags like carry or to the t bit. C types, conversions, and casts adds complexity and a source for invisible errors.

I want to be able to use the assembler instruction set without resorting to ggc's hokey inline assembler syntax, but I don't want to have to keep track of registers and which instructions work with what registers. For example I'd like to be able to write:
r8 data; // 8-bit register
...
andi data, 0x55;

And have the compiler assign data to a register in the 16-31 range. Or if I only use in/out and ldi, then data can be assigned to any register.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

C doesn't give you access to the status flags like carry or to the t bit.

Really?:

SREG |= (1 << SREG_T);

if (SREG & (1 << SREG_T)) {
//==>     SREG |= (1 << SREG_T) ;
	in r24,__SREG__
	ori r24,lo8(64)
	out __SREG__,r24
//==>     if (SREG & (1 << SREG_T)) {
	in __tmp_reg__,__SREG__
	sbrs __tmp_reg__,6
	rjmp .L2

(but I take you point that C may be using T for its own use and have "stomped" on whatever you thought you were going to use it for).

SREG_C, SREG_Z etc. also available.

Last Edited: Tue. Feb 18, 2014 - 03:36 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And why are you so concerned with this? Is everything that you do so timing critical that every bit of code absolutely must have the most efficient opcodes possible? In my experience people like you spend so much time fretting about optimization that they get very little actual work done, and the optimization they do has absolutely no benefit.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

r8 data; // 8-bit register
...
andi data, 0x55;

try something like this:

register unsigned char temp asm("r16")=0;

Not allways optimal but perhaps better

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:

C doesn't give you access to the status flags like carry or to the t bit.

Really?:

SREG |= (1 << SREG_T);

if (SREG & (1 << SREG_T)) {

Thanks. I didn't know about the SREG_x definitions. It does get closer to doing some of the things in C that I thought could only do in assembler.

How about shift with and without carry in C?

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
Quote:

r8 data; // 8-bit register
...
andi data, 0x55;

try something like this:

register unsigned char temp asm("r16")=0;

Not allways optimal but perhaps better


Neat. Is there something similar for 16-bit? like:

register uint16_t temp_16 asm("r25:r24")=0;

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes
register unsigned int temp asm("r16"); //r17:r16

as I remember it need to be on a even addr.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

yes

register unsigned int temp asm("r17"); //r18:r17 

Don't work
remember that the compiler don't like that you take the registers from it!
Which registers to use and not is in the manual.
(and there is a way to tell the compiler that it should stay away form it, just to declare this way don't do it, and many libs just use registers without ask)
But if you start to code this way I guess it's better to write some of the code in ASM.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

How about shift with and without carry in C?

Well shifts without carry are easy - that's what the >> and << operatrors do.

To implement carry you have to mess about with reading the bit about to drop off, shifting then possibly or'ing back in your stored "carry" at either end of the byte - depending on which way you are shifting.

But if you are trying to mimic rotates in C you probably aren't looking at the overall problem/solution correctly - you are just trying to apply asm-like thinking to a C problem.

Can you give an example of something in C where you absolutely must have rotates?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Remember that << or >> on a int and long, is done correct in C (using Carry).
But I guess that you want it the other way that Carry don't change.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

But I guess that you want it the other way that Carry don't change.

If you are writing in C alone why does it matter? The C bit (and other SREG flags) is a hidden resource the compiler can choose to utilize in its own way.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:

Can you give an example of something in C where you absolutely must have rotates?

"absolutely must" is a high threshold. Having written a few high-speed bitbang UARTs and worked with bitbang SPI (in C), I am curious how close to hand-coded asm performance it is possible to get with writing a software shift register in C.
It's not just an academic question either; I've seen projects like large led matrices driven by 74HC595 shift registers where the speed the data is shifted impacts on the refresh rate of the display.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Even a Tiny4 has a hardware SPI. Modern Megas have UART_MSPI as well as their regular SPI.

If you want to achieve the fastest bit-bang SPI, you write the appropriate ASM.

But as a general rule, if you make a mistake with your pin allocation, you should get out your soldering iron.

Oh, software UARTs are plenty fast enough in C. Again, you use the hardware Timers in an appropriate way.

Yes, there are occasions when 10-20 lines of ASM can make a dramatic difference to the performance of a C program. It is not too difficult to write or maintain this quantity of ASM.

YMMV.

David.

p.s. the most important lesson for a 74HC595 is that it is an SPI device.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I am curious how close to hand-coded asm performance it is possible to get with writing a software shift register in C.

While I think some of your queries are, at best, theoretical (the AVR8 is what it is and it ain't that bad compared with other architectures), I do enjoy the challenges of "you can't do this in C" just as a fun thing.

There is a current thread on "ternary" where one of the examples is conditionally setting/clearing an AVR pin based on a data bit, as you would do when bit-banging to your '595. As you [and everyone?] knows, it is somewhat painful with AVR8 instruction set as some kind of conditional logic is needed.

If a bank of '595 as you hinted, I'd probably do some unrolling. For a single or two '595 I do it in a loop. So ...

1) Post your ideal ASM sequence for loading a '595. Let the games then begin.
2) There have been extensive threads on this in the past. One trick is to pre-process the data byte creating an XOR mask. The resulting code is then cycle-constant for each bit. I'd have to dig to find the thread--GCC forum IIRC?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I had to dig to find these. ;) A discussion on just this:
https://www.avrfreaks.net/index.p...
with the links to prior discussions, including skeeve's XOR method
https://www.avrfreaks.net/index.p...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

p.s. the most important lesson for a 74HC595 is that it is an SPI device.

???
No it's not (but no problem driving it with SPI)
Or do you make a point I'm missing?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No, I think that you have put it more accurately.

My main point was: use the AVR Hardware peripherals.
You can achieve F_CPU/2 with SPI, USI, USART.

From memory, the best ASM bit-bash SPI is F_CPU/5 and you lose cycles during housekeeping.

unsigned char softspi(unsigned char val)
{
#asm
    .equ PORTMOSI = 0x5
    .equ MOSI     = 1
    .equ PORTSCK = 0xB
    .equ SCK     = 2
    .equ PINSCK  = PORTSCK-2
    .macro __SPIBIT
    BST r26,@0 
    BLD r30,MOSI        
    OUT PORTMOSI,r30      ;sda
    OUT PINSCK,r31      ;PINSCK = (1<<2)
    OUT PINSCK,r31      ;5 cycles per bit
    .endm
    IN  r30,PORTMOSI     ;existing value of latch
    ldi r31,(1<<SCK)    ;toggle value
    cbi PORTSCK,SCK       ;start with scl = 0
    __SPIBIT 7
    __SPIBIT 6
    __SPIBIT 5
    __SPIBIT 4
    __SPIBIT 3
    __SPIBIT 2
    __SPIBIT 1
    __SPIBIT 0
#endasm    
}

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

1) Post your ideal ASM sequence for loading a '595. Let the games then begin.

At a different time I'd have taken up the challenge, but I'm deep in avrdude code trying to add support for picoboot.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:

My main point was: use the AVR Hardware peripherals.
You can achieve F_CPU/2 with SPI, USI, USART.

From memory, the best ASM bit-bash SPI is F_CPU/5 and you lose cycles during housekeeping.

    OUT PINSCK,r31      ;5 cycles per bit

I was mulling over the same thing last week during lunch, so I just went and found the envelope I was scribbling code on. You don't need the extra OUT instruction above (since the out r30 zeros the sck pin at the same time as outputting the data), so that brings it down to F_CPU/4.

With USI it's possible to do one shift per clock (at least on the parts that can clock from Counter0). If you enable the clock output fuse and connect CKO to the '595 clock in, I think you could get 8 bits shifted out every 10 cycles. You'd need to toggle the latch with 2 back-to-back out instructions exactly 8 cycles after writing to USIDR.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

At a different time I'd have taken up the challenge, ...

At the risk of being accused of using the s-word https://www.avrfreaks.net/index.p... why does it seem that so often this is the case? A claim is made; the "test case" is well defined; there must have been a reason for mentioning this test case (I'd assume that ralphd already had an efficient sequence). As I've often replied: "the Emperor has no clothes".

I've posted my naive, not-for-contest-purposes, 9-bit '595 production code before. With the generated machine code in 2005 and maybe other times. Indeed it isn't optimal but there ain't much fat for a loop version. I'm awaiting your optimal sequence and then we'll do some cycle counting. I'm guessing the ASM version might save a cycle or two per loop and end up [the typical] 5%-10% faster. But maybe not.
https://www.avrfreaks.net/index.p...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:

At a different time I'd have taken up the challenge, ...

At the risk of being accused of using the s-word https://www.avrfreaks.net/index.p... why does it seem that so often this is the case? A claim is made; the "test case" is well defined; there must have been a reason for mentioning this test case (I'd assume that ralphd already had an efficient sequence). As I've often replied: "the Emperor has no clothes".

Excuse me, but I never made any claim. I don't think Cliff made any claim. I stated I do some things in asm because I don't know how to do it in C, and suspect it may not be possible. In the discussion I've learned a couple tricks that make some things easier in C than I thought they were before.

For me, I'm not looking for a flame war on which language is the best for AVR programming. I think we all know that for some things, asm is the only way to go - things like v-usb and avr-vga come to mind. For other things (like floating-point math), doing it in asm would be a nightmare. The point is each language has it's place.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

From memory, the best ASM bit-bash SPI is F_CPU/5 and you lose cycles during housekeeping.

Y'all (ralphd and david.prentice) appear to postulate that when on a quest for "fastest" one has the luxury of using a whole port?

The '595 datasheet is confusing to me (what "SPI mode" would be represented?) but I'd guess that if using a full port one could do one of the clock edges in the same operation ad the data edge? From one datasheet "Data set-up and hold times" it looks like the data line DS is sampled on the rising edge of the clock SHCP.

So, if all registers were pre-loaded then each bit could be

OUT PORTx, Rdown ; Rdown has data bit and clock falling
OUT PORTx, Rup ; Rup has data bit and clock rising

That would be two clocks per bit -- of actual transfer time. At the expense of many registers and clock cycles for setup.

So, what are the rules? Whole port available, or arbitrary pins? Does setup count, or only clocks/bit during actual transfer?

For the contest ...

Quote:

I am curious how close to hand-coded asm performance it is possible to get with writing a software shift register in C.

... I'd vote for real-world situation: Arbitrary pins and the entire "send-this-byte-to-595" routine as shown in my config_leds() in the links above.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

Y'all (ralphd and david.prentice) appear to postulate that when on a quest for "fastest" one has the luxury of using a whole port?

No, load the port state first and only change the data and clock bits. It's only using one bit, but I used the same concept in one of my soft uart implementations:

; transmit byte contained in r24
; AVR305 has 1 cycle of jitter per bit, this has none
TxByte:
sbi UART_Port-1, UART_Tx ; set Tx line to output
cbi UART_Port, UART_Tx ; start bit
in r0, UART_Port
ldi r25, 3 ; stop bit & idle state
TxLoop:
; 8 cycle loop + delay
ldi delayArg, TXDELAY
rcall Delay3Cycle ; delay + 3 cycles for rcall
bst r24, 0 ; store lsb in T
bld r0, UART_Tx
lsr r25
ror r24 ; 2-byte shift register
out UART_Port, r0
brne TxLoop

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

... I'd vote for real-world situation: Arbitrary pins and the entire "send-this-byte-to-595" routine as shown in my config_leds() in the links above.

The best I could do bitbanging would be 8 bits out in 32 cycles: 4 cycles per bit unrolled to 32 instructions.
bst
bld
out dr (clock low & data)
out pin (for toggle clock)

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Having explicit access to the carry bit is useful for implementing the IP/TCP checksum algorithm (ones' complement sum, with end-around carry.)

Rotate usually comes up in CRC and PRNG generators...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
theusch wrote:

... I'd vote for real-world situation: Arbitrary pins and the entire "send-this-byte-to-595" routine as shown in my config_leds() in the links above.

The best I could do bitbanging would be 8 bits out in 32 cycles: 4 cycles per bit unrolled to 32 instructions.
bst
bld
out dr (clock low & data)
out pin (for toggle clock)

Shirley, this breaks the minimum TSU of the 74HC595. Of course, you might get away with it if you have some capacitance on the SH_CP pin.

I quite understand that your pins might be on different ports. This is another argument for designing your pin budget with the hardware in mind. Look at the NXP LP800 series if you design PCB layout before designing software. You can map hardware pins in software !

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Shirley, this breaks the minimum TSU of the 74HC595. Of course, you might get away with it if you have some capacitance on the SH_CP pin.

I don't think so, Tim. (at least not with the datasheet I looked at) Data is samples on rising edge of the clock. With the sequence above it is present since the previous falling clock and maintained till the next falling clock.

As we are dealing with the unrolled sequence, the BLD/BST could be split to introduce more even clock timing. (And indeed even if I could coerce a C sequence to match bst/bld/out/out it would be difficult or impossible to split the bst/bld with straight C.)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In some configurations only one bit of a port register is active, so their is no loss from stomping on the other bits of the port register. For example PB2 of the ATtinyx4 when using a crystal. In such a case I could get 8 bits out in 16 cycles, with a series of ror & out. OC0 or OC1 would be used to generate the clock with a period of F_CPU/2.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oops. You are quite correct. The "out dr (clock low & data) " is the Clock falling edge.
Even @ 20MHz, TSU is going to be fine. (I am looking at NXP 74HC595 data sheet)

I still reckon that it is worth using SPI, USI, USART instead of bit-bashing. And USART_MSPI can run with no gaps.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:

I still reckon that it is worth using SPI, USI, USART instead of bit-bashing. And USART_MSPI can run with no gaps.

I generally agree. Though in some cases you have none of them (i.e. ATtiny13). And in some cases you want something that you don't have to write 3 different versions of shift out code.

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
In such a case I could get 8 bits out in 16 cycles, with a series of ror & out. OC0 or OC1 would be used to generate the clock with a period of F_CPU/2.

It would also work with 9 bits, like the situation in the 2005 thread. Just add sbrc datahi, 0 and sec before the ror/out sequence.

I have no special talents.  I am only passionately curious. - Albert Einstein