Best way to store 13-nibble data stream

Go To Last Post
13 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

All,

 

I have a data stream from a device (Mitutoyo Gauge) that is a clocked data stream consisting of 13 4-bit pieces of data.  I'm using an ISR to read this 52-bit data stream, and I'm trying to find the best way to store it.  My first method was to store it in a 13 byte array:

 

static volatile uint8_t dig1DataBuff[13];
static volatile uint8_t dig1DataBit;
.
.
.
ISR( INTn_vect)
{
    dig1DataBuff[dig1DataBit / 4] |= (DIG1_DATA( PIN) << (dig1DataBit % 4));
    ++dig1DataBit;
}

That worked well at 26 bytes of code and approximately 3.9µs with a 16MHz8MHz clock.  Hoping to do better (although I should have known better as well) I decided to dump it into a 64-bit variable:

 

static volatile uint64_t dig1DataBuff;
static volatile uint8_t dig1DataBit;
.
.
.
ISR( INTn_vect)
{
    dig1DataBuff |= ((uint64_t)DIG1_DATA( PIN) << dig1DataBit++);
}

This was markedly worse at 56 bytes and 9.5µs as I should have expected when having an 8-bit device mess with long longs.

 

The clock pulse width is spec'ed at a minimum width of 100µs, but I'm trying to read four of these independent sources simultaneously (on four different external interrupts) so I'd like to pare this down as much as possible.  Hence my question; do any of you 'Freaks have other ideas about snagging these bitstreams?

 

 

Cheers,

Tom

Last Edited: Fri. Mar 13, 2015 - 02:08 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You first solution makes the most sense.

are you so low on memory you have to pack the data??

Keith Vasilakes

Firmware engineer

Minnesota

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Keith - at this point, space is not a problem.  It's not that I'm trying to pack the data.  Since the data already comes "packed" as a serial 52-bit stream, I was just trying to find the quickest way to store it and thought storing it with the least amount of transformation would be the fastest.

Cheers,

Tom

Last Edited: Thu. Mar 12, 2015 - 02:31 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you are hard up for space, replacing the 4's with 8's will allow you to put the data in 7 bytes instead of 52.

If your code is not fast enough, you can eliminate two unnecessary fetches of dig1DataBit by using a local copy.

 

There is also the matter of what you are going to use the data for.

I suspect that 4 bits per byte will result on simpler code elsewhere.

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks, Skeeve.

ISR( INT0_vect)
{
    uint8_t bitpos = dig1DataBit;
    dig1DataBuff[bitpos / 4] |= (DIG1_DATA( PIN) << (bitpos % 4));
    dig1DataBit = ++bitpos;
}

 

 

That gave a 270ns reduction down to 3.67µs.  I though about replacing the /4 and %4 operations with bitwise operations, but the compiler was already doing those optimizations.  Looks like short of coding in ASM I'm not going to get much better and I don't think that will even gain much.

Cheers,

Tom

Last Edited: Fri. Mar 13, 2015 - 02:07 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You could save another couple cycles by storing dig1DataBit in an IO register in range of IN and OUT.

You might be able to save some more by copying avr-gcc's asm code and removing unnecessary PUSHes and POPs.

Also, I suspect that (DIG1_DATA(PIN) << (bitpos % 4)) might be better expressed.

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This will get the ISR down to 49 cycles.

#define Rbitpos R16
#define Rdata   Rbitpos
#define Rmask   R17

  .global INTn_vect
  .global dig1DataBit
  .global dig1DataBuff

; dig1DataBit should start at -1, i.e. 0xFF

INTn_vect:
  ; 11
  PUSH Rbitpos
  PUSH Rmask
  PUSR XL
  PUSH XH
  IN XH, SREG
  PUSH XH

  ; 7
  LDS Rbitpos, dig1DataBit
  INC Rbitpos
  STS dig1DataBit, Rbitpos
  SBIC PINx, y
    RJMP 9f

  ; 5
  LDI Rmask, 1
  SBRC Rbitpos, 1
    LDI Rmask, 4
  SBRC Rbitpos, 0
    LSL Rmask

  ; 2
  LSR Rbitpos
  LSR Rbitpos

  ; 5
  LDI XH, hi8(dig1DataBuff)
  LDI XL, lo8(dig1DataBuff)
  ADD XL, Rbitpos
  BRCC 1f
    INC XH
  1:

  ; 4
  LD Rdata, X     ; reuses Rbitpos
  OR Rdata, Rmask
  ST X, Rdata

  9:
  ; 11
  POP XH
  OUT SREG, XH
  POP XH
  POP XL
  POP Rmask
  POP Rbitpos

  ; 4
  RETI

22 cycles are just for saving and restoring registers.

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Skeeve,

 

Thanks!  Right now I at 75 cycles and 3.7µs with the C code so that should get me down to around 2.5µs.  I haven't done assembler with GCC before.  Can I just replace the

ISR( INT0_Vect)
{
    .
    .
    .
}

With

__ASM__( INT0_vect:
    .
    .
    .
    RETI;
);

inside my C file?

Cheers,

Tom

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm a bit lost.  It would appear that the input "clocked" stream would be very similar to reading a 74HC165 shift register?

 

I wouldn't array index every bit, and |=, and MOD and such.  I'd probably gather bit-by-bit into a byte.  Yes, when the byte is full there would be some storage overhead -- but only once per byte, and indexing could be a pointer.  I can't easily tell from the posted code whether it is MSb or LSb.  No matter; simple adjustment. 

 

For one byte starting "fresh", and MSb:

 

result <<= 1; // make room for new bit; makes a 0 bit

if (THEPINVALUE) result |= 1;

 

count the bits, and when the byte is full then store.  As it is 52 bits, for efficiency I'd probably keep two counters, one for bits this byte and one for overall.

 

I'd think it would be more efficient with e.g. CodeVision and dedicated register variables.

 

With a normal low-addressed AVR port, the shift/check/or (or inc) sequence would be constant-time, especially with a register variable -- SHL;SBIC;INC

 

 

-- shift "result"

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Theusch,

 

I'm not really seeing how this will be faster.  Whether I shift the result and always set the lowest bit, or shift the bit itself doesn't seem like it would make much difference.  Right now loading the current data byte, dig1DataBuff[dig1DataBit / 4], takes 4 cycles.  If I was keeping this as a separate counter I'd still have to load the counter, check it, and load a new byte if necessary which looks like it would save 1 cycle.

 

As Skeeve pointed out, it looks like the real killer is the register overhead in the ISR.  My current C implementation is using 75 clock cycles, only 27 of which are my 3 lines of code (Skeeve reduces that part to 23 cycles).  If I want to make significant improvements, I'll need to switch to assembly to reduce the number of unneeded register pushes and pops.

Cheers,

Tom

Last Edited: Sat. Mar 14, 2015 - 12:16 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can use inline assembly if you really want.
The syntax is
ISR(INT0_vect, ISR_NAKED)
{
__asm__(" ... " ::: ???);
}
I'm not sure what should go in the place of ??? to indicate that
dig1DataBit and dig1DataBuff might be clobbered.
Since they are volatile, ??? probably does not have to be there.

 

That said, I'd be interested to know what would go there were they not volatile.

 

If dig1DataBit or dig1DataBuff is used in another function,
in-line assembly is probably the way to go.
I can think of other ways, but they are trickier.

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not really seeing how this will be faster.

Look at the generated code  for your "fast" version.  Is it three instructions, three cycles for an "ordinary" bit?  I think not.  Whole-byte store and housekeeping should be about a wash.

 

Suit yourself.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

One mechanism for optimizing code is to compile with -S and hand-edit the resulting assembly.

Keep the original .c file for documentation.

"SCSI is NOT magic. There are *fundamental technical
reasons* why it is necessary to sacrifice a young
goat to your SCSI chain now and then." -- John Woods