Any interest in Cortex M0+ ASM code to drive WS2812B LEDs?

Go To Last Post
4 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not sure if this is the right forum to ask, but I've written an ASM routine for Cortex M0+ to bit-bang output data for WS2812B LEDs that I'm happy to share.  The function has a C declaration of

 

void output_ws2812B(uint8_t * pbuf, uint32_t count);    // sends out 'count' bytes of data located at 'pbuf'.

 

Code was designed for a 12 MHz clock, but of course nops can be added for faster clocks.  Bit encoding is 8 clocks hi, 4 lo for a '1' bit, and 4 clocks hi, 8 lo for a '0' bit.  31 lines of actual code.

Last Edited: Mon. Jun 24, 2019 - 09:13 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Moved to the ARM forum.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I certainly wouldn't mind seeing it.  Which chips does it work on?  12MHz seems like zero-wait-state, moderately deterministic territory - running on a 48MHz chip with poorly-described "flash acceleration" might be thornier.  (Arduino's delayMicrosecond() function was a busy loop on the SAMD21 boards, but it all went crazy on the 120MHz SAMD51 boards!)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Code developed in Segger Embedded Studio for a FRDM_KE04Z board (Cortex M0+).  Goals were to use the lowest possible clock speed (fewest CPU cycles per loop), and to keep the bit timing of every bit the same.  Since the WS2812B has pretty relaxed timing specs, this latter restriction was not necessary, but was a self-imposed goal "just because".

 

Timing for '1' bits: 8 cycles HI, 4 cycles LO

Timing for '0' bits: 4 cycles HI, 8 cycles LO

 

I should add that this is my first and only attempt at Cortex M0+ assembly programming.

 

  .syntax unified

  .global output_ws2812b
  .extern _vectors

  .section .init, "ax"
  .thumb_func

// WS2812 Cortex M0+ code

// output_ws2812b(uint8_t * pbuf, uint32_t count);
// optionally: output_ws2812b(uint8_t * pbuf, uint32_t count, uint32_t gpio_mask);

// '1' bit: 8 clocks hi, 4 clocks lo
// '0' bit: 4 clocks hi, 8 clocks lo
// recommended cpu clock: 12 MHz

// R0 = (param 1) byte buffer address
// R1 = (param 2) byte count
// R2 = FGPIO port bit (could be passed as param 3)
// R3 = data bytes
// R4 = data bit mask (must be saved and restored)
// R5 = FGPIO port base (must be saved and restored)
// R6 = #1, for rors instruction

  .equ  FGPIOA_BASE,  0xF8000000        // fast IO
  .equ  PSOR,         0x4               // output set register offset
  .equ  PCOR,         0x8               // output clear register offset
  .equ  PTB2,         1<<10             // gpio output bit (next to the GND pin)
  .equ  MASK,         0x80808080        // never-ending bit mask (8 LSBs are what count)

output_ws2812b:
    push   {r4-r6}                 // need to preserve all above r0-r3
    add    r0,r1                   // point r0 to end of data + 1
    rsbs   r1,r1,#0                // negate byte count
    ldr    r2,=PTB2                // using this bit for our output
    ldrb   r3,[r0,r1]              // load r3 with data byte [r0+r1]
    ldr    r4,=MASK                // endless data bit mask, so no reloading (output data in MSB->LSB order)
    ldr    r5,=FGPIOA_BASE         // pointer to FGPIO
    movs   r5,#1                   // mask rotate count
    cpsid  i                       // disable ints (may want to save and restore state later)

L0: str    r2,[r5,PSOR]            // set output to 1
    tst    r3,r4                   // check state of current bit
    beq    ZBit                    // branch if bit is 0 (shorter 1-time)

    rors    r4,r6                  // shift mask bit right
    bcc     L1                     // no new byte, finish '1' bit

    adds   r1,#1                   // advance to new byte
    ldrb   r3,[r0,r1]              // load r3 with new byte [r0+r1]
    str    r2,[r5,PCOR]            // set output to 0
    beq    Done                    // (Z holds result of the adds instruction)
    b      L0                      // more bytes

L1: nop
    nop
    str    r2,[r5,PCOR]            // set output to 0
    nop
    b      L0                      // next bit

ZBit:
    str    r2,[r5,PCOR]            // set output to 0
    rors   r4,r6                   // shift mask bit right
    bcc    L2                      // no new byte, finish '0' bit

    adds   r1,#1                   // advance to new byte
L2: ldrb   r3,[r0,r1]              // load r3 with byte [r0+r1] (just a delay if not a new byte)
    bne    L0                      // more bytes (if we jumped over adds, beq taken because r4 != 0)

Done:
    cpsie  i                       // enable ints (may want to save and restore state later)
    pop   {r4-r6}
    bx    lr                       // return

 

Last Edited: Sun. Jun 30, 2019 - 03:06 PM