Zeta's Stories: How To Make SPI Cry
Jan. 2nd, 2024 07:23 pmThis is the story of a bug that baffled a room full of engineers (some of whom weren't EEs, so it's more understandable) for more than a week.
The product in question is a very low-power device with a radio, a couple of sensors, nothing really special. It had an MSP430 running the whole show, and had to run off battery power for multiple years in a harsh environment. In the lab, with debug builds, everything worked great, but as soon as we dropped the production code onto the board, things would start jamming up and not operating properly. This was weird, because all of the various devices were on their own isolated power planes, with MOSFETs controlling the power to them to cut off even idle draw when not in use.
It took a while, but I eventually figured out the problem, and it was that we were being too clever for our own good. You see, in a SPI bus, you have four lines:
1. SCLK: Serial Clock (clock signal from main)
2. MOSI: Main Out Sub In (data output from main)
3. MISO: Main In Sub Out (data output from sub)
4. CS: Chip Select (used to select which chip on the shared bus you want to talk to)
Because some of the chips we were using didn't have safe tri-state outputs on MISO with the power cut, the engineer who designed the board had inserted a cheap directional buffer in-between the MSP430 and the various chips that were to have their power removed. The buffer, however, only covered the first /three/ lines. He didn't consider chip select a problem because it wouldn't affect any other chips, which turned out to be wrong!
What he forgot were the ESD diodes in the sensors/radios, which 'happily' (they weren't quite designed for this) passed power from the /active low/ chip select line up to the isolated power plane. They didn't pass /enough/ power to bring up the (3.3V) chips themselves, but that sneaky directional buffer was operational down to 1.2V and would happily, and /completely/, jam the MISO line for everything else on that bus.
Easy fix: Just software toggle the CS line to low after cutting power.

The product in question is a very low-power device with a radio, a couple of sensors, nothing really special. It had an MSP430 running the whole show, and had to run off battery power for multiple years in a harsh environment. In the lab, with debug builds, everything worked great, but as soon as we dropped the production code onto the board, things would start jamming up and not operating properly. This was weird, because all of the various devices were on their own isolated power planes, with MOSFETs controlling the power to them to cut off even idle draw when not in use.
It took a while, but I eventually figured out the problem, and it was that we were being too clever for our own good. You see, in a SPI bus, you have four lines:
1. SCLK: Serial Clock (clock signal from main)
2. MOSI: Main Out Sub In (data output from main)
3. MISO: Main In Sub Out (data output from sub)
4. CS: Chip Select (used to select which chip on the shared bus you want to talk to)
Because some of the chips we were using didn't have safe tri-state outputs on MISO with the power cut, the engineer who designed the board had inserted a cheap directional buffer in-between the MSP430 and the various chips that were to have their power removed. The buffer, however, only covered the first /three/ lines. He didn't consider chip select a problem because it wouldn't affect any other chips, which turned out to be wrong!
What he forgot were the ESD diodes in the sensors/radios, which 'happily' (they weren't quite designed for this) passed power from the /active low/ chip select line up to the isolated power plane. They didn't pass /enough/ power to bring up the (3.3V) chips themselves, but that sneaky directional buffer was operational down to 1.2V and would happily, and /completely/, jam the MISO line for everything else on that bus.
Easy fix: Just software toggle the CS line to low after cutting power.
