A quick reference regarding how to extract words from a bit stream, useful for quickly pointing out how to extract information from a bit stream.

Bit streams

When extracting a word from a bit stream, we need to know the following:

  • Bit order (MSB first / LSB first)
  • MSB / LSB position (Example: bit 3; offset 1)
  • Bit width (Example: 10 bit)
  • Which byte offset is next (offset increases / offset decreases)

All the following examples represent the four ways a 10 bit word could be stored.


MSB first; Offset increases.


MSB first; Offset decreases.


LSB first; Offset increases.


LSB first; Offset decreases.


Legend.

Alignment & Padding

Frequently, the words extracted from a bit stream don’t fit nicely on the standard integer types, which are commonly multiples of 8 bits. For this conversion we need to decide the following:

  • Alignment (left / right)
  • Padding (pad with ones, zeroes or sign-extend)

The most common options are right align with sign extension for signed types, and right align with zero padding for unsigned types, since these options will correctly preserve the original value, although the other options I have have seen get some use in niche cases.

In the following examples, two 10 bit signed integers, 341 (0x155 in hex) and -342 (0x2AA in hex) are translated to a 16 bit signed integer using different alignment s and padding methods, showing how the stored value may or may not change when read as a 16 bit integer.

sign-extend
Right alignment with sign extension.

align-right-zero
Right alignment with zero padding.

align-right-one
Right alignment with one padding.

align-left-zero
Left alignment with zero padding.

align-left-one
Left alignment with one padding.

legend-alignment
Legend.

I’ve only seen left alignment in the wild a couple of times, namely, on ADC conversions from the 10 bit ADC on PIC microcontrollers, so I consider necessary to document those odd cases too.

Why?

This page exists because somehow at work we managed to waste an hour trying to communicate among each other how words from a CAN bus frame ought to be interpreted. For future reference I can now show this page and ask people to point which of the 4 bit order options is needed.

And since padding / extension is common and also causes issues, I added those just in case.