Extracting data from bit streams
A quick reference regarding how to extract words from a bit stream, useful for quickly pointing out how to extract information from a bit stream.
Bit streams
When extracting a word from a bit stream, we need to know the following:
- Bit order (MSB first / LSB first)
- MSB / LSB position (Example: bit 3; offset 1)
- Bit width (Example: 10 bit)
- Which byte offset is next (offset increases / offset decreases)
All the following examples represent the four ways a 10 bit word could be stored.
MSB first; Offset increases.
MSB first; Offset decreases.
LSB first; Offset increases.
LSB first; Offset decreases.
Legend.
Alignment & Padding
Frequently, the words extracted from a bit stream don’t fit nicely on the standard integer types, which are commonly multiples of 8 bits. For this conversion we need to decide the following:
- Alignment (left / right)
- Padding (pad with ones, zeroes or sign-extend)
The most common options are right align with sign extension for signed types, and right align with zero padding for unsigned types, since these options will correctly preserve the original value, although the other options I have have seen get some use in niche cases.
In the following examples, two 10 bit signed integers, 341 (0x155
in hex) and
-342 (0x2AA
in hex) are translated to a 16 bit signed integer using different
alignment s and padding methods, showing how the stored value may or may not
change when read as a 16 bit integer.
Right alignment with sign extension.
Right alignment with zero padding.
Right alignment with one padding.
Left alignment with zero padding.
Left alignment with one padding.
Legend.
I’ve only seen left alignment in the wild a couple of times, namely, on ADC conversions from the 10 bit ADC on PIC microcontrollers, so I consider necessary to document those odd cases too.
Why?
This page exists because somehow at work we managed to waste an hour trying to communicate among each other how words from a CAN bus frame ought to be interpreted. For future reference I can now show this page and ask people to point which of the 4 bit order options is needed.
And since padding / extension is common and also causes issues, I added those just in case.