Bitfields
Overview:
Bitfields are a handy C construct for setting individual bits in a structure. They are seemingly straightforward but have many small gotchas that can be painful to figure out how to deal with or ruin program portability. This tutorial will give a brief overview of how to use bitfields, and explore a few of the more problematic issues I've personally run into and how to work around them.
Basic Usage:
You can declare a variable of type signed or unsigned that uses foo bits in any struct. For example:
struct
{
unsigned header : 7;
unsigned data : 20;
unsigned footer : 5;
} bitExample;
struct
{
unsigned : 7;
unsigned data : 25;
} bitExample;
which will allocate the first 7 bits as unnamed padding. This is *not* guaranteed to be 0 when reading out the data, more on that later.
Getting the data out is easily accomplished using unions:
union
{
struct
{
unsigned : 7;
unsigned data : 25;
} bitExample;
int rawData;
} bitUnion;
or using memcpy:
struct
{
unsigned : 7;
unsigned data : 25;
} bitExample;
int rawData = 0;
memcpy( &rawData, &bitExample, sizeof(int) );
Problem 1 -- Bad bit ordering due to little endian architecture
Consider the following example program:
#includeint main() { union { struct { unsigned x :3; unsigned y :5; }bitExample; char rawData; } bitUnion; bitUnion.bitExample.x = 0; bitUnion.bitExample.y = 0xF; printf("%hhXn", bitUnion.rawData); }
We would expect the binary and hex output to look like:
x y 000 01111 == 0000 1111 == 0FWhich will be the case if running on a big endian machine. But on a little endian machine we will see:
y x
78 == 0111 1000 == 01111 000
As you can see it built it up backwards from what you would expect! Simply reversing the bit order of the struct like:
struct
{
unsigned y :5;
unsigned x :3;
}bitExample;
fixes this. To allow it to work on either big or little endian one could do something such as:
struct
{
#ifdef LITTLE_ENDIAN
unsigned y :5;
unsigned x :3;
#else
unsigned x :3;
unsigned y :5;
#endif
} bitExample;
Problem 2 -- The bitfield reordering for little endianess is only done on a word by word basis
Consider the following on a little endian machine:
#includeint main() { union { struct { unsigned x :32; unsigned y :32; }bitExample; int rawData[2]; } bitUnion; bitUnion.bitExample.x = 0xFFFF; bitUnion.bitExample.y = 0; printf("As integers:n"); printf("%X, %Xnn", bitUnion.rawData[0] , bitUnion.rawData[1]); }
you would expect from the previous example that we would see:
0, FFFF
as the output since the bitfields are made backwards. But this only occurs per word, so we actually see:
FFFF, 0So the ordering of words should be the same, but the bits in those words should be reversed for little endian. So for the following struct:
struct
{
unsigned x : 3;
unsigned y : 29;
unsigned z : 26;
unsigned a : 2;
unsigned b : 4;
unsigned c : 32;
}
to be compatible on both big and little endian you would need to do the following:
struct
{
#ifdef LITTLE_ENDIAN
unsigned y : 29;
unsigned x : 3;
unsigned b : 4;
unsigned a : 2;
unsigned z : 26;
unsigned c : 32;
#else
unsigned x : 3;
unsigned y : 29;
unsigned z : 26;
unsigned a : 2;
unsigned b : 4;
unsigned c : 32;
#endif
}
Again swapping bits within a 32 bit word, but not the words themselves.
Problem 3 -- Misreprensentation due to copying to less than word sized structures
Yet another endianess problem. Consider the following program:
#include
int main()
{
union
{
struct
{
unsigned x :32;
unsigned y :30;
unsigned z :2;
}bitExample;
int rawData[2];
short sRawData[4];
} bitUnion;
bitUnion.rawData[0] = 0;
bitUnion.rawData[1] = 0;
bitUnion.bitExample.x = 0xFFFF;
bitUnion.bitExample.y = 0;
bitUnion.bitExample.z = 3;
printf("As integers:n");
printf("%X, %Xnn", bitUnion.rawData[0]
, bitUnion.rawData[1]);
printf("As shorts:n");
printf("%4hX, %4hXn", bitUnion.sRawData[0]
, bitUnion.sRawData[1]);
printf("%4hX, %4hXn", bitUnion.sRawData[2]
, bitUnion.sRawData[3]);
}
As before, everything behaves as you would expect on a big endian machine. But lets do some bit busting assuming little endian architecture:
x = 0xFFFF hex therefore x = 1111 1111 1111 1111 z = 3 dec therefore z = 11 bin y = 0 dec therefore y = 00 0000 0000 0000 0000 0000 0000 0000 bin
so using the idea that word order is the same but bits are added backwards we would assume:
0000 FFFF C000 0000
as the hex output. Taking into account the printf formatting in the example we would expect:
As integers: FFFF, C0000000 As shorts: 0, FFFF C000, 0
Unfortunately we actually get:
As integers: FFFF, C0000000 As shorts: FFFF, 0 0, C000
As you can see when printing it out in 16 bit chunks we get each set of shorts swapped.
Unfortunately the fix entirely depends on how you plan to spit it out in the end. If it is in 32 bit chunks you would be fine, 16 bit chunks you would need to swap the values that you loaded up in 16 bit chunks using either htons functions (For TCP/IP, since it is big-endian often converted to little-endian) or making your own simple variant.
Problem 4 -- Internal memory padding causing extra padding during retrieval of raw data
The first problem that is not an endianess issue, it will cause issues on any architecture. This is assuming a word size of 32 bits, on a 64 bit architecture it will be different. For portabilities sake one should assume 32 bit word boundaries, since that will work for both.
Consider the following example program:#includeint main() { union { struct { unsigned x :30; unsigned y :5; unsigned z :29; }bitExample; int rawData[2]; } bitUnion; bitUnion.bitExample.x = 1073741823; // 30 bit max bitUnion.bitExample.y = 31; // 5 bit max bitUnion.bitExample.z = 536870911; // 29 bit max printf("%Xn", bitUnion.rawData[0]); printf("%Xn", bitUnion.rawData[1]); }
Running it on my machine produced(It can be different depending on the state of memory, but probably won't be right):
BFFFFFFF B7F5DFFF
Which is wrong since we maxed out x,y,z we would expect to see both integers as FFFFFFFF
The issue is that the bitfields are not aligned on the word boundary. To speed up handling of bitfields internally the C compiler makes sure that the fields are aligned, if something goes past it pushes it into the next word. So for example the the 30 + 5 = 35 which is greater than the 32 bit word boundary so the 5 bits is moved and we pick up garbage at the end of the word in the last 2 bits. Then we have 5 + 29 = 34 same thing and we see more garbage in the last 27 bits.
A quick fix is below:
#includeint main() { union { struct { // Word 1 unsigned x :30; unsigned y1 :2; // Word 2 unsigned y2 :3; unsigned z :29; }bitExample; int rawData[2]; } bitUnion; bitUnion.bitExample.x = 1073741823; // 30 bit max bitUnion.bitExample.y1 = 3; // 2 bit max bitUnion.bitExample.y2 = 7; // 3 bit max bitUnion.bitExample.z = 536870911; // 29 bit max printf("%Xn", bitUnion.rawData[0]); printf("%Xn", bitUnion.rawData[1]); }
As expected we get:
FFFFFFFF
FFFFFFFF
But the problem is that the field is now split into two. Unfortunately we are limited in what we can do, the easiest thing I've found to do is just split up an incoming value like so:
unsigned integer y = 31; // 0x03 == 0b00011 So take only the last 2 bits bitUnion.bitExample.y1 = y & (0x03); // Shift out the last bits so that the top 3 bits // are now the entire number bitUnion.bitExample.y2 = (y >> 2);
Any value of y will now work, although any bitfield across words has to be tweaked like this.
Problem 5 -- Padding Fields Are Worthless
Consider the following:
#includeint main(){ union { struct { unsigned :16; unsigned x :16; } fields; int out; } test; test.fields.x = 65535; // Max 16 bit field printf("%Xn", test.out); return 0; }
We would expect FFFF0000 on a little endian machine, of course this isn't what happens. It depends on the state of your machines memory but I got FFFF7050. The problem is depending on compiler (This was done using gcc 4.1) it will not initialize padding to 0, and you can't explicit reference it. You could set test.out = 0; or just name the value and set it to 0
Also if instead we did something like test.out = 0xAAAAAAAA; it would overwrite the padding. The easiest fix is to name the area and before using the bitfield set it to 0.
Conclusion:
If you've read this far and really had to make an attempt to understand this, I'm sorry. Bitfields should be a simple construct for building up binary but due to inane compiler design they can become almost useless on little endian without properly knowing how they are internally managed. Don't use them unless there really is no other simpler alternative or you know you will always be running on a big-endian architecture. Also make sure your fields are word aligned if you can, if not you are just introducing more headaches.
Comments(2)
2008-11-01 14:07:57
(2009-09-26 16:44:11) Gavin Black said:
Thankfully I haven't had to work with them either....I really hated porting themfrom SPARC to x86. I think that was the most painful programming task I've ever
done.
Hope I can get down there one of these days. Between school, work, hobbies,
friends, and family I barely have time to sleep let alone plan a trip :p
(2009-09-26 16:40:49) Matt said:
Excellent analysis of bit fields! I haven't had the pleasure of working withthem since January 2008, when you and I first worked with the concept on
*snip*.
Keep up the great posts, and I hope to see you again sometime soon.