Signed bit-fields
jm December 28th, 2006
6.7.2 of the C standard indicates that:
Each of the comma-separated sets designates the same type, except that for bit-fields, it is implementation-defined whether the specifier int designates the same type as signed int or the same type as unsigned int.
So, when you have a bit-field of type int, it’s implementation-defined whether that bit-field is actually treated as a signed integer variable. Based on our limited testing, it appears that mainstream implementations do treat it as a signed integer, which is pretty interesting.
Say you have a bit-field like this:
struct {
unsigned int a:3;
int b:11;
int c:5;
unsigned int d:13;
} bits;
This means that the variables b and c are treated as 11-bit and 5-bit signed integers respectively. They are represented using two’s complement notation, so there’s actually a sign bit at their most significant bits. In memory, on our little-endian gcc ia32 box, this looks like:
0×402000:
b b b b b a a a
0×402001:
c c b b b b b b
0×402002:
d d d d d c c c
0×402003:
d d d d d d d d
So, consider the following code:
int f = bits.b;
We know this will perform sign-extension, as it’s causing a conversion from the 11-bit signed bit-field to a 32-bit signed int. It results in the following assembly generated by gcc (in AT&T syntax):
0x00401086 <main+54>: movzwl 0x402000,%eax 0x0040108d <main+61>: shl $0x2,%eax 0x00401090 <main+64>: cwtl 0x00401091 <main+65>: sar $0x5,%eax 0x00401094 <main+68>: cwtl 0x00401095 <main+69>: mov %eax,0xfffffffc(%ebp)
cwtl, or cwde in Intel syntax, might be unfamiliar to you. It sign-extends the word in %ax to the long in %eax. So, this code moves the lower 16-bits into eax, performing zero extension. It then shifts it left twice to get rid of the two bits that belong to bits.c. It then sign extends it into the rest of eax to get the sign bit propagated. It then does an arithmetic right shift by 5 to get rid of the bits that belong to bits.a and situate the number in the correct place in the int. The second cwtl appears superfluous, but we’ll trust that they know what they are doing. Finally, it moves it into f on the stack. That’s one complicated assignment!
Similarly, this code:
int f=bits.c;
Results in this assembly:
0x004010a4 <main+84>: mov 0x402000,%eax 0x004010a9 <main+89>: shl $0xd,%eax 0x004010ac <main+92>: sar $0x1b,%eax 0x004010af <main+95>: mov %eax,0xfffffffc(%ebp)
This is a little simpler. It takes the full 32-bits that comprise bits and puts it into eax. It’s shifted left 13 bits, to get rid of d, and place bits.c’s sign bit in the correct place for the sign-bit of eax. Then, an arithmetic shift right of 27 bits is performed to isolate the 5 bits of bits.c and situate it correctly in eax. Finally, it’s placed in f on the stack.
So, you can see that it really does make little signed integer variables for bit-fields, which causes it to go through some pretty significant gymnastics. So, what’s the point? Well, basically, bit-fields are usually used for working with precise binary structures, like packet headers and file formats. In general, if you see a bit-field that uses an int instead of an unsigned int, this should raise some concern. Chances are pretty good the developer isn’t going to be thinking about the possibility of that variable encoding a negative value, and you should be on the look out for sign extension or other numeric games you can play.

Oh, it’s much worse than this. 8^)
The C standard does not really define how to handle bit-fields as
rvalues. If you are unfamiliar with rvalues, in the expression E1 = E2;
E1 is an "lvalue" and E2 is an "rvalue". As a result, implementations
have adopted different philosophies. The philosophy adopted by gcc is
"use the length of the type (e.g., int)" so that the following program:
struct {
unsigned int a: 8;
} bits = {255};
int main(void) {
printf("unsigned 8-bit field promotes to %s.\n",
(bits.a << 24) < 0 ? "signed" : "unsigned");
}
When compiled using gcc bits.a is interpreted as an unsigned integer value.
The philosophy adopted by Microsoft Visual Studio is to "use the length
specified in the bit field", which in this case is 8. When this eight-bit
value is used as an rvalue, it is subjected to integer promotions. On
a typical system (such as IA-32) all of these values can be represented as
a signed int so the value is promoted to signed int.
This means, of course, that whether or not this value is treated as a
signed or unsigned value is implementation dependent. This is one of many
implementation dependent aspects of bit-fields.
To find out how your compiler handles this, try running the following code. The
results from running this under Microsoft Visual Studio 2005 are provided
as comments.
—————————————-
#include <stdio.h>
#if __cplusplus
#define LANG "C++"
#else
#define LANG "C"
#endif
struct {
unsigned int a: 8;
} bits = {255};
struct {
unsigned char a;
} bytes = {255};
unsigned char a = 255;
int main(void) {
printf(LANG ", unsigned 8-bit field promotes to %s.\n",
(bits.a << 24) < 0 ? "signed" : "unsigned");
// C, unsigned 8-bit field promotes to unsigned.
// C++, unsigned 8-bit field promotes to unsigned.
printf(LANG ", unsigned 8-bit byte promotes to %s.\n",
(bytes.a << 24) < 0 ? "signed" : "unsigned");
// C, unsigned 8-bit byte promotes to signed.
// C++, unsigned 8-bit byte promotes to signed.
printf(LANG ", unsigned char variable promotes to %s.\n",
(a << 24) < 0 ? "signed" : "unsigned");
// C, unsigned char variable promotes to signed.
// C++, unsigned char variable promotes to signed.
printf(LANG ", unsigned char bitfield promotes to %s.\n",
(charfield.a << 24) < 0 ? "signed" : "unsigned");
// C, unsigned char bitfield promotes to signed.
// C++, unsigned char bitfield promotes to signed.
return 0;
}
Oh, interesting! :>
So, looking at the first example snippit, with my copy of gcc, I actually get that it’s promoted to a signed integer value, which is what I suspected would happen. Now, with g++ on the same machine, it gets promoted to an unsigned int, which I was not expecting. :>
I need to install VC++ real quick, as this pretty much runs counter to my understanding of C’s integral promotions.
This is interesting:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16376
ISO/IEC 14882-2003 Section 4.5 Integral promotions [conv.prom] Paragraph 3 states that:
An rvalue for an integral bit-field (9.6) can be converted to an rvalue of type int if int can represent all the values of the bit-field; otherwise, it can be converted to unsigned int if unsigned int can represent all the values of the bit-field.
According to my notes, however, for C++ on MS Visual Studio 2005 you get the following behavior: unsigned 8-bit field promotes to unsigned.
This may or may not comply with the standard, depending on how you interpret “can be”.