Signed bit-fields

jm December 28th, 2006

6.7.2 of the C standard indicates that:

Each of the comma-separated sets designates the same type, except that for bit-fields, it is implementation-defined whether the specifier int designates the same type as signed int or the same type as unsigned int.

So, when you have a bit-field of type int, it’s implementation-defined whether that bit-field is actually treated as a signed integer variable. Based on our limited testing, it appears that mainstream implementations do treat it as a signed integer, which is pretty interesting.

Say you have a bit-field like this:

struct {
unsigned int a:3;
         int b:11;
         int c:5;
unsigned int d:13;
} bits;

This means that the variables b and c are treated as 11-bit and 5-bit signed integers respectively. They are represented using two’s complement notation, so there’s actually a sign bit at their most significant bits. In memory, on our little-endian gcc ia32 box, this looks like:

0×402000:
b b b b b a a a
0×402001:
c c b b b b b b
0×402002:
d d d d d c c c
0×402003:
d d d d d d d d

So, consider the following code:

int f = bits.b;

We know this will perform sign-extension, as it’s causing a conversion from the 11-bit signed bit-field to a 32-bit signed int. It results in the following assembly generated by gcc (in AT&T syntax):

0x00401086 <main+54>:   movzwl 0x402000,%eax 
0x0040108d <main+61>:   shl    $0x2,%eax 
0x00401090 <main+64>:   cwtl 
0x00401091 <main+65>:   sar    $0x5,%eax 
0x00401094 <main+68>:   cwtl 
0x00401095 <main+69>:   mov    %eax,0xfffffffc(%ebp)

cwtl, or cwde in Intel syntax, might be unfamiliar to you. It sign-extends the word in %ax to the long in %eax. So, this code moves the lower 16-bits into eax, performing zero extension. It then shifts it left twice to get rid of the two bits that belong to bits.c. It then sign extends it into the rest of eax to get the sign bit propagated. It then does an arithmetic right shift by 5 to get rid of the bits that belong to bits.a and situate the number in the correct place in the int. The second cwtl appears superfluous, but we’ll trust that they know what they are doing. Finally, it moves it into f on the stack. That’s one complicated assignment!

Similarly, this code:

int f=bits.c;

Results in this assembly:

0x004010a4 <main+84>:   mov    0x402000,%eax
0x004010a9 <main+89>:   shl    $0xd,%eax
0x004010ac <main+92>:   sar    $0x1b,%eax
0x004010af <main+95>:   mov    %eax,0xfffffffc(%ebp)

This is a little simpler. It takes the full 32-bits that comprise bits and puts it into eax. It’s shifted left 13 bits, to get rid of d, and place bits.c’s sign bit in the correct place for the sign-bit of eax. Then, an arithmetic shift right of 27 bits is performed to isolate the 5 bits of bits.c and situate it correctly in eax. Finally, it’s placed in f on the stack.

So, you can see that it really does make little signed integer variables for bit-fields, which causes it to go through some pretty significant gymnastics. So, what’s the point? Well, basically, bit-fields are usually used for working with precise binary structures, like packet headers and file formats. In general, if you see a bit-field that uses an int instead of an unsigned int, this should raise some concern. Chances are pretty good the developer isn’t going to be thinking about the possibility of that variable encoding a negative value, and you should be on the look out for sign extension or other numeric games you can play.

4 Responses to “Signed bit-fields”

  1. Robert C. Seacordon 28 Dec 2006 at 11:30 pm

    Oh, it’s much worse than this. 8^)

    The C standard does not really define how to handle bit-fields as
    rvalues. If you are unfamiliar with rvalues, in the expression E1 = E2;
    E1 is an "lvalue" and E2 is an "rvalue". As a result, implementations
    have adopted different philosophies. The philosophy adopted by gcc is
    "use the length of the type (e.g., int)" so that the following program:

    struct {
        unsigned int a: 8;
    } bits = {255};

    int main(void) {

        printf("unsigned 8-bit field promotes to %s.\n",
            (bits.a << 24) < 0 ? "signed" : "unsigned");
    }

    When compiled using gcc bits.a is interpreted as an unsigned integer value.

    The philosophy adopted by Microsoft Visual Studio is to "use the length
    specified in the bit field", which in this case is 8.  When this eight-bit
    value is used as an rvalue, it is subjected to integer promotions.  On
    a typical system (such as IA-32) all of these values can be represented as
    a signed int so the value is promoted to signed int.

    This means, of course, that whether or not this value is treated as a
    signed or unsigned value is implementation dependent.  This is one of many
    implementation dependent aspects of bit-fields.

    To find out how your compiler handles this, try running the following code.  The
    results from running this under Microsoft Visual Studio 2005 are provided
    as comments.

    —————————————-
    #include <stdio.h>
    #if __cplusplus
    #define LANG "C++"
    #else
    #define LANG "C"
    #endif

    struct {
        unsigned int a: 8;
    } bits = {255};

    struct {
        unsigned char a;
    } bytes = {255};

    unsigned char a = 255;

    int main(void) {
      printf(LANG ", unsigned 8-bit field promotes to %s.\n",
        (bits.a << 24) < 0 ? "signed" : "unsigned");
      // C, unsigned 8-bit field promotes to unsigned.
      // C++, unsigned 8-bit field promotes to unsigned.
        

      printf(LANG ", unsigned 8-bit byte promotes to %s.\n",
        (bytes.a << 24) < 0 ? "signed" : "unsigned");
      // C, unsigned 8-bit byte promotes to signed.
      // C++, unsigned 8-bit byte promotes to signed.

      printf(LANG ", unsigned char variable promotes to %s.\n",
        (a << 24) < 0 ? "signed" : "unsigned");
      // C, unsigned char variable promotes to signed.
      // C++, unsigned char variable promotes to signed.

      printf(LANG ", unsigned char bitfield promotes to %s.\n",
        (charfield.a << 24) < 0 ? "signed" : "unsigned");
      // C, unsigned char bitfield promotes to signed.
      // C++, unsigned char bitfield promotes to signed.

      return 0; 
    }

  2. jmon 29 Dec 2006 at 12:51 am

    Oh, interesting! :>

    So, looking at the first example snippit, with my copy of gcc, I actually get that it’s promoted to a signed integer value, which is what I suspected would happen. Now, with g++ on the same machine, it gets promoted to an unsigned int, which I was not expecting. :>

    I need to install VC++ real quick, as this pretty much runs counter to my understanding of C’s integral promotions.

  3. jmon 29 Dec 2006 at 4:39 am

    This is interesting:
    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16376

  4. Robert C. Seacordon 30 Dec 2006 at 5:52 pm

    ISO/IEC 14882-2003 Section 4.5 Integral promotions [conv.prom] Paragraph 3 states that:

    An rvalue for an integral bit-field (9.6) can be converted to an rvalue of type int if int can represent all the values of the bit-field; otherwise, it can be converted to unsigned int if unsigned int can represent all the values of the bit-field.

    According to my notes, however, for C++ on MS Visual Studio 2005 you get the following behavior: unsigned 8-bit field promotes to unsigned.

    This may or may not comply with the standard, depending on how you interpret “can be”.

Permanent Link | Trackback URI | Comments RSS

Leave a Reply