Tag: bitwise operations

Bytes and bitwise operators in C

Bitwise operations have many uses. I asked a question a few months ago at programmers.stackexchange.com, where I was taught that. The answer which I accepted contained the following list of uses (credit goes to user whatsisname):

* Juggling blocks of bytes around that don’t fit in the programming languages data types
* Switching encoding back and forth from big to little endian.
* Packing 4 6bit pieces of data into 3 bytes for some serial or usb connection
* Many image formats have differing amounts of bits assigned to each color channel.
* Anything involving IO pins in embedded applications
* Data compression, which often does not have data fit nice 8-bit boundaries.\
* Hashing algorithms, CRC or other data integrity checks.
* Encryption
* Psuedorandom number generation
* Raid 5 uses bitwise XOR between volumes to compute parity.
* Tons more

I could myself further add the following:

  • You can improve performance
  • You can decrease system memory usage

Since I asked the question, I have played around with them, and additionally, I have started programming at a much lower level than before. I will now present an introduction to bitwise operators in C and a refresher about binary numbers. In the end, I will show and explain an implementation of a boolean datatype, with 8 booleans per byte, that is, 100% of the bits will get used. Programmers who know about the bitwise operators can skip ahead to that.

Bits and bytes

As you (should) know, everything in most modern computers is stored as sequences of bits, often represented as 0′s and 1′s (though really it is about differences in electrical charges, but 1′s and 0′s are more handy representations). A byte consists of 8 bits, for instance 1001 1101. The first bit, read from the right, represents a 1, the second a 2, the third a 4, the fourth an 8 etc. In decimal, 1001 1101 would be (1*128)+(0*64)+(0*32)+(1*16)+(1*8)+(1*4)+(0*2)+(1*1), which is 128+16+8+4+1, or 157. Typing something like 1001 1101 can get a bit tedious after some time. Programmers don’t use base 10 to simplify this though (10 is not a power of 2), but base 16 (hexadecimal), which, as a power of 2, has a pretty neat relation between base 2 (binary).

You can represent every combination one byte might have using 2 numbers in hexadecimal. There are 16 numbers in hexadecimal (including 0), and each of the two groups which we divided the byte into earlier can be represented by one number. The first number, coming from 1001 would be 9 and the second, coming from 1101 would be D (13, hexadecimal uses the letters A-F when using numbers higher than 9). Let’s check this, 0x9D (programmers often prefix numbers represented in base 16 with 0x) would be (9*16)+(13*1)=144+13=157. Yes, that is correct.

Now, how do you represent things other than numbers? Well, RGB color values, for instance, are represented using 3 bytes, one for red, one for green and one for blue. Each has 256 different
combinations (a value of 0 means none of that color and 255 means all of that color), and thus RGB can be used to represent 256^3=16 777 216 combinations, roughly 16,8 millions. An example, some sort of grey, would be 0x888888.

Another example is characters. Characters are most of the time represented in ASCII or Unicode (Unicode is basically a superset of ASCII containing characters for many international alphabets etc.). ASCII has 128 characters (uses 7-bits). Different characters have different values, for instance, an upper-case A is 65 (0x41). In C, strings are null-terminated; strings (should) end with a null-character (ASCII character 0). The string ABC would have a 0x41 byte, followed by a 0x42 byte, 0x43 byte, ended with a 0x00 byte (the null character). In memory, you could have a chunk of 4 bytes representing that string, like 01000001 01000010 01000011 00000000. Note that there is nothing that says that this is four characters that should be used as a string, you could use it like one 32-bit integer, or something else. Memory has no meaning at the byte level.

Big- and little endians

There are two orders in which you store bytes in memory that are common in modern machines, these called big- and little endian. In big endian, the most significiant byte comes first, and in little endian, the least significant number comes first. Big endian is what we humans use most of the time, for instance, in 113, one hundred is the most significant number, and it comes first. One hundred thirteen would have been 311 if it was written in little endian form. Intel (and compatible) processors use little endian. Note that bits in bytes are stored the same way in both formats (the most significiant bit is first), only the order in which whole bytes come in collections of bytes is altered by this. A 32-bit integer with the value 303153 would be stored as 00 04 A0 31 in big endian form and as 31 A0 04 00 in little endian form.

C deals with endianness for you and you won’t have to worry about it. Only Assembly programmers do have to worry about this in general (and IT security folks and reverse engineers and some more).

Integer literals in C

C provides several options which you can use when writing down integer literals. We will only concern us with the ability to use different bases. If you type out an integer “like you normally would”, C will interpret it as a decimal value (e.g. 73). If it has a leading 0, C will think that it is a base 8 number (e.g. 0777). We are not concerned with base 8 numbers here, but some older computers used them. Hexadecimal literals start with 0x, followed by a value in hex, with no spaces in between (e.g. 0xF7).

Sadly, C has no way to define binary literals.

The bitwise operators

A bitwise operation is an operation that works on the individual bits of a byte (or a composition of bytes). We will now look at the bitwise operators which are available to C programmers.

Bitshift

C has two bitshift operators, left- and right bitshift, or << and >>, respectively.

The action of both of these is common. They shift n bits, either to the left or to the right (I will let you guess yourself which one shifts to which direction). Bitshift left by one step is the same as integer multiplication by 2, and bitshift one step to the right is the same as integer division by 2. The following (bitshift 3 steps to the left) is equal to integer multiplication by 8:

0x2A << 3 /*0x2A=0000000 00101010, (0x2A << 3)=00000001 01010000*/

Note that you set how many steps you want to do your bitshift on the right side of the operator. You must always set the amount of steps.

We can create a simple function for calculating the n:th power of 2 using this:

int powerOf2(int n){
  return (1 << n);
}

Bitwise AND

Bitwise AND takes two numbers and returns a new one, where 1's are located at and only at places where both values had 1's:

10011011 AND        11001100 AND
10110101            00011011
--------            --------
10010001            00001000

The operator is an ampersand (&). So we can for instance run the following:

19 & 18

We can break this doẃn to:

19 & 18=   00010011 &
           00010010
           --------
           00010010   (=18)

So we can use this to find common 1's for two bytes, but what is that good for? We can for instance check if a number is odd by doing:

XXXXXXXX
00000001
--------
0000000?

The only case when you get a non-zero number is when the 1 is "on", and that is the only case a number is odd. The following is this as a C function (it could also have been implemented using the modulo operator, as it is normally done):

int isOdd(int x){
  return(1 & x);
}

Bitwise OR

Bitwise OR, compared to bitwise AND, is happy if just one of the numbers has a 1 at a location (both may have it though). Some examples:

10011011 OR         11001100 OR
10110101            00011011
--------            --------
10111111            11011111

The operator for bitwise OR is a pipe (|). We can run something like the following in C:

35 | 92

Broken down:

35 | 92=   00100011 |
           01011100
           --------
           01111111   (=127)

Bitwise XOR

The bitwise XOR (exclusive or) is like bitwise OR, but only one of the numbers can have a 1 at a given position in order to return a 1 at that position. Examples:

10011011 XOR        11001100 XOR
10110101            00011011
--------            --------
00101110            11010111

A hat (^) is the bitwise XOR operator in C.

Bitwise NOT

Bitwise NOT basically flips the value of each position (0 becomes 1 and vice verse). Examples:

11010111 NOT = 00101000
01011001 NOT = 10100110

It is the only unary bitwise operator in C and it is a tilde (~), which you should place in front of the value to "invert".

Booleans that use memory efficiently

In C, it is customary to use integers to store boolean values (no "real" boolean datatype exists), where 0 is false and everything else is true. So, as an example, 00000000 00000000 00000000 00000000 is false, and e.g. 00000000 00000000 00000000 00000001 is true. What do we do with the other 31 bits or the other 96.875% of the 4 bytes? Umm... nothing, actually. Using a char would be a bit better (24 bits better, actually), wasting "only" 7 out of 8 bits (87.5%). That is still not very good, one char has enough room for 8, not 1, boolean values. While this doesn't matter that much on modern home computers, on some embedded computers this is the difference between success and failure; a considerable difference. What we are going to do is to have a char variable, and then we will add functions allowing us to easily use that single variable as if it was 8 different boolean values.

We will use a typedef to create our boolean variable (which is just a char in disguise):

typedef char bool;

You can change this to something else than a char to allow for more values than 8.

We will also define macros for TRUE and FALSE:

#define TRUE 1
#define FALSE 0

Our boolean will act like an array, to some extent. A simple function to get one individual value would be the following, getBool, which takes 2 arguments; the variable and an "index":

bool getBool(bool boolean, int index){
  return ((boolean >> index) & 1);
}

What this does is basically to shift the value we are interested to the right, and then it uses AND to clear all values but the last, and if any value is left, we will have a 1, which is true, and else, a 0, false.

We will also have a function called setBool, with 3 arguments, a pointer to the boolean, the "index" of the boolean we want to access and the value we want to set at that position:

void setBool(bool *boolean, int index, bool value){
  if(value==0){
    *boolean=*boolean & ~(1<<index);
  }else{
    *boolean=*boolean | (1<<index);
  }
}

The following graphic explains line 3:

*boolean=*boolean & ~(1<<index);
--------------------------------
Assume boolean is 01001000
Assume index is 3

00000001 << 3  = 00001000
00001000 NOT   = 11110111

01001000 &
11110111
--------
01000000

The following graphic explains line 5:

*boolean=*boolean | (1<<index);
-------------------------------
Assume boolean is 10001100
Assume index is 1

00000001 << 1  = 00000010

10001100 |
00000010
--------
10001110

I apologise if my solution is suboptimal; I'm new to low-level programming. And that was it.