Bitwise operations have many uses. I asked a question a few months ago at programmers.stackexchange.com, where I was taught that. The answer which I accepted contained the following list of uses (credit goes to user whatsisname):
* Juggling blocks of bytes around that don’t fit in the programming languages data types
* Switching encoding back and forth from big to little endian.
* Packing 4 6bit pieces of data into 3 bytes for some serial or usb connection
* Many image formats have differing amounts of bits assigned to each color channel.
* Anything involving IO pins in embedded applications
* Data compression, which often does not have data fit nice 8-bit boundaries.\
* Hashing algorithms, CRC or other data integrity checks.
* Encryption
* Psuedorandom number generation
* Raid 5 uses bitwise XOR between volumes to compute parity.
* Tons more
I could myself further add the following:
- You can improve performance
- You can decrease system memory usage
Since I asked the question, I have played around with them, and additionally, I have started programming at a much lower level than before. I will now present an introduction to bitwise operators in C and a refresher about binary numbers. In the end, I will show and explain an implementation of a boolean datatype, with 8 booleans per byte, that is, 100% of the bits will get used. Programmers who know about the bitwise operators can skip ahead to that.
Bits and bytes
As you (should) know, everything in most modern computers is stored as sequences of bits, often represented as 0′s and 1′s (though really it is about differences in electrical charges, but 1′s and 0′s are more handy representations). A byte consists of 8 bits, for instance 1001 1101
. The first bit, read from the right, represents a 1, the second a 2, the third a 4, the fourth an 8 etc. In decimal, 1001 1101
would be (1*128)+(0*64)+(0*32)+(1*16)+(1*8)+(1*4)+(0*2)+(1*1)
, which is 128+16+8+4+1
, or 157. Typing something like 1001 1101
can get a bit tedious after some time. Programmers don’t use base 10 to simplify this though (10 is not a power of 2), but base 16 (hexadecimal), which, as a power of 2, has a pretty neat relation between base 2 (binary).
You can represent every combination one byte might have using 2 numbers in hexadecimal. There are 16 numbers in hexadecimal (including 0), and each of the two groups which we divided the byte into earlier can be represented by one number. The first number, coming from 1001
would be 9
and the second, coming from 1101
would be D
(13, hexadecimal uses the letters A-F when using numbers higher than 9). Let’s check this, 0x9D
(programmers often prefix numbers represented in base 16 with 0x
) would be (9*16)+(13*1)=144+13=157
. Yes, that is correct.
Now, how do you represent things other than numbers? Well, RGB color values, for instance, are represented using 3 bytes, one for red, one for green and one for blue. Each has 256 different
combinations (a value of 0 means none of that color and 255 means all of that color), and thus RGB can be used to represent 256^3=16 777 216
combinations, roughly 16,8 millions. An example, some sort of grey, would be 0x888888
.
Another example is characters. Characters are most of the time represented in ASCII or Unicode (Unicode is basically a superset of ASCII containing characters for many international alphabets etc.). ASCII has 128 characters (uses 7-bits). Different characters have different values, for instance, an upper-case A
is 65 (0x41
). In C, strings are null-terminated; strings (should) end with a null-character (ASCII character 0). The string ABC
would have a 0x41
byte, followed by a 0x42
byte, 0x43
byte, ended with a 0x00
byte (the null character). In memory, you could have a chunk of 4 bytes representing that string, like 01000001 01000010 01000011 00000000
. Note that there is nothing that says that this is four characters that should be used as a string, you could use it like one 32-bit integer, or something else. Memory has no meaning at the byte level.
Big- and little endians
There are two orders in which you store bytes in memory that are common in modern machines, these called big- and little endian. In big endian, the most significiant byte comes first, and in little endian, the least significant number comes first. Big endian is what we humans use most of the time, for instance, in 113, one hundred is the most significant number, and it comes first. One hundred thirteen would have been 311 if it was written in little endian form. Intel (and compatible) processors use little endian. Note that bits in bytes are stored the same way in both formats (the most significiant bit is first), only the order in which whole bytes come in collections of bytes is altered by this. A 32-bit integer with the value 303153 would be stored as 00 04 A0 31
in big endian form and as 31 A0 04 00
in little endian form.
C deals with endianness for you and you won’t have to worry about it. Only Assembly programmers do have to worry about this in general (and IT security folks and reverse engineers and some more).
Integer literals in C
C provides several options which you can use when writing down integer literals. We will only concern us with the ability to use different bases. If you type out an integer “like you normally would”, C will interpret it as a decimal value (e.g. 73
). If it has a leading 0, C will think that it is a base 8 number (e.g. 0777
). We are not concerned with base 8 numbers here, but some older computers used them. Hexadecimal literals start with 0x
, followed by a value in hex, with no spaces in between (e.g. 0xF7
).
Sadly, C has no way to define binary literals.
The bitwise operators
A bitwise operation is an operation that works on the individual bits of a byte (or a composition of bytes). We will now look at the bitwise operators which are available to C programmers.
Bitshift
C has two bitshift operators, left- and right bitshift, or <<
and >>
, respectively.
The action of both of these is common. They shift n
bits, either to the left or to the right (I will let you guess yourself which one shifts to which direction). Bitshift left by one step is the same as integer multiplication by 2, and bitshift one step to the right is the same as integer division by 2. The following (bitshift 3 steps to the left) is equal to integer multiplication by 8:
0x2A << 3 /*0x2A=0000000 00101010, (0x2A << 3)=00000001 01010000*/
Note that you set how many steps you want to do your bitshift on the right side of the operator. You must always set the amount of steps.
We can create a simple function for calculating the n:th power of 2 using this:
int powerOf2(int n){ return (1 << n); }
Bitwise AND
Bitwise AND takes two numbers and returns a new one, where 1's are located at and only at places where both values had 1's:
10011011 AND 11001100 AND 10110101 00011011 -------- -------- 10010001 00001000
The operator is an ampersand (&
). So we can for instance run the following:
19 & 18
We can break this doẃn to:
19 & 18= 00010011 & 00010010 -------- 00010010 (=18)
So we can use this to find common 1's for two bytes, but what is that good for? We can for instance check if a number is odd by doing:
XXXXXXXX 00000001 -------- 0000000?
The only case when you get a non-zero number is when the 1 is "on", and that is the only case a number is odd. The following is this as a C function (it could also have been implemented using the modulo operator, as it is normally done):
int isOdd(int x){ return(1 & x); }
Bitwise OR
Bitwise OR, compared to bitwise AND, is happy if just one of the numbers has a 1 at a location (both may have it though). Some examples:
10011011 OR 11001100 OR 10110101 00011011 -------- -------- 10111111 11011111
The operator for bitwise OR is a pipe (|
). We can run something like the following in C:
35 | 92
Broken down:
35 | 92= 00100011 | 01011100 -------- 01111111 (=127)
Bitwise XOR
The bitwise XOR (exclusive or) is like bitwise OR, but only one of the numbers can have a 1 at a given position in order to return a 1 at that position. Examples:
10011011 XOR 11001100 XOR 10110101 00011011 -------- -------- 00101110 11010111
A hat (^
) is the bitwise XOR operator in C.
Bitwise NOT
Bitwise NOT basically flips the value of each position (0 becomes 1 and vice verse). Examples:
11010111 NOT = 00101000 01011001 NOT = 10100110
It is the only unary bitwise operator in C and it is a tilde (~
), which you should place in front of the value to "invert".
Booleans that use memory efficiently
In C, it is customary to use integers to store boolean values (no "real" boolean datatype exists), where 0 is false and everything else is true. So, as an example, 00000000 00000000 00000000 00000000
is false, and e.g. 00000000 00000000 00000000 00000001
is true. What do we do with the other 31 bits or the other 96.875% of the 4 bytes? Umm... nothing, actually. Using a char
would be a bit better (24 bits better, actually), wasting "only" 7 out of 8 bits (87.5%). That is still not very good, one char
has enough room for 8, not 1, boolean values. While this doesn't matter that much on modern home computers, on some embedded computers this is the difference between success and failure; a considerable difference. What we are going to do is to have a char
variable, and then we will add functions allowing us to easily use that single variable as if it was 8 different boolean values.
We will use a typedef
to create our boolean variable (which is just a char
in disguise):
typedef char bool;
You can change this to something else than a char
to allow for more values than 8.
We will also define macros for TRUE
and FALSE
:
#define TRUE 1 #define FALSE 0
Our boolean will act like an array, to some extent. A simple function to get one individual value would be the following, getBool
, which takes 2 arguments; the variable and an "index":
bool getBool(bool boolean, int index){ return ((boolean >> index) & 1); }
What this does is basically to shift the value we are interested to the right, and then it uses AND to clear all values but the last, and if any value is left, we will have a 1, which is true, and else, a 0, false.
We will also have a function called setBool
, with 3 arguments, a pointer to the boolean, the "index" of the boolean we want to access and the value we want to set at that position:
void setBool(bool *boolean, int index, bool value){ if(value==0){ *boolean=*boolean & ~(1<<index); }else{ *boolean=*boolean | (1<<index); } }
The following graphic explains line 3:
*boolean=*boolean & ~(1<<index); -------------------------------- Assume boolean is 01001000 Assume index is 3 00000001 << 3 = 00001000 00001000 NOT = 11110111 01001000 & 11110111 -------- 01000000
The following graphic explains line 5:
*boolean=*boolean | (1<<index); ------------------------------- Assume boolean is 10001100 Assume index is 1 00000001 << 1 = 00000010 10001100 | 00000010 -------- 10001110
I apologise if my solution is suboptimal; I'm new to low-level programming. And that was it.