Tag: linux

Designing command-line interfaces

There are few articles on the design of command-line interfaces (CLIs), even if plenty of articles on designing graphical user interfaces (GUIs) exist. This article is an attempt at presenting some of the most important guidelines for CLI design.

The article assumes the command-line utilities are to be used on a *nix system (e.g. GNU/Linux, BSD, Mac OS X, Unix), and it will frequently reference to common tools on such systems.

Types of command-line interfaces

There are three major sorts of command-line interfaces:

  • Non-interactive
  • Interactive, line-based
  • Text-based user interface

Non-interactive programs don’t need any user interaction after invocation. Examples include ls, mv and cp.

Interactive, line-based programs are programs that often need interaction from the user during execution. They can write text to standard output, and may request input from the user via standard input. Examples include ed and metasploit.

Text-based user interfaces are a cross between a GUI and a CLI. They are graphical user interfaces running inside a terminal emulator. Examples include nethack and vi. Many (but not all) text-based user interfaces on *nix use either curses or the newer ncurses.

Non-interactive programs get the most attention in this article, while text-based user interfaces are barely covered at all.

Advantages of command-line interfaces

Why use command-line interfaces in the 21st century? Graphical user interfaces were invented decades ago!

Many command-line interfaces provide several advantages over graphical user interfaces, still today. They are mostly popular among power users, programmers and system administrators; partly because many of the advantages apply to their tastes well:

  • Ease of automation: Most command-line interfaces can easily be automated using scripts.
  • Fast startup times: Most command-line interfaces beat their graphical counterparts several times at startup time
  • Easier to use remotely: Maybe it’s just me, but I prefer to remotely control computers via SSH over VNC
  • Lower system requirements: The lower system requirements make CLIs useful on embedded systems.
  • Higher efficiency: Being able to do a lot by just typing some characters instead of searching in menus increases the efficiency which you are able to do work at
  • Keyboard friendly: Grabbing the mouse is a distraction

Disadvantages of command-line interfaces

Having covered the advantages of command-line interfaces, it would be a good idea to also cover the disadvantages of them. The most major disadvantage is that the learning curve is steeper. You will also have to check the manual in many cases, while in a GUI, you can figure out many more things while using the product.

GUIs also have the advantage when it comes to presenting and editing information that is by nature graphical. This includes photo manipulation and watching movies (I have always wanted a program that shows a movie in my terminal by converting it to ASCII art in real-time, that would be sweet).

Try to avoid interactivity

Interactive interfaces are harder to automate than non-interactive user interfaces. The ease of automation is one of the greatest advantages of command-line interfaces, and by making your utility interactive, you give away much of that advantage.

There will be cases when an interactive utility makes more sense than a non-interactive one, but at least don’t make a utility interactive when the advantages are dubious. You shouldn’t create a utility which just asks (interactively) for stuff that could be easily sent to the program as arguments (imagine mv asking you for a source and destination path interactively).

Naming your utility

I would like to stress on the importance of a good name for each command-line utility. This is because a bad name is easy to forget, meaning that users will have to spend more time looking it up.

The name should be short. A long name will be tedious to type, so don’t call you version control program my-awesome-version-control-program. Call it something short, such as avc (Awesome Version Control).

The name should be easy to remember. Don’t call your utility ytzxzy.

Arguments

A lot can be said about the arguments to give to programs. First of all, follow standard practice; single-letter options are prefixed with a hyphen, and multiple of them may follow directly (e.g. -la is the same as -l -a). Multi-letter options start with two hyphens, and each such argument must be separated with spaces. Look at the arguments you give to ls, or cp. Follow the way those work. Examples of commands following non-standard practice include dd, which has been criticized for that, many times.

Continuing on the theme of standard practice, if a similar tool to yours (or a tool in the same category, such as file management) uses some option for some thing, it could be a good idea to copy that behaviour to your program. Look at most *nix file management utilities such as mv, cp and rm. All of these provide a flag called -i, with the same behaviour; asking the user interactively to confirm an action. They also provide a -f flag, to force actions (some of which the computer would think seem stupid, but you know what you are doing while using that option, don’t you?).

Options should be optional. That is part of the meaning of the word, but sometimes it is forgotten. Command-line utilities should be callable with zero, nil and no options at all. Examples include cd, calling it with no arguments returns you to the home directory. Some programs might not make sense to call with no arguments, such as mv, but in many cases, you can find some sensible standard behaviour (remember that it is, however, better to just print an error and exit than to do something stupid the user would never expect to happen (“Oh, you gave rm no options, better remove every file on your disk, just to be sure that the file you wanted to remove gets removed”)).

In the *nix world, there’s a practice that anything after double-hyphens -- should be read as a filename, even if the name contains hyphens (e.g. cmd -- -FILE-). If the arguments list ends with a single-hyphen, input should be read from standard input.

Be careful using many flags whose only difference is the case (e.g. -R and -r), as it will be hard to remember.

Always provide a long form of short arguments. Don’t provide just -i, provide --interactive as well.

Provide --version and --help

There are two options you should always include in your program: --version and --help. The first one should print version information for the program. The second should tell one what the program is for, how to use it and present common, if not all, options.

Reading the input

Make sure your program can read input from pipes, and through file redirection.

If the name of a file is passed as an argument, read the file and use its contents as input. If no such argument was passed, read from standard input until a CTRL+D key sequence is sent.

Silence trumps noise

If a program has nothing of importance to say, then be quiet (the same applies to human beings). When I run mv, I don’t want it to tell me that it moved a file to some other location. After all, isn’t that what I asked it to do? It should come to me as no surprise that that happened, so I don’t need to be told that explicitly. When things you don’t expect to happen do happen, then should you break the silence. Examples include, the file I wanted to move didn’t exist, or I didn’t have permissions to write to the directory I tried to move the file to.

Note that your program don’t need to tell its version or copyright information during each invocation, or print the names of the authors, either. That’s just extra noise, wasting space, bandwidth during remote sessions and possibly making the output harder to automatically process, for instance by sending it to another program through a pipe.

Don’t tell me what the output is, either. One should know what the program they use do. whoami prints the name of the current user, and only the name. If it printed The name of the current user is: x instead of just x, much more work would be involved in extracting just the name.

Most programs provide -v (verbose) options making the program more verbose, and a -q (quiet) option, making the program shut up all together (except possibly for some fatal error). The default behaviour should not be completely silent in every case, but in most cases. Programs that print something should only print what is relevant.

How to ask users for a yes or a no

At times, your programs might need to ask a user for a yes or a no, for different reasons, the most common being confirmations (“do you really want to do this?”). Others could be problems that the computer may offer to fix (“Table USERS doesn’t exist, do you want me to create it for you?”).

While asking for questions requiring either a yes or a no (or a y of an n), you should put (y/n) after the question:

Do you really want to do this (y/n)?

Most programs require you to manually hit return after typing the letter. While this might seem superfluous, most programs do this, and your program should require user to do this as well in order to be consistent and not surprise anyone.

Tell me what kind of input you want, and how you want it

If your program asks for a date, and it doesn’t tell me how I should type that date, I will be confused:

Enter a date:

Tell me the format in which I should input the date, and the confusion is gone:

Enter a date (YYYY-MM-DD):

Likewise, if the program just asks for a length, I would be confused. In kilometers, meters, miles, feet…? When different units for things exist, tell me the unit.

Every program should do one, and only one thing, and do it well

The above heading is one of the most important parts of the Unix philosophy, and it has survived the test of time, being just as solid advice as it was back in the late 1960′s/early 1970′s. In practice, this means that you shouldn’t create a file management program, instead, you should create a program for removing files, another for moving them and another for copying them.

Doing one thing well is partly a side-effect of doing only one thing, which allows greater focus.

The most useful GCC options and extensions

This post contains information about some of the most useful GCC options. It is meant for people new to GCC, but you should already know how to compile, link and the other basic things using GCC; this is no introduction. If you want an introductory tutorial to GCC, just .

The things this article attempts to cover include (as well as a few other things):

  • Optimization
  • Compiler warning options
  • GCC-specific extensions to C
  • Standards compliance
  • Options for debugging
  • Runtime checks (e.g. stack overflow protection)

For more information on GCC, the freely available An Introduction to GCC book is pretty good. A manual with over 700 pages is available as well (it’s a reference, not a tutorial, though) from the GCC website. The manpages for GCC (man gcc) can also be useful.

Basic compiler warning and error options

An option many programmers always use while compiling C programs is the -Wall option. It enables several compiler warnings not enabled by default, such as -Wformat warning at incorrect format strings. To enable even more warnings, use the -Wextra option. All warnings can be turned off with -w. More warnings will make catching eventual bugs easier, but it may also raise the amount of false-positives. The exact implications of these different options, as well as individual warnings can be found here.

To treat compiler warnings as errors, use the -Werror option. To stop compilation when the first error occurs, use -Wfatal-errors.

Standards compliance

By default, GCC may compile C code that is not necessary standards-compliant, or it might not even compile code that complies to the C standard (“the C standard” here means either C89 or C99). Some C standard features are disabled by default, such as trigraphs (can be enabled with -trigraphs), and several GCC extensions (will be talked about later on in this article) will work, even if they aren’t parts of the official C standard.

The -ansi option can be used to make GCC correctly compile any valid C89 program (if not, it is due to a compiler bug). It will still accept some GCC extensions (those that aren’t incompatible with the standard); use the -pedantic option to make GCC a pedant when it comes to standards compliance. The -std= option can be used to set the specific standard. There’s a bunch of supported standards (and most standards have several valid names), but the important ones to know are c89 (equal to the earlier -ansi option), c99, gnu89 (C89 with GCC extensions, which is the default) and gnu99 (C99 with GCC extensions). You can also use c1x to enable experimental support for the upcoming C1X standard (or gnu1x for the same with GCC extensions).

Code optimization levels

You can set the code optimization level for GCC, which decides how aggressively GCC will optimize the code. By default, GCC will try to compile fast, thus no optimizations will be made. By setting an optimization level, GCC will spend more time compiling, and the code might be harder to debug as well, but optimize the code better, possibly resulting in a faster executing program and/or smaller binary filesize. Because of longer compile-time and possible complications making debugging harder, it can be a good idea not to optimize during the development process and wait with that for building the production binary.

The default optimization(less) level is set with the -O0 option, or by giving no optimization option at all.

Some of the most common optimization forms can be activated by using the -O1 (or simply just -O) option. This option tries to produce smaller and faster binaries, and in many cases it can compile faster than -O0 because some optimizations will simplify the program for the compiler as well.

The next level is, perhaps unsurprisingly, -O2. It tries to improve the speed of programs even more than -O1 does, without increasing the size. It can take a much more considerable amount of time to compile. This is recommended for production releases as it optimizes well for speed without sacrificing space.

The -O3 level does some of the heaviest, most time consuming optimizations. It may also increase the size of the binary. In some cases, the optimizations may backfire and actually produce a slower binary.

If you want a small binary (most of all), you should use the -Os option.

Just pre-process, compile or assemble

When you ask GCC to compile a C program, the following steps are usually taken:

  1. Pre-processing
  2. Compilation
  3. Assembling
  4. Linking

For different reasons, you may want to stop at some of the steps. You might just want to pre-process, for instance, to find an error you suspect comes from a faulty pre-processor directive. If you do so, you will see the output from the pre-processor instead of getting the complete finished binary. Likewise, you may wish not to link because you are going to link manually later on, or maybe you just want to get the assembly output, modify that in some way and then manually assemble and link it. The reasons why you would want to do that isn’t the important thing here though, but how to do it.

To only pre-process, you should use the -E option. To stop after compilation, use -S. To do all steps but linking, use -c.

Controlling assembly output

Normally GCC produces AT&T syntax assembly output, but if you want to use Intel’s syntax (which is, in my opinion, much more readable), you should set the assembly dialect with the -masm= option, with intel as the value (-masm=intel). Note that this won’t work on Mac OS X.

A useful option for making the Assembly code more readable is the -fverbose-asm, which adds comments to the assembly output.

Adding debug information

If you are going to debug your program later, and don’t want to debug the assembly version, the -g option is absolutely essential. It adds debugging information, so that you can do source-level debugging later on the binary. The -g option produces debug information specifically for GDB, so what you get will not necessarily work on other debuggers.

You can set the level of debug information to generate. The default level is 2. With -g1 you can inform GCC to produce minimal debugging information, and with -g3 you can tell GCC that you want even more debug information than what you get by default.

Adding runtime checks

GCC can add different runtime checks to C programs at compilation, making debugging easier and avoiding some of the most common security vulnerabilities in C programs (as long as vulnerabilities/bugs don’t exist in the checking…). Note that runtime checks can degrade performance of programs.

There is an incredibly useful GCC option, -fmudflap, which can be used to find pointer arithmetic errors during runtime. This can help you find many pointer arithmetic related errors.

Stack overflow protection can be enabled by using -fstack-check.

The -gnato option enables checking for overflows during integer operations.

GCC extensions to the C language

GCC provides several extensions to the C programming language that aren’t actually parts of the C standard. You should always be careful while using non-standard features as that would, in most cases, make your code incompatible with other compilers. Anyway, I will cover some of the most useful extensions GCC provides, and you decide if you use them or not.

All extensions can be found in the GCC documentation.

Likely and unlikely cases

One GCC extension frequently used in the Linux kernel is the GCC extension __builtin_expect option, commonly known as the likely() and unlikely() macros. The Linux kernel would use something like the following for telling GCC which if statements are likely and unlikely to execute, so that GCC can do better branch prediction:

/*This is the likely case which will occur most of the time*/
if(likely(x>0)){
  return 1;
}
/*This is the unlikely case which will occur much more
 *seldom than the earlier case*/
if(unlikely(x<=0)){
  return 0;
}

The likely() and unlikely() macros are defined in the Linux kernel as:

#define likely(x)       __builtin_expect((x),1)
#define unlikely(x)     __builtin_expect((x),0)

If you want to use this outside the Linux kernel, you could always type __builtin_expect(condition, 1) for likely cases and __builtin_expect(condition, 0) for unlikely cases, but it would be much easier to use the same macros as the Linux kernel uses.

Additional datatypes

GCC provides some additional datatypes to the C programming langauge not defined by the standard. These are:

Note that 80- and 128-bit floating point values are not supported on all architectures (they are supported on common x86 and x86_64 systems, though). On ARM platforms, half-precision (16-bit) floating points are supported.

Ranges in switch/case

A GCC extension provides the support for ranges in switch/case statements, so you can have a case for values between 10 and 1000, for instance. A range is defined as x ... y, where x is the lower-bound and y is the upper-bound. You may not leave the spaces before and after the dots out. An example switch statement using cases with ranges:

switch(x){
  case 0 ... 9:
    puts("One digit"); break;
  case 10 ... 99:
    puts("Two digits"); break;
  case 100 ... 999:
    puts("Three digits"); break;
  default:
    puts("I sense a disturbance in the force");
}

This is more convenient than writing 1000 different cases (but you wouldn’t solve the problem like that, would you?).

Binary literals

GCC supports binary literals in C programs using the 0b prefix, pretty much like you would use 0x for hexadecimal literals. In the following example, we initialize an integer using a binary literal for its value:

int integer=0b10111001;

man is your friend as a programmer

The GNU manpages are useful for programmers; first of all, you can lookup commands that you might use while programming, in case you have forgotten an option or something like that, but additionally, they contain documentation for many C functions, so you can for instance run man printf to get information on C’s formatted output function. Also, GNU/Linux kernel calls are documented.

You can also run man ascii to get a manpage containing every ASCII character in the set:

man ascii

There is no need to open up Firefox or Chrome/Chromium (but dare not use Internet Explorer) each time you need some of that information, just use the manpages (especially useful if you happen to be without an Internet connection).