--help is not a specification

osh is a new shell that aims to be a drop-in replacement for bash. To reach that goal it is important osh matches bash’s behavior in almost every case. osh however, did not have a kill builtin, which is of course a problem for compatibility. So I set out to work on a kill implementation. And doing this made me realize you can not just rely on the --help output of a command to figure out what it does.

kill –help

If you run kill --help it shows the following help page:

kill: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
    Send a signal to a job.

    Send the processes identified by PID or JOBSPEC the signal named by
    SIGSPEC or SIGNUM.  If neither SIGSPEC nor SIGNUM is present, then
    SIGTERM is assumed.

    Options:
      -s sig    SIG is a signal name
      -n sig    SIG is a signal number
      -l        list the signal names; if arguments follow `-l' they are
                assumed to be signal numbers for which names should be listed
      -L        synonym for -l

    Kill is a shell builtin for two reasons: it allows job IDs to be used
    instead of process IDs, and allows processes to be killed if the limit
    on processes that you can create is reached.

    Exit Status:
    Returns success unless an invalid option is given or an error occurs.

Like I said, I wanted to implement the kill command in osh. The goal was to match the behavior of bash in osh. So I thought the output of --help would be a great starting point to see what kill is supposed to do. However, after having completed the implementation it is clear to me that this is not a great way to go about it. If you have no idea what the command is supposed to do, running --help is a fine place to start. But if you actually want to replicate the behavior of the command, you can not treat the help output as a specification.

Let’s look at the first line of the help output again:

  kill: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ...

If I read it, I can understand the following: You can send one or more processes (defined by a pid or job specification) a signal, which is either:

SIGTERM if you defined nothing
a signal defined with a number
a signal defined by a “sigspec”

But what it doesn’t tell you is what a sigspec looks like. So allow me to.

What can you actually do

To stick with the default signal SIGTERM, you can actually specify it in many ways. Any of the following are allowed:

TERM
sigterm
SiGTeRM
term

It would have been nice if the help mentioned that, but it doesn’t stop there. The help explicitely says “you must use a number” about -n. But all those signal names above also work!:

$ sleep 100 &
[1] 4410
$ kill -n SiGTeRM 4410
$

It also works vice versa, you can provide -s with a number and bash won’t complain. So in the end -s sigspec -n sigspec, and -sigspec are exactly the same thing. But there’s more.

kill -l

If we look at kill -l we can do a lot more than the help shows.

  kill -l [sigspec]
  ...
  Options:
  ...
    -l        list the signal names; if arguments follow `-l' they are
              assumed to be signal numbers for which names should be listed
    -L        synonym for -l

kill -l and kill -l sigspec are described, but you can do more. For example, you can list multiple signals:

$ kill -l 10 11 12
USR1
SEGV
USR2

But you can also list the numbers of signal names (the help only mentions numbers):

$ kill -l SIGSEGV segv SEGV SegV
11
11
11
11

And as you can see, again it is possible to use different formats.

Here’s another edge case: kill -l lists all the available signals (which is vaguely described in the help output):

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 .....

But then you can also do this:

$ kill -l 0
EXIT

Which seems to be some special case that kill -l allows, even though it doesn’t list EXIT if you list all the available signals (I guess since EXIT isn’t really a signal). kill -l 128 will result in an error in case you were wondering.

But there’s even more. This also works:

$ kill -l 134
ABRT

How does that work? Well, if a process exits due to a signal it received, its exit code is 128 + the signal it was terminate with:

$ kill -l 6
ABRT

So if a process exits with exit code 134, it was due to receiving an ABRT signal (128 + 6 = 134). There’s no mention that you can use this anywhere in the help.

You may read these differences and think “sure those are differences, but does this actually impact anyone?”. Unfortunately, the answer is yes. In fact, the kill -l 134 command not being supported in osh’s kill made it impossible to build a package called libnbd with osh as the system shell. One of the test scripts in that package used kill -l 134, which caused an error in osh (invalid signal), which consequently caused the test to fail (and thus also the build). Very often, incompatibility is mostly an issue because it breaks existing scripts. So even though kill -l 134 may not look like a critical issue that end users will greatly miss, it can cause issues because existing scripts may use it.

Creating a specification

If you want to figure out all the behaviors of a command, I think the best way is to just play around in a shell to see whats possible. For osh we also have a nice test framework that allows us to test not only bash, but also other shells (and of course osh). This can be valuable since it also gives you an idea of how other shells implement a command. You can read about it here. Ideally, for every command we have all the flags, options, and edge cases covered in the spec tests. Here are the test cases we have for the kill builtin currently:

  case  dash    bash    mksh    osh
  0     pass    pass    BUG     pass    kill -15 kills the process with SIGTERM
  1     pass    pass    pass    pass    kill -KILL kills the process with SIGKILL
  2     N-I     pass    N-I     pass    kill -n 9 specifies the signal number
  3     pass    pass    BUG     pass    kill -s TERM specifies the signal name
  4     N-I     pass    N-I     pass    kill -terM -SigterM isn't case sensitive
  5     N-I     pass    pass    ok      kill HUP pid gives the correct error
  6     N-I     pass    pass    pass    kill -l shows signals
  7     N-I     pass    N-I     pass    kill -L also shows signals
  8     N-I     pass    N-I     pass    kill -l 10 TERM translates between names and numbers
  9     N-I     pass    N-I     FAIL    kill -L checks for invalid input
 10     pass    pass    pass    pass    kill -l with exit code
 11     pass    pass    N-I     pass    kill -l with 128 is invalid
 12     N-I     pass    N-I     pass    kill -l 0 returns EXIT
 13     N-I     pass    pass    pass    kill -9999 is an invalid signal
 14     N-I     pass    BUG     FAIL    kill -15 %% kills current job
 15     BUG     pass    BUG     pass    kill -15 %- kills previous job
 16     pass    pass    pass    pass    kill multiple pids at once
 17     BUG     pass    pass    pass    kill pid and job at once
 18     ok      pass    pass    pass    Numeric signal out of range - OSH may send it anyway

N-I means the functionality is not implemented, and BUG indicates a bug (surprise). As you can also see, osh still has 2 cases where we know the behavior is not yet correct.

To sum this all up into a really simple workflow that I think works quite well:

Step 1: Read –help (gives you the basic idea)

Step 2: Experiment in bash

Step 3: Write spec tests based on your experiments

A better help

Once it is clear what the command is supposed to do exactly, it can help to write a better version of the --help output. Of course this will help your users when they start using the command. But it will also help you get a clearer image of the behavior of the command, and what the structure of the code should look like.

For example, here is current documentation about kill in osh, which I think describes the command a lot better:

kill builtin

The kill builtin sends a signal to one or more processes. Usage:

kill (-s SIG | -SIG)? WHAT+  # send SIG to the given processes

where

SIG  = NAME | NUMBER   # e.g. USR1 or 10
WHAT = PID  | JOBSPEC  # e.g. 789 or %%

Examples:

kill -s USR1 789       # send SIGUSR1 to PID 789

kill -s USR1 789 %%    # send signal to PID 789 and the current job
kill -s 10   789 %%    # specify SIGUSR1 by number instead

kill -USR1   789 %%    # shortcut syntax
kill -10     789 %%    # shortcut using a number

kill -n USR1 789 %%    # -n is a synonym for -s
kill         789 %%    # if not specified, the default is SIGTERM

It can also list signals:

kill -L                # List all signals
kill -L SIG+           # Translate signals from name to number, and vice versa

Examples:

kill -l                # List all signals; -l is a synonym for -L
kill -L USR1 USR2      # prints '10 12'
kill -L USR1 15        # prints '10 TERM'

First off, there are examples that show specific things you may do with the command, which help explain why you might want to use the command. These examples also make it clear that -s -n and -sigspec are the same thing. The help uses regex to show optional arguments and when you can use multiple values. This means people who are unfamiliar with CLI tools can look up what the symbols mean. If you google “what does [ ] mean in –help” you will not get useful results ( at least I didn’t when I just tried it). I also think this help does a much better job at explaining why and how you might use the kill command. Note: the example above does not explain kill -l >128, since I found out about that behavior only recently.

Conclusion

So I think there are 2 main takeaways here: don’t just rely on the --help output of a command to figure out what it does. And if you are reimplementing a command, make sure your --help can be depended on.

kill –help#

What can you actually do#

kill -l#

Creating a specification#

A better help#

kill builtin#

Conclusion#