osh is a new shell that aims to be a drop-in
replacement for bash.
To reach that goal it is important osh matches bash’s behavior in almost every case.
osh however, did not have a kill builtin, which is of course a problem for compatibility.
So I set out to work on a kill implementation.
And doing this made me realize you can not just rely on the --help output
of a command to figure out what it does.
kill –help
If you run kill --help it shows the following help page:
kill: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
Send a signal to a job.
Send the processes identified by PID or JOBSPEC the signal named by
SIGSPEC or SIGNUM. If neither SIGSPEC nor SIGNUM is present, then
SIGTERM is assumed.
Options:
-s sig SIG is a signal name
-n sig SIG is a signal number
-l list the signal names; if arguments follow `-l' they are
assumed to be signal numbers for which names should be listed
-L synonym for -l
Kill is a shell builtin for two reasons: it allows job IDs to be used
instead of process IDs, and allows processes to be killed if the limit
on processes that you can create is reached.
Exit Status:
Returns success unless an invalid option is given or an error occurs.
Like I said, I wanted to implement the kill command in osh.
The goal was to match the behavior of bash in osh.
So I thought the output of --help would be a great starting point to see
what kill is supposed to do.
However, after having completed the implementation it is clear to me that this
is not a great way to go about it.
If you have no idea what the command is supposed to do, running --help is a fine place to start.
But if you actually want to replicate the behavior of the command, you can not treat
the help output as a specification.
Let’s look at the first line of the help output again:
kill: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ...
If I read it, I can understand the following: You can send one or more processes (defined by a pid or job specification) a signal, which is either:
- SIGTERM if you defined nothing
- a signal defined with a number
- a signal defined by a “sigspec”
But what it doesn’t tell you is what a sigspec looks like. So allow me to.
What can you actually do
To stick with the default signal SIGTERM, you can actually specify it in many ways.
Any of the following are allowed:
TERM
sigterm
SiGTeRM
term
It would have been nice if the help mentioned that, but it doesn’t stop there.
The help explicitely says “you must use a number” about -n.
But all those signal names above also work!:
$ sleep 100 &
[1] 4410
$ kill -n SiGTeRM 4410
$
It also works vice versa, you can provide -s with a number and bash won’t complain.
So in the end -s sigspec -n sigspec, and -sigspec are exactly
the same thing.
But there’s more.
kill -l
If we look at kill -l we can do a lot more than the help shows.
kill -l [sigspec]
...
Options:
...
-l list the signal names; if arguments follow `-l' they are
assumed to be signal numbers for which names should be listed
-L synonym for -l
kill -l and kill -l sigspec are described, but you can do more.
For example, you can list multiple signals:
$ kill -l 10 11 12
USR1
SEGV
USR2
But you can also list the numbers of signal names (the help only mentions numbers):
$ kill -l SIGSEGV segv SEGV SegV
11
11
11
11
And as you can see, again it is possible to use different formats.
Here’s another edge case:
kill -l lists all the available signals (which is vaguely described in the help output):
$ kill -l
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
.....
But then you can also do this:
$ kill -l 0
EXIT
Which seems to be some special case that kill -l allows, even though it doesn’t
list EXIT if you list all the available signals (I guess since EXIT isn’t really
a signal).
kill -l 128 will result in an error in case you were wondering.
But there’s even more. This also works:
$ kill -l 134
ABRT
How does that work? Well, if a process exits due to a signal it received, its exit code is 128 + the signal it was terminate with:
$ kill -l 6
ABRT
So if a process exits with exit code 134, it was due to receiving an ABRT
signal (128 + 6 = 134). There’s no mention that you can use this anywhere in
the help.
You may read these differences and think “sure those are differences, but does this
actually impact anyone?”.
Unfortunately, the answer is yes.
In fact, the kill -l 134 command not being supported in osh’s kill made it impossible
to build a package called libnbd with osh as the system shell.
One of the test scripts in that package used kill -l 134, which caused an error in osh (invalid signal), which
consequently caused the test to fail (and thus also the build).
Very often, incompatibility is mostly an issue because it breaks existing scripts.
So even though kill -l 134 may not look like a critical issue that end users will greatly miss,
it can cause issues because existing scripts may use it.
Creating a specification
If you want to figure out all the behaviors of a command, I think the best way
is to just play around in a shell to see whats possible.
For osh we also have a nice test framework that allows us to test not only bash,
but also other shells (and of course osh). This can be valuable since it also
gives you an idea of how other shells implement a command.
You can read about it here.
Ideally, for every command we have all the flags, options, and edge cases covered in the spec tests.
Here are the test cases we have for the kill builtin currently:
case dash bash mksh osh
0 pass pass BUG pass kill -15 kills the process with SIGTERM
1 pass pass pass pass kill -KILL kills the process with SIGKILL
2 N-I pass N-I pass kill -n 9 specifies the signal number
3 pass pass BUG pass kill -s TERM specifies the signal name
4 N-I pass N-I pass kill -terM -SigterM isn't case sensitive
5 N-I pass pass ok kill HUP pid gives the correct error
6 N-I pass pass pass kill -l shows signals
7 N-I pass N-I pass kill -L also shows signals
8 N-I pass N-I pass kill -l 10 TERM translates between names and numbers
9 N-I pass N-I FAIL kill -L checks for invalid input
10 pass pass pass pass kill -l with exit code
11 pass pass N-I pass kill -l with 128 is invalid
12 N-I pass N-I pass kill -l 0 returns EXIT
13 N-I pass pass pass kill -9999 is an invalid signal
14 N-I pass BUG FAIL kill -15 %% kills current job
15 BUG pass BUG pass kill -15 %- kills previous job
16 pass pass pass pass kill multiple pids at once
17 BUG pass pass pass kill pid and job at once
18 ok pass pass pass Numeric signal out of range - OSH may send it anyway
N-I means the functionality is not implemented, and BUG indicates a bug (surprise).
As you can also see, osh still has 2 cases where we know the behavior is not yet correct.
To sum this all up into a really simple workflow that I think works quite well:
Step 1: Read –help (gives you the basic idea)
Step 2: Experiment in bash
Step 3: Write spec tests based on your experiments
A better help
Once it is clear what the command is supposed to do exactly, it can help to
write a better version of the --help output. Of course this will help your
users when they start using the command.
But it will also help you get a clearer image of the behavior of the command,
and what the structure of the code should look like.
For example, here is current documentation about kill in osh, which I think
describes the command a lot better:
kill builtin
The kill builtin sends a signal to one or more processes. Usage:
kill (-s SIG | -SIG)? WHAT+ # send SIG to the given processes
where
SIG = NAME | NUMBER # e.g. USR1 or 10
WHAT = PID | JOBSPEC # e.g. 789 or %%
Examples:
kill -s USR1 789 # send SIGUSR1 to PID 789
kill -s USR1 789 %% # send signal to PID 789 and the current job
kill -s 10 789 %% # specify SIGUSR1 by number instead
kill -USR1 789 %% # shortcut syntax
kill -10 789 %% # shortcut using a number
kill -n USR1 789 %% # -n is a synonym for -s
kill 789 %% # if not specified, the default is SIGTERM
It can also list signals:
kill -L # List all signals
kill -L SIG+ # Translate signals from name to number, and vice versa
Examples:
kill -l # List all signals; -l is a synonym for -L
kill -L USR1 USR2 # prints '10 12'
kill -L USR1 15 # prints '10 TERM'
First off, there are examples that show specific things you may do with the command,
which help explain why you might want to use the command.
These examples also make it clear that -s -n and -sigspec are the same thing.
The help uses regex to show optional arguments and when you can use multiple values.
This means people who are unfamiliar with CLI tools can look up what the symbols mean.
If you google “what does [ ] mean in –help” you will not get useful results (
at least I didn’t when I just tried it).
I also think this help does a much better job at explaining why and how you might use the
kill command.
Note: the example above does not explain kill -l >128, since I found out
about that behavior only recently.
Conclusion
So I think there are 2 main takeaways here: don’t just rely on the --help output
of a command to figure out what it does. And if you are reimplementing a command,
make sure your --help can be depended on.