Tracing shell scripts

Introduction

osh is a new shell that is intended to be an upgrade path from bash (and similar shells) to a better language and runtime. We’re trying to prove osh is mature enough to be the only shell a distro needs. We are doing this by building all the packages in the Alpine distro, using osh as the only shell on the system. If we see a package fails to build with osh, we try to figure out why it doesn’t build with osh but does with ash (the default shell on alpine) and bash (explicitly used by some scripts). Since we are down to the last issues, the disagreements between osh and ash (and bash) are often subtle. Figuring out what the disagreement exactly is that causes the package build to fail is often quite a journey. In this post I’d like to show what a particular journey looked like, and hopefully show some useful shell debugging tips along the way.

The problem

Test failure

After a package has been built, the Alpine build tool runs the package’s testing suite. If any of these tests fail, the build also counts as a failure. When running shunit2 package tests with osh, we got the following errors:

ASSERT:assert message was not generated
...
ASSERT:failure message for assertFalse was not generated
shunit2:ERROR testIssue84() returned non-zero return code.

It was not just this error, but a bunch of tests cases failing with similar error messages. This was a tricky issue to figure out, but in the end the cause turned out to be a subtle difference (as usual) between how osh handles errors when running a command through eval and how the other shells do it.

Minimal example

Consider these lines:

set -u
test_function() {
    x=$1
}

echo "Calling eval test_function"
eval test_function
echo "Eval done"

Let’s go through this example line by line.

The set -u line tells whatever shell is running this code to exit with an error if an undefined variable is accessed. This is very useful to add to your script to catch typos and other bugs (if you don’t undefined variables are treated as empty strings).

Next up, a function called test_function is declared. All it does is set the value of variable x to the value of the first parameter that was passed to function (available as $1).

Then, we have a simple echo statement to show we are about to call the test function. And then the most important line, eval test_function, which calls test_function but doesn’t provide any parameters to it. This will cause an error, since test_function will assign $1 to x, but we haven’t provided any arguments to test_function, so $1 will be undefined. This error is intentional. Finally we have another echo statement to show the script is done with the eval command and keeps running the script.

Let’s see how bash runs this (I wrote the script above to the file /tmp/test so it is easy to run):

$ bash /tmp/test
Calling eval test_function
/tmp/test: line 3: $1: unbound variable
$

It fails, as expected. But what is important to notice is that bash didn’t execute the last line: echo "Eval done", instead it completely stopped running the script after encountering the unbound (undefined) variable. All shells (except for osh) behave the same way, so this behavior seems to be very standard. But if we run the same script with osh, we see something different:

$ bin/osh /tmp/test
Calling eval test_function
      x=$1
        ^~
/tmp/test:3: fatal: Undefined variable '1'
Eval done
$

Osh does print Eval done, meaning it doesn’t exit the script upon hitting the undefined variable, but it keeps going!

The journey

Once you have a minimal reproducible example, it’s very easy to see what’s wrong. But going from error messages like:

ASSERT:assert message was not generated
...
ASSERT:failure message for assertFalse was not generated
shunit2:ERROR testIssue84() returned non-zero return code.

to a code snippet like:

set -u
test_function() {
    x=$1
}

echo "Calling eval test_function"
eval test_function
echo "Eval done"

can be quite the journey (a 5-hour journey in this case). We’ll get more into the details later, but what made this difficult is that shunit2 is a unit testing framework for shellscripts, which was testing itself. That made it quite a challenge to figure out who was supposed to call what, and why bash and osh behaved differently.

Pinpointing the failure

When a package fails to build (or test) there are 2 main questions that have to be answered:

What are the problematic lines of code?
How do they behave different from ash and/or bash? If we know what lines in the code are giving us issues, and if we also know what behavior is different, it is easy to create a minimal example that showcases the issue clearly. Getting a vague idea of what lines of code are the issue is usually not that difficult, but getting a precise idea can be.

In the shunit2 build output, we can see these lines in the build log:

--- Executing the 'shunit2_misc' test suite. ---
testUnboundVariable
 ASSERT:assert message was not generated
testIssue7
testIssue29
testIssue69
testIssue77
testIssue84
  ASSERT:failure message for assertFalse was not generated
shunit2:ERRORtestIssue84() returned non-zero return code.

If we grep (actually I prefer ripgrep) testUnboundVariable in the shunit2_misc_test.sh file we can see this piece of code (don’t worry, you don’t have to understand what this does):

testUnboundVariable() {
unittestF="${SHUNIT_TMPDIR}/unittest"
sed 's/^#//' >"${unittestF}" <<EOF
## Treat unset variables as an error when performing parameter expansion.
#set -u
#
#boom() { x=\$1; }  # This function goes boom if no parameters are passed!
#test_boom() {
#  assertEquals 1 1
#  boom  # No parameter given
#  assertEquals 0 \$?
#}
#SHUNIT_COLOR='none'
#. ${TH_SHUNIT}
EOF
  ( exec "${SHELL:-sh}" "${unittestF}" >"${stdoutF}" 2>"${stderrF}" )
  assertFalse 'expected a non-zero exit value' $?
  grep '^ASSERT:unknown failure' "${stdoutF}" >/dev/null
  assertTrue 'assert message was not generated' $?
  grep '^Ran [0-9]* test' "${stdoutF}" >/dev/null
  assertTrue 'test count message was not generated' $?
  grep '^FAILED' "${stdoutF}" >/dev/null
  assertTrue 'failure message was not generated' $?
}

The boom() function hopefully looks familiar. Probably there is a lot of stuff that doesn’t make sense. We see some assert stuff at the bottom, which might be related to our issue but we don’t know where that comes from or how it works. So we have a vague idea of where our issue is, but it doesn’t feel like we are close to answering question 1 (what are the problematic lines of code).

Understanding what the flow of this code is, is really difficult by just reading it. Some script is being sourced (. ${TH_SHUNIT}) and later on something gets executed (( exec ...)), but that’s really all we can easily tell. And because this whole block of code is just a function, we can’t easily rerun this code ourselves. Whatever is going to be calling this function will fill in all the variables in the function, which will be empty if we run the function ourselves directly. So ideally we would just like to rerun the the test like before, but get some more insight into what’s going on.

set -x

Luckily, almost every shell (including osh) provides a powerful debugging tool: set -x. set -x causes your shell to print out all the commands it executes (this is called xtrace):

$ cat /tmp/x
set -x
VAR="hello"
echo "${VAR}"
VAR="world"
$ bash /tmp/x
+ VAR=hello
+ echo hello
hello
+ VAR=world
$

As you can see, all the commands that bash executes get printed with a leading +. If we just ran the script on its own, we would see it print hello, but we would have no idea where it comes from ($VAR). This is a similiar situation to the issue we have with testUnbound, we see some output but we have no idea where it comes from. Instead of adding set -x to a script, we can also just pass -x as an argument to a shell and it will behave the same way. Let’s try doing this with this strange ( exec ...) command to see what it is doing:

( exec "${SHELL:-sh}" "-x" "${unittestF}" >"${stdoutF}" 2>"${stderrF}" )
# We added this ~~~~~~~^

What this line will do is start a new process (exec), which runs a shell. What shell it should run can be specified using a variable $SHELL, but if that is not specified it will default to sh (which in this test setup links to osh). The new shell executes $unittestF (which is most likely a path to some script), and it redirects the output of that script to a file whose name is saved in the variable $stdoutF. The error output of the shell is redirected to a file whose name is saved in $stderrF.

Let’s see what the output is:

+ set -u
+ SHUNIT_COLOR=none
+ . ./shunit2
+ command '[' -n '' ']'
+ SHUNIT_VERSION=2.1.8
+ SHUNIT_TRUE=0
+ SHUNIT_FALSE=1
.... # a bunch more lines like the ones above
+ boom
  boom() { x=$1; }  # This function goes boom if no parameters are passed!
             ^~
/tmp/shunit.PegkcD/tmp/unittest:4: fatal: Undefined variable '1'
+ command '[' 1 -ne 0 ']'
+ _shunit_error 'test_boom() returned non-zero return code.'
+ echo -e 'shunit2:ERROR test_boom() returned non-zero return code.'
shunit2:ERROR test_boom() returned non-zero return code.
+ __shunit_testSuccess=2
+ _shunit_incFailedCount
+ 67331 expr 0 '+' 1
+ __shunit_assertsFailed=1
+ 67332 expr 1 '+' 1
+ __shunit_assertsTotal=2
+ tearDown
....

This is already trimmed down a lot, it actually kept going for a few hundred lines after that. We can see the boom() function gets executed, but for the rest the lines are a bunch of noise. In my opinion, this doesn’t contain any super suspicious or helpful lines of code. There are a bunch of comparisons, the lines that look like command '[' x -eq y ']', but we are missing all the context of what they mean. You can also see a bunch of variables getting set and updated, but again no idea where they come from or how they are used. Even though this looks like a bunch of gibberish, and it still doesn’t feel like we are close to answering question 1, I’d like to make a jump to question 2 already (how do the problematic lines of code behave differently from ash and bash?). If we do the same set -x trick with ash, we might see how the output differs. That could give us a hint of where to look next. This is the output of running the same function with set -x but with ash:

+ boom  
/tmp/shunit.KKmmll/tmp/unittest: eval: line 4: 1: parameter not set
+ _shunit_cleanup EXIT
+ _shunit_name_=EXIT
+ '[' EXIT '!=' EXIT ]
+ '[' 1 -eq 1 ]
+ __shunit_clean=0
+ tearDown

Again, we don’t exactly need to understand what is going on, but there is already an interesting pattern here. Let’s assume here + tearDown indicates the end of the test. If you compare the lines between + boom (our test calls the boom function here) and + teardown in the 2 examples, you can see bash and osh execute different code. The code is actually very different, which is quite strange. If we want to understand what is going on here, knowing what code the shell is executing is not enough. We want to know where the code it is executing is coming from as well.

$PS4

We can use set -x and set +x to enable and disable the xtrace. But we can actually also customize the format of the xtrace quite a bit. We can modify it with the $PS4 variable. PS4 stands for Prompt String 4 (there’s also other $PS variables but we don’t care about those now). By default the value of $PS4 is

$ echo $PS4
+

That’s why there is a + sign in front of all the xtrace commands. If we set it to % , a % will show up in front of our xtrace outputs:

$ PS4="% "
$ set -x
$ echo 'hello'
% echo hello
hello

Unless you really like % signs this isn’t super useful, but you can put a variable into the PS4 string and it will be expanded each time a new command is run:

$ PS4='${VARIABLE}: '
: PS4='${VARIABLE}: '
$ echo "hi!"
: echo 'hi!'
hi!
$ VARIABLE="1"
: VARIABLE=1
$ echo "hello!"
1: echo 'hello!'
hello!
$ VARIABLE="2"
1: VARIABLE=2
$ echo "bye"
2: echo bye
bye
$

As you can see, for each command the variable is expanded again. Maybe you could create your own debug variables that would be shown in the xtrace output, that could be pretty useful. Luckily you don’t even have to that in most cases. Bash already has $BASH_SOURCE and $LINENO.

$BASH_SOURCE always contains the path to the script that is currently being run.
$LINENO points to the current line number of the file that is being executed. If we put this into the $PS4 variable, the control flow of the script can become a lot more clear. Here is a simple example without changing $PS4:

$ bash -x /tmp/main.sh # -x is the same as running with `set -x`
+ source /tmp/lib.sh   # We source some library files here.
+ source /tmp/lib_b.sh # This makes any functions defined in these files
+ source /tmp/lib_c.sh # available to call in our script
+ echo Starting
Starting
+ calculate
+ ANSWER=50
+ echo 'Result: 50'
Result: 50

If we want to see what calculate does, we wouldn’t know where to look. However if we run the same script with a new value in $PS4:

$ . /tmp/main.sh
+ :97: . /tmp/main.sh
++ /tmp/main.sh:2: source /tmp/lib.sh
++ /tmp/main.sh:3: source /tmp/lib_b.sh
++ /tmp/main.sh:4: source /tmp/lib_c.sh
++ /tmp/main.sh:6: echo Starting
Starting
++ /tmp/main.sh:7: calculate
++ /tmp/lib.sh:2: ANSWER=50
++ /tmp/main.sh:8: echo 'Result: 50'
Result: 50
++ :97: printf '\033]0;%s@%s:%s\007' bram bram-dell /tmp

You can clearly see now that after calling calculate $ANSWER gets set to 50 in /tmp/lib.sh, so that must be where that function is defined!

Note: you must source (or .) /tmp/main.sh. If you run it using bash /tmp/main.sh or ./main.sh you will run the script in a new shell, and in that new shell $PS4 will have the default value of just + again. source runs the script in your current shell, preserving the value of $PS4.

Back to the problem

Let’s see if setting $PS4 helps us with our problem.

As I mentioned before, the testing setup of this package is quite complicated, since this is a shell unit test framework that is testing itself with itself. Let me give a brief overview of how it works. We have the testUnbound function we have been looking at before. That function defines a test test_boom and then sources the shunit2 framework through . ${TH_SHUNIT}. $TH_SHUNIT points to the file shunit2.sh which is the testing framework executable, we were able to see this in our first run with set -x. You run the testing framework by sourcing the shunit2.sh file. It then runs every function that begins with test (I learned this from the package’s README.md). It looks a bit like this:

┌─────────────────────────────┐
│  shunit2_misc_tests.sh      │  (test definitions)
│                             │  
│  testUnbound() {            │
│      test_boom() { ... }    │
│      . ./shunit2.sh     ────┼──┐
│  }                          │  │
│                             │  │  
└─────────────────────────────┘  │
                                 │ sources
                                 ↓
                    ┌────────────────────────────┐
                    │  shunit2.sh                │  (test framework)
                    │                            │
                    │  1. Initialize framework   │
                    │  2. Discover test_* funcs  │
                    │  3. Run each test:         │
                    │     - setUp()              │
                    │     - test_boom()    ←─────┼── calls back into
                    │     - tearDown()           │   shunit2_misc_tests.sh
                    │  4. Report results         │
                    └────────────────────────────┘

The most important part to know is that testUnbound invokes shunit2.sh, which does some magic and runs the test test_boom. We want to know what kind of magic it is doing, because that magic is the different code that was executed when running the tests with osh or with ash (that’s what we were trying to figure out, remember?). I annotated the lines we want to decipher below:

# Running with osh:
+ boom            # <--- this is our test function
  boom() { x=$1; }  # This function goes boom if no parameters are passed!
             ^~
/tmp/shunit.PegkcD/tmp/unittest:4: fatal: Undefined variable '1'
+ command '[' 1 -ne 0 ']'                                            # but what is all this:
+ _shunit_error 'test_boom() returned non-zero return code.'         # <
+ echo -e 'shunit2:ERROR test_boom() returned non-zero return code.' # <
shunit2:ERROR test_boom() returned non-zero return code.             # <
+ __shunit_testSuccess=2                                             # <
+ _shunit_incFailedCount                                             # <
+ 67331 expr 0 '+' 1                                                 # <
+ __shunit_assertsFailed=1                                           # <
+ 67332 expr 1 '+' 1                                                 # <
+ __shunit_assertsTotal=2                                            # <
+ tearDown                                                           # <

# Running with bash
+ boom            # <--- again, our test case
/tmp/shunit.KKmmll/tmp/unittest: eval: line 4: 1: parameter not set
+ _shunit_cleanup EXIT      # where did this come from?
+ _shunit_name_=EXIT        # <
+ '[' EXIT '!=' EXIT ]      # <
+ '[' 1 -eq 1 ]             # <
+ __shunit_clean=0          # <
+ tearDown                  # <

Since shunit2.sh is calling running the actual test, I just defined the PS4 variable in that script, right at the top:

osh-0.36$ cat shunit2 | head -n 3
#! /bin/sh
PS4='${BASH_SOURCE}:${LINENO}: '
# vim:et:ft=sh:sts=2:sw=2

I enabled xtracing just like before:

( exec "${SHELL:-sh}" "-x" "${unittestF}" >"${stdoutF}" 2>"${stderrF}" )
# We added this ~~~~~~~^

The result:

...
./shunit2:1059: echo test_boom
./shunit2:1060: eval test_boom
...
/tmp/shunit.iHaEOA/tmp/unittest:8: boom
  boom() { x=$1; }  # This function goes boom if no parameters are passed!
             ^~
/tmp/shunit.iHaEOA/tmp/unittest:5: fatal: Undefined variable '1'
./shunit2:1061: command '[' 1 -ne 0 ']'
./shunit2:1062: _shunit_error 'test_boom() returned non-zero return code.'
...

Now we can look at line 1061 in shunit2 to see what it is comparing:

1058:  # Execute the test.
1059:  echo "${__SHUNIT_TEST_PREFIX}${_shunit_test_}"
1060:  eval "${_shunit_test_}"
1061:  if command [ $? -ne ${SHUNIT_TRUE} ]; then
1062:    _shunit_error "${_shunit_test_}() returned non-zero return code."
1063:    __shunit_testSuccess=${SHUNIT_ERROR}
1064:    _shunit_incFailedCount
1065:  fi

Okay, so we can see it executes the test function test_boom with eval. Then it checks if the exit code of the eval call indicates an error. Or rather, if it does not equal (-ne) a success, it throws an error. $? stores the result code of the previously executed command, which is non-zero in case of a failure. In our xtrace we can see $? was 1, and $SHUNIT_TRUE is 0 (succes):

./shunit2:1061: command '[' 1 -ne 0 ']'

Even though we expect our test_boom function to fail, it seems like the shunit2 script is not expecting a non-zero exit code. This is a bit strange.

Comparing with ash (again)

Maybe comparing to ash we can see what shunit2 expects to happen:

tmp/shunit.pKGjjD/tmp/unittest: ./shunit2: line 377: BASH_SOURCE: parameter not set
${BASH_SOURCE}:${LINENO}: _ASSERT_NOT_NULL_='eval assertNotNull --lineno "${LINENO:-}"'
/tmp/shunit.pKGjjD/tmp/unittest: ./shunit2: line 409: BASH_SOURCE: parameter not set
${BASH_SOURCE}:${LINENO}: _ASSERT_SAME_='eval assertSame --lineno "${LINENO:-}"'
/tmp/shunit.pKGjjD/tmp/unittest: ./shunit2: line 441: BASH_SOURCE: parameter not set

Okay seems like ash doesn’t support $BASH_SOURCE (I guess it’s not completely unexpected, since it has BASH in its name). Luckily $LINENO is part of POSIX, so most shells at least support printing the line number:

/tmp/shunit.hBaILJ/tmp/unittest: line 4: $1: unbound variable
1: _shunit_cleanup EXIT
951: _shunit_name_=EXIT

This is indeed doing something different. The syntax actually looks like its using trap to handle an EXIT signal. Yes, if we search shunit2 for “trap” we can find these lines:

# Setup traps to clean up after ourselves.
trap '_shunit_cleanup EXIT' 0

This “traps” the EXIT signal, which means that as soon as this process receives an EXIT signal, it runs the function _shunit_cleanup, no matter what it was doing before. This sounds like a great hint. If we go back to this code:

1058:  # Execute the test.
1059:  echo "${__SHUNIT_TEST_PREFIX}${_shunit_test_}"
1060:  eval "${_shunit_test_}"
1061:  if command [ $? -ne ${SHUNIT_TRUE} ]; then
1062:    _shunit_error "${_shunit_test_}() returned non-zero return code."
1063:    __shunit_testSuccess=${SHUNIT_ERROR}
1064:    _shunit_incFailedCount
1065:  fi

When we are running this with ash, after line 1060 is completed the EXIT trap is activated. This means the application will run the function _shunit_cleanup after line 1060 but before line 1061. Whatever commands are being run in _shunit_cleanup will affect the value of $?. This is a global variable, so any command you run after eval will overwrite $? with its own exit code. If _shunit_cleanup runs and the last command it executes is succesful, $? will be set to 0, and the if statement won’t be entered. That must be the reason for the different behavior when running the tests in ash or osh. Let’s visualize the difference between ash and osh:

               ash                                    osh 
    ===========================            ===========================   
                                                     
┌─────────────────────────────────┐    ┌─────────────────────────────────┐
│  shunit2.sh (line 1060)         │    │  shunit2.sh (line 1060)         │
│                                 │    │                                 │
│  eval test_boom                 │    │  eval test_boom                 │
│    ↓                            │    │    ↓                            │
│    test_boom() executes         │    │    test_boom() executes         │
│    ↓                            │    │    ↓                            │
│    boom() called                │    │    boom() called                │
│    ↓                            │    │    ↓                            │
│    ERROR: $1 unbound variable   │    │    ERROR: $1 unbound variable   │
│    ↓                            │    │    ↓                            │
│  [EXIT signal triggered]        │    │  eval returns (error code 1)    │
│    ↓                            │    │    ↓                            │
│  _shunit_cleanup() runs         │    │  Line 1061: checks $? == 1      │
│    ↓                            │    │    ↓                            │
│  Script exits                   │    │  Line 1062: _shunit_error()     │
│                                 │    │    ↓                            │
│  Line 1061 never reached        │    │  Script continues...            │
│  $? check never happens         │    │    ↓                            │
│                                 │    │  Eventually...                  │
│                                 │    │  _shunit_cleanup() at end       │
└─────────────────────────────────┘    └─────────────────────────────────┘

Who is to blame

Now we know where it is going wrong, but not yet what exactly. Is trap in osh just not working in this scenario for some reason? Lets test this with a separate script:

$ cat /tmp/test
# instead of running a clean up function, we just print EXIT TRAP FIRED
trap 'echo "EXIT TRAP FIRED"' EXIT

set -u

test_function() {
    boom() { x=$1; }
    boom  # No argument
    echo "After boom"
}

echo "Calling test_function"
test_function
echo "After test_function, exit code: $?"
echo "Script continues"

$ ash /tmp/test     # Running the script with ash
Calling test_function
/tmp/test: line 6: $1: unbound variable
EXIT TRAP FIRED
$ echo $?
1
$ bin/osh /tmp/test  # Running the script with osh
Calling test_function
      boom() { x=$1; }
                 ^~
/tmp/test:6: fatal: Undefined variable '1'
EXIT TRAP FIRED
$ echo $?
1

No they seem to behave exactly the same. But shunit2 wasn’t just calling the function, it was using eval to do it. Let’s add that to our script:

$ cat /tmp/test
trap 'echo "EXIT TRAP FIRED"' EXIT

set -u

test_function() {
    boom() { x=$1; }
    boom  # No argument
    echo "After boom"
}

echo "Calling eval test_function"
eval test_function    # Added eval here
echo "After test_function, exit code: $?"
echo "Script continues"

$ ash /tmp/test
Calling eval test_function
/tmp/test: eval: line 6: 1: parameter not set
EXIT TRAP FIRED

$ osh /tmp/test
Calling eval test_function
      boom() { x=$1; }
                 ^~
/tmp/test:6: fatal: Undefined variable '1'
After test_function, exit code: 1  # We didn't see this with ash
Script continues                   # Or this
EXIT TRAP FIRED                    # the trap fires all the way at the end

There is a clear difference! So this is an issue with eval and not with trap. For some reason, the exit behavior of eval in osh is different than that in ash. Let’s see how other shells behave:

$ bash /tmp/test
Calling eval test_function
/tmp/test: line 6: $1: unbound variable
EXIT TRAP FIRED # early exit, we dont see the "After test_function..." print

$ dash /tmp/test
Calling eval test_function
/tmp/test: 6: eval: 1: parameter not set
EXIT TRAP FIRED

$ zsh /tmp/test
Calling eval test_function
test_function:1: 1: parameter not set
Eval done
EXIT TRAP FIRED

$ mksh /tmp/test
Calling eval test_function
/tmp/test: 1: parameter not set
EXIT TRAP FIRED

Most of them seem to agree, with the exception of zsh. So it definitely seems that the way ash behaves is what is generally expected. We can simplify our example even more, we can leave out trap entirely.

set -u
test_function() {
    x=$1
}

echo "Calling eval test_function"
eval test_function
echo "Shouldn't reach here"

And running it shows the difference again:

$ bash /tmp/test
Calling eval test_function
/tmp/test: line 4: $1: unbound variable
$ ash /tmp/test
Calling eval test_function
/tmp/test: eval: line 4: 1: parameter not set
$ bin/osh /tmp/test
Calling eval test_function
      x=$1
        ^~
/tmp/test:4: fatal: Undefined variable '1'
Shouldn't reach here

This is a great example we can use to show what exactly the unexpected behavior of osh is. So thanks to set -x and $PS4 we figured this out. Debugging this out without those two tools would have been very painful (it already kind of was).

Conclusion

I hope this shows in what scenarios set -x and $PS4 can be useful. I think they are great tools, especially if you are trying to figure out the control flow of a script. Or if you want to know where certain parts of code are coming from. Using $BASH_SOURCE and $LINENO in $PS4 is a solid go to (if your shell supports it). But you can do a lot more with this, for example you could put a specific variable you are interested in, and see its value for every line that gets executed.

Even though we found the issue, these tools still feel a bit crude at times. So I hope we can add some more advanced debugging tools in osh that are easy to use in the future. I think especially the ease of use can be improved. For example, technically it’s possible to bash print out stack traces using set -x. But you have to change your $PS4 to:

PS4='+ $(printf "%*s" $((2 * ${#BASH_SOURCE[@]})) "")${BASH_SOURCE[0]}:${LINENO}:${FUNCNAME[0]:+${FUNCNAME[0]}(): }'

There has to be a better way than that.

Introduction#

The problem#

Test failure#

Minimal example#

The journey#

Pinpointing the failure#

set -x#

$PS4#

Back to the problem#

Comparing with ash (again)#

Who is to blame#

Conclusion#