This week I spent some time debugging a curious case where signals
were not being delivered to a sudo
’ed process.
Consider this shell script:
#!/bin/bash
sudo timeout 5 dd if=/dev/urandom of=/dev/null &
sudo_pid=$!
sudo kill $sudo_pid
if ps -p $sudo_pid &>/dev/null; then
echo Still running
else
echo Dead
fi
What should be printed? If you guessed “Dead”, then you would be correct:
$ ./sudo_kill.sh
Dead
However, when run in CI, I was seeing the opposite. The following is run from a recreated CI environment (through docker):
docker@cd71223430f4$ ./sudo_kill.sh
Still running
Before going deeper, we need a little background on how
sudo
handles signals.
According to the man page:
When the command is run as a child of the sudo process, sudo will relay signals it receives to the command. The SIGINT and SIGQUIT signals are only relayed when the command is being run in a new pty or when the signal was sent by a user process, not the kernel. This prevents the command from receiving SIGINT twice each time the user enters control-C. Some signals, such as SIGSTOP and SIGKILL, cannot be caught and thus will not be relayed to the command.
This makes sense. Signals need to be relayed by sudo
to
the process it’s managing. Otherwise, scripts would need to calculate
the child PID or send signals to the appropriate process group (a rather
arcane concept). It is much simpler to run a backgrounded
sudo
instance and send signals to $!
(as we
did in the test script).
However, there are some drawbacks to this approach. The documentation goes on to say:
As a special case, sudo will not relay signals that were sent by the command it is running. This prevents the command from accidentally killing itself. On some systems, the reboot(8) utility sends SIGTERM to all non-system processes other than itself before rebooting the system. This prevents sudo from relaying the SIGTERM signal it received back to reboot(8), which might then exit before the system was actually rebooted, leaving it in a half-dead state similar to single user mode.
While this note is not directly related to our curious issue (b/c we
are not getting dd
to send signals), it is important to
note there are special cases carved out.
Keeping in mind the aforementioned signal propagation corner cases,
the obvious suspect is a change in sudo
. From comparing the
local version to the CI version (1.9.13p2
vs
1.8.21p2
), we discover this commit
which landed in 1.9.13p1
.
The old behavior (before the change) was to:
Only forward user-generated signals not sent by a process in the command’s own process group. […]
This explains the discrepancy we saw earlier. bash
runs
all the commands in the script in the same process group, so
kill
’s signals were being ignored by sudo
b/c
kill
is in the same process group as dd
.
In my opinion this is surprising and unexpected, if only because this
behavior is undocumented. That being said, I did manage to find a zero
upvote stack overflow
answer for a setsid
workaround. But I think that just
proves my point.
Fortunately, as of November 2022, the new behavior is to:
[…] forward signals from a process in the same pgrp if the pgrp leader is not either sudo or the command itself. […]
which fixes the script use case and obviates the need for a
setsid
workaround.
I had remarked a few months ago I was surprised the sudo project has over 12,000 commits. Now I am no longer surprised. Behind a rather simple interface (from the common use case perspective) lies a mountain of complexity. That being said, I’m still quite surprised it took over 40 years for this surprising and undocumented behavior to be fixed.