Recently, I’ve implemented several improvements for
lxc exec. In case you didn’t know,
lxc exec is LXD‘s client tool that uses the LXD client api to talk to the LXD daemon and execute any program the user might want. Here is a small example of what you can do with it:
One of our main goals is to make
lxc exec feel as similar to
ssh as possible since this is the standard of running commands interactively or non-interactively remotely. Making
lxc exec behave nicely was tricky.
1. Handling background tasks
A long-standing problem was certainly how to correctly handle background tasks. Here’s an asciinema illustration of the problem with a pre LXD 2.7 instance:
What you can see there is that putting a task in the background will lead to
lxc exec not being able to exit. A lot of sequences of commands can trigger this problem:
chb@conventiont|~ > lxc exec zest1 bash root@zest1:~# yes & y y y . . .
Nothing would save you now.
yes will simply write to
stdout till the end of time as quickly as it can…
The root of the problem lies with
stdout being kept open which is necessary to ensure that any data written by the process the user has started is actually read and sent back over the websocket connection we established.
As you can imagine this becomes a major annoyance when you e.g. run a shell session in which you want to run a process in the background and then quickly want to exit. Sorry, you are out of luck. Well, you were.
The first, and naive approach is obviously to simply close
stdout as soon as you detect that the foreground program (e.g. the shell) has exited. Not quite as good as an idea as one might think… The problem becomes obvious when you then run quickly executing programs like:
lxc exec -- ls -al /usr/lib
lxc exec process (and the associated
forkexec process (Don’t worry about it now. Just remember that
setns() are not on speaking terms…)) exits before all buffered data in
stdout was read. In this case you will cause truncated output and no one wants that. After a few approaches to the problem that involved, disabling pty buffering (Wasn’t pretty I tell you that and also didn’t work predictably.) and other weird ideas I managed to solve this by employing a few
poll() “tricks” (In some sense of the word “trick”.). Now you can finally run background tasks and cleanly exit. To wit:
2. Reporting exit codes caused by signals
ssh is a wonderful tool. One thing however, I never really liked was the fact that when the command that was run by ssh received a signal
ssh would always report
-1 aka exit code
255. This is annoying when you’d like to have information about what signal caused the program to terminate. This is why I recently implemented the standard shell convention of reporting any signal-caused exits using the standard convention
128 + n where
n is defined as the signal number that caused the executing program to exit. For example, on
SIGKILL you would see
128 + SIGKILL = 137 (Calculating the exit codes for other deadly signals is left as an exercise to the reader.). So you can do:
chb@conventiont|~ > lxc exec zest1 sleep 100
SIGKILL to the executing program (Not to
lxc exec itself, as
SIGKILL is not forwardable.):
kill -KILL $(pidof sleep 100)
and finally retrieve the exit code for your program:
chb@conventiont|~ > echo $? 137
Voila. This obviously only works nicely when a) the exit code doesn’t breach the
8-bit wall-of-computing and b) when the executing program doesn’t use
137 to indicate success (Which would be… interesting(?).). Both arguments don’t seem too convincing to me. The former because most deadly signals should not breach the range. The latter because (i) that’s the users problem, (ii) these exit codes are actually reserved (I think.), (iii) you’d have the same problem running the program locally or otherwise.
The main advantage I see in this is the ability to report back fine-grained exit statuses for executing programs. Note, by no means can we report back all instances where the executing program was killed by a signal, e.g. when your program handles
SIGTERM and exits cleanly there’s no easy way for LXD to detect this and report back that this program was killed by signal. You will simply receive success aka exit code
3. Forwarding signals
This is probably the least interesting (or maybe it isn’t, no idea) but I found it quite useful. As you saw in the
SIGKILL case before, I was explicit in pointing out that one must send
SIGKILL to the executing program not to the
lxc exec command itself. This is due to the fact that
SIGKILL cannot be handled in a program. The only thing the program can do is die… like right now… this instance… sofort… (You get the idea…). But a lot of other signals
SIGHUP, and of course
SIGUSR2 can be handled. So when you send signals that can be handled to
lxc exec instead of the executing program, newer versions of LXD will forward the signal to the executing process. This is pretty convenient in scripts and so on.
In any case, I hope you found this little
lxc exec post/rant useful. Enjoy LXD it’s a crazy beautiful beast to play with. Give it a try online https://linuxcontainers.org/lxd/try-it/ and for all you developers out there: Checkout https://github.com/lxc/lxd and send us patches. Read more