011. pipe exclusion with splice() under Linux
↩
Since apparently I'm
the only splice
(2) user,
herein I demonstrate a fun Linux BKL moment
(but the BKL is on a pipe).
If you use a splicing cat
,
you can do this right now from your teletype: just
$ cat | whatever
and whatever
will sleep forever on reads from its standard input stream, even if it set O_NONBLOCK
on it.
That's boring tho, since anonymous pipes are, well, anonymous. What about
$ mkfifo fifo $ whatever < fifo & $ cat > fifo
? The same applies! Even better:
$ > fifo
from another teletype sleeps forever as well.
So does the <
direxion.
And O_NONBLOCK
.
And any operation on that pipe.
Try sending a deadly signal to any of the afflicted (non-cat
) processes, too!.
If you don't, then
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
int main() {
ssize_t rd, acc = 0;
while ((rd = splice(0, 0, 1, 0, 128 * 1024 * 1024, 0)) > 0)
acc += rd;
fprintf(stderr, "sp=%zd: %m\n", acc);
}
can function as the pro-verbial cat
.
You can also substitute the splice(…)
for a sendfile(1, 0, 0, 128 * 1024 * 1024);
,
since, as a special case since 5.12,
sendfile(any→pipe) is legal and equivalent to splice() of the same,
even though otherwise it
only allows seekable→any.
But
Quite easily —
splice_file_to_pipe()
,
which, shockingly, runs when splicing
from a non-pipe to a pipe,
locks the output pipe, then does I/O, then unlocks it.
Locking the pipe naturally excludes concurrent
open()
s,
read()
s,
write()
s.
and
final close()
s
(incl. implicit ones on death).
Usually you wouldn't think this to be a huge issue, since most I/O completes within some reasonably-bounded time,
but teletype I/O, by design, never does
until a newline/eof
/eol
/eol2
.
And, thus, QED.
But
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdlib.h>
int main() {
int pt = posix_openpt(O_RDWR);
grantpt(pt);
unlockpt(pt);
int cl = open(ptsname(pt), O_RDONLY);
for(;;)
splice(cl, 0, 1, 0, 128 * 1024 * 1024, 0);
}
By rough-bisecting off snapshot.d.o kernel packages – since
4.0, and even 5.0, don't build on bookworm –
to between
4.8.15-2
and 4.9.1-1~exp1
,
then manually bisecting between v4.8
and v4.9
– in a stretch chroot, naturally,
since images built on buster hard-rebooted QEMU in a tight loop just after the decompressor and ELF parsing;
strapping the chroot took two hours of baby-sitting due to the current state of s.d.o, and most revisions only build
with an ubuntu patch;
so much for never breaking fucking userspace –
commit 8924feff66f35fe22ce77aafe3f21eb8e5cff881 ("splice: lift pipe_lock out of splice_to_pipe()")
is the first bad commit.
(The smoketest is:
./v > fifo & read -r _ < fifo & echo zupa > fifo
good
is it completes; bad
is it hangs.)
This aligns with the origin of the modern pipe_lock()
placement I got by recursive blame
.
But
Depends if you're running, like, nullmailer, in which case
./v > /var/spool/nullmailer/trigger
makes it ⇒ any subsequent MUA ⇒ any subsequent sender (if wait()
ing synchronously) enter the signal-impervious mutex-sleeping state,
which can only be recovered from by killing the splicing process.
Good luck finding that, since this affects any ptracing process as well.
Or any other message or log collection system where – especially unprivileged – users write stuff to a pipe, since they've now been granted a total exclusion thereon.
Even in inocuous situations like QEMU with -chardev pipe,id=pipe,path=$HOME/uwu/q -serial chardev:pipe, cat
ting to ~/uwu/q.in
(besides only waking up every second line, which is just business as usual), excludes emulation.
Nit-pick? Correction? Improvement? Annoying? Cute? Anything? Don't hesitate to post
or open an issue!