Recently (late this week),
a v4 unix tape was found and dumped.
The relevant artifacts are as follows;
UNIX V4 tape from University of Utah (raw)
:
This is the raw analog waveform and the reconstructed digital tape image (analog.tap), read at the Computer History Museum's Shustek Research Archives on 19 December 2025 by Al Kossow using a modified tape reader and analyzed with Len Shustek's readtape tool.
UNIX Original from Bell Labs V4 (See Manual for format)
The size of the tape binary is very close to the V5 rootfs dump, so this is as-expected. The standard procedure is to turn the tape into a tarball for interactive use.
Easy enough: this will be a tp (V) tape (both per boot procedures (VII) and because there's just no alternative yet). The format is so trivial, in fact, that reading it amounts to 3 lines, which tell us the conte-
$ make tp && < analog.tap ./tp t c++ -fdebug-default-version=3 tp.cpp -o tp 2572452 bytes, 5025 blocks �dldr[0] 0:0 S�dtf[0] 0:0 /�list[0] 0:0 �Gmboot[0] 0:0 �Rmcopy[0] 0:0 %�rkf[0] 0:0 � tboot[0] 0:0 Ruboot[0] 0:0 L�[0] 0:0 [0] 0:0
Hm. Clearly not.
A raw image of a rootfs would be unprecedented for distribution, but possible? file system (V) wants some more hand-holding. But not much more just to read the super block:
$ make fs5tar && < v5root ./fs5tar c++ -fdebug-default-version=3 fs5tar.cpp -o fs5tar 2494464 bytes, 4872 blocks i-node blocks: 80 filesystem length: 4000 blocks freelist: [ 3595 3594 3593 3592 3591 3590 3589 3588 3587 3586 3585 3584 3583 3582 3581 3580 3579 3578 3577 3576 3575 3574 3573 3572 3571 3570 3569 3568 3567 3566 3565 3564 3563 3560 3559 3558 3554 3553 3552 231 232 3015 3542 3541 3556 3507 3529 3550 3492 3538 ] $ make fs5tar && < analog.tap ./fs5tar make: 'fs5tar' is up to date. 2572452 bytes, 5025 blocks i-node blocks: 0 filesystem length: 496 blocks fs5tar: fs5tar.cpp:39: int main(int, const char *const *): Assertion `super_block->nfree <= 100' failed. Aborted
It's right about the v5 dump so it has to be right about this one as well. What could this mean? Quoth readtape:
The output of the decoding can include: - a log file - multiple binary files of the reconstructed data separated at filemarks, or - one SIMH .tap file that encodes data and filemarks (see http://simh.trailing-edge.com/docs/simh_magtape.pdf) -[…]
So, easy enough, both conceptually and mechanically. Conceptually, the [redacted redacted] part above was "in a hard-to-use custom format that we didn't say what it was". Mechanically re-run readtape on its input .tbin to re-create the output file. (These are binary tapes, so they're more like long disks than like modern file-oriented tapes.) It's unclear why this isn't part of the dump (so one has to assume general contempt toward the consumer). But never mind all that.
Naturally, the readtape repository is humongous even at --depth 1 (340M examples/, which leads to a 220M .git), the distribution is "there's 3 programs in this 16-file directory" (there is no build script of any sort), and it doesn't build at all. But that's standard for academic software and easy to work around.
What's harder to work around is that it just isn't a valid readtape .tbin:
$ ./readtape -outf=gaming analog.tbin this is readtape version 3.16, compiled on Dec 21 2025 13:41:53, running on Sun Dec 21 15:09:29 2025 command line: ./readtape -outf=gaming analog.tbin this is a little-endian computer For more information, see https://github.com/LenShustek/readtape reading file "analog.tbin" the output files will be "gaming.xxx" .tbin file header: ***FATAL ERROR: bad .tbin hdr size: 240, not 300
csvtbin says the same thing (except it segfaults if you give it the .tbin suffix). This error means that this file supposes to be the right version, but just… isn't. The header isn't marked as packed (there's no padding anyway), and bears
//**** beware that changing these definitions may invalidate existing files!
as well as three members like
struct tm time_written; // when the analog tape data was written (9 integers)
which is factually wrong (a struct tm is at least 9 ints). Every unix I've looked at except Solaris (glibc/musl, {Net,Open,Free}BSD) has a long and const char * as well. struct tm instead of time_t here truly baffles the mind.
So: we don't know what system this was analysed on initially, and it's entirely possible it's a modified readtape. If it's an unmodified readtape, then each struct tm must have twenty more bytes on the original host than on amd64 glibc! One again wonders why you wouldn't include what you dumped with (in the form of a versioned readtape spec and/or the host metadata).
readtape reads either a CSV or a .tbin, and per analog.log and analog.csvtbin.log, the dumpers used the CSV and then turned it into the .tbin for distribution. This renders all inputs to readtape (which is the entire dump except analog.tap and maybe v4.sal? if you want to come up with a new CSV for the samples) useless for third-party analysis. This is what you want from your once-in-a-lifetime discovery.
So: analog.tap.
The PDF says there's a library but it's for SIMH plugin use only.
Thankfully, dpkg -L simh | grep dump on bookworm shows a mtdump(1), which understands and turns analog.tap into blocks.
The container includes two errors:
$ mtdump analog.tap | grep -iA2 err Error marker at record 4281 Obj 4281, position 2225600, record 4281, length = 511 (0x1FF) Obj 4282, position 2226120, record 4282, length = 512 (0x200) -- Error marker at record 4367 Obj 4367, position 2270320, record 4367, length = 512 (0x200) Obj 4368, position 2270840, record 4368, length = 512 (0x200)
but those are the only goofy records (also an empty Reserved Marker at the end):
$ mtdump analog.tap | grep -o 'position [^,]*' | while read -r _ off; do od -An -t d4x4 -N 4 -j $off < analog.tap; done | paste - - | uniq -c
4280 512 00000200
1 -2147483137 800001ff
85 512 00000200
1 -2147483136 80000200
580 512 00000200
2 0 00000000
So the file is degenerate enough that we can strip off the top nibble, and dump everything straight (re-using the padding byte for the short record):
$ make simh2raw && < analog.tap ./simh2raw > v4tape $ file -L v5root v4tape analog.tap v5root: PDP-11 executable v4tape: PDP-11 executable analog.tap: Atari DEGAS Elite bitmap 640 x 400 x 2, color palette 0000 0701 dc01 0000 0000 ... $ ls -lL v5root v4tape analog.tap -rw-r--r-- 2,494,464 v5root -rw-r--r-- 2,532,864 v4tape -rw-r--r-- 2,572,452 analog.tap
a vast improvement! and it's even a multiple of 512 this time!
So, once more, as a tp (V)
(this time
taking
care
to work the double-precision
(32-bit) timestamp's endianness to get a reasonable result; the epoch matches ours by this point):
$ make tp && < v4tape ./tp t | tr -d '\0' | column -t c++ -fdebug-default-version=3 tp.cpp -o tp 2532864 bytes, 4947 blocksname mode uid:gid size mtime tape address checksumdldr[4] 0100777 6:1 58368 Wed Oct 24 16:19:46 1973 block=63 cs=0x9653? OK dtf[3] 0100777 6:1 615936 Wed Oct 24 16:22:24 1973 block=64 cs=0xf72f? OK list[4] 0100777 3:1 18944 Thu Oct 25 16:12:36 1973 block=69 cs=0x47ff? OK mboot[5] 0100777 3:1 125952 Thu Oct 25 16:09:39 1973 block=70 cs=0x529c? OK mcopy[5] 0100777 6:1 110592 Sun Nov 11 21:12:55 1973 block=71 cs=0xa025? OK rkf[3] 0100777 6:1 20480 Sun Nov 11 21:04:51 1973 block=72 cs=0x0be5? OK tboot[5] 0100777 3:1 115200 Thu Oct 25 16:10:40 1973 block=73 cs=0x527f? OK uboot[5] 0100777 6:1 131072 Thu Oct 25 22:55:44 1973 block=74 cs=0xf34c? OK
boot procedures (VII) elucidates the purpose of the ?boots — loading unix from tape, DECtape, and file system (V), respectively — none of the others are mentioned in the manual. One has to assume rkf and dtf will format a rk (IV)/DECtape in some way, mcopy can copy (tp (V) files?) between tapes, and list would list the files on a tp (V) tape. Which leaves dldr.
And also that this is obviously wrong. The times seem roughly plausible (if a little old, but sure, bootloader tech doesn't really change much) but I did also massage them to get a roughly-plausible result. What's more concerning is that these are all humongous, totalling 1.2M, allthewhile the manual complains to high heavens about how little space uboot has (and 128k isn't really little here) but their blocks are located contiguously, as-if most of them were no larger than a block.
Maybe this is a mt (I)/tap (I, V), à la V3? V4 still ships a mt (I) and tap (1) (but in manx/), and backwards-compatible install media would be unprecedented but not impossible. The format is incompatible (except for a name-only listing and checksums). Also, in V3, per file system (V):
There are two words with the calendar time (measured since 00:00 Jan 1, 1972);[…]. All the times are measured in sixtieths of a second.
so:
$ make tap && < v4tape ./tap t | tr -d '\0' | column -t c++ -fdebug-default-version=3 tap.cpp -o tap 2532864 bytes, 4947 blocksname mode uid size mtime tape address unused checksumdldr[4] 0377 129: 262 Wed Jun 21 23:36:48 1972 48/60 block=1835 \x92\xff? cs=0x9653? OK dtf[3] 0377 129: 262 Thu Dec 30 00:53:36 1976 36/60 block=1836 0@ cs=0xf72f? OK list[4] 0377 129: 259 Sat Feb 26 04:07:44 1972 44/60 block=1837 dOE cs=0x47ff? OK mboot[5] 0377 129: 259 Mon Jan 8 05:35:12 1973 12/60 block=1837 \xb3NF cs=0x529c? OK mcopy[5] 0377 129: 262 Thu Nov 23 17:19:12 1972 12/60 block=1859 G\xffG cs=0xa025? OK rkf[3] 0377 129: 262 Wed Mar 1 17:21:20 1972 20/60 block=1859 c\xfdH cs=0x0be5? OK tboot[5] 0377 129: 259 Thu Dec 7 09:00:00 1972 00/60 block=1837 \xf0NI cs=0x527f? OK uboot[5] 0377 129: 262 Tue Jan 23 09:40:32 1973 32/60 block=1837 \xe0\xadJ cs=0xf34c? OK
This isn't much better: the UID is huge, the sizes are too similar and repeat, the block addresses repeat, there's a 1976 date, and all unused sections have something (in fact, the unused blob part seems to be 17 bytes long, not 20. and tp (I) has a 16-byte unused blob preceded by a little-endian address, so it's more like that).
So far, I've ignored non-self-describing methods for analysing the tape. However, there's at least 3 files that start in a way we know: all the ?boots have a common almost-initial sequence, and we have a reference version in block 0 (this also applies, more roughly, to the V5 dump):
$ hd v4tape -N 512 000000 07 01 dc 01 00 00 00 00 00 00 00 00 00 00 01 00 >................< 000010 c6 15 00 be c4 15 bc bf 81 11 c1 21 0b 86 00 0a >...........!....< 000020 17 22 07 01 02 02 c0 15 10 00 11 14 57 20 b4 bf >."..........W ..< 000030 fc 87 4e 00 f7 09 10 01 c5 15 44 bf c0 15 3d 00 >..N.......D...=.< 000040 cd 09 01 11 f5 09 02 00 17 20 0a 00 0c 03 17 20 >......... ..... <
all ?boots I've seen have a W…N sequence,
and the V4 ones have a N…D sequence as well.
So:
$ hd v4tape | grep -F N.......D 000030 fc 87 4e 00 f7 09 10 01 c5 15 44 bf c0 15 3d 00 >..N.......D...=.< 008c30 fc 87 4e 00 f7 09 10 01 c5 15 44 bf c0 15 3d 00 >..N.......D...=.< 0cba30 fc 87 4e 00 f7 09 10 01 c5 15 44 bf c0 15 3d 00 >..N.......D...=.<
So there's a ?boot in block 0, block 70, and block 1629. Which certainly implies the tp (V) read-out is more real, and the size is just definitely wrong:
$ tar -tvaf v5root.tar.gz | grep boot$ -rw-r--r-- 3/1 492 1974-11-27 00:13 ./usr/mdec/mboot -rw-r--r-- 3/1 452 1974-11-27 00:13 ./usr/mdec/tboot
And indeed it was:
$ make tp && < v4tape ./tp t | tr -d '\0' | column -t c++ -fdebug-default-version=3 tp.cpp -o tp 2532864 bytes, 4947 blocks dldr[4] 0100777 6:1 228 Wed Oct 24 16:19:46 1973 block=63 cs=0x9653? OK dtf[3] 0100777 6:1 2406 Wed Oct 24 16:22:24 1973 block=64 cs=0xf72f? OK list[4] 0100777 3:1 74 Thu Oct 25 16:12:36 1973 block=69 cs=0x47ff? OK mboot[5] 0100777 3:1 492 Thu Oct 25 16:09:39 1973 block=70 cs=0x529c? OK mcopy[5] 0100777 6:1 432 Sun Nov 11 21:12:55 1973 block=71 cs=0xa025? OK rkf[3] 0100777 6:1 80 Sun Nov 11 21:04:51 1973 block=72 cs=0x0be5? OK tboot[5] 0100777 3:1 450 Thu Oct 25 16:10:40 1973 block=73 cs=0x527f? OK uboot[5] 0100777 6:1 512 Thu Oct 25 22:55:44 1973 block=74 cs=0xf34c? OK
At this point the reader should sardonically say that I fell for the same thing as the readtape author. And they are right to do so.
Extracting this is then trivial:
$ make tp && (cd v4tape.d; < ../v4tape ../tp x | tr -d '\0' | column -t); l v4tape.d/ c++ -fdebug-default-version=3 tp.cpp -o tp 2532864 bytes, 4947 blocks -rwxrwxrwx 228 1973-10-24 dldr -rwxrwxrwx 2,406 1973-10-24 dtf -rwxrwxrwx 74 1973-10-25 list -rwxrwxrwx 492 1973-10-25 mboot -rwxrwxrwx 432 1973-11-11 mcopy -rwxrwxrwx 80 1973-11-11 rkf -rwxrwxrwx 450 1973-10-25 tboot -rwxrwxrwx 512 1973-10-25 uboot
This confirms our suspicions: dtf self-IDs with set up to format on drive 0
and ends with 3 empty blocks of buffers (rkf does neither).
mcopy says 'p' for rp; 'k' for rk
and disk offset
/tape offset
/count
,
so it presumably copies from disk to tape (or vice versa) by address range, and not files.
Most curious of all: how can list be so small when it does basically the same thing as [tm]boot
(i.e.: read the tape as tp (V), parse the directory)?
It doesn't:
$ hd list 000000 07 01 3a 00 00 00 00 00 00 00 00 00 00 00 01 00 >..:.............< 000010 c1 15 0e 00 40 94 02 03 cd 09 fc 01 87 00 64 6c >....@.........dl< 000020 64 72 0a 64 74 66 0a 6c 69 73 74 0a 6d 62 6f 6f >dr.dtf.list.mboo< 000030 74 0a 6d 63 6f 70 79 0a 72 6b 66 0a 74 62 6f 6f >t.mcopy.rkf.tboo< 000040 74 0a 75 62 6f 6f 74 0a 00 00 >t.uboot...< 00004a
Much of the time, when analysing unix software, but especially from this era, you have to ask yourself: what's the simplest, stupidest, least-structured way of doing whatever? Given these givens:
the only hypotheses they allow is "the rootfs just follows the tp (V)" or "something more structured that hasn't been described so far". Please observe a recreation of my process at this juncture:
(uboot starts on block 74 and consists of 1 block; the first-past-the-end block is 75; the root filesystem is expected to start with an uboot (and, hilariously, the only way the rootfs's uboot differs from the tape's is that the starting 16 bytes have been removed)).
Thus, v4tape.75 has a 4000-block filesystem (just like v5root), so we can easily call it v4root.
However, it is a 4872-block file.
The remaining space is used for swap
(this is standard:
v5root has exactly the same layout,
and mkfs (VIII) says, of the filesystem size parameter:
Typically it will be the number of blocks on the device,
perhaps diminished
by space for swapping.
;
also, it starts with what can't more obviously be userspace memory:
$ tail -c +$(( 1 + 4000 * 512 )) v4root | hd 000000 09 f0 80 11 26 12 d0 0b 36 10 02 00 f7 09 90 00 >....&...6.......< 000010 96 25 00 0a 01 89 48 61 6e 67 75 70 00 00 51 75 >.%....Hangup..Qu< 000020 69 74 00 00 49 6c 6c 65 67 61 6c 20 69 6e 73 74 >it..Illegal inst< 000030 72 75 63 74 69 6f 6e 00 54 72 61 63 65 2f 42 50 >ruction.Trace/BP< 000040 54 20 74 72 61 70 00 00 49 4f 54 20 74 72 61 70 >T trap..IOT trap< 000050 00 00 45 4d 54 20 74 72 61 70 00 00 46 6c 6f 61 >..EMT trap..Floa< 000060 74 69 6e 67 20 65 78 63 65 70 74 69 6f 6e 00 00 >ting exception..< 000070 4b 69 6c 6c 65 64 00 00 42 75 73 20 65 72 72 6f >Killed..Bus erro< 000080 72 00 4d 65 6d 6f 72 79 20 66 61 75 6c 74 00 00 >r.Memory fault..< 000090 42 61 64 20 73 79 73 74 65 6d 20 63 61 6c 6c 00 >Bad system call.< 0000a0 77 09 c0 11 04 00 f7 15 1a 20 92 21 f7 09 d2 0f >w........ .!....<
).
Then, it's a simple mechanical matter of traversing the filesystem (which is itself so trivial that file system (V) gives a full implementation) and feeding it into libarchive (ironically, a little problematic):
$ make fs5tar && < v4root ./fs5tar > v4root.tar c++ -fdebug-default-version=3 -O -std=c++20 fs5tar.cpp -larchive -o fs5tar 2494464 bytes, 4872 blocks i-node blocks: 80 filesystem length: 4000 blocks freelist: [ 3261 3275 3386 3456 3440 3444 3272 3370 3356 3369 3435 3434 2889 2891 2890 3421 3419 3417 3415 3414 3413 3412 3411 3410 3409 3408 3407 3405 3404 3403 3402 3401 3400 3399 3398 3396 3395 3394 3393 3392 3391 3390 3418 3263 3389 3429 3427 3430 3428 3426 3425 3424 3265 ] super block mtime: Wed Jun 12 12:29:28 2019 i-node 1: ALLOCATED DIR 0755 links=9 3:1 size=160 1[0]: 1 .. 1[1]: 1 . 1[2]: 2 bin 1[3]: 59 dev 1[4]: 62 etc 1[5]: 75 lib 1[6]: 88 mnt 1[7]: 89 tmp 1[8]: 90 usr 1[9]: 460 unix[…]
The converter even supports multiply-linked files correctly
(but there are none: < v4root ./fs5tar 2>&1 > /dev/null | grep 'i-node' | grep -v 'links=1 ' only returns directories).
There are no errors in the filesystem (| grep 'i-node' | grep -v ALLOCATED is empty).
There are 535 i-nodes, and the largest i-node number is 627.
This implies there were some deletions, and indeed:
$ < v4root ./fs5tar 2>&1 > /dev/null | grep ': 0' 75[5]: 0 c2 75[14]: 0 nhsw.o 75[15]: 0 hsw.o 195[88]: 0 a.out 195[89]: 0 l.out 195[90]: 0 a.out 459[3]: 0 l.out 459[11]: 0 a.out 459[12]: 0 a.out 459[13]: 0 c.o
This confirms the installer is, as usual, "copy our rootfs, unfuck it manually for release". We can also confirm the lay-out of the root disk matches the configuration:
$ tar -xOaf v4root.tar /usr/sys/conf/conf.c
tar: Removing leading `/' from member names
/*
* Copyright 1974 Bell Telephone Laboratories Inc
*/
int (*bdevsw[])()
{
&nulldev, &nulldev, &rkstrategy, &rktab,
&nulldev, &tcclose, &tcstrategy, &tctab,
&tmopen, &tmclose, &tmstrategy, &tmtab,
0
};
int (*cdevsw[])()
{
&klopen, &klclose, &klread, &klwrite, &klsgtty,
&nulldev, &nulldev, &rkread, &rkwrite, &nodev,
&tmopen, &tmclose, &tmread, &tmwrite, &nodev,
&dhopen, &dhclose, &dhread, &dhwrite, &dhsgtty,
&pcopen, &pcclose, &pcread, &pcwrite, &nodev,
0
};
int rootdev {(0<<8)|0};
int swapdev {(0<<8)|0};
int swplo 4000;
int nswap 872;
and see how preposterous the multiple-hundred-kilobyte figure really was for the installer environment programs:
$ tar -tvaf v4root.tar /etc /unix tar: Removing leading `/' from member names drwxr-xr-x 3/1 0 1974-06-12 22:55 /etc/ -rwxr--r-- 3/1 446 1974-06-10 14:37 /etc/getty -rwxr-xr-x 3/1 2236 1974-06-10 14:37 /etc/glob -rwxr--r-- 3/1 1950 1974-06-10 14:37 /etc/init -rwxr--r-- 3/1 4136 1974-06-10 14:37 /etc/mkfs -rwxr--r-- 3/1 1800 1974-06-10 14:37 /etc/mknod -rwsr-xr-x 0/1 2078 1974-06-10 14:37 /etc/mount -rwxr-xr-x 3/1 220 1974-06-10 14:37 /etc/msh -rw-r--r-- 3/1 30 1974-06-10 14:37 /etc/passwd -rw-r--r-- 3/1 70 1974-06-10 14:37 /etc/rc -rw-r--r-- 3/1 56 1974-06-10 14:37 /etc/ttys -rwsr-xr-x 0/1 1990 1974-06-10 14:37 /etc/umount -rwxr-xr-x 3/1 32 1974-06-10 14:37 /etc/update -rw-rw-rw- 3/1 0 1974-06-12 22:30 /etc/mtab -rwxr-xr-x 3/1 814 1974-06-12 22:55 /etc/lpd -rw-r--r-- 3/1 27624 1974-06-13 00:50 /unix
(/unix (and some of its object files) is the second-oldest file on the system, beaten only by /usr/sys/conf, by 8 seconds).
So, what have we learned to-day?
Nit-pick? Correction? Improvement? Annoying? Cute? Anything?
Mail,
post, or open!