022. Turning the V4 unix dump into Utah_v4/

Sun, 21 Dec 2025 23:51:02 +0100; on the archive

Recently (late this week), a v4 unix tape was found and dumped. The relevant artifacts are as follows; UNIX V4 tape from University of Utah (raw):

The size of the tape binary is very close to the V5 rootfs dump, so this is as-expected. The standard procedure is to turn the tape into a tarball for interactive use.

Easy enough: this will be a tp (V) tape (both per boot procedures (VII) and because there's just no alternative yet). The format is so trivial, in fact, that reading it amounts to 3 lines, which tell us the conte-

$ make tp && < analog.tap ./tp t
c++ -fdebug-default-version=3 tp.cpp -o tp
2572452 bytes, 5025 blocks
�dldr[0] 0:0
S�dtf[0] 0:0
/�list[0] 0:0
�Gmboot[0] 0:0
�Rmcopy[0] 0:0
%�rkf[0] 0:0
�
 tboot[0] 0:0
Ruboot[0] 0:0
L�[0] 0:0
[0] 0:0

Hm. Clearly not.

A raw image of a rootfs would be unprecedented for distribution, but possible? file system (V) wants some more hand-holding. But not much more just to read the super block:

$ make fs5tar && < v5root ./fs5tar
c++ -fdebug-default-version=3 fs5tar.cpp -o fs5tar
2494464 bytes, 4872 blocks
i-node blocks: 80
filesystem length: 4000 blocks
freelist: [ 3595 3594 3593 3592 3591 3590 3589 3588 3587 3586 3585 3584 3583 3582 3581 3580 3579 3578 3577 3576 3575 3574 3573 3572 3571 3570 3569 3568 3567 3566 3565 3564 3563 3560 3559 3558 3554 3553 3552 231 232 3015 3542 3541 3556 3507 3529 3550 3492 3538 ]
$ make fs5tar && < analog.tap ./fs5tar
make: 'fs5tar' is up to date.
2572452 bytes, 5025 blocks
i-node blocks: 0
filesystem length: 496 blocks
fs5tar: fs5tar.cpp:39: int main(int, const char *const *): Assertion `super_block->nfree <= 100' failed.
Aborted

It's right about the v5 dump so it has to be right about this one as well. What could this mean? Quoth readtape:

The output of the decoding can include: - a log file - multiple binary files of the reconstructed data separated at filemarks, or - one SIMH .tap file that encodes data and filemarks (see http://simh.trailing-edge.com/docs/simh_magtape.pdf) - […]

So, easy enough, both conceptually and mechanically. Conceptually, the [redacted redacted] part above was "in a hard-to-use custom format that we didn't say what it was". Mechanically re-run readtape on its input .tbin to re-create the output file. (These are binary tapes, so they're more like long disks than like modern file-oriented tapes.) It's unclear why this isn't part of the dump (so one has to assume general contempt toward the consumer). But never mind all that.

Naturally, the readtape repository is humongous even at --depth 1 (340M examples/, which leads to a 220M .git), the distribution is "there's 3 programs in this 16-file directory" (there is no build script of any sort), and it doesn't build at all. But that's standard for academic software and easy to work around.

What's harder to work around is that it just isn't a valid readtape .tbin:

$ ./readtape -outf=gaming analog.tbin
this is readtape version 3.16, compiled on Dec 21 2025 13:41:53, running on Sun Dec 21 15:09:29 2025
  command line: ./readtape -outf=gaming analog.tbin
  this is a little-endian computer
  For more information, see https://github.com/LenShustek/readtape
reading file "analog.tbin"
the output files will be "gaming.xxx"
.tbin file header:
***FATAL ERROR: bad .tbin hdr size: 240, not 300

csvtbin says the same thing (except it segfaults if you give it the .tbin suffix). This error means that this file supposes to be the right version, but just… isn't. The header isn't marked as packed (there's no padding anyway), and bears

//**** beware that changing these definitions may invalidate existing files!

as well as three members like

         struct tm time_written;    // when the analog tape data was written (9 integers)

which is factually wrong (a struct tm is at least 9 ints). Every unix I've looked at except Solaris (glibc/musl, {Net,Open,Free}BSD) has a long and const char * as well. struct tm instead of time_t here truly baffles the mind.

So: we don't know what system this was analysed on initially, and it's entirely possible it's a modified readtape. If it's an unmodified readtape, then each struct tm must have twenty more bytes on the original host than on amd64 glibc! One again wonders why you wouldn't include what you dumped with (in the form of a versioned readtape spec and/or the host metadata).

readtape reads either a CSV or a .tbin, and per analog.log and analog.csvtbin.log, the dumpers used the CSV and then turned it into the .tbin for distribution. This renders all inputs to readtape (which is the entire dump except analog.tap and maybe v4.sal? if you want to come up with a new CSV for the samples) useless for third-party analysis. This is what you want from your once-in-a-lifetime discovery.

So: analog.tap. The PDF says there's a library but it's for SIMH plugin use only. Thankfully, dpkg -L simh | grep dump on bookworm shows a mtdump(1), which understands and turns analog.tap into blocks. The container includes two errors:

$ mtdump analog.tap | grep -iA2 err
Error marker at record 4281
Obj 4281, position 2225600, record 4281, length = 511 (0x1FF)
Obj 4282, position 2226120, record 4282, length = 512 (0x200)
--
Error marker at record 4367
Obj 4367, position 2270320, record 4367, length = 512 (0x200)
Obj 4368, position 2270840, record 4368, length = 512 (0x200)

but those are the only goofy records (also an empty Reserved Marker at the end):

$ mtdump analog.tap | grep -o 'position [^,]*' | while read -r _ off; do od -An -t d4x4 -N 4 -j $off < analog.tap; done | paste - - | uniq -c
   4280 512 00000200
      1 -2147483137 800001ff
     85 512 00000200
      1 -2147483136 80000200
    580 512 00000200
      2 0 00000000

So the file is degenerate enough that we can strip off the top nibble, and dump everything straight (re-using the padding byte for the short record):

$ make simh2raw && < analog.tap ./simh2raw > v4tape
$ file -L v5root v4tape analog.tap
v5root: PDP-11 executable
v4tape: PDP-11 executable
analog.tap: Atari DEGAS Elite bitmap 640 x 400 x 2, color palette 0000 0701 dc01 0000 0000 ...
$ ls -lL v5root v4tape analog.tap
-rw-r--r-- 2,494,464 v5root
-rw-r--r-- 2,532,864 v4tape
-rw-r--r-- 2,572,452 analog.tap

a vast improvement! and it's even a multiple of 512 this time! So, once more, as a tp (V) (this time taking care to work the double-precision (32-bit) timestamp's endianness to get a reasonable result; the epoch matches ours by this point):

$ make tp && < v4tape ./tp t | tr -d '\0' | column -t
c++ -fdebug-default-version=3    tp.cpp   -o tp
2532864 bytes, 4947 blocks
name      mode   uid:gid size   mtime                         tape address checksum
dldr[4]   0100777  6:1  58368   Wed  Oct  24  16:19:46  1973  block=63  cs=0x9653?  OK
dtf[3]    0100777  6:1  615936  Wed  Oct  24  16:22:24  1973  block=64  cs=0xf72f?  OK
list[4]   0100777  3:1  18944   Thu  Oct  25  16:12:36  1973  block=69  cs=0x47ff?  OK
mboot[5]  0100777  3:1  125952  Thu  Oct  25  16:09:39  1973  block=70  cs=0x529c?  OK
mcopy[5]  0100777  6:1  110592  Sun  Nov  11  21:12:55  1973  block=71  cs=0xa025?  OK
rkf[3]    0100777  6:1  20480   Sun  Nov  11  21:04:51  1973  block=72  cs=0x0be5?  OK
tboot[5]  0100777  3:1  115200  Thu  Oct  25  16:10:40  1973  block=73  cs=0x527f?  OK
uboot[5]  0100777  6:1  131072  Thu  Oct  25  22:55:44  1973  block=74  cs=0xf34c?  OK

boot procedures (VII) elucidates the purpose of the ?boots — loading unix from tape, DECtape, and file system (V), respectively — none of the others are mentioned in the manual. One has to assume rkf and dtf will format a rk (IV)/DECtape in some way, mcopy can copy (tp (V) files?) between tapes, and list would list the files on a tp (V) tape. Which leaves dldr.

And also that this is obviously wrong. The times seem roughly plausible (if a little old, but sure, bootloader tech doesn't really change much) but I did also massage them to get a roughly-plausible result. What's more concerning is that these are all humongous, totalling 1.2M, allthewhile the manual complains to high heavens about how little space uboot has (and 128k isn't really little here) but their blocks are located contiguously, as-if most of them were no larger than a block.

Maybe this is a mt (I)/tap (I, V), à la V3? V4 still ships a mt (I) and tap (1) (but in manx/), and backwards-compatible install media would be unprecedented but not impossible. The format is incompatible (except for a name-only listing and checksums). Also, in V3, per file system (V):

There are two words with the calendar time (measured since 00:00 Jan 1, 1972); […]. All the times are measured in sixtieths of a second.

so:

$ make tap && < v4tape ./tap t | tr -d '\0' | column -t
c++ -fdebug-default-version=3    tap.cpp   -o tap
2532864 bytes, 4947 blocks
name      mode  uid   size mtime                                tape address unused    checksum
dldr[4]   0377  129:  262  Wed  Jun  21  23:36:48  1972  48/60  block=1835  \x92\xff?  cs=0x9653?  OK
dtf[3]    0377  129:  262  Thu  Dec  30  00:53:36  1976  36/60  block=1836  0@         cs=0xf72f?  OK
list[4]   0377  129:  259  Sat  Feb  26  04:07:44  1972  44/60  block=1837  dOE        cs=0x47ff?  OK
mboot[5]  0377  129:  259  Mon  Jan  8   05:35:12  1973  12/60  block=1837  \xb3NF     cs=0x529c?  OK
mcopy[5]  0377  129:  262  Thu  Nov  23  17:19:12  1972  12/60  block=1859  G\xffG     cs=0xa025?  OK
rkf[3]    0377  129:  262  Wed  Mar  1   17:21:20  1972  20/60  block=1859  c\xfdH     cs=0x0be5?  OK
tboot[5]  0377  129:  259  Thu  Dec  7   09:00:00  1972  00/60  block=1837  \xf0NI     cs=0x527f?  OK
uboot[5]  0377  129:  262  Tue  Jan  23  09:40:32  1973  32/60  block=1837  \xe0\xadJ  cs=0xf34c?  OK

This isn't much better: the UID is huge, the sizes are too similar and repeat, the block addresses repeat, there's a 1976 date, and all unused sections have something (in fact, the unused blob part seems to be 17 bytes long, not 20. and tp (I) has a 16-byte unused blob preceded by a little-endian address, so it's more like that).

So far, I've ignored non-self-describing methods for analysing the tape. However, there's at least 3 files that start in a way we know: all the ?boots have a common almost-initial sequence, and we have a reference version in block 0 (this also applies, more roughly, to the V5 dump):

$ hd v4tape -N 512
000000 07 01 dc 01 00 00 00 00 00 00 00 00 00 00 01 00  >................<
000010 c6 15 00 be c4 15 bc bf 81 11 c1 21 0b 86 00 0a  >...........!....<
000020 17 22 07 01 02 02 c0 15 10 00 11 14 57 20 b4 bf  >."..........W ..<
000030 fc 87 4e 00 f7 09 10 01 c5 15 44 bf c0 15 3d 00  >..N.......D...=.<
000040 cd 09 01 11 f5 09 02 00 17 20 0a 00 0c 03 17 20  >......... ..... <

all ?boots I've seen have a WN sequence, and the V4 ones have a ND sequence as well. So:

$ hd v4tape | grep -F N.......D
000030 fc 87 4e 00 f7 09 10 01 c5 15 44 bf c0 15 3d 00  >..N.......D...=.<
008c30 fc 87 4e 00 f7 09 10 01 c5 15 44 bf c0 15 3d 00  >..N.......D...=.<
0cba30 fc 87 4e 00 f7 09 10 01 c5 15 44 bf c0 15 3d 00  >..N.......D...=.<

So there's a ?boot in block 0, block 70, and block 1629. Which certainly implies the tp (V) read-out is more real, and the size is just definitely wrong:

$ tar -tvaf v5root.tar.gz | grep boot$
-rw-r--r-- 3/1             492 1974-11-27 00:13 ./usr/mdec/mboot
-rw-r--r-- 3/1             452 1974-11-27 00:13 ./usr/mdec/tboot

And indeed it was:

$ make tp && < v4tape ./tp t | tr -d '\0' | column -t
c++ -fdebug-default-version=3    tp.cpp   -o tp
2532864 bytes, 4947 blocks
dldr[4]   0100777  6:1  228   Wed  Oct  24  16:19:46  1973  block=63  cs=0x9653?  OK
dtf[3]    0100777  6:1  2406  Wed  Oct  24  16:22:24  1973  block=64  cs=0xf72f?  OK
list[4]   0100777  3:1  74    Thu  Oct  25  16:12:36  1973  block=69  cs=0x47ff?  OK
mboot[5]  0100777  3:1  492   Thu  Oct  25  16:09:39  1973  block=70  cs=0x529c?  OK
mcopy[5]  0100777  6:1  432   Sun  Nov  11  21:12:55  1973  block=71  cs=0xa025?  OK
rkf[3]    0100777  6:1  80    Sun  Nov  11  21:04:51  1973  block=72  cs=0x0be5?  OK
tboot[5]  0100777  3:1  450   Thu  Oct  25  16:10:40  1973  block=73  cs=0x527f?  OK
uboot[5]  0100777  6:1  512   Thu  Oct  25  22:55:44  1973  block=74  cs=0xf34c?  OK

At this point the reader should sardonically say that I fell for the same thing as the readtape author. And they are right to do so.

Extracting this is then trivial:

$ make tp && (cd v4tape.d; < ../v4tape ../tp x | tr -d '\0' | column -t); l v4tape.d/
c++ -fdebug-default-version=3    tp.cpp   -o tp
2532864 bytes, 4947 blocks
-rwxrwxrwx   228 1973-10-24  dldr
-rwxrwxrwx 2,406 1973-10-24  dtf
-rwxrwxrwx    74 1973-10-25  list
-rwxrwxrwx   492 1973-10-25  mboot
-rwxrwxrwx   432 1973-11-11  mcopy
-rwxrwxrwx    80 1973-11-11  rkf
-rwxrwxrwx   450 1973-10-25  tboot
-rwxrwxrwx   512 1973-10-25  uboot

This confirms our suspicions: dtf self-IDs with set up to format on drive 0 and ends with 3 empty blocks of buffers (rkf does neither). mcopy says 'p' for rp; 'k' for rk and disk offset/tape offset/count, so it presumably copies from disk to tape (or vice versa) by address range, and not files. Most curious of all: how can list be so small when it does basically the same thing as [tm]boot (i.e.: read the tape as tp (V), parse the directory)? It doesn't:

$ hd list
000000 07 01 3a 00 00 00 00 00 00 00 00 00 00 00 01 00  >..:.............<
000010 c1 15 0e 00 40 94 02 03 cd 09 fc 01 87 00 64 6c  >....@.........dl<
000020 64 72 0a 64 74 66 0a 6c 69 73 74 0a 6d 62 6f 6f  >dr.dtf.list.mboo<
000030 74 0a 6d 63 6f 70 79 0a 72 6b 66 0a 74 62 6f 6f  >t.mcopy.rkf.tboo<
000040 74 0a 75 62 6f 6f 74 0a 00 00                    >t.uboot...<
00004a

Much of the time, when analysing unix software, but especially from this era, you have to ask yourself: what's the simplest, stupidest, least-structured way of doing whatever? Given these givens:

  1. we have an installation tape
  2. to be installed, a unix rootfs needs to be on a disk
  3. the tape contains a bootloader and a suite of 2 actual utility programs:
    • for initialising the disk
    • for copying between the tape and disk
  4. the file system on the tape has a well-defined end (after the last file's data), which does not correspond to the end of the media

the only hypotheses they allow is "the rootfs just follows the tp (V)" or "something more structured that hasn't been described so far". Please observe a recreation of my process at this juncture:

(uboot starts on block 74 and consists of 1 block; the first-past-the-end block is 75; the root filesystem is expected to start with an uboot (and, hilariously, the only way the rootfs's uboot differs from the tape's is that the starting 16 bytes have been removed)).

Thus, v4tape.75 has a 4000-block filesystem (just like v5root), so we can easily call it v4root. However, it is a 4872-block file. The remaining space is used for swap (this is standard: v5root has exactly the same layout, and mkfs (VIII) says, of the filesystem size parameter: Typically it will be the number of blocks on the device, perhaps diminished by space for swapping.; also, it starts with what can't more obviously be userspace memory:

$ tail -c +$(( 1 + 4000 * 512 )) v4root | hd
000000 09 f0 80 11 26 12 d0 0b 36 10 02 00 f7 09 90 00  >....&...6.......<
000010 96 25 00 0a 01 89 48 61 6e 67 75 70 00 00 51 75  >.%....Hangup..Qu<
000020 69 74 00 00 49 6c 6c 65 67 61 6c 20 69 6e 73 74  >it..Illegal inst<
000030 72 75 63 74 69 6f 6e 00 54 72 61 63 65 2f 42 50  >ruction.Trace/BP<
000040 54 20 74 72 61 70 00 00 49 4f 54 20 74 72 61 70  >T trap..IOT trap<
000050 00 00 45 4d 54 20 74 72 61 70 00 00 46 6c 6f 61  >..EMT trap..Floa<
000060 74 69 6e 67 20 65 78 63 65 70 74 69 6f 6e 00 00  >ting exception..<
000070 4b 69 6c 6c 65 64 00 00 42 75 73 20 65 72 72 6f  >Killed..Bus erro<
000080 72 00 4d 65 6d 6f 72 79 20 66 61 75 6c 74 00 00  >r.Memory fault..<
000090 42 61 64 20 73 79 73 74 65 6d 20 63 61 6c 6c 00  >Bad system call.<
0000a0 77 09 c0 11 04 00 f7 15 1a 20 92 21 f7 09 d2 0f  >w........ .!....<

).

Then, it's a simple mechanical matter of traversing the filesystem (which is itself so trivial that file system (V) gives a full implementation) and feeding it into libarchive (ironically, a little problematic):

$ make fs5tar && < v4root ./fs5tar > v4root.tar
c++ -fdebug-default-version=3 -O -std=c++20 fs5tar.cpp -larchive -o fs5tar
2494464 bytes, 4872 blocks
i-node blocks:     80
filesystem length: 4000 blocks
freelist:        [ 3261 3275 3386 3456 3440 3444 3272 3370 3356 3369 3435 3434 2889 2891 2890 3421 3419 3417 3415 3414 3413 3412 3411 3410 3409 3408 3407 3405 3404 3403 3402 3401 3400 3399 3398 3396 3395 3394 3393 3392 3391 3390 3418 3263 3389 3429 3427 3430 3428 3426 3425 3424 3265 ]
super block mtime: Wed Jun 12 12:29:28 2019
i-node 1: ALLOCATED DIR 0755 links=9 3:1 size=160
  1[0]:     1 ..
  1[1]:     1 .
  1[2]:     2 bin
  1[3]:    59 dev
  1[4]:    62 etc
  1[5]:    75 lib
  1[6]:    88 mnt
  1[7]:    89 tmp
  1[8]:    90 usr
  1[9]:   460 unix
[…]

The converter even supports multiply-linked files correctly (but there are none: < v4root ./fs5tar 2>&1 > /dev/null | grep 'i-node' | grep -v 'links=1 ' only returns directories). There are no errors in the filesystem (| grep 'i-node' | grep -v ALLOCATED is empty). There are 535 i-nodes, and the largest i-node number is 627. This implies there were some deletions, and indeed:

$ < v4root ./fs5tar 2>&1 > /dev/null | grep ':     0'
  75[5]:     0 c2
  75[14]:     0 nhsw.o
  75[15]:     0 hsw.o
  195[88]:     0 a.out
  195[89]:     0 l.out
  195[90]:     0 a.out
  459[3]:     0 l.out
  459[11]:     0 a.out
  459[12]:     0 a.out
  459[13]:     0 c.o

This confirms the installer is, as usual, "copy our rootfs, unfuck it manually for release". We can also confirm the lay-out of the root disk matches the configuration:

$ tar -xOaf v4root.tar /usr/sys/conf/conf.c
tar: Removing leading `/' from member names
/*
 *	Copyright 1974 Bell Telephone Laboratories Inc
 */

int	(*bdevsw[])()
{
	&nulldev,	&nulldev,	&rkstrategy, 	&rktab,
	&nulldev,	&tcclose,	&tcstrategy, 	&tctab,
	&tmopen,	&tmclose,	&tmstrategy, 	&tmtab,
	0
};

int	(*cdevsw[])()
{
	&klopen,   &klclose,   &klread,   &klwrite,   &klsgtty,
	&nulldev,  &nulldev,   &rkread,   &rkwrite,   &nodev,
	&tmopen,   &tmclose,   &tmread,   &tmwrite,   &nodev,
	&dhopen,   &dhclose,   &dhread,   &dhwrite,   &dhsgtty,
	&pcopen,   &pcclose,   &pcread,   &pcwrite,   &nodev,
	0
};

int	rootdev	{(0<<8)|0};
int	swapdev	{(0<<8)|0};
int	swplo	4000;
int	nswap	872;

and see how preposterous the multiple-hundred-kilobyte figure really was for the installer environment programs:

$ tar -tvaf v4root.tar /etc /unix
tar: Removing leading `/' from member names
drwxr-xr-x 3/1               0 1974-06-12 22:55 /etc/
-rwxr--r-- 3/1             446 1974-06-10 14:37 /etc/getty
-rwxr-xr-x 3/1            2236 1974-06-10 14:37 /etc/glob
-rwxr--r-- 3/1            1950 1974-06-10 14:37 /etc/init
-rwxr--r-- 3/1            4136 1974-06-10 14:37 /etc/mkfs
-rwxr--r-- 3/1            1800 1974-06-10 14:37 /etc/mknod
-rwsr-xr-x 0/1            2078 1974-06-10 14:37 /etc/mount
-rwxr-xr-x 3/1             220 1974-06-10 14:37 /etc/msh
-rw-r--r-- 3/1              30 1974-06-10 14:37 /etc/passwd
-rw-r--r-- 3/1              70 1974-06-10 14:37 /etc/rc
-rw-r--r-- 3/1              56 1974-06-10 14:37 /etc/ttys
-rwsr-xr-x 0/1            1990 1974-06-10 14:37 /etc/umount
-rwxr-xr-x 3/1              32 1974-06-10 14:37 /etc/update
-rw-rw-rw- 3/1               0 1974-06-12 22:30 /etc/mtab
-rwxr-xr-x 3/1             814 1974-06-12 22:55 /etc/lpd
-rw-r--r-- 3/1           27624 1974-06-13 00:50 /unix

(/unix (and some of its object files) is the second-oldest file on the system, beaten only by /usr/sys/conf, by 8 seconds).

So, what have we learned to-day?


Nit-pick? Correction? Improvement? Annoying? Cute? Anything? Mail, post, or open!


Creative text licensed under CC-BY-SA 4.0, code licensed under The MIT License.
This page is open-source, you can find it at GitHub, and contribute and/or yell at me there.
Like what you see? Consider giving me a follow over at social medias listed here, or maybe even a sending a buck liberapay donate or two patreon my way if my software helped you in some significant way?
Compiled with Clang 21's C preprocessor on 23.12.2025 13:58:18 UTC from src/blogn_t/022-Utah_v4.html.pp.
See job on builds.sr.ht.
RSS feed