014. PID 0 in V1 UNIX (and V4 nsys, and the PDP-7 proto-UNIX)

Mon, 10 Jun 2024 01:04:14 +0200

Contents

    1. The UNIX Programmer's Manual (&c.)
    2. nsys unix (the V4 what isn't)
    3. that funny PDP-7 proto-unix

# The UNIX Programmer's Manual (&c.)

David Anderson's What is PID 0? (on mastussy) analyses the popular misconceptions (or, indeed, one guy's wikipedia edit) of if, and what, PID 0 is, but stops its retrospective at Version 4 unix because it's the first one written in C. This, in many ways, is like stopping a retrospective on the Great War when the British joined because they talk weird in Sarajevo. Thankfully, I talk exactly as weird, and am similarly ready to shoot a prince. Has this protracted {slavic/english} {unix/war} {franz-ferdinand/englishman} analogy overstayed its welcome yet.

To rectify this, and analyse it forward rather than in reverse, we can take advantage of unix72 (or unix-jun72 (but the one and only SVN dump tarball calls it unix72/, as does a mirror (note that it says it's a fork but it actually appears to have SVN-imported commits exclusively, unlike the "original" which has been damaged; it's all bad)); the more real mirror is The Unix Heritage Society's (one of them; the TUHS mirror structure is sociopathic) which reproduces said dump as svntree-20081216.tar.gz) which transcribes Jim DeFelice's 1972-06-20 Preliminary Release of UNIX Implementation Document, which itself includes

  1. a driver (in the form of a patch) for the T4002A graphics terminal (2 pgs)
  2. unix (11 segments (u0, …, un, …, u9, ux), described in F; 80 pgs)
  3. /bin/sh (7 pgs)
  4. /etc/init (4 pgs)
  5. System Overview, which is all the shit they couldn't stuff elsewhere (17 pgs)
  6. Data Base Item Descriptions, or, in modern parlance, all the kernel variables (12 pgs)
  7. effectively manual sexion 9 (kernel subroutines) (10 subsexions, internally counted as Unn, matching the H.nn segments; 156 pgs)

H.0–H.9 is a re-print of our current UNIX documentation as of 1972-04-03, also by Jim DeFelice. Both are redistributed by the TUHS. This is a different Jim DeFelice than the famous author (but I'm basing that entirely off the fact that the author would've been 11 when he started at Bell. maybe americans allow this).

Aptly, F, 1. Overview says that

There are three major portions of UNIX: the file system, the process control system, and the rest.

and

It has been mentioned parenthetically that UNIX is not very modular. Its lack of modularity is reflected in this document. Therefore (to paraphrase Fenichel and McIlroy referring to their description of TMGL) no single order of reading can be recommended; instead a chimneying technique is suggested, climbing not one wall at a time, but all simultaneously.

but, besides a truly astounding slew of typos (at best it's the "process wail facility", at worst it's gibberish) the sexion nevertheless contains relevant information on process lifetime and syscall and scheduler design, both as a general overview of fundamental UNIX semantics and as implementation minutiae.

For our case, the most interesting part of scheduler design is that the document implies there is none: The system can legitimately be entered only by some sort of trap that is, sys, further backed up by At sysret, a check is made to determine if the user's time quantum ran out during his execution in the system. If so, tswap is called to give another user a chance to run. which implies a per-user quantum-overflow-only (this is a period-accurate way of spelling "time slice") re-scheduling (to another user) only at syscalls, but there is very much a timer interrupt:

u0
interrupt vectors
. = orig+60
       ttyi;240 / interrupt vector tty in       ; processor level 5
       ttyo;240 / interrupt vector tty out
       ppti;240 /                  punch papertape in
       ppto;240 /                  punch papertape out
       clock;340 / clock interrupt vector       ; processor level 7
 . = orig+200
/     lpto; 240  line printer interrupt     ; processor level 5 (future)
 . = orig+204
       drum;300 / drum interrupt         ; processor level 6
u4
clock:
clock: / interrupt from 60 cycle clock
       mov     r0,-(sp) / save r0
       tst     *$lks / restart clock?
       mov     $s.time+2,r0 / increment the time of day
       inc     (r0)

       mov     $uquant,r0 / decrement user time quantum
       decb    (r0)
       bge     1f / if less than 0
       clrb    (r0) / make it 0
1: / decrement time out counts return now if priority was not 0
       cmp     4(sp),$200 / ps greater than or equal to 200
       bge     2f / yes, check time outs
       tstb    (r0) / no, user timed out?
       bne     1f / no
       cmpb    sysflg,$-1 / yes, are we outside the system?
       bne     1f / no, 1f
       mov     (sp)+,r0 / yes, put users r0 in r0
       sys     0 / sysrele
       rti

Syntactic informer: addresses in the form 1f, 2b, &c. Indicate the address of "next label 1, forward" and "next label 2, backward". Also, notice how there's a $ sometimes in front of numbers or addresses? This means "this is a number/pointer, not a variable to dereference". Usually. This is vibes-based. Also also, I gave up trying to highlight this shit (sorry).

Which corresponds to sys rele (II) (p. 128) in the UNIX Programmer's Manual:

11/3/71
sys rele (ii)
name
rele -- release processor
synopsis
sys rele / rele = 0; not in assembler
description
This call causes the process to be swapped out immediately if another process wants to run. Its main reason for being is internal to the system, namely to implement timer-runout swaps. However, it can be used beneficially by programs which wish to loop for some reason without consuming more processor time than necesary.

and indeed sysrele just calls tswap, which has little to do with the user (indeed u.uid isn't mentioned), and resets the timer thusly:

u3
tswap:
       movb    $30.,uquant / initialize process time quantum
       rts     r0 / return

so, a "process time", not a "user time". Indeed, we can attribute this confusion to the author's misconstruance of "user" (and the various u. variables) in the comments as referring to the user (as in a system user) where it's usually used as "current user-space process unix is interrupting".

So, if I'm reading this right, then a normal-priority process (I cut off a lot of other bollocks from clock) will get to run for around 30 ticks / 60Hz = 500ms without making any syscalls before being considered for swapping out, while a high-priority process will only be considered for swapping out after it makes a syscall.

Now that we've established a bidirexionally-adversarial rapport with the text, it's time to really analyse it. Note that to assemble the unix72 scans back into files, you can run tools/rebuild, and this will also apply patches from patches/core/; these wouldn't materially affect this analysis, but it is nevertheless performed on the unpatched source.

There is one interesting patch, not applied by default: cold.patch. This, shockingly, sets the cold flag in u0, which creates the root filesystem on boot. This is inextricably linked with the

11/3/71
boot procedures (vii)
name
bos, maki, rom, vcboot, msys, et al
synopsis
--
description
On the RF disk (this is the fixed disk with the root filesystem on it), the highest 16K words are reserved for stand-alone programs. These 16K words are allocated as follows: bos (1K) Warm UNIX (6K) Cold UNIX (6K) unassigned (6K) The UNIX read only memory (ROM) is home cut[?] with 2 programs of 16 words each. The first reads bos from the RF disk into core [and transfers to it]. The program bos (Bootstrap Operating System) examines the console switchs and executes one of several internal programs . The following settings are currently recognized: 173700 73700 Will read Warm UNIX from the RF into core location 0 and transfer to 400, 1 Will read Cold UNIX from the RF into core location 0 and transfer to 400, 2 Will read the unassigned 3K program into core location 0 and transfer to 400, Thus we come to the UNIX warm boot procedure: put 173700 into the switches, push load address and then push start. The alternate switch setting of 73700 that will load warm UNIX is used as a signal to bring up a single user system for special purposes. See /etc/init. Cold boots can be accomplished with the Cold UNIX program, but they're not. Thus the Cold UNIX slot on the RF may have any program desired. This slot is, however, used during a cold boot. How to install the "UNIX INIT DECtape", what to press to run it, and what programs are on it, is explained in grueling detail here.

There is so much more, and it is all genuinely interesting and on page 195. Also, all programmers being illiterate is clearly not a novel phenomenon.

It was relatively obvious that the entry point was 400 just the first bit of code in u0, but it's good to know this for sure. That said: let's initialise.

u0
constants
nproc = 16. / number of processes
nfiles = 50.
ntty = 8+1
nbuf = 6
 .if cold / ignored if cold = 0
nbuf = 2
.endif

core = orig+40000  / specifies beginning of user's core
ecore = core+20000 / specifies end of user's core (4096 words)
u0
entry point
. = orig+400
/ copy in transfer vectors

	mov    $ecore,sp / put pointer to ecore in the stack pointer
	jsr    r0,copyz; 0; 14 / clear locations 0 to 14 in core
	mov    $4,r0
	clr    r1
	mov    r0,(r1)+ / put value of 4 into location 0
	mov    r0,(r1)+ / put value of 4 into location 2

Historical and syntactic reminder: nothing is hexadecimal yet. Numbers that end with a full stop are decimal (there are 16 processes that can have 50 file descriptions open), everything else is octal (the beginning of user's core (smallest userspace address) is orig+40000(8) = 0+16384(10), and the maximal userspace address is thus core+20000(8) = 16384(10)+8192(10), thus the userspace has ecorecore = 8192(8) bytes = 4096 words, which is congruent with the comment). But there are 8+1=9 ttys, even though 8 is not in octal. Wondrous.

This bit doesn't really do much, except initialise three interrupt vectors (one shown) which are overridden by the a.out header (and thus commented out). This corresponds to pages in section E.0 in the PDF, which is physical page 6 (and companion section H.0 which is physical page 126). It's convenient to follow along (indeed, these tend to be annotated by hand (with a pen) with relevant metatext).

u0
memory initialisation
/ clear core
	.if cold / ignored if cold = 0
	halt / halt before initializing rf file system; user has
             / last chance to reconsider
	.endif

	jsr    r0,copyz; systm; ecore / clear locations systm to ecore
	mov    $s.chrgt+2,clockp / intialize clockp
/ allocate tty buffers; see H.0 for description

/ allocate disk buffers; see H.0 for description

/ set devices to interrupt

/ set up time out subroutines

/ free all character blocks; see H.0 for description

/ set up drum swap addresses; see H.0 for description

/ free rest of drum
	.if cold

/ zero i list

	.endif

systm is the start of the kernel variable block (all of which reside in ux, and systm is actually the root filesystem superblock), and is the last thing before core.

Note that s.chrgt is just a variable (symbol) called s.chrgt. Given the $, the mov can be expressed as (in C parlance) "clockp = (char *)(&s.chrgt) + 2".

Do not be confused by the convenient naming: r0 is just a variable. The PDP-11 has no registers.

The drum refers not to an actual drum but the RF fixed disk where the root filesystem is (the one from boot procedures (VII) with the kernels and BPP on it). But they're really adamant about calling it a "drum" and contrasting it to disks (as in the 2.5MB "RK" removable disk with /usr), and only doing so in the kernel source and this Implementation Document that I deadass thought there was another device, and not shown to the userspace. The nomenclature is so ancient it's like it's from another dimension. But yes, the root disk has 1024 512-byte blocks. The top 64 (32kB) are set aside for storing UNIX itself and this is the bit with the kernels and BPP in it, then the processes get 17 blocks each (8.5kB; this is congruent with having 8kB of userspace RAM + 1 accounting block consisting of the user label (which includes all u. variables)). Thus we can eke out the formatting of the fixed disk as [/: 344kB; swap: 16·8.5kB=136kB; bootloader area: 32kB].

The next two bits effectively do mkfs /dev/rf0 / (on the cold kernel) by zeroing most of it out and configuring the superblock accordingly, with user data reaching up to block 687 (which matches 1024−64−17·(nproc=16) so the entire disk is accounted for).

u0
magick
/ make current program a user

	mov    $41.,r0 / rootdir set to 41 and never changed
	mov    r0,rootdir / rootdir is i-number of root directory
	mov    r0,u.cdir / u.cdir is i-number of process current directory
	mov    $1,r0
	movb   r0,u.uno / set process table index for this process to 1
	mov    r0,mpid / initialize mpid to 1
	mov    r0,p.pid / p.pid identifies process
	movb   r0,p.stat / process status = 1 i.e., active
                         /                = 0 free
	.if cold         /                = 2 waiting for a child to die
                         /                = 3 terminated but not yet waited
                         /                  for

This is where the magic happens. This is what you came here for. Let's prototype this in C for the gamers:

#define NPROC 16
extern uint16_t rootdir;
extern uint16_t u.cdir;
extern uint8_t  u.uno;
extern uint16_t mpid;
extern uint16_t p.pid[NPROC];
extern uint8_t  p.stat[NPROC];

// make current program a user

               // rootdir set to 41 and never changed
rootdir = 41;  // rootdir is i-number of root directory
u.cdir  = 41;  // u.cdir is i-number of process current directory

u.uno     = 1; // set process table index for this process to 1
mpid      = 1; // initialize mpid to 1
p.pid[0]  = 1; // p.pid identifies process
p.stat[0] = 1; // process status = 1 i.e., active
               //                = 0 free
#if COLD       //                = 2 waiting for a child to die
               //                = 3 terminated but not yet waited for

This means, in order:

rootdir is just a constant, but defined as a symbol. / is defined as being i-node 41, and this sets that. (We have zeroed the whole variable block in the last step.)

u.cdir is the i-node of the process's working directory, on device u.cdev (previously zeroed, and 0 = RF), so the process we are becoming is now in /.

u.uno is the 1-based index into the process table (p.) of the current process, and 0 is used as a sentinel to mean "no process, you must swap me out".

mpid is the global PID counter. On sys fork (II), a new process is assigned p.pid[…] = ++mpid;.

proc (under which all p. variables are found), is the process table (though notice how it's intrusively structured as arrays per-field rather than an array of objects; remember: no objects). Thus, p.pid[n] and p.status[n] are the PID and status of process n.

Thus, we can model the state before as:

and the state after as

and this is written in the correct order, so you can't ever really say you're running PID 0 (like would be the case if the p.pid and p.stat assignments were swapped). Between assigning u.uno and p.stat a destroyed process is running, which is SOP for sys exit (II) &c. Admittedly, I don't think the process would survive getting swapped out (or returned to at all), but it doesn't have to because we're in kernel mode.

u0
superblock
	.if cold
/ initialize inodes for special files (inodes 1 to 40.)
(we will revisit this)
	.endif

/ next 2 instructions not executed during cold boot.
	bis    $2000,sb0 / sb0 I/O queue entry for superblock on drum;
                         / set bit 10 to 1
	jsr    r0,ppoke / read drum superblock
1:
	tstb   sb0+1 / has I/O request been honored (for drum)?
	bne    1b / no, continue to idle.

By this time we still haven't read in the superblock for the rootfs. It's always loaded as an invariant, and definitely before we do any filesystem operations.

u0
init
1:
	decb   sysflg / mormally sysflag=0, indicates executing in system
	sys    exec; 2f; 1f / generates trap interrupt; trap vector =
                            / sysent; 0
	br     panic / execute file/etc/init

1:
	2f;0
2:
	</etc/init\0> / UNIX looks for strings term, noted by nul\0

panic:
	clr    ps
1:
	dec    $0
	bne    1b
	dec    $5
	bne    1b
	jmp    *$173700 / rom loader address

and once again for the gamers:

extern uint8_t sysflg;

--sysflg;      // mormally sysflag=0, indicates executing in system
exec(_2, _1);  // generates trap interrupt; trap vector = sysent; 0
goto panic;    // execute file/etc/init

static char*_1[] = {_2, 0};
static char _2[] = "/etc/init";  // UNIX looks for strings term, noted by nul\0

panic:
ps = 0;
while(--*(uint16_t *)0 || --*(uint16_t *)5)  // I think??
  ;
goto *(void *)0173700;  // rom loader address

Once more, in order:

sysflg is used as a flag that prevents running syscalls while in the kernel (see below). This is what realistically delineates running in user and kernel modes (indeed, some may say it's a flg what indicates being in sys).

exec() is exactly how you'd run it from userspace (since, well, we are now).

If that fails, then we panic by clearing processor flags (idk) and jumping to the loader that loads the BOS (cf. boot procedures (VII)).

But not before doing something sociopathic with the word at 0 (this would be the address to jump to for interrupt 0(?), and is 4) and the word at 5 (this is an odd address, so this is the high byte of the address for interrupt 1 (unkni/sysent) and the low byte of its processor flags word (0)). Why?

u1
sysent:
unkni: / used for all system calls
sysent:
	incb	sysflg / indicate a system routine is
	beq	1f / in progress
	jmp	panic / called if trap inside system
1:
	(rest of syscall handling)
extern uint8_t sysflg;
if(++sysflg)
  goto panic;

(−1 if in userspace, 0 if in kernel, 1 if currently panicking).

This clearly demonstrates that in V1 there is no PID 0. The system runs with no process, then it runs as a syscall of PID 1, then it runs execl("/etc/init", "/etc/init", 0) in PID 1, then it's normal.

u0
superblock installation
	.if cold
/ initialize inodes for special files (inodes 1 to 40.)

	mov    $40.,r1 / set r1=i-node-number 40.
1:
	jsr    r0,iget / read i-node 'r1' from disk into inode area of
                       / core and write modified inode out (if any)
	mov    $100017,i.flgs / set flags in core image of inode to indi-
                              / cate allocated, read (owner, non-owner),
                              / write (owner, non-owner)
	movb   $1,i.nlks / set no. of links = 1
	movb   $1,i.uid / set user id of owner = 1
	jsr    r0,setimod / set imod=1 to indicate i-node modified, also
                          / stuff time of modification into i-node
	dec    r1 / next i-node no. = present i-node no.-1
	bgt    1b / has i-node 1 been initialized; no, branch

/ initialize i-nodes r1.,...,47. and write the root device, binary, etc.,
/ directories onto fixed head disk. user temporary, initialization prog.

	mov    $idata,r0 / r0=base addr. of assembled directories.
	mov    $u.off,u.fofp / pointer to u.off in u.fofp (holds file
                             / offset)
1:
	mov    (r0)+,r1/r1=41.,...,47; "0" in the assembled directory
                       / header signals last
	beq    1f      / assembled directory has been written onto drum
	jsr    r0,imap / locate the inode map bit for i-node 'r1'
	bisb   mq,(r2) / set the bit to indicate the i-node is not
                       / available
	jsr    r0,iget / read inode 'r1' from disk into inode area of
                       / core and write modified i-node on drum (if any)
	mov    (r0)+,i.flgs / set flags in core image of inode from
                            / assembled directories header
	movb   (r0)+,i.nlks / set no. of links from header
	movb   (r0)+,i.uid / set user id of owner from header
	jsr    r0,setimod / set imod=1 to indicate inode modified; also,
                          / stuff time of modification into i-node
	mov    (r0)+,u.count / set byte count for write call equal to
                             / size of directory
	mov    r0,u.base / set buffer address for write to top of directory
	clr    u.off / clear file offset used in 'seek' and 'tell'
	add    u.count,r0 / r0 points to the header of the next directory
	jsr    r0,writei / write the directory and i-node onto drum
	br     1b / do next directory
	.endif

Here-in:

The first loop creates files 1–40 owned by UID 1 ("sys"), mode u=rw,o=rw (normal creation semantics apply; there isn't a "close" or "write", because only one i-node can be open at a time and iget will flush the current one before opening a new one). The first real file is 41 (that's the root directory), and i-nodes smaller than this are devices, with the equivalent of the modern maj:min being just the i-node number:

u5
iopen:
iopen: / open file whose i-number is in r1
        tst    r1 / write or read access?
        blt    2f / write, go to 2f
        jsr    r0,access; 2 / get inode into core with read access
        cmp    r1,$40. / is it a special file
        bgt    3f / no. 3f
        mov    r1,-(sp) / yes, figure out
        asl    r1
        jmp    *1f-2(r1) / which one and transfer to it
1:
        otty   / tty
        oppt   / ppt
        sret   / mem
        sret    / rf0
        sret   / rk0
        sret   / tap0
        
        sret   / tap7
        ocvt   / tty0
        
        ocvt   / tty7
        error / crd
(this repeats for writes identically, but directories error)

The second loop walks idata to exhaustion, creating each given inode, copying the mode, link count, owner, and data. What could the format possibly be? The uzh, lest we forget that directories are just files with filenames inside:

u0
idata:
idata:

/ root

	41.
	140016
	.byte 7,1
	9f-.-2
	41.
	<..\0\0\0\0\0\0>
	41.
	<.\0\0\0\0\0\0\0>
	42.
	<dev\0\0\0\0\0>
	43.
	<bin\0\0\0\0\0>
	44.
	<etc\0\0\0\0\0>
	45.
	<usr\0\0\0\0\0>
	46.
	<tmp\0\0\0\0\0>
9:

/ device directory

	42.
	140016
	.byte 2,1
	9f-.-2

	01.
	<tty\0\0\0\0\0>
	02.
	<ppt\0\0\0\0\0>
	03.

	01.
	<tty8\0\0\0\0> / really tty
9:

/ etcetra directory

	44.
	140016
	.byte 2,3
	9f-.-2

	47.
	<init\0\0\0\0>
9:



/ initialization program

	47.
	100036
	.byte 1,3
	9f-.-2
8:
	sys    break; 0
	sys    open; 6f-8b+core; 0
	mov    r0,r1
	sys    seek; 65.; 0
1:
	mov    r1,r0
	sys    read; 9f-8b+core; 512.
	mov    9f,r5            / size
	beq    1f
	sys    creat; 9f-8b+core+4; 0
	mov    r0,r2
	movb   9f+2,0f
	sys    chmod; 9f-8b+core+4; 0:..
	movb   9f+3,0f
	sys    chown; 9f-8b+core+4; 0:..
2:
	tst    r5
	beq    2f
	mov    r1,r0
	sys    read; 9f-8b+core; 512.
	mov    $512.,0f
	cmp    r5,$512.
	bhi    3f
	mov    r5,0f
3:
	mov    r2,r0
	sys    write; 9f-8b+core; 0:..
	sub    r0,r5
	br     2b
2:
	mov    r2,r0
	sys    close
	br     1b
1:
	mov    r1,r0
	sys    close
	sys    exec; 5f-8b+core; 4f-8b+core
	sys    exit
4:
	5f-8b+core; 0
5:
	</etc/init\0>
6:
	</dev/tap0\0>
	.even
9:

/ end of initialization data

	0

	.endif

And the problem of including an /etc/init which we inevitably execute is solved by using the kernel assembler to compile it directly into the faux filesystem image (admittedly, with a bit of a fucky -8b+core addressing mode):

break(0);           // cf. modern brk(2); this allocates the entire 8kB:
extern char buf[];  // we can pretend this variable exists just after program text

tape = open("/dev/tap0", O_RDONLY);
seek(tape, 65, SEEK_SET);

for(;;) {
	read(tape, buf, 512);
	size = ((uint16_t *)buf)[0];
	if(!size)
		break;

	file = creat(buf + 4, 0);
	chmod(buf + 4, buf[2]);
	chown(buf + 4, buf[3]);

	while(size) {
		read(tape, buf, 512);
		size -= write(file, buf, min(size, 512));
	}
	close(file);
}

close(tape);
exec("/etc/init");
exit();

Thus, knowing that DECtapes are block-addressed, we can deduce the tape has a 65-block header, then alternatingly blocks containing [size: 2B; mode: 1B; owner: 1B; name: NUL-terminated], and data to completion, ending in a file with size=0. Boot procedures (VII) say this init reads the DECtape for initialization files starting from block 33. It's unclear how one's supposed to reconcile this with the 65.

So the filesystem, before loading from tape, consists of (omitting . and ..):

41 /          1 (sys)      [7 links] d-rwr-
   42 dev     1 (sys)      [2 links] d-rwr-
       1 tty      2 ppt      3 mem   Every file: 1 (sys) [1 link] --rwrw
       4 rf0      5 rk0      6 tap0
       7 tap1     8 tap2     9 tap3
      10 tap4    11 tap5    12 tap6
      13 tap7    14 tty0    15 tty1
      16 tty2    17 tty3    18 tty4
      19 tty5    20 tty6    21 tty7
      22 lpr      1 tty8             (really tty)
   43 bin      3 (bin, adm) [2 links] d-rwr-
   44 etc      3 (bin, adm) [2 links] d-rwr-
      47 init  3 (bin, adm) [1 link ] -xrwr-
   45 usr      1 (sys)      [2 links] d-rwr-
   46 tmp      1 (sys)      [2 links] d-rwrw

# nsys unix (the V4 what isn't)

One would be remiss to not properly analyse the next available unix as well. V2? no. V3? sure, but this is actually a WIP of V4 (compare Readme.nsys and Changes.txt and the The Unix Tree-for-V4 comments): it's in C, sure, and time (II) already returns seconds since 1970, but 0, &nosys, /* 42 = pipe */ (V3 already had pipes). This is the same dump as what Anderson calls "V4". This is obviously wrong (no fault of his, of course, how were he to know), but V4 is one of the two things we know this is definitely not. That said: probably close enough, given that the bits we care about are identical in V5. And what would those be?

ken/low.s
entry
/ low core



. = 0^.
	4
	br	1f

/ trap vectors
	trap; br7+0		/ bus error
	trap; br7+1		/ illegal instruction
	trap; br7+2		/ bpt-trace trap
	trap; br7+3		/ iot trap
	trap; br7+4		/ power fail
	trap; br7+5		/ emulator trap
	trap; br7+6		/ system entry

. = 040^.
1:	jmp	start

Loaded at 0, jump to jump to start. It's most convenient to follow along with The Unix Tree.

ken/45.s
start:
/ machine language assist
/ PDP-11/45



start:
	bit	$1,SSR0
	bne	start			/ loop if re-entry
	reset

/ initialize systems segments

/ initialize user segmant

/ initialize io segment
/ set up counts on super segments

/ get a sp and start segmentation

/ clear bss

/ clear user block

/ set up previous mode and call main

	mov	$30000,PS
	jsr	pc,_main
	mov	$170000,-(sp)
	clr	-(sp)
	rti
ken/main.c
main()
main()
{

	/*
	 * zero and free all of core
	 */

	UISA->r[0] = KISA->r[6] + USIZE;
	UISD->r[0] = 6;
	for(; fubyte(0) >= 0; UISA->r[0]++) {
		clearseg(UISA->r[0]);
		mfree(coremap, 1, UISA->r[0]);
	}
	mfree(swapmap, NSWAP, SWPLO);

This mirrors almost-verbatim what we saw in V1 setup, except we now have segmented memory (and thus malloc()/mfree(); also printf()(!)).

ken/main.c
process prep
	/*
	 * set up system process
	 */

	proc[0].p_addr = KISA->r[6];
	proc[0].p_size = USIZE;
	proc[0].p_stat = SRUN;
	proc[0].p_flag =| SLOAD|SSYS;
	u.u_procp = &proc[0];

	/*
	 * set up 'known' i-nodes
	 */

	rootdir = iget(ROOTDEV, ROOTINO);
	rootdir->i_flag =& ~ILOCK;
	u.u_cdir = iget(ROOTDEV, ROOTINO);
	u.u_cdir->i_flag =& ~ILOCK;

One last time:

proc[0].p_addr is the base of the process' address space and proc[0].p_size is its size. but idk what the unit is lol (it's gotta be chunks of some size because the allocator's arena size is a char, but fuck knows what; USIZE is helpfully 8, and I've calculated the size of u at 217 (or 228) bytes; maybe it's 32 bytes? that's awfully narrow but it fits). In V1 this was [core; u.break) (and also the stack was separate. but close enough), now it's [proc[…].p_addr, proc[…].p_addr + proc[…].p_size). AFAICT, KISA->r[6] is gonna be the Kernel Instruction-Space Page Address Register #0, register 6 (there's a total of 8, matching the machine's 8 segments, so this probably means we'll be executing from kernel instruction bank 6?).

proc[0].p_stat is the same as before (except R is now 4).

proc[0].p_flag is, AFAICT, "loaded" (as in: don't need to swap in), and SSYS poisons the scheduler to never consider the process for being swapped out.

rootdir and u.u_cdir mean the same thing as in V1, but they are now entries into the open-inode table (and iget() opens the given inode, returning a struct inode *). This system has a "filesystem cache"! Revolutionary.

u.u_procp is the equivalent of u.uno, and is the process' index in the scheduler table.

Before we were strictly in kernel mode, now we're running as PID 0, as an unswappable system process executing kernel code, in /.

ken/main.c
the scheduler and also init
	/*
	 * make init process
	 * enter scheduling loop
	 * with system process
	 */

	if(newproc()) {
		expand(USIZE+1);
		u.u_uisa[0] = USIZE;
		u.u_uisd[0] = 6;
		sureg();
		for(i=0; icode[i] != -1; i++)
			suword(i*2, icode[i]);
		return;
	}
	sched();
}

newproc() is what does the forking in fork(), and returns nonzero in the child. This is the first known instance of the kernel forking. Naturally, since mpid was never changed, the new process is PID 1. The special PID 0 process goes to sched() and now does scheduling. It will probably schedule our PID 1 instantly.

And PID 1 will first allocate another 1 block (32 bytes?), then… u.u_uisa[0]&u.u_uisd[0] look awfully like the user instruction and data segments? This matches segment 6 for data? But there are 8 segments (0–7 and USIZE is 8 so fuck knows. maybe not). sureg() sets user segment registers.

suword(), expectedly, sets words in user memory. The loop effectively does a memcpy of icode until -1 to user memory starting at user address 0, and then we return there.

icode is, unfortunately, int icode[] { 0104413, 0000014, 0000010, 0000777, 0000014, 0000000, 0062457, 0061564, 0064457, 0064556, 0000164, -1 }; so a beautiful methodology has definitely been lost. But a quick objcopy --dump-section .data=/dev/stdout a.o | od -t o2z -w014 later, this is

0000000 104413 000014 000010 000777 000014 000000  >............<
0000014 062457 061564 064457 064556 000164 177777  >/etc/init...<
0000030
from which the 0000014 pointer stands out the most, so this can be trivially disassembled to
0000000: sys exec; 0000014; 0000010
0000010: 0000014; 0000000
0000014: </etc/init\0>

The return returns to start:, which explicitly (clr) returns to address 0 in user mode (rti returns from an interrupr).

Anderson's analysis is correct, even if by accident: based on nsys source, V4 will have most likely been the first unix with a PID 0 scheduler process. TTBOMK, this corresponds with this being the first system targeting the PDP-11/45 (or at least its memory segmentation features) rather than the PDP-11/20, and thus a purely-interrupt-driven architecture was no longer possible. (This hurts less when it's one process out of a fifty than if it were one process out of sixteen, too.)

# that funny PDP-7 proto-unix

Since we're all here already, it'd be a shame to stop at Version 1 unix. The scans for this system can be found at the TUHS as the number-named PDFs. The accompanying document — D. M. Ritchie's DRAFT of The UNIX Time-Sharing System — is a delightful 53 short pages (27 if you exclude the manuals (though they are not that yet)) and you should read or at least skim, is found in the UnixEditionZero* files. For the PDFs I recommend Threshold_OCR, which has text, bookmarks, and not-being-100MB. An OCR-only text file is also available.

Well, the document says this documents UNIX-11, which is a mildly-redesigned port of UNIX-7. idk. it's the best I've got. It's also the first and only mention I've ever seen of R. H. Canaday's involvement. But this document also contains my favourite lie (it's honestly all gold, but.): 5. The Shell, 5.5. Processes and forking (p. 18):

Except while UNIX is bootstrapping itself into operation, a new process can come into existence in only one way: by use of the fork system call.

processid = fork(label) 

When fork is executed by a process, it splits into two idependently executing processes. The two processes have core images which are copies of each other, but they are not precisely equivalent: one of them is considered the parent process. In the parent, control does not return directly from the fork, but instead passes to location label; in the child process, there is a normal return. The processid returned by the fork call is the identification of the other, offspring process.

This (a) shows dmr's hand, since he almost-outright says the first process is produced magically, but also (b) is a delightful fib because this is, logically, what fork does, but: APPENDIX 1, A1.2 fork (p. A1-2; p. 37):

This is the primitive used to generate new processes.

sys fork
(old process return)
(new process return) 

There are no input arguments. R0 contains the process identification of the new proces. See also section 5.

The parent process returns immediately after the "sys" call; the new process skips one word. (The label argument mentioned in the discussion of fork in section 5.5 was a white lie.)

This also looks like the first syscall-type word. Everything else calls them "system calls", but this is a call to "sys". Give it another year.

The TUHS also distributes a zipped git export with no history, which claims to be a transcription of the PDFs. It may well be, and you may even be able to run it! But the only reliable source is 01-s1.pdf.

s8, pp. 49–50
coldentry:
coldentry:
   dzm 0100 " not re-entrant
   caf
   
   jms dskio; 06000
   jms copy; dskbuf; sysdata; ulist-sysdata
   lac d3
   jms namei; initf
      jms halt
   jms iget
   cls
   jms iread; 4096; 4096
   jmp 4096
s8, p. 48
"init"
" Strings
initf:
   <i>n;<i>t;< > ;< > "

Also note the slew of "constants" on p. 48 which are variables in the form o212: 0212, d7999: 8999, dm3: -3. It's unclear if this saves space, or.

ulist is the process table (funnily enough, object-addressed):

s8, p. 50
ulist:
ulist:
   0131000;1;0;0
   0031040;0;0;0
   0031100;0;0;0
   0031140;0;0;0
   (and so on, there's 10 total)

and fuck knows what the fields are, but the first one is very clearly initialised differently. If I had to guess (and I do), the second field would be pid because that's congruent with the userdata. No clue what that high bit being set is.

s8, p. 50
userdata:
userdata:
   u.ac: 0
   u.mq: 0
   u.rg: .=.+9
   u.uid: -1
   u.pid: 1
   (there's more)

If I were to translate this all to C and adapt it to a modern audience, then I'd say

static char initf[8] = "init";  // idk if 4 spaces or 4 NULs at the end

diskread(06000, sysdata, sizeof(sysdata)); // superblock(?)

init = open(initf);
if(init == -1)
	halt();
read(init, (void *)4096, 4096)
goto *(void *)4096;

So I'd say this system boots into PID 1 but executing the kernel and with no program data, then reads init into the program data (from the root directory; but there's barely a hierarchical filesystem here), and jumps there. This is as an open-coded sys exec (II).


Nit-pick? Correction? Improvement? Annoying? Cute? Anything? Mail, post, or open!


Creative text licensed under CC-BY-SA 4.0, code licensed under The MIT License.
This page is open-source, you can find it at GitHub, and contribute and/or yell at me there.
Like what you see? Consider giving me a follow over at social medias listed here, or maybe even a sending a buck liberapay donate or two patreon my way if my software helped you in some significant way?
Compiled with Clang 19's C preprocessor on 13.09.2025 14:09:37 UTC from src/blogn_t/014-unix-pre-v4-pid0-corollary.html.pp.
See job on builds.sr.ht.
RSS feed