023,a. V1 unix I/O buffer count vs. performance benchmark

Sat, 17 Jan 2026 04:33:03 +0100

My draft for post 023 currently quips

(to wit: V1's filesystem cache consists of 1 – The – open file (iget (+ icalc), inode, i.*, ii, idev, cdev, imod), and up to 6 (installer kernel has 2, I'm pretty sure even just 1 would work) 512-byte I/O block buffers (nbuf, buffer (bufp), wslot) (the block containing The open i-node isn't cached)).

but pretty sure is not good enough. If only there were some way to compile and run a V1 kernel and test this directly!

This is borne out of pedantry and to satisfy my neurosis, because

  1. write (II)
  2. syswrite
  3. writeinode
  4. dskw (write routine for non-special files)

says

212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
dskw: / write routine for non-special files
	mov	(sp),r1 / get an i-node number from the stack into r1
	jsr	r0,iget / write i-node out (if modified), read i-node 'r1'
		        / into i-node area of core
	mov	 *u.fofp,r2 / put the file offset [(u.off) or the offset in
		            / the fsp entry for this file] in r2
	add	 u.count,r2 / no. of bytes to be written + file offset is
		            / put in r2
	cmp	 r2,i.size / is this greater than the present size of
		           / the file?
	blos	 1f / no, branch
	 mov	r2,i.size / yes, increase the f11e size to file offset +
		           / no. of data bytes
	 jsr	r0,setimod / set imod=1 (i.e., core inode has been
		           / modified), stuff tlme of modification into
		           / core image of i-node
1:
	jsr	r0,mget / get the block no. in which to write the next data
		        / byte
	bit	*u.fofp,$777 / test the lower 9 bits of the file offset
	bne	2f / if its non-zero, branch; if zero, file offset = 0,
		   / 512, 1024,...(i.e., start of new block)
	cmp	u.count,$512. / if zero, is there enough data to fill an
		              / entire block? (i.e., no. of
	bhis	3f / bytes to be written greater than 512.? Yes, branch.
		   / Don't have to read block
2: / in as no past info. is to be saved (the entire block will be
   / overwritten).
	jsr	r0,dskrd / no, must retain old info.. Hence, read block 'r1'
		         / into an I/O buffer
3:
	jsr	r0,wslot / set write and inhibit bits in I/O queue, proc.
		         / status=0, r5 points to 1st word of data
	jsr	r0,sioreg / r3 = no. of bytes of data, r1 = address of data,
		          / r2 points to location in buffer in which to
		          / start writing data
2:
	movb	(r1 )+,(r2)+ / transfer a byte of data to the I/O buffer
	dec	r3 / decrement no. of bytes to be written
	bne	2b / have all bytes been transferred? No, branch
	jsr	r0,dskwr / yes, write the block and the i-node
	tst	u.count / any more data to write?
	bne	1b / yes, branch
	jmp	ret / no, return to the caller via 'ret'
cf. emphasised fragments
(labels colour-coded; note that 123 is octal and 123. is decimal)

which can be translated to

​
212
​
214
215
216
217
218
220
223
225
226
​
​
229
​
​
​
240
241
​
​
243
244
245
246
247
249
252
255
extern r1, r2, r3, cdev;
dskw(ino) /* write routine for non-special files */
{
	r1 = ino; iget(); /* write i-node out (if modified), read i-node 'r1' on 'cdev'
	                     into i-node area of core */
	r2 = *u.fofp + u.count; /* file offset [(u.off) or the offset in
	                           the fsp entry for this file] +
	                           no. of bytes to be written */
	if(r2 > i.size) {
		i.size = r2;
		setimod();
	}

	while(u.count) {
		mget(); /* get the block no. in which to write the next data byte */

		/* if lower 9 bits of file offset are 0,
		   file offset = 0, 512, 1024,...(i.e., start of new block): */
		if(*u.fofp & 511 || u.count < 512) {
			dskrd();  /* if there is not enough data to fill an entire block, */
		}	          /* read block 'r1' on 'cdev' into an I/O buffer */

		wslot(); /* set write and inhibit bits in I/O queue, proc. status=0,
		            r5 points to 1st word of data */
		sioreg(); /* r3 = no. of bytes of data,
		             r1 = address of data,
		             r2 points to location in buffer in which to start writing data */
		while(r3--)
			*(char *)r2 = *(char *)r1; /* transfer a byte of data to the I/O buffer */

		dskwr(); /* yes, write the block and the i-node */
	}

	goto ret;
}

The comment on the dskwr call implies that it'll (schedule to) write the current i-node as well as the just-written-to data block. This is conspicuous given that the installation build of the kernel downgrades from 6 I/O buffers to 2. Are those the same two?

unix72 builds an image containing the unpacked source in /usr/sys, so it's pretty easy to just Do this. time (II) is updated to return a sentinel large value to check if running updated kernel (this would be better served with a printf or the like, but this kernel doesn't have any kernel output facility):

# cp u2.s u2.s.orig
# ed u2.s
19053
.,.+4p
systime: / get time of year
	mov	s.time,4(sp)
	mov	s.time+2,2(sp) / put the present time on the stack
	br	sysret4

/sys/t@/systime/
systime: / get time of year

	mov	s.time,4(sp)
s/s.time/$653#535./
p
	mov	$65535.,4(sp)

	mov	s.time+2,2(sp) / put the present time on the stack
s/s.time+2/65535#####$6554$##34#4#5./
p
	mov	$65535.,2(sp) / put the present time on the stack

# cp u0.s u0.s.orig
# ed us@ed u0.s
12636
/nbuf/
nbuf = 6
d
d
d
p
.endif
d
i
nbuf = 1
.
w
12588
q
# as u*.s
I
II
# ls -l
total 539
225 lxrwrw 1 root 36432 Jan 1 00:00:00 a.out

What remains is the matter of getting the kernel to boot. The unix72 bootloader is broadly-compatible with boot procedures (VII) and documented in boot/README:

NOTE: For using kernels built using the V2 assembler, all of the following should refer to msys2, instead of msys.

Alternatively, everything can be installed while running under the V1 system using the following procedure:

First, build the support programs: bos and msys

chdir /usr/boot
sh run

[…] If you build a kernel under V1, then you can install it into the […] cold boot area (such as for testing) with:

msys 1 name_of_kernel

[I]f the cold boot area is being used for testing new kernels, then the kernel can be bootstrapped using:

tools/pdp11 boot/simh_cold.cfg

(layout sic!). The only difference in boot/simh_cold.cfg is the console switches going from 1737008 to 1. I bring this documentation up only because the run script doesn't actually build msys2, which one'd think rather defeats the purpose, since running it is only meaningful in the emulated system, which we know only has the V2 assembler. unix72 is universally weird like this.

# chdir /usr/boot
# cat run
as bos.s
mv a.out bos
as msys.s
mv a.out msys
# as msys2.s
I
II
# mv a.out msys2
# msys2 1 /usr/sys/a.out
# date
Sun Jan 23 22:46:32
# ^E
Simulation stopped, PC: 007332 (MOV (SP)+,25244)
sim> q
Goodbye
RF: writing buffer to file
TC0: writing buffer to file
unix72$ pdp11 simh_cold.cfg
PDP-11 simulator V3.8-1
Disabling CR
Disabling XQ
RF: buffering file in memory
TC0: 16b format, buffering file in memory
:login: root
root
# date
Sun Dec <9 12:06:28
# chdir /usr/sys
# as u*.s
I
II
#

so it does work, it just took a little longer. Well, it felt longer.

This paragraph was going to start "Easy enough to quantify, though:" but it isn't, because SIMH resists being instrumented at every turn. What is easy is making the kernel self-identify by putting nbuf in the minutes field of date (I) (there's an off-by-one in the division for seconds, so just returning 60*nbuf for nbuf=4 yields Fri Jan  1 00:00:03):

# ed u2.s
19053
/65535/
        mov $65535.,4(sp)
s/65535/0/
        mov $65535.,2(sp) / put the present time on the stack
s/\$65535./$[60. * 60. * nbuf]/
w
19061
# as u*.s
I
II
# /usr/boot/msys2 1 a.out
# ^E
Simulation stopped, PC: 007332 (MOV (SP)+,25232)
sim> q
unix72$ pdp11 simh_cold.cfg
:login: root
root
# date
Fri Jan 1 00:01:00
# chdir /usr/sys
# cat >nbuf
: echo leaves a space at the end of every line
echo /nbuf/s/=.\*/= $1/p >.ed
echo w >>.ed
ed .ed
1,$s/ $//
w
q
ed u0.s <.ed
rm .ed
as u*.s
mv a.out nbuf.$1
# sh nbuf 0; sh nbuf 1; sh nbuf […]; sh nbuf 11; sh nbuf 12
# ls -l nbuf*
220 s-rwrw  1 root    178 Jan  1 00:00:00 nbuf
240 lxrwrw  1 root  36432 Jan  1 00:00:00 nbuf.0
230 lxrwrw  1 root  36432 Jan  1 00:00:00 nbuf.1
242 lxrwrw  1 root  37960 Jan  1 00:00:00 nbuf.10
234 lxrwrw  1 root  39004 Jan  1 00:00:00 nbuf.11
236 lxrwrw  1 root  40048 Jan  1 00:00:00 nbuf.12
232 lxrwrw  1 root  36432 Jan  1 00:00:00 nbuf.2
225 lxrwrw  1 root  36432 Jan  1 00:00:00 nbuf.3
237 lxrwrw  1 root  36432 Jan  1 00:00:00 nbuf.4
231 lxrwrw  1 root  36432 Jan  1 00:00:00 nbuf.5
235 lxrwrw  1 root  36432 Jan  1 00:00:00 nbuf.6
238 lxrwrw  1 root  36916 Jan  1 00:00:00 nbuf.7
239 lxrwrw  1 root  37960 Jan  1 00:00:00 nbuf.8
241 lxrwrw  1 root  39004 Jan  1 00:00:00 nbuf.9
# /usr/boot/msys2 1 nbuf.0
# ^E
sim> q
unix72$ pdp11 simh_cold.cfg
Disabling CR
Disabling XQ
RF: buffering file in memory
TC0: 16b format, buffering file in memory
Listening on port 5555 (socket 7)
(it loops here)
Simulation stopped, PC: 001000 (TSTB 25135)
sim> q

so 0 obviously doesn't work (entirely unsurprisingly). 6, as shipped, is the max: 7 crashes with

Trap stack push abort, PC: 011600 (BNE 11612)

The instrumentation is then "install kernel, exit; boot it, compile the kernel, exit". ^E is byte 5.

for i in $(seq 6); do
	{ sleep 1; echo root; echo "/usr/boot/msys2 1 /usr/sys/nbuf.$i";
	  sleep 1; printf '\005'; sleep 0.2; echo q; } |
		script /dev/null -c 'pdp11 simh.cfg'
	for sample in $(seq 10); do
		{ sleep 1; echo root; echo date; echo "chdir /usr/sys"; echo "as u*.s";
		  sleep 5; printf '\005'; sleep 0.2; echo q; } |
			script log.$i+$sample -T log.$i+$sample.tm -c 'pdp11 simh_cold.cfg'
	done
done

Time samples can be found by parsing the (time) logs:

nbuf Time to as u*.s speedup vs previous
1 3.215s
2 1.309s 2.455× 146%
3 1.004s 3.202× 30.4%
4 780ms 4.121× 28.7%
5 734ms 4.378× 6.26%
6 724ms 4.443× 1.48%

This would be much more linear, except nbufs=1 pays double price because in that configuration, having any in-flight I/O blocks any disk I/O (except to the superblock and swap), so unix and the disk no longer run in parallel. This is in an emulator with functionally-instant-for-the-time I/O. One has to assume this'd be much worse on hardware.

In conclusion: nbufs=2 is a soft usability minimum, more-so than a hard limit, and the distribution kernel ships as many nbufs as it can easily fit.


Nit-pick? Correction? Improvement? Annoying? Cute? Anything? Mail, post, or open!


Creative text licensed under CC-BY-SA 4.0, code licensed under The MIT License.
This page is open-source, you can find it at GitHub, and contribute and/or yell at me there.
Like what you see? Consider giving me a follow over at social medias listed here, or maybe even a sending a buck liberapay donate or two patreon my way if my software helped you in some significant way?
Compiled with Clang 21's C preprocessor on 17.01.2026 03:43:10 UTC from src/blogn_t/023,a-v1-nbuf-benchmark.html.pp.
See job on builds.sr.ht.
RSS feed