007. Cleanly enabling Cyrillic and broad Unicode output in groff -Tps
↩$ mkdir devps; cd devps
$ for f in {C,T}{R,B,I,BI}; do
in="$(awk '$1 == "internalname" {print $2; exit}' "/usr/share/groff/current/font/devps/$f")";
echo $f: $in;
src="$(awk -v cur="/$in" '
$1 ~ /^\// && $2 ~ /^[\/\(]/ {
if($2 ~ /^\//)
aliases[$1] = $2;
else
paths[$1] = $2;
}
END {
while(cur in aliases)
cur = aliases[cur];
if(cur in paths) {
sub(/^\(/, "", paths[cur]);
sub(/(pfb)?\)$/, "", paths[cur]);
print paths[cur]
} else {
print "last: " cur > "/dev/stderr";
exit 1
}
}' /var/lib/ghostscript/fonts/Fontmap)";
echo ${src}afm;
afmtodit $(expr "$f" : C > /dev/null && printf -- -n) -cmi0 \
-d /usr/share/groff/current/font/devps/DESC \
-e /usr/share/groff/current/font/devps/text.enc \
"${src}afm" /usr/share/groff/current/font/devps/generate/textmap "$f";
done
CR: Courier
/usr/share/fonts/type1/gsfonts/n022003l.afm
CB: Courier-Bold
/usr/share/fonts/type1/gsfonts/n022004l.afm
CI: Courier-Oblique
/usr/share/fonts/type1/gsfonts/n022023l.afm
CBI: Courier-BoldOblique
/usr/share/fonts/type1/gsfonts/n022024l.afm
TR: Times-Roman
/usr/share/fonts/type1/gsfonts/n021003l.afm
TB: Times-Bold
/usr/share/fonts/type1/gsfonts/n021004l.afm
TI: Times-Italic
/usr/share/fonts/type1/gsfonts/n021023l.afm
TBI: Times-BoldItalic
/usr/share/fonts/type1/gsfonts/n021024l.afm
# ln -s /usr/local/share/groff/site-font /usr/share/groff
# mkdir -p /usr/local/share/groff/site-font
# cp -r ../devps /usr/local/share/groff/site-font
That's it. Same exact fonts as groff was already using, but with the appropriate metrics;
on Debian — where all of these are Nimbus variants — this includes Cyrillic and the extended Latin planes, among others,
because unlike Times
, Nimbus is an actual font.
Render with groff -Kutf8 …
:
We've all been there, writing a manual that needs example text wider than US-ASCII, or to accredit a Russian. groff provides ć as \['c] and ł as \[/l], gracefully noting that the latter is Polish, and that's about it. It, of course, is rather tempting to, then, "just write something", as it is, after all, anno domini twenty twenty. This yields:
nabijaczleweli@tarta:~/uwu$ groff -Tps -mdoc -tp < frag.1 | lp -dPDF
pic:<standard input>:10: invalid input character code 128
pic:<standard input>:14: invalid input character code 128
pic:<standard input>:52: invalid input character code 129
pic:<standard input>:52: invalid input character code 132
pic:<standard input>:52: invalid input character code 152
pic:<standard input>:52: invalid input character code 134
pic:<standard input>:52: invalid input character code 131
pic:<standard input>:52: invalid input character code 130
pic:<standard input>:52: invalid input character code 133
pic:<standard input>:52: invalid input character code 153
pic:<standard input>:52: invalid input character code 135
pic:<standard input>:52: invalid input character code 132
pic:<standard input>:54: invalid input character code 128
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:83: invalid input character code 143
pic:<standard input>:83: invalid input character code 134
pic:<standard input>:83: invalid input character code 143
pic:<standard input>:83: invalid input character code 134
pic:<standard input>:83: invalid input character code 143
pic:<standard input>:83: invalid input character code 134
pic:<standard input>:83: invalid input character code 143
pic:<standard input>:83: invalid input character code 134
pic:<standard input>:83: invalid input character code 143
pic:<standard input>:83: invalid input character code 134
pic:<standard input>:84: invalid input character code 158
pic:<standard input>:84: invalid input character code 158
pic:<standard input>:84: invalid input character code 158
pic:<standard input>:84: invalid input character code 158
pic:<standard input>:84: invalid input character code 158
pic:<standard input>:100: invalid input character code 129
pic:<standard input>:100: invalid input character code 132
pic:<standard input>:100: invalid input character code 152
pic:<standard input>:100: invalid input character code 134
pic:<standard input>:100: invalid input character code 131
pic:<standard input>:100: invalid input character code 130
pic:<standard input>:100: invalid input character code 133
pic:<standard input>:100: invalid input character code 153
pic:<standard input>:100: invalid input character code 135
pic:<standard input>:100: invalid input character code 132
request id is PDF-978 (0 file(s))
And:
A great time!
The Arch Wiki says that to "Correctly display Polish diacritics",
you can groff -Kutf8 -Tdvi -mec -ms test.ms > test.dvi
, so:
nabijaczleweli@tarta:~/uwu$ groff -Kutf8 -Tps -mdoc -tp < frag.1 | lp -dPDF
troff: <standard input>:52: warning: can't find special character 'u0041_0328'
troff: <standard input>:52: warning: can't find special character 'u0045_0328'
troff: <standard input>:52: warning: can't find special character 'u005A_0307'
troff: <standard input>:52: warning: can't find special character 'u005A_0301'
troff: <standard input>:52: warning: can't find special character 'u004E_0301'
troff: <standard input>:52: warning: can't find special character 'u0061_0328'
troff: <standard input>:52: warning: can't find special character 'u0065_0328'
troff: <standard input>:52: warning: can't find special character 'u007A_0307'
troff: <standard input>:52: warning: can't find special character 'u007A_0301'
troff: <standard input>:52: warning: can't find special character 'u006E_0301'
troff: <standard input>:82: warning: can't find special character 'u044F'
troff: <standard input>:82: warning: can't find special character 'u0438_0306'
troff: <standard input>:82: warning: can't find special character 'u0446'
troff: <standard input>:82: warning: can't find special character 'u043E'
troff: <standard input>:82: warning: can't find special character 'u042F'
troff: <standard input>:82: warning: can't find special character 'u0418_0306'
troff: <standard input>:82: warning: can't find special character 'u0426'
troff: <standard input>:82: warning: can't find special character 'u041E'
request id is PDF-979 (0 file(s))
And:
This is, arguably, not better – ł, ć, and the ellipsis have appeared, but everything else disappeared.
It's unclear why exactly those three, considering the ellipsis can't be reached by a classic groff_char(7) escape,
and all three become \[uXXXX]
-style escapes after passing through preconv.
The same Arch Wiki article says that as part of "Adding support for cyrillic fonts",
you can preprocess with iconv -f utf-8 -t koi8-r
.
This is preceded by downloading an unspecified font pack out of about 8 from a Russian FTP,
aided by the READMEs therein also being in KOI8-R (i.e. '�' in UTF-8),
and mangling the Ghostscript font map into pointing at that font for Times-Roman.
It's impossible to tell how the author expects literally any other Times variant (TI, TB, TBI), or any of the four fixed-width fonts, to render Cyrillic. And it ignores the obvious point that it replaces the font, which means the pages will look different.
It's only reasonable to expect the Russosphere to have some sort of answer to this, and,
indeed, a 2015 post has a workable solution;
it rambles and arrives at its point round-aboutly, converting to and from KOI8-R, as well as requiring a ru.tmac
package
(not included, but can be found on the groff mailing list, as can a plethora of other hacks of varying insanity) which isn't required at all; however:
More formally, groff does not have metrics for russian fonts. But I do have! I have metrics for Times fonts and I need to put them into /usr/share/groff/current/font/devps, replacing already existing. My metrics are in devps.tar attach.
Indeed! devps.tar
contains a perfectly serviceable set of usr/share/groff/current/font/devps/T{R,B,I,BI}
metrics:
nabijaczleweli@tarta:~/uwu$ groff -Kutf8 -Tps -mdoc -tp < frag.1 | lp -dPDF
troff: <standard input>:82: warning: can't find special character 'u044F'
troff: <standard input>:82: warning: can't find special character 'u0438_0306'
troff: <standard input>:82: warning: can't find special character 'u0446'
troff: <standard input>:82: warning: can't find special character 'u043E'
troff: <standard input>:82: warning: can't find special character 'u042F'
troff: <standard input>:82: warning: can't find special character 'u0418_0306'
troff: <standard input>:82: warning: can't find special character 'u0426'
troff: <standard input>:82: warning: can't find special character 'u041E'
troff: <standard input>:100: warning: can't find special character 'u0041_0328'
troff: <standard input>:100: warning: can't find special character 'u0045_0328'
troff: <standard input>:100: warning: can't find special character 'u005A_0307'
troff: <standard input>:100: warning: can't find special character 'u005A_0301'
troff: <standard input>:100: warning: can't find special character 'u004E_0301'
troff: <standard input>:100: warning: can't find special character 'u0061_0328'
troff: <standard input>:100: warning: can't find special character 'u0065_0328'
troff: <standard input>:100: warning: can't find special character 'u007A_0307'
troff: <standard input>:100: warning: can't find special character 'u007A_0301'
troff: <standard input>:100: warning: can't find special character 'u006E_0301'
request id is PDF-980 (0 file(s))
A cursory inspection of the metric reveals that they're for internalname NimbusRomNo9L-Regu
and friends,
generated Thu Aug 2 13:14:49 2007, as was the Nimbus in Buster;
a cursory correlation with the provided Fontmap
reveals that the mappings are as today.
But it's still undistributable, still missing the Courier variants, and, regardless, still quite different from a metric generated with up-to-date groff.
This is the bit where I vaguely explain how that big hunk at the top generates the unsolvable:
/usr/share/groff/current/font/devps/DESC
says that there's the usual four styles
(R
, B
, I
, and BI
),
and that the default family is T
; there's no point in getting cute and parsing this.internalname
directive, corresponding to the PostScript font requested
(for TI
it's Times-Italic
, for CI
it's Courier-Oblique
, &c.)./var/lib/ghostscript/fonts/Fontmap
from /etc/ghostscript/fontmap.d/*
by update-gsfontmap
)
is used to override font paths and alias fonts to others (well, it's arbitrary Ghostscript code that happens to match font names,
but doing anything besides name/alias/path mapping has thankfully died down as a practice),
with the interesting mappings being in the format:
/NimbusRomNo9L-ReguItal (/usr/share/fonts/type1/gsfonts/n021023l.pfb) ;
/Times-Italic /NimbusRomNo9L-ReguItal ;
/NimbusMonL-ReguObli (/usr/share/fonts/type1/gsfonts/n022023l.pfb) ;
/Courier-Oblique /NimbusMonL-ReguObli ;
/usr/share/fonts/type1/gsfonts/
, there are three files:
nabijaczleweli@tarta:~/uwu/devps$ file /usr/share/fonts/type1/gsfonts/{n022004l,n022023l}.*
/usr/share/fonts/type1/gsfonts/n022004l.afm: ASCII font metrics
/usr/share/fonts/type1/gsfonts/n022004l.pfb: PostScript Type 1 font program data
/usr/share/fonts/type1/gsfonts/n022004l.pfm: Printer Font Metrics NimbusMonL-Bold, 2738 bytes, Nimbus Mono L bold
/usr/share/fonts/type1/gsfonts/n022023l.afm: ASCII font metrics
/usr/share/fonts/type1/gsfonts/n022023l.pfb: PostScript Type 1 font program data
/usr/share/fonts/type1/gsfonts/n022023l.pfm: Printer Font Metrics NimbusMonL-ReguObli, 2742 bytes, Nimbus Mono L italic
/usr/share/groff/current/font/devps/generate/Makefile
;
-n
, disabling ligature
output, only applies to monospace fonts, because ligaturised tables are nightmarish:
/usr/share/groff/site-font/devDEV
.This is not without its own delta from the original:
But these differences are not only minute even at their worst (1.92pt in the synopsis line, 0.96pt in the third description line), but a result of the tighter kerning matching the font better.
In the three-day process of "printing a cyrillic document in 2020", I've also done what grops(1)
says you can,
but is clasically unhelpful in achieving, so:
In this case: Курьер Прайм, a Courier Prime with Cyrillic characters, that, unlike the latter, isn't in Debian (merging them is an ongoing adventure):
$ unzip ../courierprime.zip
Archive: ../courierprime.zip
inflating: Courier-Prime-Italic.ttf
inflating: Courier-Prime-Bold.ttf
inflating: Courier-Prime.ttf
inflating: Courier-Prime-Bold-Italic.ttf
$ for v in "" -Bold -Italic -Bold-Italic; do
for f in afm pfa; do
fontforge -c 'import sys; fontforge.open(sys.argv[1]).generate(sys.argv[2])' \
Courier-Prime$v.ttf Courier-Prime$v.$f &
done
done
# ...
Warning: Mac and Windows entries in the 'name' table differ for the
Fullname string in the language English (US)
Mac String: Courier Prime Bold Italic
Windows String: CourierPrime-BoldItalic
# cp *.pfa /usr/local/share/groff/site-font/devps/
$ for v in "" -Bold -Italic -Bold-Italic; do
afmtodit -ncmi0 \
-d /usr/share/groff/current/font/devps/DESC \
-e /usr/share/groff/current/font/devps/text.enc \
Courier-Prime$v.afm /usr/share/groff/current/font/devps/generate/textmap "C$(expr "$(echo $v | tr -d '[:lower:]-')" \| R)";
done
# cp C{R,B,I,BI} /usr/local/share/groff/site-font/devps/
# {
cat /usr/share/groff/current/font/devps/download; echo;
for v in "" -Bold -Italic -Bold-Italic; do
printf "%s\t%s\n" CourierPrime${v/d-I/dI} Courier-Prime$v.pfa;
done;
} | tee /usr/local/share/groff/site-font/devps/download
# List of downloadable fonts
# PostScript-name Filename
Symbol-Slanted symbolsl.pfa
ZapfDingbats-Reverse zapfdr.pfa
FreeEuro freeeuro.pfa
CourierPrime Courier-Prime.pfa
CourierPrime-Bold Courier-Prime-Bold.pfa
CourierPrime-Italic Courier-Prime-Italic.pfa
CourierPrime-BoldItalic Courier-Prime-Bold-Italic.pfa
GhostScript can directly load TrueType and OpenType fonts, so for a properly-packaged one all that's needed is:
# apt install fonts-liberation2
$ for v in Regular Bold Italic BoldItalic; do
fontforge -c 'import sys; fontforge.open(sys.argv[1]).generate(sys.argv[2])' /usr/share/fonts/truetype/liberation2/LiberatSans-$v.ttf LiberatSans-$v.afm;
afmtodit -cmi0 \
-d /usr/share/groff/current/font/devps/DESC \
-e /usr/share/groff/current/font/devps/text.enc \
LiberationSans-$v.afm /usr/share/groff/current/font/devps/generate/textmap "T$(echo $v | tr -d '[:lower:]-')";
done
This font contains both a 'kern' table and a 'GPOS' table.
The 'kern' table will only be read if there is no 'kern' feature in 'GPOS'.
The glyph named macron is mapped to U+02C9.
But its name indicates it should be mapped to U+00AF.
The glyph named Delta is mapped to U+0394.
But its name indicates it should be mapped to U+2206.
The glyph named Omega is mapped to U+03A9.
But its name indicates it should be mapped to U+2126.
The glyph named mu is mapped to U+03BC.
But its name indicates it should be mapped to U+00B5.
The glyph named periodcentered is mapped to U+2219.
But its name indicates it should be mapped to U+00B7.
both S_BE and S_PE map to u0053 at /bin/afmtodit line 6519.
both S_BE and S_TE map to u0053 at /bin/afmtodit line 6519.
both anoteleia and middot map to u00B7 at /bin/afmtodit line 6519.
both mu and mu1 map to mc at /bin/afmtodit line 6411.
both hookabovecomb and uni0309 map to u0309 at /bin/afmtodit line 6519.
both gravecomb and uni0340 map to u0300 at /bin/afmtodit line 6519.
both acutecomb and uni0341 map to u0301 at /bin/afmtodit line 6519.
both uni0313 and uni0343 map to u0313 at /bin/afmtodit line 6519.
both uni02B9 and uni0374 map to u02B9 at /bin/afmtodit line 6519.
both alphatonos and uni1F71 map to u03B1_0301 at /bin/afmtodit line 6519.
both epsilontonos and uni1F73 map to u03B5_0301 at /bin/afmtodit line 6519.
both etatonos and uni1F75 map to u03B7_0301 at /bin/afmtodit line 6519.
both iotatonos and uni1F77 map to u03B9_0301 at /bin/afmtodit line 6519.
both omicrontonos and uni1F79 map to u03BF_0301 at /bin/afmtodit line 6519.
both omegatonos and uni1F7D map to u03C9_0301 at /bin/afmtodit line 6519.
both Alphatonos and uni1FBB map to u0391_0301 at /bin/afmtodit line 6519.
both Epsilontonos and uni1FC9 map to u0395_0301 at /bin/afmtodit line 6519.
both Etatonos and uni1FCB map to u0397_0301 at /bin/afmtodit line 6519.
both iotadieresistonos and uni1FD3 map to u03B9_0308_0301 at /bin/afmtodit line 6519.
both Iotatonos and uni1FDB map to u0399_0301 at /bin/afmtodit line 6519.
both Upsilontonos and uni1FEB map to u03A5_0301 at /bin/afmtodit line 6519.
both dieresistonos and uni1FEE map to u00A8_0301 at /bin/afmtodit line 6519.
both Omicrontonos and uni1FF9 map to u039F_0301 at /bin/afmtodit line 6519.
both Omegatonos and uni1FFB map to u03A9_0301 at /bin/afmtodit line 6519.
both uni2000 and uni2002 map to u2002 at /bin/afmtodit line 6519.
both uni2001 and uni2003 map to u2003 at /bin/afmtodit line 6519.
both uni1FE3 and upsilondieresistonos map to u03C5_0308_0301 at /bin/afmtodit line 6519.
both uni1F7B and upsilontonos map to u03C5_0301 at /bin/afmtodit line 6519.
both patah and yodyod_patah map to u05B7 at /bin/afmtodit line 6519.
# cp T* /usr/local/share/groff/site-font/devps/
A powerful sufficient diagnostic tool can also be plainly piping groff into Ghostscript:
nabijaczleweli@tarta:~/uwu$ groff -Kutf8 -Tps -mdoc -tp < frag.1 | gs
GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS<1>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<1>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS<2>GS<5>GS<13>GS<22>GS<31>GS<40>GS<48>GS<61>GS<82>GS<97>GS<124>GS<132>GS<139>GS<150>GS<159>GS<168>GS<174>GS<181>GS<188>GS<193>GS<201>GS<209>GS<217>GS<225>GS<234>GS<242>GS<251>GS>Querying operating system for font files...
Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/CourierPrime-BoldItalic.
Can't find (or can't open) font file CourierPrime-BoldItalic.
Didn't find this font on the system!
Substituting font Courier-BoldOblique for CourierPrime-BoldItalic.
Loading NimbusMonoPS-BoldItalic font from /usr/share/ghostscript/9.27/Resource/Font/NimbusMonoPS-BoldItalic... 4371456 2971386 7864932 6531275 1 done.
GS>Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/LiberationSans-Italic.
Can't find (or can't open) font file LiberationSans-Italic.
Loading LiberationSans-Italic font from /usr/share/fonts/truetype/liberation2/LiberationSans-Italic.ttf... 4572808 3202292 8735964 7377991 1 done.
GS<1>Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/CourierPrime.
Can't find (or can't open) font file CourierPrime.
Didn't find this font on the system!
Substituting font Courier for CourierPrime.
Loading NimbusMonoPS-Regular font from /usr/share/ghostscript/9.27/Resource/Font/NimbusMonoPS-Regular... 4707152 3390954 8777444 7417724 1 done.
Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/CourierPrime-Italic.
Can't find (or can't open) font file CourierPrime-Italic.
Didn't find this font on the system!
Substituting font Courier-Oblique for CourierPrime-Italic.
Loading NimbusMonoPS-Italic font from /usr/share/ghostscript/9.27/Resource/Font/NimbusMonoPS-Italic... 4955064 3635291 8797644 7430341 1 done.
GS>Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/CourierPrime-Bold.
Can't find (or can't open) font file CourierPrime-Bold.
Didn't find this font on the system!
Substituting font Courier-Bold for CourierPrime-Bold.
Loading NimbusMonoPS-Bold font from /usr/share/ghostscript/9.27/Resource/Font/NimbusMonoPS-Bold... 5223176 3896909 8797644 7439279 1 done.
GS<1>Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/LiberationSans-Bold.
Can't find (or can't open) font file LiberationSans-Bold.
Loading LiberationSans-Bold font from /usr/share/fonts/truetype/liberation2/LiberationSans-Bold.ttf... 5255944 3939182 9686752 8286341 1 done.
Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/LiberationSans.
Can't find (or can't open) font file LiberationSans.
Loading LiberationSans font from /usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf... 5313240 3985493 10080064 8668213 1 done.
GS>GS<9>GS<18>GS<27>GS<36>GS<45>GS<54>GS<63>GS<72>GS<81>GS<90>GS<99>GS<108>GS<117>GS<126>GS<135>GS<144>GS<153>GS<162>GS<171>GS<180>GS<189>GS<198>GS<207>GS<216>GS<225>GS<234>GS<243>GS<252>GS>Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/LiberationSans-BoldItalic.
Can't find (or can't open) font file LiberationSans-BoldItalic.
Loading LiberationSans-BoldItalic font from /usr/share/fonts/truetype/liberation/LiberationSans-BoldItalic.ttf... 5333440 4004198 10467904 9033893 1 done.
GS>GS<9>GS<18>GS<27>GS<36>GS<45>GS<54>GS<63>GS<72>GS<81>GS<90>GS<99>GS<108>GS<117>GS<126>GS<135>GS<144>GS<153>GS<162>GS<171>GS<180>GS<189>GS<198>GS<207>GS<216>GS<225>GS<234>GS<243>GS<252>GS>GS<2>GS<11>GS<20>GS<29>GS<38>GS<47>GS<56>GS<65>GS<74>GS<83>GS<92>GS<101>GS<110>GS<119>GS<128>GS<137>GS<146>GS<155>GS<164>GS<173>GS<182>GS<191>GS<200>GS<209>GS<218>GS<227>GS<236>GS<245>GS<254>GS<2>GS<7>GS<16>GS<25>GS<34>GS<43>GS<52>GS<61>GS<70>GS<79>GS<88>GS<97>GS<106>GS<115>GS<124>GS<133>GS<142>GS<151>GS<160>GS<169>GS<178>GS<187>GS<196>GS<205>GS<214>GS<223>GS<232>GS<241>GS<250>GS>GS<4>GS<13>GS<22>GS<31>GS<40>GS<49>GS<58>GS<67>GS<76>GS<84>GS<92>GS<100>GS<108>GS<117>GS<125>GS<133>GS<141>GS<149>GS<158>GS<167>GS<176>GS<185>GS<194>GS<203>GS<212>GS<221>GS<230>GS<239>GS<248>GS<257>GS<3>GS<9>GS<18>GS<27>GS<36>GS<45>GS<54>GS<63>GS<72>GS<80>GS<88>GS<96>GS<104>GS<113>GS<122>GS<130>GS<138>GS<146>GS<154>GS<163>GS<172>GS<181>GS<190>GS<199>GS<208>GS<217>GS<226>GS<235>GS<244>GS<253>GS<1>GS<6>GS<15>GS<24>GS<33>GS<42>GS<51>GS<60>GS<69>GS<78>GS<86>GS<94>GS<102>GS<111>GS<120>GS<128>GS<136>GS<144>GS<152>GS<161>GS<170>GS<179>GS<188>GS<197>GS<206>GS<215>GS<224>GS<233>GS<242>GS<251>GS>GS<3>GS<12>GS<21>GS<30>GS<39>GS<48>GS<57>GS<66>GS<74>GS<82>GS<90>GS<98>GS<106>GS<115>GS<124>GS<132>GS<140>GS<148>GS<157>GS<166>GS<175>GS<184>GS<193>GS<202>GS<211>GS<220>GS<229>GS<238>GS<247>GS<255>GS>GS<9>GS<18>GS<27>GS<36>GS<45>GS<54>GS<63>GS<72>GS<81>GS<90>GS<99>GS<108>GS<117>GS<126>GS<135>GS<144>GS<153>GS<162>GS<171>GS<180>GS<189>GS<198>GS<207>GS<216>GS<225>GS<234>GS<243>GS<252>GS<1>GS<6>GS<15>GS<24>GS<33>GS<42>GS<51>GS<60>GS<68>GS<77>GS<86>GS<95>GS<104>GS<113>GS<122>GS<131>GS<140>GS<149>GS<158>GS<167>GS<176>GS<185>GS<194>GS<203>GS<212>GS<221>GS<230>GS<239>GS<248>GS<257>GS>GS<9>GS<18>GS<27>GS<36>GS<45>GS<54>GS<63>GS<72>GS<81>GS<90>GS<99>GS<108>GS<117>GS<126>GS<135>GS<144>GS<153>GS<162>GS<171>GS<180>GS<189>GS<198>GS<207>GS<216>GS<225>GS<234>GS<243>GS<252>GS>GS<3>GS<12>GS<21>GS<30>GS<39>GS<48>GS<57>GS<66>GS<75>GS<84>GS<93>GS<102>GS<111>GS<120>GS<129>GS<138>GS<147>GS<155>GS<164>GS<173>GS<182>GS<191>GS<200>GS<209>GS<218>GS<227>GS<236>GS<245>GS<254>GS<2>GS<7>GS<16>GS<25>GS<33>GS<42>GS<51>GS<60>GS<69>GS<78>GS<87>GS<96>GS<104>GS<113>GS<122>GS<131>GS<140>GS<149>GS<158>GS<167>GS<176>GS<185>GS<194>GS<203>GS<212>GS<221>GS<230>GS<239>GS<248>GS<257>GS>GS>GS>GS>GS>GS>GS>GS<2>GS<1>GS<1>GS<2>GS<2>GS<2>GS<1>GS<1>GS<1>GS>GS>GS>GS<2>GS<1>GS>GS<1>GS>GS>GS<1>GS>GS<1>GS>GS<1>GS<2>GS<1>GS>GS>GS>GS<2>GS>GS<1>GS<1>GS>GS<2>GS>GS>GS>GS<1>GS>GS<1>GS<2>GS>GS>GS<1>GS<1>GS>GS<2>GS<1>GS>GS<1>GS<1>GS>GS>GS<2>GS>GS<2>GS<2>GS>GS<1>GS>GS<2>GS<1>GS>GS>GS<2>GS>GS<1>GS<4>GS<2>GS>GS>GS<1>GS<1>GS<1>GS>>>showpage, press <return> to continue<<
GS>GS>GS>GS>
frag.1
in question, as minimised from voreutils'
cut.1
:
.Dd
.Dt CUT 1
.Os
.
.Sh NAME
.Nm cut
.Nd extract bytes, characters, or fields
.Sh SYNOPSIS
.Nm
.Fl f Ar range Ns Oo , Ns Ar range Oc Ns …
.Op Fl Czs
.Op Fl d Ar elimiter
.Op Fl O Ar out-delimiter
.Oo Ar file Oc Ns …
.
.Sh DESCRIPTION
Copies bytes, characters, or fieflds specified by
.Ar range Ns s
from each line of the input
.Ar file Ns s Pq standard input stream if none or Sq Ar -
to the standard output stream.
.Pp
.Ar range Ns s
can be separated by commas or spaces, and each can be in the format:
.Bl -tag -compact -offset Ds -width "from-to"
.It Ar number
.Brq Ar number
.It Ar from Ns Cm -
.Ar [ from , No \[if] )
.It Ar from Ns Cm - Ns Ar to
.Ar [ number , to ]
.It Li " " Cm - Ns Ar to
.Li [ 1 , Ar to ]
.El
Indices are
.Em 1 Ns -based ,
and a union is taken of all
.Ar range Ns s .
.Pp
With
.Fl b ,
bytes are extracted; with
.Fl n
characters are never interrupted mid-sequence, with rounding preferred down
.Pq see Sx EXAMPLES .
.Pp
The newline
.Pq NUL with Fl z
is never matched and always printed
.Pq unless the entire line was removed with Fl fs .
.
.Sh OPTIONS (ŁĄĘĆŻŹŃ łąęćżźń)
.Bl -tag -compact -width "-O, --output-delimiter=out-delim"
.It Fl b , -bytes Ns = Ns Ar range Ns Oo , Ns Ar range Oc Ns …
Extract bytes.
.It Fl n
Don't interrupt multi-byte character sequences.
.Pp
.It Fl C , -complement
Invert
.Ar range Ns s :
select all
.Em but
what they match
.Pq \&[ Ns Sy 1 , No \[if] ) \- \[*S] Ns Ar range .
.\" TODO?: this should be a Big Union, but, well
For the purposes of
.Fl n ,
the most minimal set of
.Ar range Ns s
is constructed.
.El
.
.Sh EXAMPLES
.Bd -literal -compact
.Li $ Nm printf Li '\ex01\ex02\ex03\ex04\e0\ex05\ex06\ex07' \&| Nm Fl zb Ar 1,3- Li \&| Nm hexdump Fl C
00000000 01 03 04 00 05 07 00 |.......|
00000007
.Ed
.Pp
.Bd -literal -compact
.Li $ Nm printf Li 'яйцо\enЯЙЦО'\f(CB'яйцо\enЯЙЦО'\fP\f(CI'яйцо\enЯЙЦО'\fP\f[CBI]'яйцо\enЯЙЦО'\fP \&| Nm Fl c Ar 1,3-
яцо\fRяцо\fP\fBяцо\fP\fIяцо\fP\f(BIяцо\fP
ЯЦО\fRЯЦО\fP\fBЯЦО\fP\fIЯЦО\fP\f(BIЯЦО\fP
.Ed
.Pp
.Bd -literal -compact
# name, IDs, homedir, shell, ...
.Li $ Nm Fl f Ar 1,3-4,6- Fl d Ns Ar \&: Fl O Ns Li \&"$( Ns Nm printf Li '\et')" Pa /etc/passwd
root 0 0 /root /bin/bash
bin 2 2 /bin /usr/sbin/nologin
irc 39 39 /var/run/ircd /usr/sbin/nologin
cicada 1000 100 /home/cicada /bin/bash
nobody 65534 65534 /nonexistent /usr/sbin/nologin
# Everything else: password and GNATS
.Li $ Nm Fl Cf Ar 1,3-4,6- Fl d Ns Ar \&: Fl O Ns Li \&"$( Ns Nm printf Li '\et')" Pa /etc/passwd
x root
x bin
x ircd
x ŁĄĘĆŻŹŃ łąęćżźń,,,
x nobody
.Ed
Nit-pick? Correction? Improvement? Annoying? Cute? Anything? Don't hesitate to post
or open an issue!