007. Cleanly enabling Cyrillic and broad Unicode output in groff -Tps

Sat, 28 Aug 2021 18:36:44 +0200, updated Thu, 02 Sep 2021 20:41:02 +0200

$ mkdir devps; cd devps
$ for f in {C,T}{R,B,I,BI}; do
    in="$(awk '$1 == "internalname" {print $2; exit}' "/usr/share/groff/current/font/devps/$f")";
    echo $f: $in;
    src="$(awk -v cur="/$in" '  
      $1 ~ /^\// && $2 ~ /^[\/\(]/ {
        if($2 ~ /^\//)
          aliases[$1] = $2;
        else
          paths[$1] = $2;
      }
      END {
        while(cur in aliases)
          cur = aliases[cur];
        if(cur in paths) {
          sub(/^\(/, "", paths[cur]);
          sub(/(pfb)?\)$/, "", paths[cur]);
          print paths[cur]
        } else {
          print "last: " cur > "/dev/stderr";
          exit 1
        }
      }' /var/lib/ghostscript/fonts/Fontmap)";  
    echo ${src}afm;
    afmtodit $(expr "$f" : C > /dev/null && printf -- -n) -cmi0 \
      -d /usr/share/groff/current/font/devps/DESC \
      -e /usr/share/groff/current/font/devps/text.enc \
      "${src}afm" /usr/share/groff/current/font/devps/generate/textmap "$f";
  done
CR: Courier
/usr/share/fonts/type1/gsfonts/n022003l.afm
CB: Courier-Bold
/usr/share/fonts/type1/gsfonts/n022004l.afm
CI: Courier-Oblique
/usr/share/fonts/type1/gsfonts/n022023l.afm
CBI: Courier-BoldOblique
/usr/share/fonts/type1/gsfonts/n022024l.afm
TR: Times-Roman
/usr/share/fonts/type1/gsfonts/n021003l.afm
TB: Times-Bold
/usr/share/fonts/type1/gsfonts/n021004l.afm
TI: Times-Italic
/usr/share/fonts/type1/gsfonts/n021023l.afm
TBI: Times-BoldItalic
/usr/share/fonts/type1/gsfonts/n021024l.afm

# ln -s /usr/local/share/groff/site-font /usr/share/groff
# mkdir -p /usr/local/share/groff/site-font
# cp -r ../devps /usr/local/share/groff/site-font

That's it. Same exact fonts as groff was already using, but with the appropriate metrics; on Debian — where all of these are Nimbus variants — this includes Cyrillic and the extended Latin planes, among others, because unlike Times, Nimbus is an actual font. Render with groff -Kutf8 …:

Example manual page with Cyrillic and Polish characters

# Prior art

We've all been there, writing a manual that needs example text wider than US-ASCII, or to accredit a Russian. groff provides ć as \['c] and ł as \[/l], gracefully noting that the latter is Polish, and that's about it. It, of course, is rather tempting to, then, "just write something", as it is, after all, anno domini twenty twenty. This yields:

nabijaczleweli@tarta:~/uwu$ groff -Tps -mdoc -tp < frag.1 | lp -dPDF
pic:<standard input>:10: invalid input character code 128
pic:<standard input>:14: invalid input character code 128
pic:<standard input>:52: invalid input character code 129
pic:<standard input>:52: invalid input character code 132
pic:<standard input>:52: invalid input character code 152
pic:<standard input>:52: invalid input character code 134
pic:<standard input>:52: invalid input character code 131
pic:<standard input>:52: invalid input character code 130
pic:<standard input>:52: invalid input character code 133
pic:<standard input>:52: invalid input character code 153
pic:<standard input>:52: invalid input character code 135
pic:<standard input>:52: invalid input character code 132
pic:<standard input>:54: invalid input character code 128
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:82: invalid input character code 143
pic:<standard input>:82: invalid input character code 134
pic:<standard input>:82: invalid input character code 153
pic:<standard input>:82: invalid input character code 158
pic:<standard input>:83: invalid input character code 143
pic:<standard input>:83: invalid input character code 134
pic:<standard input>:83: invalid input character code 143
pic:<standard input>:83: invalid input character code 134
pic:<standard input>:83: invalid input character code 143
pic:<standard input>:83: invalid input character code 134
pic:<standard input>:83: invalid input character code 143
pic:<standard input>:83: invalid input character code 134
pic:<standard input>:83: invalid input character code 143
pic:<standard input>:83: invalid input character code 134
pic:<standard input>:84: invalid input character code 158
pic:<standard input>:84: invalid input character code 158
pic:<standard input>:84: invalid input character code 158
pic:<standard input>:84: invalid input character code 158
pic:<standard input>:84: invalid input character code 158
pic:<standard input>:100: invalid input character code 129
pic:<standard input>:100: invalid input character code 132
pic:<standard input>:100: invalid input character code 152
pic:<standard input>:100: invalid input character code 134
pic:<standard input>:100: invalid input character code 131
pic:<standard input>:100: invalid input character code 130
pic:<standard input>:100: invalid input character code 133
pic:<standard input>:100: invalid input character code 153
pic:<standard input>:100: invalid input character code 135
pic:<standard input>:100: invalid input character code 132
request id is PDF-978 (0 file(s))

And:

Same page, but non-ASCII characters are replaced with garbage, usually variant on the 'Å'

A great time! The Arch Wiki says that to "Correctly display Polish diacritics", you can groff -Kutf8 -Tdvi -mec -ms test.ms > test.dvi, so:

nabijaczleweli@tarta:~/uwu$ groff -Kutf8 -Tps -mdoc -tp < frag.1 | lp -dPDF
troff: <standard input>:52: warning: can't find special character 'u0041_0328'
troff: <standard input>:52: warning: can't find special character 'u0045_0328'
troff: <standard input>:52: warning: can't find special character 'u005A_0307'
troff: <standard input>:52: warning: can't find special character 'u005A_0301'
troff: <standard input>:52: warning: can't find special character 'u004E_0301'
troff: <standard input>:52: warning: can't find special character 'u0061_0328'
troff: <standard input>:52: warning: can't find special character 'u0065_0328'
troff: <standard input>:52: warning: can't find special character 'u007A_0307'
troff: <standard input>:52: warning: can't find special character 'u007A_0301'
troff: <standard input>:52: warning: can't find special character 'u006E_0301'
troff: <standard input>:82: warning: can't find special character 'u044F'
troff: <standard input>:82: warning: can't find special character 'u0438_0306'
troff: <standard input>:82: warning: can't find special character 'u0446'
troff: <standard input>:82: warning: can't find special character 'u043E'
troff: <standard input>:82: warning: can't find special character 'u042F'
troff: <standard input>:82: warning: can't find special character 'u0418_0306'
troff: <standard input>:82: warning: can't find special character 'u0426'
troff: <standard input>:82: warning: can't find special character 'u041E'
request id is PDF-979 (0 file(s))

And:

Same page, the ellipsis is rendered, but non-ASCII characters apart from ŁĆ are entirely missing

This is, arguably, not better – ł, ć, and the ellipsis have appeared, but everything else disappeared. It's unclear why exactly those three, considering the ellipsis can't be reached by a classic groff_char(7) escape, and all three become \[uXXXX]-style escapes after passing through preconv.

The same Arch Wiki article says that as part of "Adding support for cyrillic fonts", you can preprocess with iconv -f utf-8 -t koi8-r. This is preceded by downloading an unspecified font pack out of about 8 from a Russian FTP, aided by the READMEs therein also being in KOI8-R (i.e. '' in UTF-8), and mangling the Ghostscript font map into pointing at that font for Times-Roman.

It's impossible to tell how the author expects literally any other Times variant (TI, TB, TBI), or any of the four fixed-width fonts, to render Cyrillic. And it ignores the obvious point that it replaces the font, which means the pages will look different.

It's only reasonable to expect the Russosphere to have some sort of answer to this, and, indeed, a 2015 post has a workable solution; it rambles and arrives at its point round-aboutly, converting to and from KOI8-R, as well as requiring a ru.tmac package (not included, but can be found on the groff mailing list, as can a plethora of other hacks of varying insanity) which isn't required at all; however:

More formally, groff does not have metrics for russian fonts. But I do have! I have metrics for Times fonts and I need to put them into /usr/share/groff/current/font/devps, replacing already existing. My metrics are in devps.tar attach.

Indeed! devps.tar contains a perfectly serviceable set of usr/share/groff/current/font/devps/T{R,B,I,BI} metrics:

nabijaczleweli@tarta:~/uwu$ groff -Kutf8 -Tps -mdoc -tp < frag.1 | lp -dPDF
troff: <standard input>:82: warning: can't find special character 'u044F'
troff: <standard input>:82: warning: can't find special character 'u0438_0306'
troff: <standard input>:82: warning: can't find special character 'u0446'
troff: <standard input>:82: warning: can't find special character 'u043E'
troff: <standard input>:82: warning: can't find special character 'u042F'
troff: <standard input>:82: warning: can't find special character 'u0418_0306'
troff: <standard input>:82: warning: can't find special character 'u0426'
troff: <standard input>:82: warning: can't find special character 'u041E'
troff: <standard input>:100: warning: can't find special character 'u0041_0328'
troff: <standard input>:100: warning: can't find special character 'u0045_0328'
troff: <standard input>:100: warning: can't find special character 'u005A_0307'
troff: <standard input>:100: warning: can't find special character 'u005A_0301'
troff: <standard input>:100: warning: can't find special character 'u004E_0301'
troff: <standard input>:100: warning: can't find special character 'u0061_0328'
troff: <standard input>:100: warning: can't find special character 'u0065_0328'
troff: <standard input>:100: warning: can't find special character 'u007A_0307'
troff: <standard input>:100: warning: can't find special character 'u007A_0301'
troff: <standard input>:100: warning: can't find special character 'u006E_0301'
request id is PDF-980 (0 file(s))

Rendering to:

Same page, proportional non-ASCII characters rendered, monospace ones missing

A cursory inspection of the metric reveals that they're for internalname NimbusRomNo9L-Regu and friends, generated Thu Aug 2 13:14:49 2007, as was the Nimbus in Buster; a cursory correlation with the provided Fontmap reveals that the mappings are as today. But it's still undistributable, still missing the Courier variants, and, regardless, still quite different from a metric generated with up-to-date groff.

# Current art

This is the bit where I vaguely explain how that big hunk at the top generates the unsolvable:

  1. /usr/share/groff/current/font/devps/DESC says that there's the usual four styles (R, B, I, and BI), and that the default family is T; there's no point in getting cute and parsing this.
  2. groff font metrics have an internalname directive, corresponding to the PostScript font requested (for TI it's Times-Italic, for CI it's Courier-Oblique, &c.).
  3. The Ghostscript font map (/var/lib/ghostscript/fonts/Fontmap from /etc/ghostscript/fontmap.d/‌* by update-gsfontmap) is used to override font paths and alias fonts to others (well, it's arbitrary Ghostscript code that happens to match font names, but doing anything besides name/alias/path mapping has thankfully died down as a practice), with the interesting mappings being in the format:
    /NimbusRomNo9L-ReguItal (/usr/share/fonts/type1/gsfonts/n021023l.pfb) ;
    /Times-Italic /NimbusRomNo9L-ReguItal ;
    /NimbusMonL-ReguObli (/usr/share/fonts/type1/gsfonts/n022023l.pfb) ;
    /Courier-Oblique /NimbusMonL-ReguObli ;
    
    This can be trivially walked by first exhausting all name/name mappings, then resolving the final name/path mapping.
  4. For each font in /usr/share/fonts/type1/gsfonts/, there are three files:
    nabijaczleweli@tarta:~/uwu/devps$ file /usr/share/fonts/type1/gsfonts/{n022004l,n022023l}.*
    /usr/share/fonts/type1/gsfonts/n022004l.afm: ASCII font metrics
    /usr/share/fonts/type1/gsfonts/n022004l.pfb: PostScript Type 1 font program data
    /usr/share/fonts/type1/gsfonts/n022004l.pfm: Printer Font Metrics NimbusMonL-Bold, 2738 bytes, Nimbus Mono L bold
    
    /usr/share/fonts/type1/gsfonts/n022023l.afm: ASCII font metrics
    /usr/share/fonts/type1/gsfonts/n022023l.pfb: PostScript Type 1 font program data
    /usr/share/fonts/type1/gsfonts/n022023l.pfm: Printer Font Metrics NimbusMonL-ReguObli, 2742 bytes, Nimbus Mono L italic
    
    The second is the actual font, the other two are metrics.
  5. These metrics must be converted to groff's internal format, the flags gracefully provided by /usr/share/groff/current/font/devps/generate/Makefile; -n, disabling ligature output, only applies to monospace fonts, because ligaturised tables are nightmarish: Tab-aligned /etc/passwd, one of the usernames is 'fifl', both pairs ligaturised throwing the second through fourth columns two columns left out of alignment, and the fifth four ).
  6. groff searches for site-specific fonts in /usr/share/groff/site-font/devDEV.

This is not without its own delta from the original:

Visual diff between the page with no non-ASCII characters and the one with them; the differences stack up most toward the end of the line on about half of the lines, and are within antialiasing noise on the other half

But these differences are not only minute even at their worst (1.92pt in the synopsis line, 0.96pt in the third description line), but a result of the tighter kerning matching the font better.

# Alternative art

In the three-day process of "printing a cyrillic document in 2020", I've also done what grops(1) says you can, but is clasically unhelpful in achieving, so:

# A foreign font

In this case: Курьер Прайм, a Courier Prime with Cyrillic characters, that, unlike the latter, isn't in Debian (merging them is an ongoing adventure):

$ unzip ../courierprime.zip
Archive:  ../courierprime.zip
  inflating: Courier-Prime-Italic.ttf
  inflating: Courier-Prime-Bold.ttf
  inflating: Courier-Prime.ttf
  inflating: Courier-Prime-Bold-Italic.ttf
$ for v in "" -Bold -Italic -Bold-Italic; do
    for f in afm pfa; do
      fontforge -c 'import sys; fontforge.open(sys.argv[1]).generate(sys.argv[2])' \
        Courier-Prime$v.ttf Courier-Prime$v.$f &
    done
  done
# ...
Warning: Mac and Windows entries in the 'name' table differ for the
 Fullname string in the language English (US)
 Mac String: Courier Prime Bold Italic
Windows String: CourierPrime-BoldItalic
# cp *.pfa /usr/local/share/groff/site-font/devps/

$ for v in "" -Bold -Italic -Bold-Italic; do
    afmtodit -ncmi0 \
      -d /usr/share/groff/current/font/devps/DESC \
      -e /usr/share/groff/current/font/devps/text.enc \
      Courier-Prime$v.afm /usr/share/groff/current/font/devps/generate/textmap "C$(expr "$(echo $v | tr -d '[:lower:]-')" \| R)";
  done
# cp C{R,B,I,BI} /usr/local/share/groff/site-font/devps/

# {
    cat /usr/share/groff/current/font/devps/download; echo;
    for v in "" -Bold -Italic -Bold-Italic; do
      printf "%s\t%s\n" CourierPrime${v/d-I/dI} Courier-Prime$v.pfa;
    done;
  } | tee /usr/local/share/groff/site-font/devps/download
# List of downloadable fonts
‌# PostScript-name       Filename

Symbol-Slanted          symbolsl.pfa
ZapfDingbats-Reverse    zapfdr.pfa
FreeEuro                freeeuro.pfa

CourierPrime    Courier-Prime.pfa
CourierPrime-Bold       Courier-Prime-Bold.pfa
CourierPrime-Italic     Courier-Prime-Italic.pfa
CourierPrime-BoldItalic Courier-Prime-Bold-Italic.pfa

Yielding:

Same page, monospace non-ASCII characters rendered in a different, rounder, heavier font, with a long f, and proportional ones missing

# A distribution font

GhostScript can directly load TrueType and OpenType fonts, so for a properly-packaged one all that's needed is:

# apt install fonts-liberation2

$ for v in Regular Bold Italic BoldItalic; do
    fontforge -c 'import sys; fontforge.open(sys.argv[1]).generate(sys.argv[2])' /usr/share/fonts/truetype/liberation2/LiberatSans-$v.ttf LiberatSans-$v.afm;
  
    afmtodit -cmi0 \
      -d /usr/share/groff/current/font/devps/DESC \
      -e /usr/share/groff/current/font/devps/text.enc \
      LiberationSans-$v.afm /usr/share/groff/current/font/devps/generate/textmap "T$(echo $v | tr -d '[:lower:]-')";
  done
This font contains both a 'kern' table and a 'GPOS' table.
  The 'kern' table will only be read if there is no 'kern' feature in 'GPOS'.
The glyph named macron is mapped to U+02C9.
But its name indicates it should be mapped to U+00AF.
The glyph named Delta is mapped to U+0394.
But its name indicates it should be mapped to U+2206.
The glyph named Omega is mapped to U+03A9.
But its name indicates it should be mapped to U+2126.
The glyph named mu is mapped to U+03BC.
But its name indicates it should be mapped to U+00B5.
The glyph named periodcentered is mapped to U+2219.
But its name indicates it should be mapped to U+00B7.

both S_BE and S_PE map to u0053 at /bin/afmtodit line 6519.
both S_BE and S_TE map to u0053 at /bin/afmtodit line 6519.
both anoteleia and middot map to u00B7 at /bin/afmtodit line 6519.
both mu and mu1 map to mc at /bin/afmtodit line 6411.
both hookabovecomb and uni0309 map to u0309 at /bin/afmtodit line 6519.
both gravecomb and uni0340 map to u0300 at /bin/afmtodit line 6519.
both acutecomb and uni0341 map to u0301 at /bin/afmtodit line 6519.
both uni0313 and uni0343 map to u0313 at /bin/afmtodit line 6519.
both uni02B9 and uni0374 map to u02B9 at /bin/afmtodit line 6519.
both alphatonos and uni1F71 map to u03B1_0301 at /bin/afmtodit line 6519.
both epsilontonos and uni1F73 map to u03B5_0301 at /bin/afmtodit line 6519.
both etatonos and uni1F75 map to u03B7_0301 at /bin/afmtodit line 6519.
both iotatonos and uni1F77 map to u03B9_0301 at /bin/afmtodit line 6519.
both omicrontonos and uni1F79 map to u03BF_0301 at /bin/afmtodit line 6519.
both omegatonos and uni1F7D map to u03C9_0301 at /bin/afmtodit line 6519.
both Alphatonos and uni1FBB map to u0391_0301 at /bin/afmtodit line 6519.
both Epsilontonos and uni1FC9 map to u0395_0301 at /bin/afmtodit line 6519.
both Etatonos and uni1FCB map to u0397_0301 at /bin/afmtodit line 6519.
both iotadieresistonos and uni1FD3 map to u03B9_0308_0301 at /bin/afmtodit line 6519.
both Iotatonos and uni1FDB map to u0399_0301 at /bin/afmtodit line 6519.
both Upsilontonos and uni1FEB map to u03A5_0301 at /bin/afmtodit line 6519.
both dieresistonos and uni1FEE map to u00A8_0301 at /bin/afmtodit line 6519.
both Omicrontonos and uni1FF9 map to u039F_0301 at /bin/afmtodit line 6519.
both Omegatonos and uni1FFB map to u03A9_0301 at /bin/afmtodit line 6519.
both uni2000 and uni2002 map to u2002 at /bin/afmtodit line 6519.
both uni2001 and uni2003 map to u2003 at /bin/afmtodit line 6519.
both uni1FE3 and upsilondieresistonos map to u03C5_0308_0301 at /bin/afmtodit line 6519.
both uni1F7B and upsilontonos map to u03C5_0301 at /bin/afmtodit line 6519.
both patah and yodyod_patah map to u05B7 at /bin/afmtodit line 6519.

# cp T* /usr/local/share/groff/site-font/devps/

And just like that:

Same page, proportional non-ASCII characters rendered in a serifless font with busted kerning and regular and bold-italic cyrillic replaced by boxes, and monospace ones missing

# \nЯЙЦО

A powerful sufficient diagnostic tool can also be plainly piping groff into Ghostscript:

nabijaczleweli@tarta:~/uwu$ groff -Kutf8 -Tps -mdoc -tp < frag.1 | gs
GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS<1>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<3>GS<1>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS>GS<2>GS<5>GS<13>GS<22>GS<31>GS<40>GS<48>GS<61>GS<82>GS<97>GS<124>GS<132>GS<139>GS<150>GS<159>GS<168>GS<174>GS<181>GS<188>GS<193>GS<201>GS<209>GS<217>GS<225>GS<234>GS<242>GS<251>GS>Querying operating system for font files...
Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/CourierPrime-BoldItalic.
Can't find (or can't open) font file CourierPrime-BoldItalic.
Didn't find this font on the system!
Substituting font Courier-BoldOblique for CourierPrime-BoldItalic.
Loading NimbusMonoPS-BoldItalic font from /usr/share/ghostscript/9.27/Resource/Font/NimbusMonoPS-BoldItalic... 4371456 2971386 7864932 6531275 1 done.
GS>Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/LiberationSans-Italic.
Can't find (or can't open) font file LiberationSans-Italic.
Loading LiberationSans-Italic font from /usr/share/fonts/truetype/liberation2/LiberationSans-Italic.ttf... 4572808 3202292 8735964 7377991 1 done.
GS<1>Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/CourierPrime.
Can't find (or can't open) font file CourierPrime.
Didn't find this font on the system!
Substituting font Courier for CourierPrime.
Loading NimbusMonoPS-Regular font from /usr/share/ghostscript/9.27/Resource/Font/NimbusMonoPS-Regular... 4707152 3390954 8777444 7417724 1 done.
Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/CourierPrime-Italic.
Can't find (or can't open) font file CourierPrime-Italic.
Didn't find this font on the system!
Substituting font Courier-Oblique for CourierPrime-Italic.
Loading NimbusMonoPS-Italic font from /usr/share/ghostscript/9.27/Resource/Font/NimbusMonoPS-Italic... 4955064 3635291 8797644 7430341 1 done.
GS>Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/CourierPrime-Bold.
Can't find (or can't open) font file CourierPrime-Bold.
Didn't find this font on the system!
Substituting font Courier-Bold for CourierPrime-Bold.
Loading NimbusMonoPS-Bold font from /usr/share/ghostscript/9.27/Resource/Font/NimbusMonoPS-Bold... 5223176 3896909 8797644 7439279 1 done.
GS<1>Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/LiberationSans-Bold.
Can't find (or can't open) font file LiberationSans-Bold.
Loading LiberationSans-Bold font from /usr/share/fonts/truetype/liberation2/LiberationSans-Bold.ttf... 5255944 3939182 9686752 8286341 1 done.
Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/LiberationSans.
Can't find (or can't open) font file LiberationSans.
Loading LiberationSans font from /usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf... 5313240 3985493 10080064 8668213 1 done.
GS>GS<9>GS<18>GS<27>GS<36>GS<45>GS<54>GS<63>GS<72>GS<81>GS<90>GS<99>GS<108>GS<117>GS<126>GS<135>GS<144>GS<153>GS<162>GS<171>GS<180>GS<189>GS<198>GS<207>GS<216>GS<225>GS<234>GS<243>GS<252>GS>Can't find (or can't open) font file /usr/share/ghostscript/9.27/Resource/Font/LiberationSans-BoldItalic.
Can't find (or can't open) font file LiberationSans-BoldItalic.
Loading LiberationSans-BoldItalic font from /usr/share/fonts/truetype/liberation/LiberationSans-BoldItalic.ttf... 5333440 4004198 10467904 9033893 1 done.
GS>GS<9>GS<18>GS<27>GS<36>GS<45>GS<54>GS<63>GS<72>GS<81>GS<90>GS<99>GS<108>GS<117>GS<126>GS<135>GS<144>GS<153>GS<162>GS<171>GS<180>GS<189>GS<198>GS<207>GS<216>GS<225>GS<234>GS<243>GS<252>GS>GS<2>GS<11>GS<20>GS<29>GS<38>GS<47>GS<56>GS<65>GS<74>GS<83>GS<92>GS<101>GS<110>GS<119>GS<128>GS<137>GS<146>GS<155>GS<164>GS<173>GS<182>GS<191>GS<200>GS<209>GS<218>GS<227>GS<236>GS<245>GS<254>GS<2>GS<7>GS<16>GS<25>GS<34>GS<43>GS<52>GS<61>GS<70>GS<79>GS<88>GS<97>GS<106>GS<115>GS<124>GS<133>GS<142>GS<151>GS<160>GS<169>GS<178>GS<187>GS<196>GS<205>GS<214>GS<223>GS<232>GS<241>GS<250>GS>GS<4>GS<13>GS<22>GS<31>GS<40>GS<49>GS<58>GS<67>GS<76>GS<84>GS<92>GS<100>GS<108>GS<117>GS<125>GS<133>GS<141>GS<149>GS<158>GS<167>GS<176>GS<185>GS<194>GS<203>GS<212>GS<221>GS<230>GS<239>GS<248>GS<257>GS<3>GS<9>GS<18>GS<27>GS<36>GS<45>GS<54>GS<63>GS<72>GS<80>GS<88>GS<96>GS<104>GS<113>GS<122>GS<130>GS<138>GS<146>GS<154>GS<163>GS<172>GS<181>GS<190>GS<199>GS<208>GS<217>GS<226>GS<235>GS<244>GS<253>GS<1>GS<6>GS<15>GS<24>GS<33>GS<42>GS<51>GS<60>GS<69>GS<78>GS<86>GS<94>GS<102>GS<111>GS<120>GS<128>GS<136>GS<144>GS<152>GS<161>GS<170>GS<179>GS<188>GS<197>GS<206>GS<215>GS<224>GS<233>GS<242>GS<251>GS>GS<3>GS<12>GS<21>GS<30>GS<39>GS<48>GS<57>GS<66>GS<74>GS<82>GS<90>GS<98>GS<106>GS<115>GS<124>GS<132>GS<140>GS<148>GS<157>GS<166>GS<175>GS<184>GS<193>GS<202>GS<211>GS<220>GS<229>GS<238>GS<247>GS<255>GS>GS<9>GS<18>GS<27>GS<36>GS<45>GS<54>GS<63>GS<72>GS<81>GS<90>GS<99>GS<108>GS<117>GS<126>GS<135>GS<144>GS<153>GS<162>GS<171>GS<180>GS<189>GS<198>GS<207>GS<216>GS<225>GS<234>GS<243>GS<252>GS<1>GS<6>GS<15>GS<24>GS<33>GS<42>GS<51>GS<60>GS<68>GS<77>GS<86>GS<95>GS<104>GS<113>GS<122>GS<131>GS<140>GS<149>GS<158>GS<167>GS<176>GS<185>GS<194>GS<203>GS<212>GS<221>GS<230>GS<239>GS<248>GS<257>GS>GS<9>GS<18>GS<27>GS<36>GS<45>GS<54>GS<63>GS<72>GS<81>GS<90>GS<99>GS<108>GS<117>GS<126>GS<135>GS<144>GS<153>GS<162>GS<171>GS<180>GS<189>GS<198>GS<207>GS<216>GS<225>GS<234>GS<243>GS<252>GS>GS<3>GS<12>GS<21>GS<30>GS<39>GS<48>GS<57>GS<66>GS<75>GS<84>GS<93>GS<102>GS<111>GS<120>GS<129>GS<138>GS<147>GS<155>GS<164>GS<173>GS<182>GS<191>GS<200>GS<209>GS<218>GS<227>GS<236>GS<245>GS<254>GS<2>GS<7>GS<16>GS<25>GS<33>GS<42>GS<51>GS<60>GS<69>GS<78>GS<87>GS<96>GS<104>GS<113>GS<122>GS<131>GS<140>GS<149>GS<158>GS<167>GS<176>GS<185>GS<194>GS<203>GS<212>GS<221>GS<230>GS<239>GS<248>GS<257>GS>GS>GS>GS>GS>GS>GS>GS<2>GS<1>GS<1>GS<2>GS<2>GS<2>GS<1>GS<1>GS<1>GS>GS>GS>GS<2>GS<1>GS>GS<1>GS>GS>GS<1>GS>GS<1>GS>GS<1>GS<2>GS<1>GS>GS>GS>GS<2>GS>GS<1>GS<1>GS>GS<2>GS>GS>GS>GS<1>GS>GS<1>GS<2>GS>GS>GS<1>GS<1>GS>GS<2>GS<1>GS>GS<1>GS<1>GS>GS>GS<2>GS>GS<2>GS<2>GS>GS<1>GS>GS<2>GS<1>GS>GS>GS<2>GS>GS<1>GS<4>GS<2>GS>GS>GS<1>GS<1>GS<1>GS>>>showpage, press <return> to continue<<
GS>GS>GS>GS>

frag.1 in question, as minimised from voreutils' cut.1:

.Dd
.Dt CUT 1
.Os
.
.Sh NAME
.Nm cut
.Nd extract bytes, characters, or fields
.Sh SYNOPSIS
.Nm
.Fl f Ar range Ns Oo , Ns Ar range Oc Ns …
.Op Fl Czs
.Op Fl d Ar elimiter
.Op Fl O Ar out-delimiter
.Oo Ar file Oc Ns …
.
.Sh DESCRIPTION
Copies bytes, characters, or fieflds specified by
.Ar range Ns s
from each line of the input
.Ar file Ns s Pq standard input stream if none or Sq Ar -
to the standard output stream.
.Pp
.Ar range Ns s
can be separated by commas or spaces, and each can be in the format:
.Bl -tag -compact -offset Ds -width "from-to"
.It Ar number
.Brq Ar number
.It Ar from Ns Cm -
.Ar [ from , No \[if] )
.It Ar from Ns Cm - Ns Ar to
.Ar [ number , to ]
.It Li "   " Cm - Ns Ar to
.Li [ 1 , Ar to ]
.El
Indices are
.Em 1 Ns -based ,
and a union is taken of all
.Ar range Ns s .
.Pp
With
.Fl b ,
bytes are extracted; with
.Fl n
characters are never interrupted mid-sequence, with rounding preferred down
.Pq see Sx EXAMPLES .
.Pp
The newline
.Pq NUL with Fl z
is never matched and always printed
.Pq unless the entire line was removed with Fl fs .
.
.Sh OPTIONS (ŁĄĘĆŻŹŃ łąęćżźń)
.Bl -tag -compact -width "-O, --output-delimiter=out-delim"
.It Fl b , -bytes Ns = Ns Ar range Ns Oo , Ns Ar range Oc Ns …
Extract bytes.
.It Fl n
Don't interrupt multi-byte character sequences.
.Pp
.It Fl C , -complement
Invert
.Ar range Ns s :
select all
.Em but
what they match
.Pq \&[ Ns Sy 1 , No \[if] ) \- \[*S] Ns Ar range .
.\" TODO?: this should be a Big Union, but, well
For the purposes of
.Fl n ,
the most minimal set of
.Ar range Ns s
is constructed.
.El
.
.Sh EXAMPLES
.Bd -literal -compact
.Li $ Nm printf Li '\ex01\ex02\ex03\ex04\e0\ex05\ex06\ex07' \&| Nm  Fl zb Ar 1,3- Li \&| Nm hexdump Fl C
00000000  01 03 04 00 05 07 00                              |.......|
00000007
.Ed
.Pp
.Bd -literal -compact
.Li $ Nm printf Li 'яйцо\enЯЙЦО'\f(CB'яйцо\enЯЙЦО'\fP\f(CI'яйцо\enЯЙЦО'\fP\f[CBI]'яйцо\enЯЙЦО'\fP \&| Nm  Fl c Ar 1,3-
яцо\fRяцо\fP\fBяцо\fP\fIяцо\fP\f(BIяцо\fP
ЯЦО\fRЯЦО\fP\fBЯЦО\fP\fIЯЦО\fP\f(BIЯЦО\fP
.Ed
.Pp
.Bd -literal -compact
# name, IDs, homedir, shell, ...
.Li $ Nm Fl f Ar 1,3-4,6- Fl d Ns Ar \&: Fl O Ns Li \&"$( Ns Nm printf Li '\et')" Pa /etc/passwd
root    0       0       /root   /bin/bash
bin     2       2       /bin    /usr/sbin/nologin
irc     39      39      /var/run/ircd   /usr/sbin/nologin
cicada  1000    100     /home/cicada    /bin/bash
nobody  65534   65534   /nonexistent    /usr/sbin/nologin
# Everything else: password and GNATS
.Li $ Nm Fl Cf Ar 1,3-4,6- Fl d Ns Ar \&: Fl O Ns Li \&"$( Ns Nm printf Li '\et')" Pa /etc/passwd
x       root
x       bin
x       ircd
x       ŁĄĘĆŻŹŃ łąęćżźń,,,
x       nobody
.Ed


Nit-pick? Correction? Improvement? Annoying? Cute? Anything? Don't hesitate to post or open an issue!


Creative text licensed under CC-BY-SA 4.0, code licensed under The MIT License.
This page is open-source, you can find it at GitHub, and contribute and/or yell at me there.
Like what you see? Consider giving me a follow over at social medias listed here, or maybe even a sending a buck or two patreon my way if my software helped you in some significant way?
Automatically generated with Clang 11's C preprocessor on 02.09.2021 20:40:45 CEST from src/blogn_t/007-groff-Tps-cyrillic-et-al.html.pp.
RSS feed