gen-epub-book​ For Programmers

# Description

gen-epub-book​ is a loose file to ePub assembly/conversion tool for everyone not afraid of not using Word. gen-epub-book​ allows you to assemble ePub e-books using an easy-to-read, easy-to-write plaintext format.

Thus, "gen-epub-book​" is two things: (1) a plaintext descriptor syntax; and (2) a (set of) software tool(s), written in a multitude of languages, that assemble that descriptor and files referenced thereby into an ePub e-book. See the Getting the gist heading for details pertaining to gen-epub-book​'s descriptor syntax.

The "design goal" of gen-epub-book​’s descriptor syntax is to make it as consistent and obvious at the first sight as possible. The idea is that a gen-epub-book​ descriptor should be writable by any person with minimal technical knowledge.

gen-epub-book​ is free software, available under the MIT open source license. See the License heading for more information.

# Discussion

Any topic related to gen-epub-book​ – both the descriptor syntax and the software – is fair game for discussion over at the gen-epub-book​ issue trackers A w k and this site's issue tracker . I've also set up a GitHub issue for simple questions and clarifications.

I hope that GitHub issue system will lead to good ideas for future improvements to gen-epub-book​.

# Variants

As mentioned before, gen-epub-book​ is actually also a set of software tools each of which performing mostly the same function, but being written in a different language. Here is the current list of gen-epub-book​'s variations, chronologically:

These variations have mostly the same features, and all differences will be highlighted in this document as they come up.

Binary releases of compiled variants are available for download, for Windows and Ubuntu, on the releases pages .

# Installation and requirements

All compiled variations of gen-epub-book​ are stand-alone in that they don't depend on any files other than themselves. Some of them, however, require additional software during building/installation.

# gen-epub-book​.awk

gen-epub-book​.awk does not require compilation, which makes it require external binary tools. Here's the list outlining them, their uses, and where to get them:

Tool Use Where
curl Downloading Network-* data.
Getting random UUID.
Linux: usually shipped with system, package manager, binary releases.
Windows: binary releases.
zip Packing ePub. Binary releases.
rm/del Pre-generation cleanup. Shipped with system.
cp/copy Assembling e-book in temporary directory. Shipped with system.
mkdir Creating temporary directories. Available everywhere.
cd Proper relative paths for Info-ZIP (no, there's no way around that). Available everywhere.

These tools need to be callable as in the first column – in $PATH or otherwise.

gen-epub-book​.awk, of course, requires an implementation of A w k . Each implementation of A w k has its own quirks and therefore might not work as well as other implementations. Here's the list outlining some tested A w k implementations and their support:

Variant Support Note
gawk Yes.
mawk No. Max supported string length exceeded. wontfix

Not sure if your preferred implementation's supported? Drop a question over at one of the issue trackers – A w k !

# gen-epub-book​.rs

gen-epub-book​.rs just requires a roughly modern version of the Rust compiler. Since gen-epub-book​.rs is uploaded to crates.io , you need simply run
$ cargo install gen-epub-book

The resulting executable is fully stand-alone and available in your $PATH if installed via cargo install.

# gen-epub-book​.cpp

To be built, gen-epub-book​.cpp requires: The resulting executable is fully stand-alone.

# gen-epub-book​.scala

To build, gen-epub-book​.scala requires the Scala compiler. The resulting .jars depend only on the Scala runtime library.

# gen-epub-book​.js

Due to the specificity of the Node.js environment, gen-epub-book​.js depends on about 110 other packages. To install it, simply run

$ npm install -g epubify
The resulting "executable" will be in your $PATH.

# Configuration

The same descriptor syntax put into any gen-epub-book​ variant with any options passed shall produce (functionally) the same e-book (if they support a compatible featureset), but some gen-epub-book​ variants are configurable in their non–e-book output. Subheadings include links to manpages with more information.

# gen-epub-book​.awk

A temp variable passed from the commandline containing the desired temporary directory (usually $TEMP). This usually yields for -v temp​="$TEMP" full argument.

# gen-epub-book​.rs

The --verbose flag makes gen-epub-book​.rs print information about what it's currently doing.

# Getting the gist of gen-epub-book​'s descriptor syntax

This section offers a complete, detailed documentation for the descriptor syntax.

# Overview

gen-epub-book​'s descriptor syntax can be divided into lines. Each line is either (a) comprised of three elements:
key: value
# ​ ​^ separator
or (b) is a comment if that format is not met for the line.

key is any sequence of characters up to the separator.

separator is just :, unless the custom separator feature is enabled.

value is any sequence of characters till the end of line.

All whitespace before and after every element is stripped (removed).

Every line in the following descriptor is equivalent:
Name: The Taste of MI
Name:The Taste of MI
 ​ ​ Name ​ ​ : The Taste of MI ​ ​ ​ ​

All these lines are comments:
# A marked comment (not that it changes much)
An unmarked comment (what)Name The Taste of MI
​ ​ ​ ​ ​^ missing separatorName:
​ ​ ​ ​ ​ ​ ​^ missing value: The Taste of MI
^ missing key

# Elements

Each non-comment line forms is an element – a (key, value) pair.

A descriptor is considered valid if the following conditions are met:

The following table enumerates supported keys and their properties:

key Value type Effect Required Amount Remarks
Name Plain text. Sets book name/title tag. Yes. 1
Author Plain text. Sets book author tag. Yes. 1
Date RFC3339-compliant date. Sets book authoring/publishing date. Yes. 1
Language BCP47-compliant language code. Sets book language tag. Yes. 1
Content Path to HTML file. Includes specified file in e-book. No. Any. See additional content processing.
String-Content HTML text. Includes raw HTML in e-book. No. Any.
Image-Content Path to image file. Packs specified image file in the e-book and includes centered content therewith. No. Any.
Network-Image-Content URL to remote image file. Downloads and packs specified image file in the e-book and adds centered content therewith. No. Any.
Cover Path to image file. Packs specified image and sets cover to content pointing thereat. No. 0-1 Exclusive with Network-Cover.
Network-Cover URL to remote image file. Downloads and packs specified image and sets cover to content pointing thereat. No. 0-1 Exclusive with Cover.
Include File path. Packs specified file. No. Any.
Network-Include Remote file URL. Downloads and packs specified file. No. Any.
Description File path. Sets the book's description to the specified file's contents. No. 0-1 Exclusive with String-Description and Network-Description.
String-Description HTML text. Sets the book's description to the specified string. No. 0-1 Exclusive with Description and Network-Description.
Network-Description URL to remote HTML file. Sets the book's description to the remote file's contents. No. 0-1 Exclusive with Description and String-Description.

All local paths are relative to the descriptor file. This can be changed with the -Include dirs feature.

The included local files' packed names are the paths by which they've been included with all instances of \ replaced with /, of ../ and ./ removed, and of / replaced with -. Examples:
simple-content.html ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​​ => simple-content_html
../cover.jpg ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​​ => cover.jpg
.\../books/../covers/cover.jpg ​​ => books-covers-cover.jpg

The included remote files' packed names are the last segment (i.e. the filename) of their URL.

In addition, all Content files are checked for the <​!​-​- ePub title: "TOC_NAME" -​-> sequence, where TOC_NAME is a sequence of any character except " which is the name to give the content in the Table of Contents.

# Features

Features are extensions to the gen-epub-book​ descriptor syntax, modifying the behaviour and validity of some lines.

# Custom separator

This feature allows a custom set of characters to be used as the separator. For example:
Name: The Taste of MI
# With custom separator set to "=" becomes
Name = The Taste of MI
# Or, with custom separator set to "INCREDIBLE COMMUNISM" becomes
Name INCREDIBLE COMMUNISM The Taste of MI

# Free date format

This feature allows a gen-epub-book​ to accept non-RFC3339 date formats, whichever it can. For example, with free date format feature on these can become equivalent:
Date: 2017-08-19T21:22:31+0200
Date: Sat, 19 Aug 2017 21:22:31 +02:00
Date: 1503177751

# -Include dirs

This feature allows for specifying more root directories for finding local files.

An include directory can be named or unnamed, which affects their packed name: unnamed directories act transparently – their packed name is exactly as specified and transformed, while named directories put their files in a dedicated subdirectory named themafter, then their specified path transformed.

For example, given the following directory tree:
special_book
├── rendered
│ ​ ​ ​└── output
│ ​ ​ ​ ​ ​ ​ ​├── intro.html
│ ​ ​ ​ ​ ​ ​ ​├── main.html
│ ​ ​ ​ ​ ​ ​ ​└── ending.html
├── previews
│ ​ ​ ​└── generated
│ ​ ​ ​ ​ ​ ​ ​└── out
│ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ├── intro.html
│ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ └── main.html
└── geb
​ ​ ​ ​└── special
​ ​ ​ ​ ​ ​ ​ ​├── intro.html
​ ​ ​ ​ ​ ​ ​ ​└── book.epupp
If geb/special/book.epupp the content inside would be laid out as follows:
book.epub
├── intro.html ​ ​ ​ ​ ​​# From geb/special/
├── previews
│ ​ ​ ​└── main.html ​ ​​# From previews/generated/out/
└── ending.html ​ ​ ​ ​​# From rendered/output/

# Support table

A w k
Custom separator No. v2.0.0 No. v1.1.0 v0.2.0
Free date format No. v2.1.0 v2.0.0 No. v0.2.0
-Include dirs No. v2.0.0 v2.0.0 v1.1.0 No.

# License

The MIT License (MIT)
​
Copyright (c) 2017 nabijaczleweli
​
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
​
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
​
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

# Afterword

Thank you for making it this far. I hope that this document is clear and/or informative, if not, why don't you pop into the issues and help yourself, others and me?


Creative text licensed under CC-BY-SA 4.0, code licensed under The MIT License.
This page is open-source, you can find it at GitHub, and contribute and/or yell at me there.
Like what you see? Consider giving me a follow over at social medias listed here, or maybe even a sending a buck or two patreon my way if my software helped you in some significant way?
Automatically generated with Clang 14's C preprocessor on 11.09.2023 01:31:48 UTC from src/gen-epub-book/programmer.html.pp.
See job on builds.sr.ht.
RSS feed