Building a Website with Make4ht: Basic Typesetting
Getting Started
As discussed in the preceding article, the goal of this project is to generate both PDF and HTML documents from near-typical LaTeX input files. As such, knowledge of how to use LaTeX to generate at least a basic article-type document is considered a prerequisite for understanding this article. The Overleaf tutorials are an excellent place to start for absolute beginners.
To lay the groundwork for this project, the system, directories, and tools must be described in a bit more detail.
This project was built on a computer running Debian 13, thus all descriptions given of commands, file paths, etc. will be
using the conventions of that branch of Linux. Users of Windows, Mac, or other Linux distributionsA Those derived from Red Hat and Arch. should
make note to use their system’s equivalents if necessary. The root directory of this project will be
considered to be ~/Website, though there will be some work that needs to done outside of it (more on that
later).
The tools (programs) used for this project are mostly provided by TeXLive, a large software package of nearly every LaTeX compiler and package one could want. More specifically, it will provide LuaTeX for compiling LaTeX source files into PDF files, BibTeX for handling citations in an organized manner, and Make4ht for compiling LaTeX source files into HTML files. Some version of TeXLive is almost assuredly available from Linux users’ package manager for easy install. The details of integrating specific LaTeX packages into the system will be discussed later.
The other major tools that will be used are Tidy, Minify, and Python. Tidy is a tool that cleans up HTML code in various ways and will provide a way to validate the output of Make4ht and any post-processing. Minify reduces the file size of HTML documents by eliminating semantically unnecessary data such as white space formatting that might be useful to a human reading the raw HTML but wasted on a web browser parsing it for display. Python will be providing several services. It will be used directly for post-processing the Make4ht output in various ways, and it will be called as a part of the Make4ht program when highlighting code via the Pygments package. Python may also be used to host a local HTTP server so that the final HTML documents can be checked naturally as if hosted by a real web server. B Being only an HTTP server, there is some functionality lost, but all the most important things like linking will be handled correctly.
A Basic Test Case
To check that all the tooling is in place, it would be wise to verify that by compiling a simple test
document. The text for just such a document is found in Listing 1. To test that things are working as
intended, simply save the document ~/Website/BasicExample.tex, compile it to a PDF document lualatex BasicExample.tex, compile it to an HTML
document make4ht BasicExample.tex, and start the local HTTP server python -m http.server. With default settings, the server should have the IP
address 127.0.0.1 on port 8000. Thus, the webpage should be able to be seen in a web browser at the address http://127.0.0.1:8000/BasicExample.html.
\documentclass[11pt]{article} % Basic article class
\usepackage[margin=2cm]{geometry} % Default margins are huge. Shrink them
\usepackage{lipsum} % Generate Lorem Ipsum Text
\title{A Basic Test Case}
\author{Jane Doe}
\date{1980/01/01}
\begin{document}
\maketitle
\lipsum[1]
\end{document}
If both the PDF was successfully compiled and the website is visible in a web browser, then congratulations! From this point, one could go and create their own website with Make4ht. However, Make4ht is extremely configurable, and a great deal of work remains if one should want to create a website that is tailored to their wants and needs.
Customizing Make4ht
Customizing Make4ht is fairly straightforward if one knows just a bit of its inner workings. The next section details the theory of operation of Make4ht such that readers might understand the succeeding section in which configuration files are introduced. The theory will also prove useful when it comes time to define new commands that need to work first in LaTeX and then have a different behavior when compiled by Make4ht.
Theory of Operation
Figure 1 details the order in which LaTeX one should consider files to be loaded into Make4ht to prepare for the compilation of the target file.C The reality of the load order is more complicated, but this sketch allows one to understand where and when commands are overwritten in a clear and concise way. Note that this sequence starts by loading the normal LaTeX files first, then extending and/or overwriting definitions from those files with those sourced from.4ht files. This is what allows the Make4ht compiler to inject HTML code in and around the content typically produced by LuaLaTeX. Also note that the final two files to be loaded are the configuration file and then the target file. This allows the user to redefine anything they need to from the configuration file without polluting the target file with commands that LuaTeX may not understand.
While commands from any package used may be overwritten in the configuration file, there is a better way to do
so in a clean and extensible manner. This entails the creation of the directory ~/texmf/tex/latex and then subdirectories for
any given package and/or document class. This will be discussed later in the context of creating a
custom document class that is better structured for the needs of this project than the default article
class.
Configuration Files
To use a configuration file, one must invoke the -c or --config option followed by the name of the configuration
file, e.g. make4ht --config myconfig.cfg mydocument.tex. Listing 2 provides a first example of a configuration file. In order, the options given
to the \Preamble{⋯} command specify to produce valid XML code, produce stricter HTML code,disable font
processing in the document, prevent the creation of a default CSS file, and produce only MathML
markup for equations. These options result in the production a well-formed HTML page with no bells or
whistles. This is desirable as it means that all further customization is as mutable as the user wants it to
be. More information on the options available for the \Preamble command may be found at the following link.
\Preamble{xhtml,html+,NoFonts,-css,mathml}
\AddToHook{begindocument}{
%Custom page opening
\Configure{HTML}{\HCode{<html lang="en">}}{\HCode{</html>}}
%Custom <head> content
\Configure{HEAD}{\HCode{<head>}}{\HCode{</head>}}
\Configure{@HEAD}{}
\Configure{@HEAD}{\HCode{<meta name="author" content="}\@author\HCode{"/>}}
\Configure{@HEAD}{\HCode{<meta name="description" content="}\@abstract\HCode{"/>}}
\Configure{@HEAD}{\HCode{<meta name="keywords" content="}\@keywords\HCode{"/>}}
\Configure{@HEAD}{\HCode{<meta charset="UTF-8"/>}}
\Configure{@HEAD}{
\HCode{<meta name="viewport" content="width=device-width, initial-scale=1.0"/>}
}
\Configure{@HEAD}{
\HCode{<link rel="icon" type="image/x-icon" href="/style/favicon.svg"/>}
}
\Configure{TITLE}{\HCode{<title>}}{\HCode{</title>}}
\Configure{TITLE+}{\@title}
\Configure{AddCss}{/style/style.css}
%Custom <body> content
\Configure{@BODY}{
\HCode{<!--Header--><header>}}
\HCode{</header>}}
\Configure{@BODY}{\HCode{<!--Main--><main>}}
\Configure{@/BODY}{\EndP\HCode{</main>}}
\Configure{@/BODY}{
\HCode{<!--Footer--><footer>}
\@copyright
\HCode{</footer>}}
\Configure{@/BODY}{\HCode{<!--Scripts--><script></script>}}
%Custom <section>
\Configure{section}
{\FinishPar\HCode{<section class="section">}}
{\FinishPar\HCode{</section>}}
{\HCode{<h2 id="\@section" class="linkable">}}
{\HCode{</h2>}}
\Configure{subsection}
{\FinishPar\HCode{<section class="subsection">}}
{\FinishPar\HCode{</section>}}
{\HCode{<h3 id="\@section-\@subsection" class="linkable">}}
{\HCode{</h3>}}
\Configure{subsubsection}
{\FinishPar\HCode{<section class="subsubsection">}}
{\FinishPar\HCode{</section>}}
{\HCode{<h4 id="\@section-\@subsection-\@subsubsection" class="linkable">}}
{\HCode{</h4>}}
%Custom <p> content
\Configure{HtmlPar}{\EndP\HCode{<p>}}{\EndP\HCode{<p>}}{\HCode{</p>}}{\HCode{</p>}}
%Custom bolded, italicized, and underlined handling
\renewcommand{\textbf}[1]{
\HCode{<span class="boldtext">#1</span>}
}
\renewcommand{\textit}[1]{
\HCode{<span class="italictext">#1</span>}
}
\renewcommand{\texttt}[1]{
\HCode{<span class="monotext">#1</span>}
}
\renewcommand{\underline}[1]{
\HCode{<span class="underlinetext">#1</span>}
}
\renewcommand{\uuline}[1]{
\HCode{<span class="doubleunderlinetext">#1</span>}
}
\renewcommand{\uwave}[1]{
\HCode{<span class="waveunderlinetext">#1</span>}
}
\renewcommand{\dashuline}[1]{
\HCode{<span class="dashunderlinetext">#1</span>}
}
\renewcommand{\dotuline}[1]{
\HCode{<span class="dotunderlinetext">#1</span>}
}
\renewcommand{\textoverline}[1]{
\HCode{<span class="overlinetext">#1</span>}
}
\renewcommand{\sout}[1]{
\HCode{<span class="striketext">#1</span>}
}
%End AddToHook{begindocument}
}
%\href
\renewcommand{\href}[2]{
\HCode{<a class="textlink" href="#1">#2</a>}
}
%\url
\renewcommand{\url}[1]{
\HCode{<a class="textlink" href="#1">#1</a>}
}
\begin{document}
\EndPreamble
Within the configuration files, the most used commands are probably \Configure and \ConfigureEnv. These allow
one to specify the HTML tags (and other content) to inject into the document such that the HTML
DOM accurately represents what the user intends. There are a number of special configuration targets
such as HEAD and TITLE as well as configuration targets that are simply commands such as section. A wealth of
information on configurations can be found here, but note that the documentation is rather dense for the
uninitiated.
Stepping through the \Configure commands in Listing 2, one should first note line 4. This is a command to configure
the <html>⋯</html> block that surrounds all the content in an HTML document. The first argument specifies that
as the target, the second defines the opening tag with the attribute lang="en", and the third simply closes
the document. This is a typical pattern amongst many of the configuration commands. Note also
the \HCode{⋯} commands; these properly escape special characters such that the HTML tags may be written
succinctly.
The <head>⋯</head> block is slightly more complicated. Here the \Configure{HEAD} command follows the pattern just discussed, but the
following \Configure{@HEAD} commands first clear any default tags within the HEAD block, then adds new meta tags one at a time. The
customization of the HEAD block is finalized by setting the title, with the \Configure{TITLE+}{\@title} command setting the actual
value of the title, and adding the path to a CSS file for later customization of the look and feel of the
website.
The BODY block is mostly the same as before, though with the appearance of the \Configure{@/BODY}{⋯} command. While the
preceding commands added content to the top of the BODY block, these new commands add content to the
bottom of it. This allows the user to customize the tags before and after the normal content of the BODY
block.
At this point, the basics of the customization have been covered. The HTML document can be successfully created and some facets of the HTML can be customized such that it is well formed for the modern web and can be styled with CSS (see this article\). Indeed, sectioning, paragraphing, and both types of lists should work out-of-the-box. However, there is much more to cover as the next section details how to customize section IDs such that they can be easily linked to with meaningful paths. Text decorations and external links will also be discussed to round out this article.
Identifiers and Links
Ensuring the consistent generation of linkable, meaningful section IDsD Indeed, much of this section can be skipped if the user is content with the auto-generated IDs. is a somewhat complicated process
that requires a deep knowledge of LaTeX. As this document is not a LaTeX tutorial, a great deal of the precise
details will be skipped. However, there should be enough information provided so that an enterprising reader may
follow along. With that stated, the proposed manner in which to achieve this goal is to rewrite a small portion of
the base LaTeX article class such that a command, \@section, is created that accurately and uniquelyE Unique within the bounds of some to-be-discussed assumptions, that is. defines a
section, subsection, or subsubsection so that it may be readily used in the configuration file to define a linkable
ID.
Following that, this section also details the configuration of external links and text decorations. This is substantially easier than the preceding discussion on section IDs.
A Custom Document Class
Creating a document class in LaTeX is a daunting challenge, however there are resources to help. For this project, the
base LaTeX article class, found in the TeXLive install directory /usr/share/texlive/texmf-dist/ at .../tex/latex/base/article.cls, was copied and modified so as to leave
most of its structure intact. It was copied to ~/texmf/tex/latex/htmlarticle/htmlarticle.cls. It will also be necessary to copy the corresponding .4ht file, .../tex/generic/tex4ht/article.4ht, to ~/texmf/tex/latex/htmlarticle/htmlarticle.4ht. Note that the destination of these is outside of the ~/Website directory where the LaTeX document source files are
created and compiled to PDF and HTML outputs.
Within the newly copied htmlarticle.cls file, one should find the section dedicated to defining the sectioning commands ( \section,
\subsection, and \subsubsection ). This is to be replaced with the code in Listing 3. Note that this uses the new expl3 layer for LaTeX
programming1The expl3 package and LaTeX3 programming. https://texdoc.org/serve/expl3.pdf/0. Accessed: 2026-01-25. ,2The LaTeX3 Interfaces. https://texdoc.org/serve/interface3.pdf/0. Accessed: 2026-01-25. , and thus may be as comprehensible as ancient Sumerian to even seasoned LaTeX
users. The cited documents should provide enough guidance to understand the code with time and
attention.
Starting from the top, there is first the definition of the empty \@referenceoverride command that will do nothing when
compiling with LuaTeX, but will be defined to perform a task during the Make4ht compilation. Just
below that, the \@section command is defined. When used, it will simply return the token list (text) stored
within it. Below that, one finds the actual definition of the \section command. Going line-by-line, the
command:
- Assigns a local temporary variable the value of the command’s argument
- Replaces any spaces in the variable with underscores.
- Clears the old value of
\@section - Assigns
\@sectionthe value of the variable - Sets up the formatting when typesetting the document
- Defines the current label
- Defines the current label’s name
- Uses the label command to create a linkable location
- And finally calls the empty command defined earlier
%Define empty referenceoverride command for the PDF documents.
%To be defined for the HTML.
\def\@referenceoverride{}
%Define section, subsection, etc
\ExplSyntaxOn
\NewExpandableDocumentCommand{\@section}{}{
\tl_use:c { g__mysection_tl }
}
\NewDocumentCommand{\section}{m}{
\tl_set:Nn \l_tmpa_tl {#1}
\regex_replace_all:nnN {\ }{\_} \l_tmpa_tl
\tl_clear_new:c { g__mysection_tl }
\tl_gset:ce { g__mysection_tl }{ \tl_use:N \l_tmpa_tl }
\@startsection{section}{1}{-\parindent}{1.0em}{0.5em}
{\centering\normalfont\Large\bfseries} {#1}
\protected@edef\@currentlabel{\@arabic\c@section}
\protected@edef\@currentlabelname{#1}
\label{sec:\@section}
\@referenceoverride
}
\NewExpandableDocumentCommand{\@subsection}{}{
\tl_use:c { g__mysubsection_tl }
}
\NewDocumentCommand{\subsection}{m}{
\tl_set:Nn \l_tmpa_tl {#1}
\regex_replace_all:nnN {\ }{\_} \l_tmpa_tl
\tl_clear_new:c { g__mysubsection_tl }
\tl_gset:ce { g__mysubsection_tl }{ \tl_use:N \l_tmpa_tl }
\@startsection{subsection}{2}{-\parindent}{1.0em}{0.5em}
{\normalfont\large\bfseries} {#1}
\protected@edef\@currentlabel{\@arabic\c@section.\@arabic\c@subsection}
\protected@edef\@currentlabelname{#1}
\label{sec:\@section-\@subsection}
\@referenceoverride
}
\ExplSyntaxOff
\newcommand{\@subsubsection}{}
\ExplSyntaxOn
\newcommand{\subsubsection}[1]{
\tl_set:Nn \l_tmpa_tl {#1}
\regex_replace_all:nnN {\ }{\_} \l_tmpa_tl
\edef\@subsubsection{\tl_use:N \l_tmpa_tl}
\@startsection{subsubsection}{3}{-\parindent}{1.0em}{0.5em}
{\normalfont\normalsize\bfseries} {#1}
\protected@edef\@currentlabel{
\@arabic\c@section.\@arabic\c@subsection.\@arabic\c@subsubsection}
\protected@edef\@currentlabelname{#1}
\label{sec:\@section-\@subsection-\@subsubsection}
\@referenceoverride
}
\ExplSyntaxOff
The definitions for subsections and subsubsections that follow the definition for sections are quite similar with
the only notable difference being the \label command. These instances simply stack the \@section, \@subsection, and \@subsubsection commands as
necessary to ensure a readable, probably uniqueF A user could conceivably repeat a sequence with the (im)proper choice of section headings, but this should be a rare circumstance under normal use. In any event, the worst that will happen is inappropriate linking; the document will otherwise be well-formed. identifier. This matches precisely with the use of the
the commands in lines 35-50 of Listing 2. Here, the first pair of parameters of the \Configure commands
set up the semantic HTML elements correctly while the second pair set up the section headings such that they are
linkable.
Configuring References
The previous section describes only how to ensure that the link IDs are formatted correctly such that a \ref
command might appropriately reference them and be correctly reflected in the HTML document. However, there
remains some work to configure the various reference commands themselves. This will assume that the hyperref package
will always be used.G As it should be. It is very good. This seems rather complex, but it really isn’t any more so than setting up the commands
in the previous section.
Listing 4 provides the code one should append to the end of the htmlarticle.4ht file. It begins by redefining the \label command
such that it makes use of the \@current... commands defined in each of the sectioning command definitions in Listing 3. This
guarantees that the values stored for each label in the AUX file will be exactly as desired. The next sections redefine
the \ref, \nameref, and \autoref commands. The \pageref command is excluded as it makes little sense in the context of an infinitely
scrolling web page.
To pick the most substantial of these to discuss, the definition of \autoref starts by declaring two arguments
for the command. The first, s creates a star variant of the command, \autoref* and provides a boolean
value as the first parameter. The second is a typical mandatory argument. After that, going down
line-by-line:
- A local temporary variable is created and assigned the value of the second argument.
- A regular expression is used to strip the type prefix from the ID.
- A temporary (non-local) variable A is assigned the value of the stripped ID.
- A temporary variable B is assigned the value of the label. The
\csname r@#2\endcsnamerecovers it directly as LaTeX actually stores labels as control sequences. - A temporary variable C is assigned the value of the fourth part of the label.
- A temporary variable D is assigned the value of the third part of the label.
- Lines 83-106 are actually one command. Broken down, it:
- Performs a regex extraction on C to get the value stored after the first period.
- Feeds the extracted value into a switch-case statement to match it with the correct statement.
- If a match is found, a local variable E is assigned the value (Type Name) (Value of D). E.g Figure 3, Equation 1a, or Listing 2.
- If no match is found, throw an error.
- If no regex match was found from the start, throw an error.
- If the
\autorefcommand was starred, print the text without creating a link. - If the command was not starred, create the link.
%Return the label command to it's form from the href package,
%overwriting any 4ht definition
\def\label#1{%
\@bsphack
\begingroup
\def\label@name{#1}%
\label@hook
\protected@write\@auxout{}{%
\string\newlabel{#1}{%
{\@currentlabel}%
{\thepage}%
{\@currentlabelname}%
{\@currentHref}{}%
}%
}%
\endgroup
\@esphack
}%
%Customize the ref, nameref, and autoref commands to use the information
%provided by the above \label definition
%referenceoverride
\ExplSyntaxOn
\def\@referenceoverride{
\RenewDocumentCommand{\ref}{s m}{
\tl_set:Nn \l_tmpa_tl {##2}
\regex_replace_all:nnN {^.+:}{} \l_tmpa_tl
\edef\tmpa{\tl_use:N \l_tmpa_tl}
\edef\tmpb{\csname r@##2\endcsname\relax}
\edef\tmpc{\expandafter\@firstoffive\tmpb}
\IfBooleanTF{##1}{
\tmpc
}{
\HCode{<a~class="textlink"~href="\#\tmpa">}
\tmpc
\HCode{</a>}
}
}}
\ExplSyntaxOff
%nameref
\ExplSyntaxOn
\AddToHook{begindocument/end}{
\RenewDocumentCommand{\nameref}{s m}{
\IfBooleanTF{#1}{
\NR@ref@showkeys{#2}%
\begingroup
\let\label\@gobble
\NR@setref{#2}\@thirdoffive{#2}
\endgroup
}{
\tl_set:Nn \l_tmpa_tl {#2}
\regex_replace_all:nnN {^.+:}{} \l_tmpa_tl
\edef\tmp{\tl_use:N \l_tmpa_tl}
\HCode{<a~class="textlink"~href="\#\tmp">}
\NR@ref@showkeys{#2}%
\begingroup
\let\label\@gobble
\NR@setref{#2}\@thirdoffive{#2}
\endgroup
\HCode{</a>}
}
}}
\ExplSyntaxOff
%autoref
\ExplSyntaxOn
\AddToHook{begindocument/end}{
\RenewDocumentCommand{\autoref}{s m}{
\tl_set:Nn \l_tmpa_tl {#2}
\regex_replace_all:nnN {^.+:}{} \l_tmpa_tl
\edef\tmpa{\tl_use:N \l_tmpa_tl}
\edef\tmpb{\csname r@#2\endcsname\relax}
\edef\tmpc{\expandafter\@fourthoffive\tmpb}
\edef\tmpd{\expandafter\@thirdoffive\tmpb}
\regex_extract_once:nVNTF{\A(\w+)\.(.+)\Z}\tmpc\l_tmp_seq{
\str_case:enF{\seq_item:Nn\l_tmp_seq{2}}{
{part}{\edef\tmpe{\partautorefname\space\tmpd}}
{chapter}{\edef\tmpe{\chapterautorefname\space\tmpd}}
{section}{\edef\tmpe{\sectionautorefname\space\tmpd}}
{subsection}{\edef\tmpe{\subsectionautorefname\space\tmpd}}
{subsubsection}{\edef\tmpe{\subsubsectionautorefname\space\tmpd}}
{paragraph}{\edef\tmpe{\paragraphautorefname\space\tmpd}}
{subparagraph}{\edef\tmpe{\subparagraphautorefname\space\tmpd}}
{line}{\edef\tmpe{\lineautorefname\space\tmpd}}
{page}{\edef\tmpe{\pageautorefname\space\tmpd}}
{equation}{\edef\tmpe{\equationautorefname\space\tmpd}}
{theorem}{\edef\tmpe{\theoremautorefname\space\tmpd}}
{figure}{\edef\tmpe{\figureautorefname\space\tmpd}}
{table}{\edef\tmpe{\tableautorefname\space\tmpd}}
{item}{\edef\tmpe{\itemautorefname\space\tmpd}}
{listing}{\edef\tmpe{\listingautorefname\space\tmpd}}
{footnote}{\edef\tmpe{\footnoteautorefname\space\tmpd}}
{appendix}{\edef\tmpe{\appendixautorefname\space\tmpd}}
}
{Error:~Unable~to~match~autoref~type~\l_tmpb_tl ! \def\tmpd{Error!}}
}{Error:~Unable~to~extract~autoref~type~from~reference:~#2 !
\def\tmpd{Error!
}}
\IfBooleanTF{#1}{
\tmpe
}{
\HCode{<a~class="textlink"~href="\#\tmpa">}
\tmpe
\HCode{</a>}
}
}}
\ExplSyntaxOff
Voila! The references should now be set up such that plain, human-readable links can be used. This may have seemed like a great deal of work for just that, but it will come in handy as more functionality gets added, i.e. as functionality for equations, tables, listings, and images is added. The next section is mercifully short.
External Links and Text Decorations
Configuring external links is almost laughably easy now. Indeed, it has already been accomplished by lines 89-97 in Listing 2. All that was required was to renew the commands and inject the HTML code as one would expect. It just works.
The basic text decorations are also extremely easy, with one small exception. Most of functions are either built
in or readily available with the ulem package. However, overlines are not natively defined in either case. To achieve
that, one must simply create the command \textoverline which takes one mandatory argument and has the body {$\overline{\hbox{#1}}\m@th$} to
the htmlarticle.cls file. With that, the commands can be simply renewed as seen in lines 55-85 of Listing 2. Note
that for these to then visually appear in the HTML document, CSS support must be added for each
class.
Moving Forward
The next article in the project focuses on getting references and footnotes working. This will be the first time that post-processing is added to the system such that complex behavior can be achieved beyond what is readily possible with Make4ht and LaTeX. Otherwise, it will largely be building on what has already been demonstrated here.
References
- The expl3 package and LaTeX3 programming. https://texdoc.org/serve/expl3.pdf/0. Accessed: 2026-01-25.
- The LaTeX3 Interfaces. https://texdoc.org/serve/interface3.pdf/0. Accessed: 2026-01-25.
Footnotes
- Those derived from Red Hat and Arch.
- Being only an HTTP server, there is some functionality lost, but all the most important things like linking will be handled correctly.
- The reality of the load order is more complicated, but this sketch allows one to understand where and when commands are overwritten in a clear and concise way.
- Indeed, much of this section can be skipped if the user is content with the auto-generated IDs.
- Unique within the bounds of some to-be-discussed assumptions, that is.
- A user could conceivably repeat a sequence with the (im)proper choice of section headings, but this should be a rare circumstance under normal use. In any event, the worst that will happen is inappropriate linking; the document will otherwise be well-formed.
- As it should be. It is very good.