Marked features for Apex - BrettTerpstra.com

When Apex reaches 1.0, I’m planning to include it in Marked 3. I realized that Marked has a lot of preprocessing features that were previously handled in Objective-C that would make sense to have in the core processor for both speed and accessibility from the command line.

So I’ve added a bunch of new flags and C API definitions to Apex that bring some of Marked’s capabilities directly into the processor. These are all available via command-line flags, configuration options, and the C API.

Hashtags

The --hashtags flag converts #tags in your Markdown into span-wrapped hashtags. By default, it uses the mkhashtag class, but you can use --style-hashtags to use mkstyledtag instead.

apex document.md --hashtags
apex document.md --hashtags --style-hashtags

This is smart enough to skip hashtags inside code blocks and HTML attributes, so you won’t get false matches in things like href="#anchor" or code examples.

Hashtags are disabled by default because they would conflict with headers if you’re not in the habit of putting a space after the # in an ATX header (e.g. #Header 1). This option is only for people who want to convert things like Bear notes with tag formatting. They can be enabled with --hashtags on the command line, by including hashtags: true in a config file, or by using the enable_hashtags boolean when using the C API.

Random Footnote IDs

When you’re combining multiple documents, footnote ID collisions can be a problem. The --random-footnote-ids flag generates hash-based footnote IDs using an 8-character hex prefix from the document content.

Instead of fn-1 and fnref-1, you’ll get fn-a7b3c9d2-1 and fnref-a7b3c9d2-1. Different documents get different hash prefixes, so you can safely combine them without conflicts.

Widon’t for Headings

The --widont flag prevents short widows in headings by inserting non-breaking spaces between trailing words. It works backwards from the end of the heading, combining words until the trailing portion exceeds 10 characters.

So a heading like “introduction to the topic” becomes introduction to the topic — ensuring that if the heading wraps, the trailing portion won’t be a short, lonely word on its own line.

Widon’t is disabled by default in all modes, as it might create potentially unexpected results if the user isn’t aware of it. It can be explicitly enabld with --widont, widont: true in config, or with the enable_widont boolean in the C API.

Code is Poetry

Code blocks without a programming language specified can be treated as poetry with the --code-is-poetry flag. This adds a poetry class to code blocks that don’t have a language specified, and automatically enables --highlight-language-only so only code blocks with languages get syntax highlighting.

Works for both fenced code blocks and indented code blocks.

Again, this is disabled by default as it has very specific use cases.

Proofreader Mode

The --proofreader flag converts ==highlight== and ~~delete~~ syntax into CriticMarkup highlight and deletion. It automatically enables CriticMarkup processing, so you can use this simpler syntax and still get the full CriticMarkup rendering.

Markdown in HTML Toggle

Marked has always processed markdown inside HTML blocks with markdown attributes. Now you can control this behavior with --markdown-in-html and --no-markdown-in-html. It’s enabled by default in unified mode, but you can toggle it for other modes or when you need stricter behavior.

Page Breaks

Two new flags for page break handling:

--hr-page-break replaces <hr> elements with Marked-style page break divs
--page-break-before-footnotes inserts a page break before the footnotes section

Both use Marked’s standard page break format with proper attributes for styling and identification.

Title from H1

When using --standalone, the --title-from-h1 flag extracts the text from the first H1 heading and uses it as the document title if no title is specified via --title or metadata. The H1 stays in the document body, but now it’s also in the <title> tag.

All Available Everywhere

All of these features are available through:

Command-line flags (as shown above)
Configuration files (config.yml, --meta-file)
Document metadata (YAML front matter, MultiMarkdown metadata)
The C API (via the apex_options struct)

This means you can set defaults in your config file, override per-document in metadata, or use command-line flags for one-off processing. The flexibility is there.

Why Build These Into Apex?

These features were originally preprocessing steps in Marked’s Objective-C code. Moving them into Apex provides:

Speed — C-based processing is faster than Objective-C string manipulation
Accessibility — Available from the command line without needing Marked
Consistency — Same behavior whether called from Marked or directly
Extensibility — Other tools can use these features via the C API

As Apex approaches 1.0, these features will make the integration with Marked a little easier while also making some of Marked’s capabilities available to anyone using Apex directly. I know none of these recent changes are killer features for most people, so this is just serving as documentation of the development process.

As always, as Apex develops, I’m very interested in what features you’d like to see. The goal is to have one universal processor that, at least for HTML output¹, can do everything that other Markdown flavors can do. If you have ideas or requests, or want to contribute to the development, please join me on GitHub!

I’m not going to try to replicate all of Pandoc’s powerful conversion features, as you can just pipe Apex HTML output into Pandoc for easy conversion to PDF, DOCX, etc. I’m just focusing on making as many extensions for HTML output as possible work. ↩

apex, criticmarkup, features, markdown, marked