Syntax highlighting is fun, and you won't believe this one weird trick

Over the last couple of days I got obsessed with wrangling my code snippet collection, once again. It’s not healthy, but it is what it is. I dug back into Snibbets, a tool for managing code snippets as plain text Markdown files that I started back in 2020. I actually got it to a really good point today, but I’m realizing that it’s getting bloated enough that it needs to become a gem before I’m ready to hype it up. The current version and mostly-up-to-date documentation are up on GitHub, though, so feel free to peek in the meantime.

But that’s not what I’m here to tell you about. In the process of working on Snibbets, I wrote a little routine that could turn a file extension into a programming language name for tagging purposes, and vice versa. It seemed ripe for making a little one-off utility, so I’ve posted a standalone version to GitHub. I’m going to be using it when I’m doing technical writing and including code samples in languages I don’t usually work with. When you create a fenced code block, you can add a “lexer” to the opening fence, e.g. ~~~ruby, which helps most platforms with properly syntax highlighting it. But then I find myself working on someone else’s Terraform code and I’m unsure whether that’s a supported language for syntax highlighting. Now I can just run lexers.rb terraform or lexers.rb tf and it will tell me all about it. What the available lexers are, what common extensions are associated with it, you know, the works.

I built this by taking the output of pygmentize -L lexers and running it through a few regular expressions to make a parseable data set. Then I took the output of skylighting -l to add a few more lexers (though those don’t have extensions listed and I don’t know many of the more obscure ones, so that data serves to search for a valid lexer, but nothing else). The script itself just builds a queryable object out of the data and offers a few different ways to get at the data (you can see the whole set at the bottom of the script). The easiest way to use it is like I mentioned above: just pass a file extension or language name to it as an argument and it will give you back the info you need. There’s more documentation in the comments of the script.

Just thought I’d share it! Check out the gist if you’re interested.

Join the conversation