A little Scala project to convert AsciiDoc to clean, simple HTML

I recently started using AsciiDoc to write a new book. A great thing about it is that unlike Markdown, you can use AsciiDoc to write a book and get all of the features you want in a book, including linking between anything, captions for tables and figures, indexes, etc. Because this got me started using AsciiDoc I thought, “Wouldn’t it be nice if I could also use AsciiDoc to write blog posts like this one?”

Sadly, I quickly ran into a problem: I couldn’t find a good way to convert AsciiDoc into HTML, or even Markdown. There are tools to convert AsciiDoc to HTML, but for some reason they take the approach of including a ton of markup in the HTML (divs, spans, and attributes), and as far as I can tell there’s no way to turn off that markup.

A shell script solution

I initially gave up on the problem and went back to Markdown for blog posts, but then I remembered that JSoup had a way of “cleaning” HTML files. So I came up with this process:

  • Read an AsciiDoc file as a string
  • Convert that to markup-heavy HTML using AsciidoctorJ
  • Use JSoup to clean up (simplify) that HTML

My initial solution was to write a shell script that looks like this:

object AsciidocToHtml extends App {

    // PART 1: READ THE INPUT FILE
    // ---------------------------
    val filename = args(0).trim
    val file = new File(filename)
    if (!file.exists()) {
        System.err.println(s"The file $filename does not exist. Quitting.")
        System.exit(2)
    }
    val fileContents = Source.fromFile(filename).getLines.mkString("\n")


    // PART 2: Convert AsciiDoc -> HTML with AsciidoctorJ
    // --------------------------------------------------
    val asciidoctor: Asciidoctor = Asciidoctor.Factory.create
    val html: String = asciidoctor
        .convert(
            fileContents,
            new HashMap[String, Object]()
        )


    // PART 3: The AsciidoctorJ HTML has a lot of markup, so clean it up with JSoup
    // ----------------------------------------------------------------------------

    val wl = Whitelist.simpleText
    configureWhitelistTagsToKeep(wl)
    configureWhitelistAttributes(wl)

    val cleanButUglyHtml = Jsoup.clean(html, wl)
    val prettierHtml = insertBlankLinesBeforeHtmlTags(cleanButUglyHtml)
    println(prettierHtml)

}

That code depends on a few utility functions that I don’t show here, but if you’re interested in the complete solution you can find it at this Github project:

A JavaFX GUI

After writing the shell script on Saturday I woke up this morning and decided to wrap a little JavaFX GUI around the code. After you put a little AsciiDoc in the main window and press the Convert button, this is what the GUI looks like:

AsciiDoc to HTML GUI

As another update, I later decided to add a “Preview” button next to the Convert button, so you can preview the HTML in a JavaFX WebView. This seems like a nice way to validate the correctness of the AsciiDoc.

If you’re interested in the GUI, the source code for it is in the same repository as the shell script.

Summary

In summary, if you’re interested in converting AsciiDoc to HTML, I hope this solution is helpful. And many thanks to the creators of AsciidoctorJ and JSoup for doing all of the heavy lifting for this process.