1 Introduction
Someone on fedi asked how you could syntax highlight a REPL-session on your website, and since I have it on my website I decided to write this article to explain how I did it to everyone!
It is written with Megaparsec and Skylighting (Kate).
The idea is as follows:
- parse a prompt;
- parse input;
- optionally parse output;
- repeat.
The parser for the prompt decides which inputs it accepts, and the input decides what kind of output it expects.
In this article we will look at the following three scenarios:
- bash with basic commands;
- bash with the cat-command, showing the content of the file highlighted (here XML);
- GHCi.
2 How it looks
Below you can see a small REPL-session.
lira@pc:~$ echo hoihoi! hoihoi! lira@pc:~$ ghci ghci Prelude> 5+5 10 ghci Prelude> :m Data.Text ghci Prelude Data.Text> 5+5 10 ghci Prelude Data.Text> "test" "test" ghci Prelude Data.Text> putStrLn "test" test ghci Prelude Data.Text> :q Leaving GHCi. lira@pc:~$ ghci ghci Prelude> :q Leaving GHCi. lira@pc:~$ cat test.xml <?xml version="1.0" ?> <foo xmlns="https://[::1]/foo" xmlns:b="https://[::1]/bar"> <b:bar yes="no" /> </foo>
3 The code
3.1 Getting ready
Lets just write a standalone script that takes a REPL as input and returns the highlighted HTML.
First we need some imports and cabal config. We use mtl for the MonadReader, megaparsec for the parsing, blaze for the HTML generation and Skylighting for the highlighting.
#!/usr/bin/env cabal
{- cabal:
build-depends:
base, text,
containers, mtl,
megaparsec
blaze-html, blaze-markup,
skylighting, skylighting-core,
ghc-options: -Wno-tabs
-}
{-# LANGUAGE OverloadedStrings, LambdaCase, FlexibleContexts #-}
module Main where
import Data.Void (Void)
import Data.Text (Text, pack, singleton, stripEnd, cons, intercalate)
import Data.Functor (($>))
import Data.Functor.Identity (runIdentity)
import Data.List (intersperse)
import Data.Map (union)
import Control.Applicative (liftA2, (<|>), asum)
import Control.Monad.Reader (MonadReader, ask, runReaderT)
import qualified Text.Megaparsec as P
import qualified Text.Megaparsec.Char as P
import Text.Blaze.Html (Html, toHtml)
import Text.Blaze.Internal (MarkupM(..))
import qualified Text.Blaze.XHtml5 as H
import qualified Text.Blaze.XHtml5.Attributes as A
import Text.Blaze.Html.Renderer.String (renderHtml)
import Skylighting (defaultFormatOpts, tokenize, TokenizerConfig(..), lookupSyntax, defaultSyntaxMap, formatHtmlInline, SyntaxMap, SourceLine(..))
import Skylighting.Loader (loadSyntaxesFromDir)
We also need some input:
input :: Text
input = intercalate "\n"
[ "lira@pc:~$ echo hoihoi!"
, "hoihoi!"
, "lira@pc:~$ ghci"
, "ghci Prelude> 5+5"
, "10"
, "ghci Prelude> :m Data.Text"
, "ghci Prelude Data.Text> 5+5"
, "10"
, "ghci Prelude Data.Text> \"test\""
, "\"test\""
, "ghci Prelude Data.Text> putStrLn \"test\""
, "test"
, "ghci Prelude Data.Text> :q"
, "Leaving GHCi."
, "lira@pc:~$ ghci"
, "ghci Prelude> :q"
, "Leaving GHCi."
, "lira@pc:~$ cat test.xml"
, "<?xml version=\"1.0\" ?>"
, "<foo xmlns=\"https://[::1]/foo\" xmlns:b=\"https://[::1]/bar\">"
, "\t<b:bar yes=\"no\" />"
, "</foo>"
]
3.2 The entry-point
Our main looks like this: we load the syntax-files, run the highlighter, render it as HTML and finally print it to the console.
main :: IO ()
main = do
syntaxes <- loadSyntaxes
let
output = runHighlighter syntaxes input
renderedOutput = renderHtml output
putStrLn renderedOutput
main-functionYou might only need the defaultSyntaxMap configuration, but for my blog I need more syntax files. One example is for my article about PDF, where the PDF-syntax does not come pre-installed with Skylighting.
loadSyntaxes :: IO SyntaxMap
loadSyntaxes = loadSyntaxesFromDir "./syntax/" >>= \case
Left _ -> pure defaultSyntaxMap
Right b -> pure $ b `union` defaultSyntaxMap
loadSyntaxes definitionIn order to store those syntax files and not have to pass them around as parameters, we will use a MonadReader. We set that up in runHighlighter:
runHighlighter :: SyntaxMap -> Text -> Html
runHighlighter syntaxes code =
runIdentity (runReaderT (highlightPromptOutput code) syntaxes)
runHighlighter definition3.3 Using Skylighting
Let's define a function highlight that takes care of all the skylighting configuration. It only needs input and language, and will output our code highlighted:
highlight :: (MonadReader SyntaxMap m) => Text -> Text -> m Html
highlight code language = do
syntaxes <- ask
pure $ case lookupSyntax language syntaxes of
Nothing -> toHtml code
Just syntax -> case tokenize (tokenizerConfig syntaxes) syntax code of
Left _ -> toHtml code
Right sourceLines -> removeCruft sourceLines
where
tokenizerConfig :: SyntaxMap -> TokenizerConfig
tokenizerConfig sm = TokenizerConfig
{ traceOutput = False
, syntaxMap = sm
}
removeCruft :: [SourceLine] -> Html
removeCruft sourceLines =
case formatHtmlInline defaultFormatOpts sourceLines of
(AddAttribute _ _ _ (Parent _ _ _ subElements)) -> subElements
o -> o
highlight definitionThe removeCruft is only there to get rid of extra HTML that skylighting wraps around the output. If you want to keep that you can leave that part out.
3.4 Parsing with megaparsec
We now have the highlighter we can use for code blocks containing a single language. We want a REPL-session, which has multiple languages. We need to work with parsers as the input has to be static and line-based, instead of how most syntax highlighters work. The parser also has to figure out which language to use.
All parsers need a ParsecT-type. Let's create a type for it so we don't have to repeat ourselves to much:
type Parser m a = P.ParsecT Void Text m a
Parser-typeIn Parser m a, m is the monad and a is the type being parsed.
For the parsing we also use the MonadReader, but because it has to be wrapped with the ParsecT we have to call runReaderT again.
highlightPromptOutput :: (MonadReader SyntaxMap m) => Text -> m Html
highlightPromptOutput t = do
syntaxes <- ask
runReaderT (P.runParserT prompts "" t) syntaxes >>= case
(Left _) -> pure $ H.text t
(Right html) -> pure $ foldr1 (*>) html
highlightPromptOutput definitionIf parsing is successful we will get multiple lines of code, which we foldr1 nicely together. If it is not successful, we just return the plain text. We cannot do anything else to recover.
For us prompts will be bash and GHCi.
prompt :: (MonadReader SyntaxMap m) => Parser m Html
prompt = asum $ P.try <$> [ bashPrompt, ghciPrompt ]
prompts :: (MonadReader SyntaxMap m) => Parser m [Html]
prompts = P.many prompt
prompts definitionBefore we get into the parsers themselves, lets also create a function that will let us wrap spans with certain classes around the generated HTML. That gives us some more control over how to highlight prompts.
textWithClass :: Text -> Text -> Html
textWithClass c t = H.span H.! A.class_ (H.textValue c) $ H.text t
textWithClass definition3.5 Parsing input and output
For parsing input and output, we need the following parsers:
closingNewline :: Parser m Text
closingNewline = singleton <$> P.single '\n' <|> P.eof $> ""
inputP :: Parser m Text
inputP = do
i <- P.some (P.chunk "\\\n" <|> (singleton <$> P.anySingleBut '\n'))
end <- closingNewline
pure $ mconcat i <> end
outputP :: Parser m Text
outputP = do
P.notFollowedBy promptStartP
o <- pack <$> P.many (P.try (P.single '\n' <* P.notFollowedBy promptStartP) <|> P.anySingleBut '\n')
end <- closingNewline
pure $ o <> end
where
promptStartP :: Parser m Html
promptStartP = asum [ ghciP, userNameP ]
You might wonder what ghciP or userNameP are. Those are our starting points for the prompts, the first part of the GHCi and bash prompt parsers.
3.6 bash prompt
The default prompt in Ubuntu can be summarized with the following regex:
^(\w+)@(\w+):([a-zA-Z/~\.]+)(\$|#) ([^]*?)(?<!\\)\n[^\n]*
While I currently use Arch, I still really like the default prompt Ubuntu came with so I will stick with it!
The regex translated to a parser looks like this:
bashPrompt :: (MonadReader SyntaxMap m) => Parser m Html
bashPrompt = do
userName <- userNameP
host <- hostP
path <- pathP
privilege <- privilegeP <* P.single ' '
(input, output) <- catFileP <|> basicIOP
bashCode <- highlight input "bash"
pure $ do
userName
textWithClass "prompt-at" "@"
host
textWithClass "prompt-colon" ":"
path
privilege
H.text " "
H.kbd bashCode
H.text "\n"
output
userNameP :: Parser m Html
userNameP = textWithClass "prompt-user" . pack <$> P.many P.alphaNumChar <* P.single '@'
hostP :: Parser m Html
hostP = textWithClass "prompt-host" . pack <$> P.many P.alphaNumChar <* P.single ':'
pathP :: Parser m Html
pathP = textWithClass "prompt-path" . pack <$> P.many (P.alphaNumChar <|> P.oneOf ("/~." :: [Char]))
privilegeP :: Parser m Html
privilegeP = textWithClass "prompt-privilege" . singleton <$> (P.single '$' <|> P.single '#')
bash-promptNow we only need the parsers for catFileP and basicIOP
The parser basicIOP is very straightforward.
basicIOP :: Parser m (Text, Html)
basicIOP = liftA2 (,) inputP $ P.optional outputP >>= \case
Just output -> pure . H.span $ H.text output
Nothing -> pure $ H.text ""
basicIOP definitionThe parser catFileP is a bit more work. Here we need to use highlight to get the correct highlighting for the shown file.
catFile :: (MonadReader SyntaxMap m) => Parser m (Text, Html)
catFile = do
syntaxes <- ask
i <- mconcat <$> sequence
[ P.chunk "cat "
, pack <$> P.many P.alphaNumChar
, singleton <$> P.single '.'
]
extension <- pack <$> P.many P.alphaNumChar <* P.single '\n'
o <- outputP
let
l = extensionToHL extension
o' <- highlight o l
let
o'' = H.span H.! A.class_ (H.textValue l) $ o'
pure (i <> extension <> "\n", o'' *> H.text "\n")
where
extensionToHL :: Text -> Text
extensionToHL "hs" = "haskell"
extensionToHL e = e
catFileP definition3.7 GHCi prompt
The prompt of GHCi has the following pattern:
ghci( [A-Z][a-zA-Z.]+)*(>|\|) [^\n]*\n[^\n]*\n
Translated into a parser, it looks like this:
ghciPrompt :: (MonadReader SyntaxMap m) => Parser m Html
ghciPrompt = do
ghci <- ghciP <* P.single ' '
modules <- ghciModules
end <- textWithClass "end" . singleton <$> (P.single '>' <|> P.single '|') <* P.single ' '
io <- ghciSettingInput <|> ghciHaskellIO
pure $ ghci *> H.text " " *> modules *> end *> H.text " " *> io
ghciP :: Parser m Html
ghciP = textWithClass "ghci" <$> P.try (P.chunk "ghci")
ghciModule :: Parser m Html
ghciModule = textWithClass "module" . pack <$> P.some (P.alphaNumChar <|> P.single '.')
ghciModules :: Parser m Html
ghciModules = do
ms <- P.many (ghciModule <* P.optional (P.single ' '))
pure $ foldr1 (*>) (intersperse (H.text " ") ms)
Again, we branch of with the I/O-part where input can either be commands to change the environment, or the REPL we most often use.
ghciSettingInput :: Parser m Html
ghciSettingInput = do
P.single ':'
command <- asum ((liftA2 commandToHtml `uncurry`) <$>
[ (P.chunk "m", P.single ' ' *> (Just <$> ghciModule))
, (P.chunk "module", P.single ' ' *> (Just <$> ghciModule))
, (P.chunk "q", pure Nothing)
, (P.chunk "quit", pure Nothing)
, (P.chunk "{", pure Nothing)
, (P.chunk "}", pure Nothing)
])
closingNewline
output <- P.optional outputP
pure $ do
command
H.text "\n"
case output of
Just output -> H.text output
Nothing -> H.text ""
where
commandToHtml :: Text -> Maybe Html -> Html
commandToHtml c Nothing = textWithClass "command" (':' `cons` c)
commandToHtml c (Just a) = commandToHtml c Nothing *> H.text " " *> a
ghciHaskellIO :: (MonadReader SyntaxMap m) => Parser m Html
ghciHaskellIO = do
input <- inputP
output <- outputP
iH <- highlight input "haskell"
oH <- highlight output "haskell"
pure $ do
H.kbd iH
H.text "\n"
oH
H.text "\n"
ghciSettingInput and ghciHaskellIO parsers4 Conclusion
And there you have it, highlighting for your website!