This post is a basic guide to building a binary expression parser with operator precedence entirely from scratch in Haskell. This information is surprisingly hard to find on the internet, so I’ve decided to make a blog post about it.

Why build an expression parser? If you want to build a compiler, this is usually the first place to start. Most programming languages support binary expressions like 1*2+3, and parsing is most of the work when compiling expressions.

This post assumes intermediate knowledge of Haskell. You should be somewhat familiar with functors, applicative functors, and monads. I recommend spending some time understanding these topics before continuing, otherwise the code later in the post won’t make much sense.

Before we begin, I highly recommend checking out JSON Parsing from Scratch in Haskell. The parser we’re going to build uses the same basic ideas, just for a different problem.

All the code in this post is available on my GitHub.


  1. Overview
    1. Specs
  2. Grammars
  3. Parser
  4. Setup
    1. The Parser Type
  5. Basic Parsers
    1. Satisfy
    2. Char
    3. Digit
  6. Functor
    1. Int
    2. Operators
  7. Applicative
  8. Alternative
    1. Operator
    2. Custom Errors
    3. Int
  9. Monad
    1. Option
    2. Lookahead
  10. Parsing Expressions
    1. Pretty Printing
  11. Testing
    1. Generators
    2. Property 1: Increasing Precedence
    3. Property 2: Inorder Traversal == Input


The expression parser transforms a string representing an expression into a binary tree. Here are some examples:

               +                        +                        *
              / \                      / \                      / \
"1*2+3"  ->  *   3       "2+4*6"  ->  2   *       "3*6*9"  ->  *   9
            / \                          / \                  / \
           1   2                        4   6                3   6

The binary tree is structured in such a way that operators with higher precedence are lower in the tree. In these examples, the multiplication operator * has higher precedence than the addition operator +, so * is always lower in the tree. In general, every path downwards in the tree must have operators with increasing precedence.

By structuring the tree this way, we’re able to correctly evaluate the entire expression. We just need to traverse the binary tree bottom-up, evaluating the operators with the highest precedence first. In the first example, we first calculate 1 * 2, which is 2, and then calculate 2 + 3, which returns the final result 5.

This post focuses on the hard part: parsing an expression into the correct tree. There are multiple algorithms that solve this problem. The algorithm we’re going to implement is called precedence-climbing, and is the best approach for the kind of parser we’re going to build. We’re going to work our way up towards it by first building simple parsers, then gradually building more complex parsers until we get our final expression parser.


The expression parser will support unsigned integers and the following operators:

Operator     Precedence      Associativity
* / %        5               Left to right
+ -          4               Left to right
&            3               Left to right
^            2               Left to right
|            1               Left to right
=            0               Right to left

These operators are a subset of the binary operators in C. They include multiplicative, additive, bit manipulation, and assignment operators. I’ve added the assignment operator since it’s right-associative, making our expression parser a bit more interesting.

Before we dive into building the parser, let’s quickly review how parsers work.


Parsers operate according to specific grammar, which defines the rules that the input stream is expected to follow. Here’s the grammar for the expression parser we’re going to build:

expression:   int-constant (op int-constant)*

int-constant: ('0' | ... | '9')+

op:           '*' | '/' | '%' | '+' | '-' | '&' | '^' | '|' | '='

Grammars can be in any particular form. Here, the left side defines the name of a specific rule, which are called nonterminals, and the right side defines the pattern that the input string is expected to follow.

Characters enclosed in '' are called terminals. Terminals are the actual characters that the parser will consume. The characters + * | ( ) are called qualifiers. Qualifiers modify the behaviour of the parser. Here’s what they mean:

* - Zero or more times

+ - One or more times

| - Select alternatives

() - Parse group

This is what the grammar defines: an expression must start with one integer, and may contain many groups of operators and integers afterwards. An integer must be one or more digits, and an operator must be one of these characters: * / % + - & ^ | =. Spaces within expressions are not supported, though this is easy to implement later.

Now that we’ve defined our grammar, let’s review what a parser actually is.


A parser transforms a stream of input into a data structure, according to a specific grammar. The expression parser that we’re going to build transforms a string into a binary tree.

There are several ways to build a parser. The simplest approach is to parse the input recursively by creating a function for every nonterminal in the grammar. This is called recursive descent. Each function attempts to match the pattern in the nonterminal with the start of the remaining input. If the match is successful, the function consumes that part of the input and outputs a subset of the final data structure. We then combine these functions to build the final data structure.

The expression parser combines the simpler integer and operator parse functions to build a tree.

expression:   int-constant (op int-constant)*

Each function only outputs a single node in the tree. The expression parser then joins these nodes together to build increasingly large subtrees until we get the final expression tree.

There are a few ways we can combine parse functions. We can either call the functions directly, or use parser combinators. A parser combinator is a function that takes one or more parse functions as input, and returns another parse function as output. It lets us build more complicated parsers out of simpler ones through function composition. For example, if we want to parse a specific string, we can combine multiple single character parsers together in sequence.

There are other ways to build a parser. A parser generator automates most of the task for you, but requires giving up control to a generalist tool. For other approaches, check the wikipedia page on parsing.

Okay, enough theory. Let’s actually build this thing!


This code compiles on ghc version 9.0.1, though any version should do. We’re also going to use the ghci interpreter to interactively test our parse functions. The only third party Haskell library we should have installed is QuickCheck, which we will use to test the properties of our parser. If you don’t want to run any tests, you can skip this step.

Let’s start with some imports:

import Control.Applicative (Alternative (..))
import Data.Char (isDigit, digitToInt)
import Test.QuickCheck  -- optional

Let’s implement our data types:

-- binary expression tree
data Expr = BinExpr Op Expr Expr
          | IntConst Int
          deriving (Show, Eq)

-- binary operator
data Op = Op OpType Prec Fixity deriving (Show, Eq)

type Prec = Int
data Fixity = LFix | RFix deriving (Show, Eq)

data OpType = Mul
            | Div
            | Mod
            | Add
            | Sub
            | BitAnd
            | BitXor
            | BitOr
            | Assign
            deriving (Show, Eq)

Expr is the data type for the binary tree. The binary operator data type Op contains the type, precedence and fixity for a specific operator.

Now that we have our basic data types, let’s define our parse functions.

The Parser Type

Each parse function will consume part of the input string and return some intermediate output and the remaining string. With error handling, we can think of every parser as having this declaration:

parser :: String -> Either Error (a, String)

This function takes a string as input and returns either an error, or a tuple containing some intermediate output a and the remaining string. For simplicity, this error type is just a string:

type Error = String

In order to simplify building parser combinators, let’s wrap this function in a newtype:

newtype Parser a = Parser { runParser :: String -> Either Error (a, String) }

With Parser defined as a specific type, we can make it an instance of a bunch of useful typeclasses that we’ll need to combine parse functions later. This will also simplify our function declarations and allow us to easily change the internals of the parser if we want to make changes.

We now have all the necessary data types to implement our first parse function. Let’s start with the simplest of them all.

Basic Parsers


The satisfy parser is the basic atom of our parser combinator, among which all other parsers will be built. This parser takes a boolean function that tests some condition on the first character of the input string, and returns that character if the condition is true.

satisfy :: (Char -> Bool) -> Parser Char
satisfy predicate = Parser $ \s -> case s of
    []     -> Left "empty input"
    (c:cs) -> if predicate c
        then Right (c, cs)
        else Left "unable to satisfy predicate"

Let’s run this in ghci:

> runParser (satisfy (=='t')) "test"
Right ('t',"est")

> runParser (satisfy $ const True) "test"
Right ('t',"est")

> runParser (satisfy isDigit) "test"
Left "unable to satisfy predicate"

With satisfy, we can start to build slightly more interesting parsers. Let’s move on to parsing single characters and digits.


The char parser simply parses the input character.

char :: Char -> Parser Char
char c = satisfy (==c)


> runParser (char 'a') "apple"
Right ('a',"pple")

> runParser (char '+') "+2"
Right ('+', "2")

> runParser (char 'x') "orange"
Left "unable to satisfy predicate"


The digit parser parses any digit character.

digit :: Parser Char
digit = satisfy isDigit


> runParser digit "123"
Right ('1',"23")

> runParser digit "pineapple"
Left "unable to satisfy predicate"

With char and digit, we now have two very simple parsers that are able to parse a single character. This is a good first step, but we need parsers that output the correct types for our binary tree: Op and Int. Since we’re using parser combinators, we can wrap these parsers within other parsers to modify their output. Let’s implement our first parser combinator that does this.


Functor defines a function called fmap which is basically just map but for more complex data types. With functors, we can apply any function to the output of a parser.

instance Functor Parser where
    fmap f p = Parser $ \s -> case runParser p s of
        Left err        -> Left err
        Right (out, s') -> Right (f out, s')

We can now create new parsers that convert the outputs of char and digit into Op and Int respectively. Let’s start with converting digit characters to integers.


The int parser runs the digit parser and uses fmap to convert its output to an integer.

-- <$> is the infix form of fmap
int :: Parser Int
int = digitToInt <$> digit


> runParser int "123"
Right (1,"23")

> runParser int "3.14"
Right (3,".14")

> runParser int "three"
Left "unable to satisfy predicate"

Note that this only converts a single digit character to an integer. Later, we’re going to extend this parser to support multiple characters. Let’s move on to parsing operators.


Starting with the multiply operator, this parser runs the char parser and overwrites its output with the operator data type Op.

mulOp :: Parser Op
mulOp = Op Mul 5 LFix <$ char '*'

The <$ function overwrites the output of the parser on the right with the value on the left. In this case, we’re overwriting '*' with Op Mul 5 LFix. We get this function for free when implementing the Functor instance.

Using mulOp as a template, we can easily implement parsers for every other operator. We just need to make sure that the precedence and associativity for each operator are correct.

divOp = Op Div 5 LFix <$ char '/'
modOp = Op Mod 5 LFix <$ char '%'

addOp = Op Add 4 LFix <$ char '+'
subOp = Op Sub 4 LFix <$ char '-'

bitAndOp = Op BitAnd 3 LFix <$ char '&'
bitXorOp = Op BitXor 2 LFix <$ char '^'
bitOrOp  = Op BitOr 1 LFix  <$ char '|'
assignOp = Op Assign 0 RFix <$ char '='


> runParser mulOp "*2+3"
Right (Op Mul 5 LFix,"2+3")

> runParser bitAndOp "&7"
Right (Op BitAnd 3 LFix,"7")

> runParser addOp "%0"
Left "unable to satisfy predicate"

Exciting stuff. But what if we want to parse any operator? We need some way to try multiple operator parsers, and return the output of the first one that succeeds. For the integer parser, we also need a way to parse more than a single digit. Let’s implement our next two parser combinators that solve these problems.


Applicative functors are the same as functors, except that the input function must reside in the output of a parser.

instance Applicative Parser where
    pure x = Parser $ \s -> Right (x, s)

    pf <*> p = Parser $ \s -> case runParser pf s of
        Left err      -> Left err
        Right (f, s') -> runParser (fmap f p) s'

This instance is only necessary because the next typeclasses we’re going to implement require Applicative as a dependency (for ghc versions 7.10 and above).

We’re not going to be directly using these functions, so it’s not particularly important that we understand how they work. Let’s move on to the next parser combinator that we’re actually going to use.


Alternative lets us try a second parser if the first one fails.

instance Alternative Parser where
    empty = Parser $ \s -> Left "empty"

    p1 <|> p2 = Parser $ \s -> case runParser p1 s of
        Left err  -> runParser p2 s
        Right out -> Right out

Alternative also provides the some and many functions that can run a parser in a loop. The some function runs a parser one or more times, while the many function runs a parser zero or more times. Both loops terminate where there is a parse error and return a list containing each parsed element.

We can now select between different parsers and run parsers in a loop. Let’s use this functionality to build an operator parser that can select between different operators.


The op parser simply tries a bunch of operator parsers and returns the output of the first match.

op :: Parser Op
op =  mulOp
  <|> divOp
  <|> modOp
  <|> addOp
  <|> subOp
  <|> bitAndOp
  <|> bitXorOp
  <|> bitOrOp
  <|> assignOp


> runParser op "+"
Right (Op Add 4 LFix,"")

> runParser op "*2+3"
Right (Op Mul 5 LFix,"2+3")

> runParser op "$1"
Left "unable to satisfy predicate"

Now that our parsers are getting more complex, let’s write a parser combinator that provides more helpful error messages. The error message in the last example isn’t particularly relevant to parsing operators.

Custom Errors

The <?> function runs a parser and overwrites its error message with the input string if it fails.

-- overwrite error messages
(<?>) :: Parser a -> String -> Parser a
infix 1 <?>
p <?> err = Parser $ \s -> case runParser p s of
    Left _    -> Left err
    Right out -> Right out

Since we always want to run this function last, we’ve given it an infix precedence of 1. Let’s now update our operator parser to provide a more helpful error message.

op :: Parser Op
op =  mulOp
  <|> divOp
  <|> modOp
  <|> addOp
  <|> subOp
  <|> bitAndOp
  <|> bitXorOp
  <|> bitOrOp
  <|> assignOp
  <?> "invalid operator"


> runParser op "+"
Right (Op Add 4 LFix,"")

> runParser op "$1"
Left "invalid operator"

Much better! Let’s now move on to parsing integers with multiple digits.


The updated int parser now converts a string of digits into an integer.

int :: Parser Int
int = read <$> some digit <?> "invalid int"

This parser runs the digit parser one or more times and returns a string as its output. This string is then converted to an integer with the built-in read function.


> runParser int "123*4"
Right (123,"*4")

-- read ignores leading zeroes
> runParser int "007"
Right (7,"")

> runParser int "*5"
Left "invalid int"

With int and op, we’re now able to parse unsigned integers and select between different operators. We’ll use these parsers to output the nodes in our tree. We now just need a way to sequence them together to build increasingly large subtrees. Let’s implement our final parser combinator.


Monad allows us to sequence multiple parsers together, piping the output of one parser into another.

instance Monad Parser where
    return = pure

    p >>= fp = Parser $ \s -> case runParser p s of
        Left err        -> Left err
        Right (out, s') -> runParser (fp out) s'

    p1 >> p2 = p1 >>= const p2

The >>= function pipes the output of the first parser into a function-wrapped second parser that takes a single argument of the same type. This wrapped parser is free to do whatever it wants with the first parser’s output.

We will use monads in our final expression parser to join the outputs of int and op into a binary tree. Before we get there, we need just two more parsers.


The option parser attempts to run a specific parser. If it fails, it returns the input value.

option :: a -> Parser a -> Parser a
option x p = p <|> return x


> runParser (option 0 int) "123"
Right (123,"")

> runParser (option 0 int) "abc"
Right (0,"abc")

This parser will come in handy when parsing expressions that might contain a parse error. Instead of throwing an exception, we’ll use it to return the current subtree.

There is one last parser that we need. This one is not related to Monad, but we’re going to need it to parse expressions.


The lookahead parser runs a specific parser but does not consume any input if it succeeds.

lookahead :: Parser a -> Parser a
lookahead p = Parser $ \s -> case runParser p s of
    Left err       -> Left err
    Right (out, _) -> Right (out, s)


> runParser (lookahead int) "123+42"
Right (123,"123+42")

> runParser (lookahead op) "+42"
Right (Op Add 4 LFix,"+42")

> runParser (lookahead int) "test"
Left "invalid int"

Lookahead allows us to peek at the next characters in the input. The expression parser will use lookahead extensively to determine whether or not to return from a recursive call.

We’ve now built every parser we need to parse binary expressions. Let’s finally implement our expression parser!

Parsing Expressions

The expression parser that we’re going to build transforms a string representing an expression into a binary tree. Now that we have our int and op parsers, we can output single nodes in our binary tree. Using >>=, we can then sequence them together to form increasingly large subtrees. Let’s now start with parsing the simplest expression, an integer constant.

intExpr :: Parser Expr
intExpr = IntConst <$> int

This parser just wraps the output of the int parser with IntConst and returns an expression. Let’s now move on to the more interesting part: parsing binary expressions. The algorithm I’m going to describe here is called precedence-climbing.

expr :: Parser Expr
expr = intExpr >>= opExpr (-1)

opExpr :: Prec -> Expr -> Parser Expr
opExpr minPrec left = option left $ do
    opType@(Op _ prec fix) <- lookahead op
    if prec < minPrec || prec == minPrec && fix == LFix
        then return left
        else do
            right <- op >> intExpr >>= opExpr prec
            opExpr minPrec (BinExpr opType left right)

The algorithm takes a minimum precedence value and a left subtree as input. It peeks at the next operator, and recursively builds the right subtree if the operator has higher precedence or has the same precedence but is right-associative. Otherwise, it returns the left subtree. After building the right subtree, it joins the left and right subtrees to form the next left subtree, and repeats the entire process until the input string is empty or invalid.

Here’s an example. The numbers indicate the ordering of function calls.

> runParser expr "1*2+3"

1. expr call:                  Notes:
input: "1*2+3"                 Parse integer, then pass result to opExpr with
                               -1 precedence. This allows any op to come next.

2. opExpr call:    Subtree:    Notes:
input: "*2+3"         *        Does the next op have higher precedence? Yes, so
minPrec: -1          / \       initialise a new subtree (shown left) and 
left subtree: 1     1  ...     recursively compute the right subtree (3).

                               After completing the subtree, call opExpr again
                               with the same precedence and the completed
                               subtree as input (4).

3. Compute right:  Subtree:    Notes: 
input: "+3"           2        Does the next op have higher precedence? No, so
minPrec: 5                     return the left subtree and go back to (2).
left subtree: 2

4. opExpr call:    Subtree:    Notes:
input: "+3"             +      Does the next op have higher precedence? Yes, so
minPrec: -1    *       / \     initialise a new subtree and recursively compute 
left subtree: / \     *  ...   the right subtree (5).
             1   2   / \     
                    1   2      After completing the subtree, call opExpr again

5. Compute right:  Subtree:    Notes: 
input: ""             3        Empty input, so return the left subtree and go
minPrec: 4                     back to (4). 
left subtree: 3

6. opExpr call:    Subtree:    Notes:
input: ""       +              Empty input, return left subtree. Done.
minPrec: -1    / \             
left subtree: *   3            
             / \     
            1   2  

Somewhat of a long example, but hopefully you get the point. Every time we try to build a right subtree, we pass the current operator’s precedence prec and the next node as input. During the recursive call, if the next operator has lower precedence (or has the same precedence but is left-associative), we return the single node as the right subtree.

Once the right subtree is computed, we join the left and right subtrees into the next left subtree, and make another recursive call. This time we’re passing the initial precedence minPrec and the joined subtree as input. This makes the joined subtree the left subtree of the next operator, which we now know must have lower precedence. This process repeats until the input string is empty or invalid.

Let’s now implement our final parse function that ties everything together.

parseExpr :: String -> Expr
parseExpr s = case runParser expr s of
    Left err        -> error err
    Right (out, "") -> out
    Right (_, s)    -> error $ "unable to parse remaining string \"" ++ s ++ "\""


> parseExpr "1*2"
BinExpr (Op Mul 5 LFix) (IntConst 1) (IntConst 2)

> parseExpr "1*2+3"
BinExpr (Op Add 4 LFix) (BinExpr (Op Mul 5 LFix) (IntConst 1) (IntConst 2)) (IntConst 3)

> parseExpr "2+4*6"
BinExpr (Op Add 4 LFix) (IntConst 2) (BinExpr (Op Mul 5 LFix) (IntConst 4) (IntConst 6))

> parseExpr "1+2x"
*** Exception: unable to parse remaining string "x"

> parseExpr "x+2"
*** Exception: invalid int

We now have our final expression parser! Before we move on to testing, let’s write a pretty-printer so we can better visualise the output.


First, we need to remove the automatic derivation of Show from the binary operator data type OpType and write a custom one instead.

-- data OpType = ... deriving Eq

instance Show OpType where
    show Mul = "*"
    show Div = "/"
    show Mod = "%"
    show Add = "+"
    show Sub = "-"
    show BitAnd = "&"
    show BitXor = "^"
    show BitOr = "|"
    show Assign = "="

Now let’s implement the pretty-printer. This function indents a specific substring for each line of output.

printExpr :: Expr -> String
printExpr expr = go expr ""
    where go (IntConst n) _ = show n
          go (BinExpr (Op t _ _) left right) indent = 
              show t ++ "\n" ++
              indent ++ "├ " ++ go left (indent ++ "│ ") ++ "\n" ++
              indent ++ "└ " ++ go right (indent ++ "  ") 


> putStrLn $ printExpr $ parseExpr "1*2"
├ 1
└ 2

> putStrLn $ printExpr $ parseExpr "1*2+3"
├ *
│ ├ 1
│ └ 2
└ 3

> putStrLn $ printExpr $ parseExpr "2+4*6"
├ 2
└ *
  ├ 4
  └ 6

Not bad. I’ve chosen this format since it’s easier to code, but it’s also possible to print the tree similar to the other examples in this post. Feel free to mess around with this function in ghci, it’s quite satisfying to view the output this way.

Let’s finish by writing some tests to make sure everything works as expected.


We’re going to use QuickCheck to test the properties of our expression parser. If you’re not familiar with QuickCheck, this post provides a pretty good explanation:

Basically, QuickCheck generates a bunch of random input and checks whether a function satisfies certain user-defined properties for every input. For our expression parser, there are at least two properties we want to test:

  1. Every path downwards in the tree has operators with increasing precedence.
  2. The inorder traversal of the tree is equivalent to the input string.

Before we test these properties, we first need to generate random binary expressions. Let’s write some simple generators.


genOp :: Gen Char
genOp = elements "*/%+-&^|="

genInt :: Gen String
genInt = show <$> choose (0, 1000 :: Int)

genOpInt :: Gen String
genOpInt = (:) <$> genOp <*> genInt

-- (op int)*
genOpInts :: Gen String
genOpInts = sized $ \n ->
    [ (1, return [])
    , (n, (++) <$> genOpInt <*> genOpInts)

-- int (op int)*
genExpr :: Gen String
genExpr = (++) <$> genInt <*> genOpInts

The binary expression generator genExpr generates a random integer between 0-1000, and a random number of groups of operators and integers afterwards.

If any of these generators don’t make sense, make sure to read the post linked above. These generators follow almost exactly from the example generators in that post.

Here’s some sample output using a smaller integer range of 0-20:

> sample genExpr

Now that we can generate random binary expressions, let’s write our first property.

Property 1: Increasing Precedence

The first property checks whether every path downwards in the tree has operators with increasing precedence.

prop_precOrder :: Property
prop_precOrder = forAll genExpr $ \s -> isIncr (-1) $ parseExpr s
    where isIncr _ (IntConst _) = True
          isIncr minPrec (BinExpr (Op _ prec _) left right) =
            if prec >= minPrec
                then isIncr prec left && isIncr prec right
                else False

For every random input, this property parses it into a binary tree and runs the local isIncr function that recursively checks whether operators are in increasing order.

Property 2: Inorder Traversal == Input

The second property checks whether the inorder traversal of the binary tree is equivalent to the input string.

prop_inOrder :: Property
prop_inOrder = forAll genExpr $ \s -> s == (toStr $ parseExpr s)
    where toStr (IntConst n) = show n
          toStr (BinExpr (Op t _ _) left right) = toStr left ++ show t ++ toStr right

This parses the input into a binary tree, converts the tree back into a string, and then checks whether the two strings match.

Let’s now run these two properties:

> quickCheck prop_precOrder
+++ OK, passed 100 tests.

> quickCheck prop_inOrder 
+++ OK, passed 100 tests.

Nice! We’ve now built a simple binary expression parser with operator precedence entirely from scratch and tested it with randomized inputs. We’ve essentially built the frontend to a small compiler.

This parser provides a good foundation for building a parser for a more sophisticated grammar, like a programming language. In fact, this is the exact approach I’ve taken for a C compiler I’m currently working on.

Thanks for reading this basic guide on binary expression parsing. If you have any comments, please send me an email.

All the code in this post is available here.