Adventure in parserland – parsing lambda expressions in F# – Part II

The parser starts simple with the following two functions to parse either a string or a file. I use the XXXReaders because I want to lazy read character by character.

let parseString s =
    let reader = new StringReader(s)
    parseTextReader reader

let parseFile fileName =
    let reader = new StreamReader(fileName: string)
    parseTextReader reader

The whole parser is in the following two lines:

let parseTextReader: TextReader -> seq<Expression> =
                    textReaderToLazyList >> tokenStream >> parseExpressions

I need to specify the signature otherwise the compiler gets confused : wait, does it take a StringReader or a StreamReader? You better tell me!

The function is a composite of three functions applied in sequence:

  1. Translate a TextReader to a LazyList<char>
  2. Translate a LazyList<char> to a LazyList<Token> (lexer)
  3. Translate a LazyList<Token> to a LazyList<Expression> (parser)

My usage of LazyList as the workhorse for the program is because I want to match on the head of the stream of chars/tokens in a lazy way.

I love it when a program naturally decomposes in such simple understandable pieces. I impute some of that to functional programming. For one reason or another, in my 15+ years of object oriented programming, I’ve rarely got to the core of a problem with such immediacy.

A sequence of operations likes the above would be lost in a protected overridden implementation of a base class somewhere (or something else equally long to pronounce). The beauty would be lost somewhere in the vast machinery required to support it.

In any case, TextReaderToLazyList is a trivial generator function that uses the unfold function of LazyList to read a character at the time.

let textReaderToLazyList textReader = LazyList.unfold (fun (ts:TextReader) ->
    let ch = ts.Read()
    if ch = -1 then None else Some(char ch, ts)) textReader

The next step is to look at either the lexer, going bottom up, or the parser, going top down.

Adventure in parserland – parsing lambda expressions in F# – Part I

This is part of my ‘things that I do in the empty spaces between one meeting and the next one, which might end up being vaguely interesting’. It is a lambda expression parser.

The full source code is here.

I actually have two versions of it: one written longhand and the other one written with FParsec. Just to be clear: I’m no expert of either.

And just to be more clear: I think writing most parsers longhand in the way I am about to show is crazy. You either use FParsec or  fslex / fsyacc.

I have a strong distaste for additional compilation steps. I think it lingers on from MFC project types of 15/20 years ago. I was one of these crazy folks that would generate the project, wrap the generated code (with some generalizations) in my own library and use that one from then on.

So I prefer FParsec. I’m ok rewriting left recursive rules and its performance has never been a problem for me. Here is a table that compares the different approaches.

But I started wondering about coding a a recursive descent parser for a simple grammar by hand, fully knowing the foolishness of the idea. Thanks to Jose for code reviewing it.

The inspiration for the grammar comes from this book.

    (*
        <expression> ::= <name> | <function> | <application> 
        <name> ::= non­blank character sequence 
        <function> ::= \ <name> . <body> 
        <body> ::= <expression> 
        <application> ::= ( <function expression> <argument expression> ) 
        <function expression> ::= <expression> 
        <argument expression> ::= <expression> 
    *)

In English, an expression is either a name, a function or an application. A name is a bunch of characters (better defined in the code). A function is ‘\’, a name, ‘.’ and an expression. An application is ‘(‘, an expression, whitespaces, an expression and ‘)’.

Some testcases for the above grammar and the parsers written to parse it are below. It should be intuitive what this code does just by the name of the functions. Even it isn’t, check that the expressions symbol contains valid productions from the grammar above.

module Test

open Microsoft.FSharp.Collections
open Xunit

open LambdaEngine
open Parser
open Lexer
open FParser

let writeTokenStream stream = Seq.fold (fun acc token -> acc + writeToken token) "" stream

let rec writeExpr = function
        | EName(s) -> s
        | Function(expr, body) -> writeToken Lambda + writeExpr expr + writeToken Dot + writeExpr body
        | Application(funExpr, argExpr) -> writeToken OpenParens + writeExpr funExpr + writeToken (Ws(" ")) 
                                            + writeExpr argExpr + writeToken CloseParens
        | EOT -> ""

let tokenStreams = [
    ""
    "(\xs.xs \y.(y \x.y))"
    "(\xst.xst \y.(y  \x.y))"
    " "
    "x"
    "(x y)"
    ]

let expressions = [
    ""
    "(\x.x \y.(y \x.y))"
    "x"
    "(x y)"
    ]

let stringToCharList s =
    let textReader = new System.IO.StringReader(s)
    textReaderToLazyList textReader

[<Fact>]
let testTokenizer () =
    let testTokenStream s =
        let stream = tokenStream <| stringToCharList s
        let s1 = writeTokenStream stream
        Assert.Equal(s, s1)
    tokenStreams |> List.iter testTokenStream

let testExpr parseFunction s =
    let exprs = parseFunction s
    let s1 = exprs |> Seq.fold (fun s expr -> s + writeExpr expr) ""
    Assert.Equal(s, s1)

[<Fact>]
let testParser () = expressions |> List.iter (testExpr parseString)

[<Fact>]
let testFParser () = expressions |> List.iter (testExpr fparseString)

In the next instalment, we’ll start looking at the real code for the parser.

Write Yourself a Scheme in 48 Hours in F# – Part VII

Let’s talk about the environment now.  This is the part of the interpreter that I like the least. It is a global variable and it contains a list of  (string, LispVal) where the LispVal is mutable.

type Env = (string * LispVal ref) list ref

This is pretty bad. First of all, it immediately cuts off any option of running interpreters in different threads. Moreover, it makes a lot of functions in the evaluator to have side effects. That makes it much harder to reason about them.

In a world where I am provided with infinite time and energy, I would change it. In this world, I won’t. If you try your hand at doing it, make sure that you pass all the testcases before declaring victory. The scope rules of Scheme are not all that obvious. A code reviewer called them the Italian scoping rules because he thought I got them wrong …

In any case, there isn’t much to the symbol table management.  You can create an empty one:

let nullEnv (): Env = ref List.empty

Check if a variable is bound:

let keyEq name (k, _) = name = k

let isBound var (env: Env) = !env |> List.exists (keyEq var)

Get a variable out:

let getVar var (env: Env) =
    let result = !env |> List.tryFind (keyEq var)
    match result with
    | None -> throw (UnboundVar("Getting an unbound variable: " , var))
    | Some(_, r) -> !r

Set the value of an existing variable:

let setVar var value (env:Env) =
    let result = !env |> List.tryFind (keyEq var)
    match result with
    | Some(_, v) -> v := value ; value
    | None -> throw (UnboundVar("Setting an unbound variable: " , var))

Or define a new variable in the environment. Note that if the variable already exist, its value gets set.

let define (env:Env) var value =
    let result = !env |> List.tryFind (keyEq var)
    match result with
    | Some(_, v) -> v := value ; value
    | None -> 
        env := [var, ref value] @ !env; value
You can also bind a list of (string, LispVal) to the environment by prepending it to the existing ones:
let bindVars bindings (env:Env) = 
   ref ((bindings |> List.map (fun (n, v) -> n , ref v)) @ !env)

Once you accept the evil of the global mutable variable scheme, these functions are easy enough.

The only piece left is error management. This is where my implementation differs from the Haskell version the most. In essence, I throw exception and catch them to report errors, while the Haskell version uses a monad to propagate the error information.

I have a LispError that represents everything that can go wrong:

type LispError =
    | NumArgs of int * LispVal list
    | TypeMismatch of string * LispVal
    | ParseError of string * FParsec.Error.ParserError
    | BadSpecialForm of string * LispVal
    | NotFunction of string * string
    | UnboundVar of string * string
    | Default of string
    | IOError of string

I wrap it in an exception:

exception LispException of LispError

This is what I throw in various places in the code.

let throw le = raise (LispException(le))

I then catch it at the outer layer:

let evalString env expr = 
    try
        expr |> readExpr |> eval env
    with
    | LispException(error) -> String (showError error)

And display the error by using the below function:

let showError = function
    | NumArgs(expected, found) -> "Expected " + expected.ToString() + " args; found values " + unwordsList found
    | TypeMismatch(expected, found) -> "Invalid type: expected " + expected + ", found " + showVal found
    | ParseError(msg, _) -> "Parse Errror" + msg
    | BadSpecialForm(message, form) -> message + showVal form
    | NotFunction(message, func) -> message + func
    | UnboundVar(message, varName) -> message + varName
    | Default(message) -> message
    | IOError(message) -> message

And that’s all there is to it. I hope you guys and gals enjoyed this seven part extravagance. Cheers.

Write Yourself a Scheme in 48 Hours in F# – Part VI

The evaluator takes as an input a LispVal. Where does it come from? There must be something that converts your textual input into it. That is the job of the parser.

I have used FParsec to build my parser. FParsec is a fantastic library to build parsers. It is a perfect showcase of the composition potential that functional code yields. 

When you write an FParsec parser you compose many little parsers to create the one parser that works for your language.  The resulting code looks very much like your language grammar, but you don’t need  a separate code generation compilation step to produce it.

There is one element of ugliness in the syntax to create recursive parsers. You need to define two global variables that can be referred to before they are constructed. This is an artefact of how F# works. So you need a line in your code that looks like this:

let parseExpr, parseExprRef : LispParser * LispParser ref = createParserForwardedToRef()

With that piece of machinery out of the way, we can focus on the parser itself. Our goal here is to parse expressions and generate LispVal. We need a LispParser like the below (the second generic parameter is for advanced usage).

type LispParser = Parser<LispVal, unit>

We need to parse all the kind of expressions that the user can type. Notice in the below the use of a computation expression to simplify the syntax. Also note that lists and dotted lists look very much the same until you encounter the ‘.’ character. You could disambiguate the situation by extracting out the commonality in a separate kind of expression. I decided instead to instruct the parser to backtrack if it gets it wrong (attempt). This is slower, but keeps the code identical to our conceptual model. I value that greatly.

do parseExprRef := parseAtom
                   <|> parseString
                   <|> parseNumber
                   <|> parseQuoted
                   <|> parse {
                           do! chr '('
                           let! x = (attempt parseList) <|> parseDottedList
                           do! chr ')'
                           return x
                       }

Let’s start from the top. Parsing an atom means parsing something that starts with a letter or symbol and continues with letters, symbols or digits. Also “#t” and “#f” can be resolved at parsing time.

let parseAtom : LispParser = parse {
        let! first = letter <|> symbol
        let! rest = manyChars (letter <|> symbol <|> digit)
        return match first.ToString() + rest with
               | "#t" -> Bool true
               | "#f" -> Bool false
               | atom -> Atom atom
}

A string is just a bunch of chars (except ‘\’) surrounded by ‘ ” ’.

let parseString : LispParser = parse {
    do! chr '"'
    let! xs = manyChars (noneOf "\"")
    do! chr '"'
    return String(xs)        
}

A number is just one or more digits. I am afraid we just support integers at this stage …

let parseNumber : LispParser = many1Chars digit |>> (System.Int32.Parse >> Number)

A quoted expression is jut a ‘\’ followed by an expression.

let parseQuoted : LispParser = chr '\'' >>. parseExpr |>> fun expr -> List [Atom "quote"; expr] 

A list is just a bunch of expressions separate by at least one space.

let parseList : LispParser = sepBy parseExpr spaces1 |>> List

A dotted list starts in the same way (hence the backtracking above), but then has a dot, one or more spaces and an expression.

let parseDottedList : LispParser = parse {
    let! head = endBy parseExpr spaces1
    let! tail = chr '.' >>. spaces1 >>. parseExpr
    return DottedList (head, tail)
}

And here are a bunch of functions used throughout the code, presented here for completeness.

    let spaces1 : LispParser<unit> = skipMany1 whitespace
    let chr c = skipChar c
    let endBy  p sep = many  (p .>> sep)

    let symbol : LispParser<char> = anyOf "!$%&|*+-/:<=>?@^_~#"

This is all the code you need to translate text to a LispVal to feed the evaluator. That is pretty impressive.

There is also a function to go the other way, from a LispVal to text. It is used in implementing the testcases and to print out diagnostics.

    let rec showVal = function
        | String contents -> "\"" + contents + "\""
        | Atom name -> name
        | Number num -> num.ToString()
        | Bool t -> if t then "#t" else "#f"
        | List l -> "(" + unwordsList l + ")"
        | DottedList (head, tail) -> "(" + unwordsList head + " . " + showVal tail + ")"
        | PrimitiveFunc(_) -> "<primitive>"
        | Port (_) -> "<IO port>"
        | Func({ parms = parms; varargs = varargs; body = body; closure = closure }) -> 
                                                "(lambda (" + unwordsList (parms |> List.map (String)) +
                                                    (match varargs with
                                                        | None -> ""
                                                        | Some(arg) -> " . " + arg) + ") ...)"
                                                        
    and
        unwordsList = List.map showVal >> String.concat " "