Adventure in parserland – parsing lambda expressions in F# – Part II

The parser starts simple with the following two functions to parse either a string or a file. I use the XXXReaders because I want to lazy read character by character.

let parseString s =
    let reader = new StringReader(s)
    parseTextReader reader

let parseFile fileName =
    let reader = new StreamReader(fileName: string)
    parseTextReader reader

The whole parser is in the following two lines:

let parseTextReader: TextReader -> seq<Expression> =
                    textReaderToLazyList >> tokenStream >> parseExpressions

I need to specify the signature otherwise the compiler gets confused : wait, does it take a StringReader or a StreamReader? You better tell me!

The function is a composite of three functions applied in sequence:

  1. Translate a TextReader to a LazyList<char>
  2. Translate a LazyList<char> to a LazyList<Token> (lexer)
  3. Translate a LazyList<Token> to a LazyList<Expression> (parser)

My usage of LazyList as the workhorse for the program is because I want to match on the head of the stream of chars/tokens in a lazy way.

I love it when a program naturally decomposes in such simple understandable pieces. I impute some of that to functional programming. For one reason or another, in my 15+ years of object oriented programming, I’ve rarely got to the core of a problem with such immediacy.

A sequence of operations likes the above would be lost in a protected overridden implementation of a base class somewhere (or something else equally long to pronounce). The beauty would be lost somewhere in the vast machinery required to support it.

In any case, TextReaderToLazyList is a trivial generator function that uses the unfold function of LazyList to read a character at the time.

let textReaderToLazyList textReader = LazyList.unfold (fun (ts:TextReader) ->
    let ch = ts.Read()
    if ch = -1 then None else Some(char ch, ts)) textReader

The next step is to look at either the lexer, going bottom up, or the parser, going top down.

Adventure in parserland – parsing lambda expressions in F# – Part I

This is part of my ‘things that I do in the empty spaces between one meeting and the next one, which might end up being vaguely interesting’. It is a lambda expression parser.

The full source code is here.

I actually have two versions of it: one written longhand and the other one written with FParsec. Just to be clear: I’m no expert of either.

And just to be more clear: I think writing most parsers longhand in the way I am about to show is crazy. You either use FParsec or  fslex / fsyacc.

I have a strong distaste for additional compilation steps. I think it lingers on from MFC project types of 15/20 years ago. I was one of these crazy folks that would generate the project, wrap the generated code (with some generalizations) in my own library and use that one from then on.

So I prefer FParsec. I’m ok rewriting left recursive rules and its performance has never been a problem for me. Here is a table that compares the different approaches.

But I started wondering about coding a a recursive descent parser for a simple grammar by hand, fully knowing the foolishness of the idea. Thanks to Jose for code reviewing it.

The inspiration for the grammar comes from this book.

    (*
        <expression> ::= <name> | <function> | <application> 
        <name> ::= non­blank character sequence 
        <function> ::= \ <name> . <body> 
        <body> ::= <expression> 
        <application> ::= ( <function expression> <argument expression> ) 
        <function expression> ::= <expression> 
        <argument expression> ::= <expression> 
    *)

In English, an expression is either a name, a function or an application. A name is a bunch of characters (better defined in the code). A function is ‘\’, a name, ‘.’ and an expression. An application is ‘(‘, an expression, whitespaces, an expression and ‘)’.

Some testcases for the above grammar and the parsers written to parse it are below. It should be intuitive what this code does just by the name of the functions. Even it isn’t, check that the expressions symbol contains valid productions from the grammar above.

module Test

open Microsoft.FSharp.Collections
open Xunit

open LambdaEngine
open Parser
open Lexer
open FParser

let writeTokenStream stream = Seq.fold (fun acc token -> acc + writeToken token) "" stream

let rec writeExpr = function
        | EName(s) -> s
        | Function(expr, body) -> writeToken Lambda + writeExpr expr + writeToken Dot + writeExpr body
        | Application(funExpr, argExpr) -> writeToken OpenParens + writeExpr funExpr + writeToken (Ws(" ")) 
                                            + writeExpr argExpr + writeToken CloseParens
        | EOT -> ""

let tokenStreams = [
    ""
    "(\xs.xs \y.(y \x.y))"
    "(\xst.xst \y.(y  \x.y))"
    " "
    "x"
    "(x y)"
    ]

let expressions = [
    ""
    "(\x.x \y.(y \x.y))"
    "x"
    "(x y)"
    ]

let stringToCharList s =
    let textReader = new System.IO.StringReader(s)
    textReaderToLazyList textReader

[<Fact>]
let testTokenizer () =
    let testTokenStream s =
        let stream = tokenStream <| stringToCharList s
        let s1 = writeTokenStream stream
        Assert.Equal(s, s1)
    tokenStreams |> List.iter testTokenStream

let testExpr parseFunction s =
    let exprs = parseFunction s
    let s1 = exprs |> Seq.fold (fun s expr -> s + writeExpr expr) ""
    Assert.Equal(s, s1)

[<Fact>]
let testParser () = expressions |> List.iter (testExpr parseString)

[<Fact>]
let testFParser () = expressions |> List.iter (testExpr fparseString)

In the next instalment, we’ll start looking at the real code for the parser.

Write Yourself a Scheme in 48 Hours in F# – Part VII

Let’s talk about the environment now.  This is the part of the interpreter that I like the least. It is a global variable and it contains a list of  (string, LispVal) where the LispVal is mutable.

type Env = (string * LispVal ref) list ref

This is pretty bad. First of all, it immediately cuts off any option of running interpreters in different threads. Moreover, it makes a lot of functions in the evaluator to have side effects. That makes it much harder to reason about them.

In a world where I am provided with infinite time and energy, I would change it. In this world, I won’t. If you try your hand at doing it, make sure that you pass all the testcases before declaring victory. The scope rules of Scheme are not all that obvious. A code reviewer called them the Italian scoping rules because he thought I got them wrong …

In any case, there isn’t much to the symbol table management.  You can create an empty one:

let nullEnv (): Env = ref List.empty

Check if a variable is bound:

let keyEq name (k, _) = name = k

let isBound var (env: Env) = !env |> List.exists (keyEq var)

Get a variable out:

let getVar var (env: Env) =
    let result = !env |> List.tryFind (keyEq var)
    match result with
    | None -> throw (UnboundVar("Getting an unbound variable: " , var))
    | Some(_, r) -> !r

Set the value of an existing variable:

let setVar var value (env:Env) =
    let result = !env |> List.tryFind (keyEq var)
    match result with
    | Some(_, v) -> v := value ; value
    | None -> throw (UnboundVar("Setting an unbound variable: " , var))

Or define a new variable in the environment. Note that if the variable already exist, its value gets set.

let define (env:Env) var value =
    let result = !env |> List.tryFind (keyEq var)
    match result with
    | Some(_, v) -> v := value ; value
    | None -> 
        env := [var, ref value] @ !env; value
You can also bind a list of (string, LispVal) to the environment by prepending it to the existing ones:
let bindVars bindings (env:Env) = 
   ref ((bindings |> List.map (fun (n, v) -> n , ref v)) @ !env)

Once you accept the evil of the global mutable variable scheme, these functions are easy enough.

The only piece left is error management. This is where my implementation differs from the Haskell version the most. In essence, I throw exception and catch them to report errors, while the Haskell version uses a monad to propagate the error information.

I have a LispError that represents everything that can go wrong:

type LispError =
    | NumArgs of int * LispVal list
    | TypeMismatch of string * LispVal
    | ParseError of string * FParsec.Error.ParserError
    | BadSpecialForm of string * LispVal
    | NotFunction of string * string
    | UnboundVar of string * string
    | Default of string
    | IOError of string

I wrap it in an exception:

exception LispException of LispError

This is what I throw in various places in the code.

let throw le = raise (LispException(le))

I then catch it at the outer layer:

let evalString env expr = 
    try
        expr |> readExpr |> eval env
    with
    | LispException(error) -> String (showError error)

And display the error by using the below function:

let showError = function
    | NumArgs(expected, found) -> "Expected " + expected.ToString() + " args; found values " + unwordsList found
    | TypeMismatch(expected, found) -> "Invalid type: expected " + expected + ", found " + showVal found
    | ParseError(msg, _) -> "Parse Errror" + msg
    | BadSpecialForm(message, form) -> message + showVal form
    | NotFunction(message, func) -> message + func
    | UnboundVar(message, varName) -> message + varName
    | Default(message) -> message
    | IOError(message) -> message

And that’s all there is to it. I hope you guys and gals enjoyed this seven part extravagance. Cheers.

Write Yourself a Scheme in 48 Hours in F# – Part VI

The evaluator takes as an input a LispVal. Where does it come from? There must be something that converts your textual input into it. That is the job of the parser.

I have used FParsec to build my parser. FParsec is a fantastic library to build parsers. It is a perfect showcase of the composition potential that functional code yields. 

When you write an FParsec parser you compose many little parsers to create the one parser that works for your language.  The resulting code looks very much like your language grammar, but you don’t need  a separate code generation compilation step to produce it.

There is one element of ugliness in the syntax to create recursive parsers. You need to define two global variables that can be referred to before they are constructed. This is an artefact of how F# works. So you need a line in your code that looks like this:

let parseExpr, parseExprRef : LispParser * LispParser ref = createParserForwardedToRef()

With that piece of machinery out of the way, we can focus on the parser itself. Our goal here is to parse expressions and generate LispVal. We need a LispParser like the below (the second generic parameter is for advanced usage).

type LispParser = Parser<LispVal, unit>

We need to parse all the kind of expressions that the user can type. Notice in the below the use of a computation expression to simplify the syntax. Also note that lists and dotted lists look very much the same until you encounter the ‘.’ character. You could disambiguate the situation by extracting out the commonality in a separate kind of expression. I decided instead to instruct the parser to backtrack if it gets it wrong (attempt). This is slower, but keeps the code identical to our conceptual model. I value that greatly.

do parseExprRef := parseAtom
                   <|> parseString
                   <|> parseNumber
                   <|> parseQuoted
                   <|> parse {
                           do! chr '('
                           let! x = (attempt parseList) <|> parseDottedList
                           do! chr ')'
                           return x
                       }

Let’s start from the top. Parsing an atom means parsing something that starts with a letter or symbol and continues with letters, symbols or digits. Also “#t” and “#f” can be resolved at parsing time.

let parseAtom : LispParser = parse {
        let! first = letter <|> symbol
        let! rest = manyChars (letter <|> symbol <|> digit)
        return match first.ToString() + rest with
               | "#t" -> Bool true
               | "#f" -> Bool false
               | atom -> Atom atom
}

A string is just a bunch of chars (except ‘\’) surrounded by ‘ ” ’.

let parseString : LispParser = parse {
    do! chr '"'
    let! xs = manyChars (noneOf "\"")
    do! chr '"'
    return String(xs)        
}

A number is just one or more digits. I am afraid we just support integers at this stage …

let parseNumber : LispParser = many1Chars digit |>> (System.Int32.Parse >> Number)

A quoted expression is jut a ‘\’ followed by an expression.

let parseQuoted : LispParser = chr '\'' >>. parseExpr |>> fun expr -> List [Atom "quote"; expr] 

A list is just a bunch of expressions separate by at least one space.

let parseList : LispParser = sepBy parseExpr spaces1 |>> List

A dotted list starts in the same way (hence the backtracking above), but then has a dot, one or more spaces and an expression.

let parseDottedList : LispParser = parse {
    let! head = endBy parseExpr spaces1
    let! tail = chr '.' >>. spaces1 >>. parseExpr
    return DottedList (head, tail)
}

And here are a bunch of functions used throughout the code, presented here for completeness.

    let spaces1 : LispParser<unit> = skipMany1 whitespace
    let chr c = skipChar c
    let endBy  p sep = many  (p .>> sep)

    let symbol : LispParser<char> = anyOf "!$%&|*+-/:<=>?@^_~#"

This is all the code you need to translate text to a LispVal to feed the evaluator. That is pretty impressive.

There is also a function to go the other way, from a LispVal to text. It is used in implementing the testcases and to print out diagnostics.

    let rec showVal = function
        | String contents -> "\"" + contents + "\""
        | Atom name -> name
        | Number num -> num.ToString()
        | Bool t -> if t then "#t" else "#f"
        | List l -> "(" + unwordsList l + ")"
        | DottedList (head, tail) -> "(" + unwordsList head + " . " + showVal tail + ")"
        | PrimitiveFunc(_) -> "<primitive>"
        | Port (_) -> "<IO port>"
        | Func({ parms = parms; varargs = varargs; body = body; closure = closure }) -> 
                                                "(lambda (" + unwordsList (parms |> List.map (String)) +
                                                    (match varargs with
                                                        | None -> ""
                                                        | Some(arg) -> " . " + arg) + ") ...)"
                                                        
    and
        unwordsList = List.map showVal >> String.concat " "

Write Yourself a Scheme in 48 Hours in F# – Part V

We have one loose end to tie in the evaluator: the primitive operators. These are things that the interpreter knows intrinsically. There is a list of them below.

let rec primitives =
     [
        "+",    numericBinop (+)
        "-",    numericBinop (-)
        "*",    numericBinop (*)
        "/",    numericBinop (/)
        "mod",  numericBinop (%)
        "=",    numBoolBinop (=)
        "<",    numBoolBinop (<)
        ">",    numBoolBinop (>)
        "/=",   numBoolBinop (<>)
        ">=",   numBoolBinop (>=)
        "<=",   numBoolBinop (<=)
        "&&",   boolBoolBinop (&&)
        "||",   boolBoolBinop (||)
        "string=?",     strBoolBinop (=)
        "string>?",      strBoolBinop (>)
        "string<?",      strBoolBinop (<)
        "string<=?",    strBoolBinop (<=)
        "string>=?",    strBoolBinop (>=)
        "car",  car
        "cdr",  cdr
        "cons", cons
        "eq?", eqv
        "eqv?", eqv
        "equal?", equal
        
        // IO primitives
        "apply", applyProc
        "open-input-file", makePort FileAccess.Read
        "open-output-file", makePort FileAccess.Write
        "close-input-port", closePort
        "close-output-port", closePort
        "read", readProc
        "write", writeProc
        "read-contents", readContents
        "read-all", readAll 
     ]

Having seen the above list, it now becomes clearer why the primitiveBindings function was defined as such. It just binds these pairs into the environment.

let primitiveBindings () =
    (nullEnv ()) |> bindVars [ for v, f in primitives -> v, PrimitiveFunc f ] 

numericBinop unpacks the numbers, applies the provided operator and packs the result back in the Number.

let numericBinop op parms =
    if List.length parms < 2
        then throw <| NumArgs(2, parms)
        else parms |> List.map unpackNum |> foldl1 op |> Number

While we are at it, we can define fold1 (it tends to be  useful)

let foldl1 op = function
    | h::t -> List.fold op h t
    | [] -> throw (Default("Expected a not empty list, got an empty list"))

The other XBinops work similarly …

let boolBinop unpacker op args =
    match args with
    | [ left; right ] -> Bool (op (unpacker left) (unpacker right))
    | _ -> throw (NumArgs(2,args))

let numBoolBinop = boolBinop unpackNum
let strBoolBinop = boolBinop unpackStr
let boolBoolBinop = boolBinop unpackBool

We now have to look at the family of unpackers. They all work rather similarly. Notice Scheme making an effort to get a number out of a string and to get anything out of a list. Strong type folks won’t like that. Oh well, just remove these lines …

let rec unpackNum = function
    | Number n  -> n
    | String n  -> let success, result = System.Int32.TryParse n
                   if success
                       then result
                       else throw (TypeMismatch("number", String n))
    | List [n]  -> unpackNum n
    | notNumber -> throw (TypeMismatch("number", notNumber))

let rec unpackStr = function
    | String s -> s
    | Number n -> n.ToString()
    | Bool b   -> b.ToString()
    | List [s]  -> unpackStr s
    | noString -> throw (TypeMismatch("string", noString))
    
let rec unpackBool = function
    | Bool b -> b
    | List [b]  -> unpackBool b
    | noBool -> throw (TypeMismatch("boolean", noBool))

Now back to the list of primitive operators, there are the signature LISP operators car, cdr and cons. Just understanding the first line for each function should be enough to get an idea of what they do.

let car = function
    | [List (x :: _)] -> x
    | [DottedList (x :: _, _)] -> x
    | [badArg] -> throw (TypeMismatch("pair", badArg))
    | badArgList -> throw (NumArgs(1, badArgList))

let cdr = function
    | [List (x :: xs)] -> List xs
    | [DottedList ([xs], x)] -> x
    | [DottedList ((_ :: xs), x)] -> DottedList (xs, x)
    | [badArg] -> throw (TypeMismatch("pair", badArg))
    | badArgList -> throw (NumArgs(1, badArgList))

let cons = function
    | [x; List xs] -> List (x :: xs)
    | [x; DottedList (xs, xlast)] -> DottedList (x :: xs, xlast)
    | [x1; x2] -> DottedList([x1], x2)
    | badArgList -> throw (NumArgs(2, badArgList))

We then need to work our way to implement eqv (aka eq? in Scheme). We first define a function that tests that two LispVal are the same. It should be pretty self explanatory (the list piece is kind of cute).

    let rec eqvPrim e1 e2 =
        match e1, e2 with
        | (Bool b1, Bool b2) -> b1 = b2
        | (Number n1, Number n2) -> n1 = n2
        | (String s1, String s2) -> s1 = s2
        | (Atom a1, Atom a2) -> a1 = a2
        | (DottedList (xs, x), DottedList(ys, y)) -> eqvPrim (List (xs @ [x])) (List (ys @ [y]))
        | (List l1, List l2) -> l1.Length = l2.Length && List.forall2 eqvPrim l1 l2
        | _ -> false

Now we wrap the result in a Bool. Doing it this way avoid repeating the wrapping in each single line of eqvPrim (thanks to Tobias for spotting this refactoring).

let eqv = function
          | [e1; e2] -> Bool (eqvPrim e1 e2)
          | badArgList -> throw (NumArgs (2, badArgList))

Equal? checks if there is any unpacking scheme that can be used to test equality of the two elements of a two element list.

let equal = function
    | [arg1; arg2] ->
        let unpackEqual = numUnpackEq arg1 arg2 ||
                          strUnpackEq arg1 arg2 ||
                          boolUnpackEq arg1 arg2
        Bool (eqvPrim arg1 arg2 || unpackEqual)
    | argsList -> throw (NumArgs(2, argsList))

We need to define equality of packed primitive types. We do it nicely below.

let tryUnpacker (unpack : LispVal -> 'a) (op : 'a -> 'a -> bool) arg1 arg2 =
    try op (unpack arg1) (unpack arg2) with _ -> false

let numUnpackEq = tryUnpacker unpackNum (=)
let strUnpackEq = tryUnpacker unpackStr (=)
let boolUnpackEq = tryUnpacker unpackBool (=)

The apply statement maps more or less directly to our apply function.

applyProc = function
            | [func; List args] -> apply func args
            | func :: args -> apply func args
            | [] -> throw (Default("Expecting a function, got an empty list"))

And we are left with the I/O processing functions. We are simply wrapping a FileStream in a Port.

    let makePort fileAccess = fileIOFunction (fun fileName -> 
                                File.Open(fileName,FileMode.OpenOrCreate, fileAccess) |> Port)

    let closePort = function
                    | [Port(port)] -> port.Close() ; Bool true
                    | _ -> Bool false

We then can read and write from it. Notice how the lack of arguments makes us do it from the standard Console.

let rec readProc port = 
    let parseReader (reader:TextReader) = reader.ReadLine() |> readExpr
    match port with
       | [] -> parseReader(System.Console.In)
       | [Port(port)] -> 
            use reader = new StreamReader(port)
            parseReader (reader)
       | args -> throw (NumArgs(1, args))

let writeProc objPort =
    let write obj (writer: TextWriter) = writer.Write(showVal obj) ; Bool true
    match objPort with
    | [obj] -> write obj (System.Console.Out)
    | [obj ; Port(port)] ->
        use writer = new StreamWriter(port)
        write obj writer
    | args -> throw (NumArgs(1, args))

There you go. A full evaluator in two blog posts!! Next up, the parser.

Write Yourself a Scheme in 48 Hours in F# – Part IV

It is the evaluator turn. It is a big file, let’s see if I can fit it in a single post.

Aptly enough, the most important function is called eval.

eval env = function
| String _ as v -> v 
| Number _ as v -> v
| Bool _ as v -> v
| Atom var -> getVar var env 
| List [Atom "quote"; v] -> v 
| List [Atom "if"; pred; conseq; alt] -> evalIf env pred conseq alt
| List [Atom "load"; fileName] -> load [fileName] |> List.map (eval env) |> last
| List [Atom "set!" ; Atom var ; form] -> env |> setVar var (eval env form)
| List [Atom "define"; Atom var; form] -> define env var (eval env form)
| List (Atom "define" :: (List (Atom var :: parms) :: body)) ->
    makeNormalFunc env parms body |> define env var
| List (Atom "define" :: (DottedList ((Atom var :: parms), varargs) :: body)) ->
    makeVarargs varargs env parms body |> define env var
| List (Atom "lambda" :: (List parms :: body)) -> makeNormalFunc env parms body
| List (Atom "lambda" :: (DottedList(parms, varargs) :: body)) -> makeVarargs varargs env parms body
| List (Atom "lambda" :: ((Atom _) as varargs :: body)) -> makeVarargs varargs env [] body
| List (func :: args) ->
    let f = eval env func
    let argVals = List.map (eval env) args
    apply f argVals
| badForm -> throw (BadSpecialForm("Unrecognized special form", badForm))

This is the core of the evaluator. It takes as an input the LispVal generated by the parser and an environment and returns a LispVal that is the result of the reduction. As a side effect, it occasionally modify the environment. I carefully crafted the previous phrase to maximize the discomfort  of the functional programmers tuned in. Such fun :-)

More seriously (?), here is what it does:

  • If it is a String, Number of Bool, just return it
  • If it is an Atom, return its value
  • If it is a quote  statement, return what is quoted (read your little schemer manual)
  • If it is an if statement, evaluate it (see below)
  • If it is a load statement, load the file (see below) and evaluate each expression using the current environment. Return the last expression in the file
  • If it is a set!, set the value of the passed variable to the evaluated form
  • If it is a define, do almost the same as above (except that you don’t throw if the variable doesn’t exist, but you create it)
  • If it is a define  that defines a function (it has that shape), create a ‘function slot’ in the environment. That is a (functionName,  FuncRecord) pair (see below)
  • If it is a lambda, return the FuncRecord that describe the inline function
  • If it is a function call, evaluate the expression that describe the function (yes, you can do that in Lisp!!), evaluate the arguments, and apply the function to the arguments
  • Otherwise, it must be a bad form, throw it back to the calling function to do something meaningful with it

We have a bunch of ‘see below’ to take care of. We’ll look at them in order.

First the ‘if’ statement. If the evaluated predicate is Bool(True) evaluate the consequent, otherwise evaluate the alternative. For some reason, I wrote it the other way around.

and
    // 1a. If the evaluation of the pred is false evaluate alt, else evaluate cons
    evalIf env pred conseq alt =
        match eval env pred with
        | Bool(false) -> eval env alt
        | _ -> eval env conseq

Then there is the load function. It reads all the test and gets out the list of LispVal contained in it.

let load = fileIOFunction (fun fileName -> File.ReadAllText(fileName)
                                           |> readExprList)

ReadExprList is part of the parser. We’ll look at it then. Sufficient here to say that it takes a string and returns a list of LispVal.

FileIOFunction just encapsulates a common pattern in all the file access functions. I don’t like such mechanical factorization of methods, without any real reusability outside the immediate surroundings of the code. But I like repetition even less.

let fileIOFunction func = function
    | [String fileName] -> func (fileName) 
    | [] -> throw (IOError("No file name"))
    | args -> throw (NumArgs(1, args))

A family of functions create FuncRecord given appropriate parameters. I seem to have lost memory of the contortions related to the last one. If I end up having to work again on this code, I’ll need to figure it out again. Note to myself, please comment this kind of code next time.

let makeFunc varargs env parms body = 
            Func ({parms = (List.map showVal parms); varargs = varargs; body = body; closure = env})
let makeNormalFunc = makeFunc None
let makeVarargs = showVal >> Some >> makeFunc

apply is the other workhorse function in the evaluator.  The best way to understand it is to start from the bottom (where bindVars starts the line). We are binding the arguments and the variable arguments in the closure that has been passed in. We then evaluate the body. But the body is just a list of LispVal, so we just need to evaluate them in sequence and return the result of the last one.

and apply func args = 
    match func with
    | PrimitiveFunc(f) -> f args
    | Func ({parms = parms; varargs = varargs; body = body; closure = closure}) ->
        let invalidNonVarargs = args.Length <> parms.Length && varargs.IsNone 
        let invalidVarargs = args.Length < parms.Length && varargs.IsSome  
        
        if invalidVarargs || invalidNonVarargs 
        then
            throw (NumArgs(parms.Length, args))
        else
            let remainingArgs = args |> Seq.skip parms.Length |> Seq.toList
            let evalBody env = body |> List.map (eval env) |> last
            let rec zip xs1 xs2 acc =
                match xs1, xs2 with
                | x1::xs1, x2::xs2 -> zip xs1 xs2 ((x1, x2)::acc)
                | _ -> acc
            let bindVarArgs arg env =
                match arg with
                | Some(argName) -> bindVars [argName, (List remainingArgs)] env 
                | None -> env
            bindVars (zip parms args []) closure
                |> bindVarArgs varargs
                |> evalBody
    | funcName -> throw (NotFunction("Expecting a function, getting ", showVal funcName))

This is enough for one post. Next time we’ll finish the evaluator.

Write Yourself a Scheme in 48 Hours in F# – Part III

Very often my code ends up having the following form: parse an input to create an intermediate data structure and evaluate the structure to produce an output. Strangely, many years ago, when my code was object oriented, that wasn’t the case. Or at least I wasn’t explicitly aware of it.

When you write an interpreter or a compiler, things always work out like that, but I see the same pattern in almost everything I produce: from financial backtesting to chart libraries. Sometimes when, out of laziness or stupidity, I forego the intermediate structure, I end up in the end having to retrofit it in. Simply processing input and generating output at the same time rarely cuts it. But it is tempting because you get going pretty fast and I’m tricked into it occasionally.

Hence the first thing that I find myself reasoning about is often the particular form of such intermediate structure. In this case it looks like the following:

type Env = (string * LispVal ref) list ref

and FuncRecord = { parms: string list; varargs: string option; body: LispVal list; closure: Env}

and LispVal =
    | Atom of string
    | List of LispVal list
    | DottedList of LispVal list * LispVal
    | Number of int
    | String of string
    | Bool of bool
    | PrimitiveFunc of (LispVal list -> LispVal)
    | Func of FuncRecord
    | Port of System.IO.FileStream

This LispVal structure has one constructor for each kind of expression (production) that is allowed in Scheme. Or at least that ones I support …

It is important that each one stores all the information that is necessary for the evaluator to evaluate the expression. No more, no less. Here is a brief description:

  • Atom: it is a kind of a constant in Scheme. This is probably the worst definition ever given for it. Please read about it in your little schemer book.
  • List: is the main Scheme type. It represents a list of expressions.
  • DottedList: this is the bizarre Scheme way to pass optional parameters
  • Number: is a number :-) You will discover which kind of number when we talk about the parser
  • String : is a string
  • Bool: #t, #f
  • PrimitiveFunc: is the representation for the primitive operators/functions that are burned into the interpreter. It is just a function that takes a list of LispVal and returns a LispVal
  • Func: is a user defined function. Notice that the body of it is simply a list of LispVal. This is why LISP is so powerful. Also notice that a closure gets passed to it for the ‘captured’ variables.
  • Port: is a slightly goofy representation of an in/out stream
  • Anything else (i.e. macros) is not supported, but this would be the first piece to change if they were.

The only remaining code to address is:

type Env = (string * LispVal ref) list ref

This is the symbol table and it is ugly. It is not multithread safe either. But it works and it is close enough to the Haskell version so I decided to retain it. A proper code review would ‘strongly suggest’ rewriting the code to pass it around to each function instead of using ‘ref’ or using the state monad encapsulated in a computation expression. Any of these solutions is left as an exercise to the reader (use the testcases to validate that you get it right).

We could go in many different direction from here. We could talk about:

  • The evaluator
  • The parser
  • The symbol table
  • Error handling

To keep with the top – down approach I’ve been espousing. I’ll probably talk about the evaluator next.

Write Yourself a Scheme in 48 Hours in F# – Part II

Usually, when I do blog posts that are all about code, I write them ‘bottom up’. I start talking about the most primitive types and functions and build up from there toward higher abstractions. I think this is a pretty common way of doing it.

For this series I’m going to try the opposite. I start with the code that creates the REPL window and move down from there toward the guts of the interpreter. I hold the opinion that, if the code is written right, this should be ok. The naming scheme and general structure of it should allow understanding it at different levels.

Or at least I hope so.

Let’s start from the main function. Depending on the number of arguments it either runs the REPL window or executes whatever is in the file passed in as the first argument using the other arguments as parameters.

[<EntryPoint>]
let main(args: string[]) =
    match Array.toList args with
    | [] -> runRepl ()
    | filename :: args -> runOne filename args
    0

The latter case is coded in the below function. It first load all the primitive operators (i.e. ‘+’, ‘-‘ etc…) and the standard library. The word ‘load’ above is a little misleading. In reality it adds them to the environment. It then proceeds to add the arguments that were passed on. As the last step, it evaluates the ‘load’ command by using the newly created environment, it transforms the returned token to a string and prints it.

let runOne (filename : string) (args : list<string>) =
    let env = primitiveBindings ()
                |> loadStdLib
                |> bindVars [ "args", List (List.map String args) ]
    List [Atom "load"; String filename] |> eval env |> showVal |> printStr

Running the REPL windows is equally simple. Load the primitive operators and the standard library, show a prompt and evaluate the input until the input is ‘Quit’.

let runRepl () = 
    let env = primitiveBindings () |> loadStdLib
    until (fun s -> s = "Quit" || s = "quit") (fun () -> readPrompt "Lisp>>> ") (evalAndPrint env)

readPrompt is pretty simple:

let printStr (s: string) = Console.Write(s)

let readPrompt (s: string) = printStr s ; Console.ReadLine ()

EvalAndPrint is written as a chain of functions (lookup the ‘>>’ operator in F#) and just evaluate the string, transform the result to a string, prints it and newline it.

let newLine () = Console.WriteLine()

let evalAndPrint env = evalString env >> showVal >> printStr >> newLine

evalString parses the string and evaluates the expression. Note the exception management. This is a result of my decision of throwing an exception every time something goes wrong instead of using a monad to pass the state around. I think it is pretty clear, but haskellers might disagre. This is one of the main differences from the Haskell version.

let evalString env expr = 
    try
        expr |> readExpr |> eval env
    with
    | LispException(error) -> String (showError error)

For the sake of completeness, here is until. Maybe there is a library function somewhere that I could have used?

let rec until pred prompter evaluator =
    let result = prompter ()
    if not (pred result) then
        evaluator result
        until pred prompter evaluator

Back on the main flow of the code, loadStdLib just loads the standard file and returns the populated environment.

let loadStdLib env = 
    eval env (List [Atom "load"; String "stdlib.scm"]) |> ignore
    env

primitiveBindings creates a new empty environment and adds a bunch of pairs (primitiveName, LispVal –> LispVal). LispVal is a representation of a Scheme expression, so the second part of the tuple is simply a reduction from one expression to another (hopefully simpler in some sense). We’ll talk more about LispVal in upcoming posts.

let primitiveBindings () =
    (nullEnv ()) |> bindVars [ for v, f in primitives -> v, PrimitiveFunc f ] 

There you have it. That’s the full implementation for the REPL window. Next post, we’ll look at LispEval and the evaluator.

Write Yourself a Scheme in 48 Hours in F# – Part I

Hi, I’m back. I’ve finally sorted out the guidelines for blogging in Credit Suisse.

Here is something I have been playing around with in the spare time between one meeting and the next one.  It is a Scheme interpreter that includes a REPL window. The full code is here.

All the smarts for it come from this Wiki Book. I just ported the code to F# (and modified it a bit). I thought the comparison might be interesting, so here we go. Thanks to Tobias and Jose for reviewing the code, find one bug and suggest improvements.

Before we start looking at the real code, here is what we are trying to accomplish in form of test cases. If you are a bit rusty on LISP syntax, you might want to try and see if you understand what it does.

Our goal is to make all this XUnit test cases pass. Each of the lists below contains the Scheme statement and the result to display in the REPL window.

open Xunit
open Lisp.Repl
open Lisp.Parser
open Lisp.SymbolTable

let eval env = evalString env >> showVal
let initEnv () = primitiveBindings () |> loadStdLib

let test tests =
    let env = initEnv ()
    tests |> List.iter (fun (expr, result) -> Assert.Equal(result, eval env expr))

[<Fact>]
let simpleEval() = 
    let tests = [
        "(+ 2 2)", "4"
        "(+ 2 (- 4 1))", "5"
        "(- (+ 4 6 3) 3 5 2)", "3"
    ]
    test tests

[<Fact>]
let errorCheck() =
    let tests = [
         "(+ 2 \"two\")", "\"Invalid type: expected number, found \"two\"\""
         "(+ 2)", "\"Expected 2 args; found values 2\""
         "(what? 2)", "\"Getting an unbound variable: what?\""
         ]
    test tests

[<Fact>]
let moreEval() =
    let tests = [
         "(< 2 3)", "#t"
         "(> 2 3)", "#f"
         "(>= 3 3)", "#t"
         "(string=? \"test\" \"test\")", "#t"
         "(string=? \"abcd\" \"dsft\")", "#f"
         "(if (> 2 3) \"no\" \"yes\")", "\"yes\""
         "(if (= 3 3) (+ 2 3 (- 5 1)) \"unequal\")", "9"
         "(cdr '(a simple test))", "(simple test)"
         "(car (cdr '(a simple test)))", "simple"
         "(car '((this is) a test))", "(this is)"
         "(cons '(this is) 'test)", "((this is) . test)"
         "(cons '(this is) '())", "((this is))"
         "(eqv? 1 3)", "#f"
         "(eqv? 3 3)", "#t"
         "(eqv? 'atom 'atom)", "#t"
         ]
    test tests
    
[<Fact>]
let assignement() =
    let tests = [
        "(define x 3)", "3"
        "(+ x 2)", "5"
        "(+ y 2)", "\"Getting an unbound variable: y\""
        "(define y 5)", "5"
        "(+ x (- y 2))", "6"
        "(define str \"A string\")", "\"A string\""
        "(< str \"The string\")", "\"Invalid type: expected number, found \"A string\"\""
        "(string<? str \"The string\")", "#t"
         ]
    test tests

[<Fact>]
let closure() =
    let tests = [
        "(define (f x y) (+ x y))", "(lambda (\"x\" \"y\") ...)"
        "(f 1 2)", "3"
        "(f 1 2 3)", "\"Expected 2 args; found values 1 2 3\""
        "(define (factorial x) (if (= x 1) 1 (* x (factorial (- x 1)))))", "(lambda (\"x\") ...)"
        "(factorial 10)", "3628800"
        "(define (counter inc) (lambda (x) (set! inc (+ x inc)) inc))", "(lambda (\"inc\") ...)"
        "(define my-count (counter 5))", "(lambda (\"x\") ...)"
        "(my-count 3)", "8"
        "(my-count 6)", "14"
        "(my-count 5)", "19"
         ]
    test tests

[<Fact>]
let predefinedFunctions() =
    let tests = [
        "(map (curry + 2) '(1 2 3 4))", "(3 4 5 6)"
        "(filter even? '(1 2 3 4))", "(2 4)"
        ]
    test tests

[<Fact>]
let varargsCountCheck() =
    let tests = [
        "(define (sum x y . lst) (fold + (* x y) lst))", "(lambda (\"x\" \"y\" . lst) ...)"
        "(sum 1 2 3)", "5"
        "(sum 1 1 1)", "2"
        "(sum 1 2)", "2"
        "(sum 1)", "\"Expected 2 args; found values 1\""
         ]
    test tests