Write Yourself a Scheme in 48 Hours in F# – Part V

We have one loose end to tie in the evaluator: the primitive operators. These are things that the interpreter knows intrinsically. There is a list of them below.

let rec primitives =
        "+",    numericBinop (+)
        "-",    numericBinop (-)
        "*",    numericBinop (*)
        "/",    numericBinop (/)
        "mod",  numericBinop (%)
        "=",    numBoolBinop (=)
        "<",    numBoolBinop (<)
        ">",    numBoolBinop (>)
        "/=",   numBoolBinop (<>)
        ">=",   numBoolBinop (>=)
        "<=",   numBoolBinop (<=)
        "&&",   boolBoolBinop (&&)
        "||",   boolBoolBinop (||)
        "string=?",     strBoolBinop (=)
        "string>?",      strBoolBinop (>)
        "string<?",      strBoolBinop (<)
        "string<=?",    strBoolBinop (<=)
        "string>=?",    strBoolBinop (>=)
        "car",  car
        "cdr",  cdr
        "cons", cons
        "eq?", eqv
        "eqv?", eqv
        "equal?", equal
        // IO primitives
        "apply", applyProc
        "open-input-file", makePort FileAccess.Read
        "open-output-file", makePort FileAccess.Write
        "close-input-port", closePort
        "close-output-port", closePort
        "read", readProc
        "write", writeProc
        "read-contents", readContents
        "read-all", readAll 

Having seen the above list, it now becomes clearer why the primitiveBindings function was defined as such. It just binds these pairs into the environment.

let primitiveBindings () =
    (nullEnv ()) |> bindVars [ for v, f in primitives -> v, PrimitiveFunc f ] 

numericBinop unpacks the numbers, applies the provided operator and packs the result back in the Number.

let numericBinop op parms =
    if List.length parms < 2
        then throw <| NumArgs(2, parms)
        else parms |> List.map unpackNum |> foldl1 op |> Number

While we are at it, we can define fold1 (it tends to be  useful)

let foldl1 op = function
    | h::t -> List.fold op h t
    | [] -> throw (Default("Expected a not empty list, got an empty list"))

The other XBinops work similarly …

let boolBinop unpacker op args =
    match args with
    | [ left; right ] -> Bool (op (unpacker left) (unpacker right))
    | _ -> throw (NumArgs(2,args))

let numBoolBinop = boolBinop unpackNum
let strBoolBinop = boolBinop unpackStr
let boolBoolBinop = boolBinop unpackBool

We now have to look at the family of unpackers. They all work rather similarly. Notice Scheme making an effort to get a number out of a string and to get anything out of a list. Strong type folks won’t like that. Oh well, just remove these lines …

let rec unpackNum = function
    | Number n  -> n
    | String n  -> let success, result = System.Int32.TryParse n
                   if success
                       then result
                       else throw (TypeMismatch("number", String n))
    | List [n]  -> unpackNum n
    | notNumber -> throw (TypeMismatch("number", notNumber))

let rec unpackStr = function
    | String s -> s
    | Number n -> n.ToString()
    | Bool b   -> b.ToString()
    | List [s]  -> unpackStr s
    | noString -> throw (TypeMismatch("string", noString))
let rec unpackBool = function
    | Bool b -> b
    | List [b]  -> unpackBool b
    | noBool -> throw (TypeMismatch("boolean", noBool))

Now back to the list of primitive operators, there are the signature LISP operators car, cdr and cons. Just understanding the first line for each function should be enough to get an idea of what they do.

let car = function
    | [List (x :: _)] -> x
    | [DottedList (x :: _, _)] -> x
    | [badArg] -> throw (TypeMismatch("pair", badArg))
    | badArgList -> throw (NumArgs(1, badArgList))

let cdr = function
    | [List (x :: xs)] -> List xs
    | [DottedList ([xs], x)] -> x
    | [DottedList ((_ :: xs), x)] -> DottedList (xs, x)
    | [badArg] -> throw (TypeMismatch("pair", badArg))
    | badArgList -> throw (NumArgs(1, badArgList))

let cons = function
    | [x; List xs] -> List (x :: xs)
    | [x; DottedList (xs, xlast)] -> DottedList (x :: xs, xlast)
    | [x1; x2] -> DottedList([x1], x2)
    | badArgList -> throw (NumArgs(2, badArgList))

We then need to work our way to implement eqv (aka eq? in Scheme). We first define a function that tests that two LispVal are the same. It should be pretty self explanatory (the list piece is kind of cute).

    let rec eqvPrim e1 e2 =
        match e1, e2 with
        | (Bool b1, Bool b2) -> b1 = b2
        | (Number n1, Number n2) -> n1 = n2
        | (String s1, String s2) -> s1 = s2
        | (Atom a1, Atom a2) -> a1 = a2
        | (DottedList (xs, x), DottedList(ys, y)) -> eqvPrim (List (xs @ [x])) (List (ys @ [y]))
        | (List l1, List l2) -> l1.Length = l2.Length && List.forall2 eqvPrim l1 l2
        | _ -> false

Now we wrap the result in a Bool. Doing it this way avoid repeating the wrapping in each single line of eqvPrim (thanks to Tobias for spotting this refactoring).

let eqv = function
          | [e1; e2] -> Bool (eqvPrim e1 e2)
          | badArgList -> throw (NumArgs (2, badArgList))

Equal? checks if there is any unpacking scheme that can be used to test equality of the two elements of a two element list.

let equal = function
    | [arg1; arg2] ->
        let unpackEqual = numUnpackEq arg1 arg2 ||
                          strUnpackEq arg1 arg2 ||
                          boolUnpackEq arg1 arg2
        Bool (eqvPrim arg1 arg2 || unpackEqual)
    | argsList -> throw (NumArgs(2, argsList))

We need to define equality of packed primitive types. We do it nicely below.

let tryUnpacker (unpack : LispVal -> 'a) (op : 'a -> 'a -> bool) arg1 arg2 =
    try op (unpack arg1) (unpack arg2) with _ -> false

let numUnpackEq = tryUnpacker unpackNum (=)
let strUnpackEq = tryUnpacker unpackStr (=)
let boolUnpackEq = tryUnpacker unpackBool (=)

The apply statement maps more or less directly to our apply function.

applyProc = function
            | [func; List args] -> apply func args
            | func :: args -> apply func args
            | [] -> throw (Default("Expecting a function, got an empty list"))

And we are left with the I/O processing functions. We are simply wrapping a FileStream in a Port.

    let makePort fileAccess = fileIOFunction (fun fileName -> 
                                File.Open(fileName,FileMode.OpenOrCreate, fileAccess) |> Port)

    let closePort = function
                    | [Port(port)] -> port.Close() ; Bool true
                    | _ -> Bool false

We then can read and write from it. Notice how the lack of arguments makes us do it from the standard Console.

let rec readProc port = 
    let parseReader (reader:TextReader) = reader.ReadLine() |> readExpr
    match port with
       | [] -> parseReader(System.Console.In)
       | [Port(port)] -> 
            use reader = new StreamReader(port)
            parseReader (reader)
       | args -> throw (NumArgs(1, args))

let writeProc objPort =
    let write obj (writer: TextWriter) = writer.Write(showVal obj) ; Bool true
    match objPort with
    | [obj] -> write obj (System.Console.Out)
    | [obj ; Port(port)] ->
        use writer = new StreamWriter(port)
        write obj writer
    | args -> throw (NumArgs(1, args))

There you go. A full evaluator in two blog posts!! Next up, the parser.

Write Yourself a Scheme in 48 Hours in F# – Part IV

It is the evaluator turn. It is a big file, let’s see if I can fit it in a single post.

Aptly enough, the most important function is called eval.

eval env = function
| String _ as v -> v 
| Number _ as v -> v
| Bool _ as v -> v
| Atom var -> getVar var env 
| List [Atom "quote"; v] -> v 
| List [Atom "if"; pred; conseq; alt] -> evalIf env pred conseq alt
| List [Atom "load"; fileName] -> load [fileName] |> List.map (eval env) |> last
| List [Atom "set!" ; Atom var ; form] -> env |> setVar var (eval env form)
| List [Atom "define"; Atom var; form] -> define env var (eval env form)
| List (Atom "define" :: (List (Atom var :: parms) :: body)) ->
    makeNormalFunc env parms body |> define env var
| List (Atom "define" :: (DottedList ((Atom var :: parms), varargs) :: body)) ->
    makeVarargs varargs env parms body |> define env var
| List (Atom "lambda" :: (List parms :: body)) -> makeNormalFunc env parms body
| List (Atom "lambda" :: (DottedList(parms, varargs) :: body)) -> makeVarargs varargs env parms body
| List (Atom "lambda" :: ((Atom _) as varargs :: body)) -> makeVarargs varargs env [] body
| List (func :: args) ->
    let f = eval env func
    let argVals = List.map (eval env) args
    apply f argVals
| badForm -> throw (BadSpecialForm("Unrecognized special form", badForm))

This is the core of the evaluator. It takes as an input the LispVal generated by the parser and an environment and returns a LispVal that is the result of the reduction. As a side effect, it occasionally modify the environment. I carefully crafted the previous phrase to maximize the discomfort  of the functional programmers tuned in. Such fun :-)

More seriously (?), here is what it does:

  • If it is a String, Number of Bool, just return it
  • If it is an Atom, return its value
  • If it is a quote  statement, return what is quoted (read your little schemer manual)
  • If it is an if statement, evaluate it (see below)
  • If it is a load statement, load the file (see below) and evaluate each expression using the current environment. Return the last expression in the file
  • If it is a set!, set the value of the passed variable to the evaluated form
  • If it is a define, do almost the same as above (except that you don’t throw if the variable doesn’t exist, but you create it)
  • If it is a define  that defines a function (it has that shape), create a ‘function slot’ in the environment. That is a (functionName,  FuncRecord) pair (see below)
  • If it is a lambda, return the FuncRecord that describe the inline function
  • If it is a function call, evaluate the expression that describe the function (yes, you can do that in Lisp!!), evaluate the arguments, and apply the function to the arguments
  • Otherwise, it must be a bad form, throw it back to the calling function to do something meaningful with it

We have a bunch of ‘see below’ to take care of. We’ll look at them in order.

First the ‘if’ statement. If the evaluated predicate is Bool(True) evaluate the consequent, otherwise evaluate the alternative. For some reason, I wrote it the other way around.

    // 1a. If the evaluation of the pred is false evaluate alt, else evaluate cons
    evalIf env pred conseq alt =
        match eval env pred with
        | Bool(false) -> eval env alt
        | _ -> eval env conseq

Then there is the load function. It reads all the test and gets out the list of LispVal contained in it.

let load = fileIOFunction (fun fileName -> File.ReadAllText(fileName)
                                           |> readExprList)

ReadExprList is part of the parser. We’ll look at it then. Sufficient here to say that it takes a string and returns a list of LispVal.

FileIOFunction just encapsulates a common pattern in all the file access functions. I don’t like such mechanical factorization of methods, without any real reusability outside the immediate surroundings of the code. But I like repetition even less.

let fileIOFunction func = function
    | [String fileName] -> func (fileName) 
    | [] -> throw (IOError("No file name"))
    | args -> throw (NumArgs(1, args))

A family of functions create FuncRecord given appropriate parameters. I seem to have lost memory of the contortions related to the last one. If I end up having to work again on this code, I’ll need to figure it out again. Note to myself, please comment this kind of code next time.

let makeFunc varargs env parms body = 
            Func ({parms = (List.map showVal parms); varargs = varargs; body = body; closure = env})
let makeNormalFunc = makeFunc None
let makeVarargs = showVal >> Some >> makeFunc

apply is the other workhorse function in the evaluator.  The best way to understand it is to start from the bottom (where bindVars starts the line). We are binding the arguments and the variable arguments in the closure that has been passed in. We then evaluate the body. But the body is just a list of LispVal, so we just need to evaluate them in sequence and return the result of the last one.

and apply func args = 
    match func with
    | PrimitiveFunc(f) -> f args
    | Func ({parms = parms; varargs = varargs; body = body; closure = closure}) ->
        let invalidNonVarargs = args.Length <> parms.Length && varargs.IsNone 
        let invalidVarargs = args.Length < parms.Length && varargs.IsSome  
        if invalidVarargs || invalidNonVarargs 
            throw (NumArgs(parms.Length, args))
            let remainingArgs = args |> Seq.skip parms.Length |> Seq.toList
            let evalBody env = body |> List.map (eval env) |> last
            let rec zip xs1 xs2 acc =
                match xs1, xs2 with
                | x1::xs1, x2::xs2 -> zip xs1 xs2 ((x1, x2)::acc)
                | _ -> acc
            let bindVarArgs arg env =
                match arg with
                | Some(argName) -> bindVars [argName, (List remainingArgs)] env 
                | None -> env
            bindVars (zip parms args []) closure
                |> bindVarArgs varargs
                |> evalBody
    | funcName -> throw (NotFunction("Expecting a function, getting ", showVal funcName))

This is enough for one post. Next time we’ll finish the evaluator.

Write Yourself a Scheme in 48 Hours in F# – Part III

Very often my code ends up having the following form: parse an input to create an intermediate data structure and evaluate the structure to produce an output. Strangely, many years ago, when my code was object oriented, that wasn’t the case. Or at least I wasn’t explicitly aware of it.

When you write an interpreter or a compiler, things always work out like that, but I see the same pattern in almost everything I produce: from financial backtesting to chart libraries. Sometimes when, out of laziness or stupidity, I forego the intermediate structure, I end up in the end having to retrofit it in. Simply processing input and generating output at the same time rarely cuts it. But it is tempting because you get going pretty fast and I’m tricked into it occasionally.

Hence the first thing that I find myself reasoning about is often the particular form of such intermediate structure. In this case it looks like the following:

type Env = (string * LispVal ref) list ref

and FuncRecord = { parms: string list; varargs: string option; body: LispVal list; closure: Env}

and LispVal =
    | Atom of string
    | List of LispVal list
    | DottedList of LispVal list * LispVal
    | Number of int
    | String of string
    | Bool of bool
    | PrimitiveFunc of (LispVal list -> LispVal)
    | Func of FuncRecord
    | Port of System.IO.FileStream

This LispVal structure has one constructor for each kind of expression (production) that is allowed in Scheme. Or at least that ones I support …

It is important that each one stores all the information that is necessary for the evaluator to evaluate the expression. No more, no less. Here is a brief description:

  • Atom: it is a kind of a constant in Scheme. This is probably the worst definition ever given for it. Please read about it in your little schemer book.
  • List: is the main Scheme type. It represents a list of expressions.
  • DottedList: this is the bizarre Scheme way to pass optional parameters
  • Number: is a number :-) You will discover which kind of number when we talk about the parser
  • String : is a string
  • Bool: #t, #f
  • PrimitiveFunc: is the representation for the primitive operators/functions that are burned into the interpreter. It is just a function that takes a list of LispVal and returns a LispVal
  • Func: is a user defined function. Notice that the body of it is simply a list of LispVal. This is why LISP is so powerful. Also notice that a closure gets passed to it for the ‘captured’ variables.
  • Port: is a slightly goofy representation of an in/out stream
  • Anything else (i.e. macros) is not supported, but this would be the first piece to change if they were.

The only remaining code to address is:

type Env = (string * LispVal ref) list ref

This is the symbol table and it is ugly. It is not multithread safe either. But it works and it is close enough to the Haskell version so I decided to retain it. A proper code review would ‘strongly suggest’ rewriting the code to pass it around to each function instead of using ‘ref’ or using the state monad encapsulated in a computation expression. Any of these solutions is left as an exercise to the reader (use the testcases to validate that you get it right).

We could go in many different direction from here. We could talk about:

  • The evaluator
  • The parser
  • The symbol table
  • Error handling

To keep with the top – down approach I’ve been espousing. I’ll probably talk about the evaluator next.

Write Yourself a Scheme in 48 Hours in F# – Part II

Usually, when I do blog posts that are all about code, I write them ‘bottom up’. I start talking about the most primitive types and functions and build up from there toward higher abstractions. I think this is a pretty common way of doing it.

For this series I’m going to try the opposite. I start with the code that creates the REPL window and move down from there toward the guts of the interpreter. I hold the opinion that, if the code is written right, this should be ok. The naming scheme and general structure of it should allow understanding it at different levels.

Or at least I hope so.

Let’s start from the main function. Depending on the number of arguments it either runs the REPL window or executes whatever is in the file passed in as the first argument using the other arguments as parameters.

let main(args: string[]) =
    match Array.toList args with
    | [] -> runRepl ()
    | filename :: args -> runOne filename args

The latter case is coded in the below function. It first load all the primitive operators (i.e. ‘+’, ‘-‘ etc…) and the standard library. The word ‘load’ above is a little misleading. In reality it adds them to the environment. It then proceeds to add the arguments that were passed on. As the last step, it evaluates the ‘load’ command by using the newly created environment, it transforms the returned token to a string and prints it.

let runOne (filename : string) (args : list<string>) =
    let env = primitiveBindings ()
                |> loadStdLib
                |> bindVars [ "args", List (List.map String args) ]
    List [Atom "load"; String filename] |> eval env |> showVal |> printStr

Running the REPL windows is equally simple. Load the primitive operators and the standard library, show a prompt and evaluate the input until the input is ‘Quit’.

let runRepl () = 
    let env = primitiveBindings () |> loadStdLib
    until (fun s -> s = "Quit" || s = "quit") (fun () -> readPrompt "Lisp>>> ") (evalAndPrint env)

readPrompt is pretty simple:

let printStr (s: string) = Console.Write(s)

let readPrompt (s: string) = printStr s ; Console.ReadLine ()

EvalAndPrint is written as a chain of functions (lookup the ‘>>’ operator in F#) and just evaluate the string, transform the result to a string, prints it and newline it.

let newLine () = Console.WriteLine()

let evalAndPrint env = evalString env >> showVal >> printStr >> newLine

evalString parses the string and evaluates the expression. Note the exception management. This is a result of my decision of throwing an exception every time something goes wrong instead of using a monad to pass the state around. I think it is pretty clear, but haskellers might disagre. This is one of the main differences from the Haskell version.

let evalString env expr = 
        expr |> readExpr |> eval env
    | LispException(error) -> String (showError error)

For the sake of completeness, here is until. Maybe there is a library function somewhere that I could have used?

let rec until pred prompter evaluator =
    let result = prompter ()
    if not (pred result) then
        evaluator result
        until pred prompter evaluator

Back on the main flow of the code, loadStdLib just loads the standard file and returns the populated environment.

let loadStdLib env = 
    eval env (List [Atom "load"; String "stdlib.scm"]) |> ignore

primitiveBindings creates a new empty environment and adds a bunch of pairs (primitiveName, LispVal –> LispVal). LispVal is a representation of a Scheme expression, so the second part of the tuple is simply a reduction from one expression to another (hopefully simpler in some sense). We’ll talk more about LispVal in upcoming posts.

let primitiveBindings () =
    (nullEnv ()) |> bindVars [ for v, f in primitives -> v, PrimitiveFunc f ] 

There you have it. That’s the full implementation for the REPL window. Next post, we’ll look at LispEval and the evaluator.