Luca Bolognese's Blog

September 16, 2011

Adventure in parserland – parsing lambda expressions in F# – Part V

Filed under: F# — lucabol @ 6:57 am

We are now going to look at a solution which is concise, efficient and gives sophisticated error messages. It is also less than 20 lines of code. We’ll be using FParsec.

FParsec is a port of an Haskell library. It is a parser combinator library or, as I like to think of it, an internal DSL to build parsers in F#. My usual disclaimer: I’m not an expert in FParsec. It is likely that, if you are an expert, you can come up with more maintainable/efficient/elegant version of this parser.

The whole code is below:

let ws = " \t\n"
let specialChars = ".)(\\\n"

let pWs = spaces
let pName = manyChars (noneOf (ws + specialChars)) |>> EName

let pExpr, pExprRef = createParserForwardedToRef<Expression, Unit>()

let curry2 f a b = f(a,b)
let pFunction = pchar '\\' >>. pipe2 pName (pchar '.' >>. pExpr) (curry2 Function)

let pApplication = pchar '(' >>. pipe2 pExpr (pWs >>. pExpr) (curry2 Application)
                          .>> pWs .>> pchar ')'

do pExprRef := pFunction <|> pApplication <|> pName

let pExpressions = sepBy pExpr spaces1

This mirrors pretty closely the grammar we are trying to parse. A program is a bunch of expressions separated by whitespaces.

let pExpressions = sepBy pExpr spaces1

sepBy is a combinator that takes a parser that defines what to parse and a parser that defines what the separator should be. And the separator should be one or more spaces …

pExpr is either a function, an application or a name. the operator <|> is a choice combinator that tries each parser in order. It tries the right parser just if the left parser fails and it doesn’t consume input. So it doesn’t backtrack. If you need a backtracking parser, you’ll need to get acquainted with the attempt combinator.

do pExprRef := pFunction <|> pApplication <|> pName

A function starts with a ‘\’, then comes a name, a dot and an expression.

let pFunction = pchar '\\' >>. pipe2 pName (pchar '.' >>. pExpr)                                                       (curry2 Function)

I know that might look crazy (and maybe it is), but just bear with me. Someone , who I’m not sure I can name, once told me that functional programming is great to write code, but terrible to read it and debug it. The phrase stayed with me as containing a grain of truth. In any case, in the code above:

  • >>. is a combinator that says “use the left parser, discard its value and then use the right one, returning its value”. Try to guess what .>> does …
  • pipe2 is a combinator that says “apply the first parser, the second parser and then call a function passing as parameters the values returned by the two parsers
  • curry2 is a function combinator that transform a function that takes a tuple to a function that takes the parameters as untupled
let curry2 f a b = f(a,b)

An application works similarly, but differently …

let pApplication = pchar '(' >>. pipe2 pExpr (pWs >>. pExpr) (curry2 Application)
                              .>> pWs .>> pchar ')'

The only difference is that now we have to consume the optional whitespaces and the ‘)’ at the end. A rule of thumb that I use is to use >>.  to flow the result through and pipeX when I need more than one result.

The last thing is pName, which consume chars until it finds either a whitespace or a special char.

let ws = " \t\n"
let specialChars = ".)(\\\n"

let pWs = spaces
let pName = manyChars (noneOf (ws + specialChars)) |>> EName

And there you have it, a lexer, a parser all in 20 lines of code. I don’t like the code that I wrote above much. I’m sure I could refine it plenty and it probably contains some bugs, but it gives an idea of what is possible with FParsec.

September 9, 2011

Adventure in parserland – parsing lambda expressions in F# – Part IV

Filed under: F# — lucabol @ 6:09 am

Let’ now look at the parser. First let’s review the grammar:

    (*
        <expression> ::= <name> | <function> | <application>
        <name> ::= non­blank character sequence
        <function> ::= \ <name> . <body>
        <body> ::= <expression>
        <application> ::= ( <function expression> <argument expression> )
        <function expression> ::= <expression>
        <argument expression> ::= <expression>
    *)

And the data type to represent it:

type Name = string
and Body = Expression
and Function = Name * Expression
and FunctionExpression = Expression
and ArgumentExpression = Expression
and Expression =
| EName of string
| Function of Expression * Body
| Application of FunctionExpression * ArgumentExpression
| EOT

In essence, the data type need to store all the information needed for subsequent stages of computation (i.e. beta reductions and such). The closer it is to the grammar, the better. In this case it looks pretty close.

Remember what is the main goal of our parser:

let parseTextReader: TextReader -> seq<Expression> =
                    textReaderToLazyList >> tokenStream >> parseExpressions

We have already looked at TextReaderToLazyList and tokenStream. Now it is the time to look at parseExpressions. It’s goal is to  parse the LazyList<Token> and return a sequence of expressions. The choice of returning a sequence at this point is to make the parseTextReader, which is the main function in the program, return a more ‘standard’ type.

and parseExpressions tokens = seq {
   let tokens = parseOptionalWs tokens
   let expr, tokens = parseExpr tokens
   let tokens = parseOptionalWs tokens
   match expr with
    | EOT   -> yield EOT
    | exp   -> yield exp; yield! parseExpressions tokens }

parseOtionalWs simply skips ahead whatever whitespaces it finds.

and parseOptionalWs tokens = match tokens with
                                | LazyList.Nil -> LazyList.empty
                                | LazyList.Cons(h, t) ->
                                    match h with
                                       | Ws _ -> parseOptionalWs t
                                       | _ -> tokens

parseExpr is more interesting. It is the main switch that creates expression kinds.

let rec parseExpr tokens = match tokens with
                            | LazyList.Nil -> EOT, LazyList.empty
                            | LazyList.Cons(h, t) ->
                                match h with
                                    | EOF -> parseEOF tokens
                                    | Name _ -> parseName  tokens
                                    | Lambda -> parseFunction tokens
                                    | OpenParens -> parseApplication tokens
                                    | token -> errorAtStart "Expression" token

parseEOF is not.

and parseEOF tokens = EOT, LazyList.empty

parseName just returns a EName, unwrapping it from Name.

and parseName tokens = EName (head tokens |> unwrapName), tail tokens

Unwrap just unwraps it.

let unwrapName = function
    | Name(s) -> s
    | tok -> errorExpecting "a Name" <| writeToken tok

parseFunction just conumes a Lambda, a name, a Dot token, a body (i.e. \x.x)and assembles them in a Function:

and parseFunction tokens =
    let tokens = consumeToken Lambda tokens
    let name, tokens = parseName tokens
    let tokens = consumeToken Dot tokens
    let body, tokens = parseExpr tokens
    Function(name, body), tokens

consumeToken tries to consume a token generating an error if it doesn’t find it:

let consumeToken token =
    genericConsumeToken (fun token' _ -> errorExpecting (writeToken token') (writeToken token)) token

genericConsumeToken is just a generalization of the function above:

let genericConsumeToken noMatch token = function
    | LazyList.Nil -> LazyList.empty
    | LazyList.Cons(h, t) as originalTokens ->
        match h with
        | tok when tok = token -> t
        | tok -> noMatch token originalTokens

The last thing left to consume is an application which is in this form (func args):

and parseApplication tokens =
    let tokens = consumeToken OpenParens tokens
    let funExpr, tokens = parseExpr tokens
    let tokens = parseOptionalWs tokens
    let argExpr, tokens = parseExpr tokens
    let tokens = consumeToken CloseParens tokens
    Application(funExpr, argExpr), tokens

Various error and utility functions are defined below:

let errorEOF expecting = failwith  ("Expected " + expecting + ", got EOF")
let errorExpecting expecting gotToken = failwith ("Expected " + expecting + ", got" + gotToken)
let errorAtStart expecting gotToken = failwith ("Expected " + expecting + " which cannot start with" + writeToken gotToken)

let tail = LazyList.tail
let head = LazyList.head

And that is the parser. All 100+ lines of it. As you can tell it is rather formulaic to go from a grammar to a lexer and a parser, which is why you shouldn’t do it, but instead let a tool generate the code for you given the grammar or use FParsec.

We have written 200+ code and I don’t think we can be too proud of our achievement. It is:

  • Certainly buggy
  • Primitive in error handling
  • Not tail recursive (big text is likely to blow up our stack)
  • Probably inefficient

So let’s look next at a better way to do it.

September 2, 2011

Adventure in parserland – parsing lambda expressions in F# – Part III

Filed under: F# — lucabol @ 3:25 pm

Let’s start from the lexer. Remember, I wrote this code based on my memory of how a lexer ought to look like. I didn’t read again the relevant chapters in the Dragon book. But I think it came out all right after all.

The tokenStream function we looked at last time takes a LazyList<char> and returns a LazyList<Token>. It uses the unfold method on LazyList to call matchToken on each char until the stream is empty.

let rec tokenStream chars =
    LazyList.unfold
        (fun chList ->
            match chList with
            | LazyList.Nil -> None
            | chList ->
                let token, chList' = matchToken chList
                Some(token, chList')
        )
        chars 

A token is what gets passed up to the parser to do syntactic analysis on. It is the vocabulary of our language. The lexer divide a phrase in words, the parser put together the words in a phrase. So, these are the words.

type Token =
    | Name of string
    | Dot
    | OpenParens
    | CloseParens
    | Lambda
    | Def
    | Ws of string
    | NewLine
    | EOF

Matching is a process whereas you try to return the token that you have read plus the list of characters yet to be read. Matching a Token is defined below:

let matchToken = function
    | LazyList.Nil                 -> EOF, LazyList.empty
    | LazyList.Cons(h, t) as chars ->
        match h with
        | ch when isWs ch -> matchWs chars
        | ch when isSpecialChar ch -> matchSpecialChar ch t
        | _ -> matchString chars

A token is either nothing, a whitespace, a special char or anything else.

Let’s look at what matching each one of them means.  Matching whitespaces means consuming them and remembering what was consumed.

let matchWs chars =
    let value, remainingChars = matchSeriesOfChars isWs chars
    Ws value, remainingChars

matchSeriesOfChars takes a predicate and a LazyList of chars and returns the string composed of all the consecutive chars for which the predicate is true, plus, as always, the remaining chars to be matched. In this case the predicate returns true if the char is a whitespace.

To write matchSeriesOfChars I need a function that reverses a LazyList. Not having found such thing, I wrote it.

let reversell l =
    let rec go l' a = match l', a with
                        | LazyList.Nil, a -> a
                        | LazyList.Cons(h, t), a -> go t (LazyList.cons h a)
    go l LazyList.empty

Then I wrote matchSeriesOfChars. The function uses an accumulator. It adds to the front whenever the predicate is true, it reverses it and translates it to a string (I could have reversed the string instead, it might have been better).

let matchSeriesOfChars comparer chars =
    let rec go result = function
        | LazyList.Nil    -> charListToString(reversell result), LazyList.empty
        | LazyList.Cons(h, t) -> if comparer h then go (LazyList.cons h result) t
                                 else charListToString (reversell result), LazyList.cons h t
    go LazyList.empty chars

These are  predicates we’ll use later on to recognize characters:

let isInString (ch: char) (s: string) = s.IndexOf(ch) <> -1
let isWs (chr: char) = isInString chr wsChars
let isNameChar (chr: char) = not (isInString chr (wsChars + specialChars))
let isSpecialChar ch = isInString ch specialChars

wsChar and specialChars are defined below:

    let wsChars = " \t"
    let charTokens =
        Map.ofList [
            '.' , Dot
            '(' , OpenParens
            ')' , CloseParens
            '\\', Lambda
            '\n', NewLine
         ]
    let specialChars = charTokens |> Map.fold (fun s k v -> s + k.ToString()) ""

Getting back to the more important matching functions, matching a special character is defined as a simple lookup in the charToken map:

let matchSpecialChar ch chars = Map.find ch charTokens, chars

We are left with matchString, this simply matches the characters until it finds a char that cannot be part of a name. It then looks it up in a list of special strings. If it finds it, it returns it, otherwise it just returns the name.

let stringTokens =
    Map.ofList [
        "Def", Def
    ]
let matchString chars =
    let value, remainingChars = matchSeriesOfChars isNameChar chars
    let specialString = Map.tryFind value stringTokens
    if specialString.IsSome
        then specialString.Value, remainingChars
        else Name(value), remainingChars

And we are done with the lexer, all of 100+ lines of it …

August 26, 2011

Adventure in parserland – parsing lambda expressions in F# – Part II

Filed under: F# — lucabol @ 2:25 pm

The parser starts simple with the following two functions to parse either a string or a file. I use the XXXReaders because I want to lazy read character by character.

let parseString s =
    let reader = new StringReader(s)
    parseTextReader reader

let parseFile fileName =
    let reader = new StreamReader(fileName: string)
    parseTextReader reader

The whole parser is in the following two lines:

let parseTextReader: TextReader -> seq<Expression> =
                    textReaderToLazyList >> tokenStream >> parseExpressions

I need to specify the signature otherwise the compiler gets confused : wait, does it take a StringReader or a StreamReader? You better tell me!

The function is a composite of three functions applied in sequence:

  1. Translate a TextReader to a LazyList<char>
  2. Translate a LazyList<char> to a LazyList<Token> (lexer)
  3. Translate a LazyList<Token> to a LazyList<Expression> (parser)

My usage of LazyList as the workhorse for the program is because I want to match on the head of the stream of chars/tokens in a lazy way.

I love it when a program naturally decomposes in such simple understandable pieces. I impute some of that to functional programming. For one reason or another, in my 15+ years of object oriented programming, I’ve rarely got to the core of a problem with such immediacy.

A sequence of operations likes the above would be lost in a protected overridden implementation of a base class somewhere (or something else equally long to pronounce). The beauty would be lost somewhere in the vast machinery required to support it.

In any case, TextReaderToLazyList is a trivial generator function that uses the unfold function of LazyList to read a character at the time.

let textReaderToLazyList textReader = LazyList.unfold (fun (ts:TextReader) ->
    let ch = ts.Read()
    if ch = -1 then None else Some(char ch, ts)) textReader

The next step is to look at either the lexer, going bottom up, or the parser, going top down.

August 19, 2011

Adventure in parserland – parsing lambda expressions in F# – Part I

Filed under: F# — lucabol @ 6:53 am

This is part of my ‘things that I do in the empty spaces between one meeting and the next one, which might end up being vaguely interesting’. It is a lambda expression parser.

The full source code is here.

I actually have two versions of it: one written longhand and the other one written with FParsec. Just to be clear: I’m no expert of either.

And just to be more clear: I think writing most parsers longhand in the way I am about to show is crazy. You either use FParsec or  fslex / fsyacc.

I have a strong distaste for additional compilation steps. I think it lingers on from MFC project types of 15/20 years ago. I was one of these crazy folks that would generate the project, wrap the generated code (with some generalizations) in my own library and use that one from then on.

So I prefer FParsec. I’m ok rewriting left recursive rules and its performance has never been a problem for me. Here is a table that compares the different approaches.

But I started wondering about coding a a recursive descent parser for a simple grammar by hand, fully knowing the foolishness of the idea. Thanks to Jose for code reviewing it.

The inspiration for the grammar comes from this book.

    (*
        <expression> ::= <name> | <function> | <application>
        <name> ::= non­blank character sequence
        <function> ::= \ <name> . <body>
        <body> ::= <expression>
        <application> ::= ( <function expression> <argument expression> )
        <function expression> ::= <expression>
        <argument expression> ::= <expression>
    *)

In English, an expression is either a name, a function or an application. A name is a bunch of characters (better defined in the code). A function is ‘\’, a name, ‘.’ and an expression. An application is ‘(‘, an expression, whitespaces, an expression and ‘)’.

Some testcases for the above grammar and the parsers written to parse it are below. It should be intuitive what this code does just by the name of the functions. Even it isn’t, check that the expressions symbol contains valid productions from the grammar above.

module Test

open Microsoft.FSharp.Collections
open Xunit

open LambdaEngine
open Parser
open Lexer
open FParser

let writeTokenStream stream = Seq.fold (fun acc token -> acc + writeToken token) "" stream

let rec writeExpr = function
        | EName(s) -> s
        | Function(expr, body) -> writeToken Lambda + writeExpr expr + writeToken Dot + writeExpr body
        | Application(funExpr, argExpr) -> writeToken OpenParens + writeExpr funExpr + writeToken (Ws(" "))
                                            + writeExpr argExpr + writeToken CloseParens
        | EOT -> ""

let tokenStreams = [
    ""
    "(\xs.xs \y.(y \x.y))"
    "(\xst.xst \y.(y  \x.y))"
    " "
    "x"
    "(x y)"
    ]

let expressions = [
    ""
    "(\x.x \y.(y \x.y))"
    "x"
    "(x y)"
    ]

let stringToCharList s =
    let textReader = new System.IO.StringReader(s)
    textReaderToLazyList textReader

[<Fact>]
let testTokenizer () =
    let testTokenStream s =
        let stream = tokenStream <| stringToCharList s
        let s1 = writeTokenStream stream
        Assert.Equal(s, s1)
    tokenStreams |> List.iter testTokenStream

let testExpr parseFunction s =
    let exprs = parseFunction s
    let s1 = exprs |> Seq.fold (fun s expr -> s + writeExpr expr) ""
    Assert.Equal(s, s1)

[<Fact>]
let testParser () = expressions |> List.iter (testExpr parseString)

[<Fact>]
let testFParser () = expressions |> List.iter (testExpr fparseString)

In the next instalment, we’ll start looking at the real code for the parser.

August 12, 2011

Write Yourself a Scheme in 48 Hours in F# – Part VII

Filed under: F# — lucabol @ 7:11 am

Let’s talk about the environment now.  This is the part of the interpreter that I like the least. It is a global variable and it contains a list of  (string, LispVal) where the LispVal is mutable.

type Env = (string * LispVal ref) list ref

This is pretty bad. First of all, it immediately cuts off any option of running interpreters in different threads. Moreover, it makes a lot of functions in the evaluator to have side effects. That makes it much harder to reason about them.

In a world where I am provided with infinite time and energy, I would change it. In this world, I won’t. If you try your hand at doing it, make sure that you pass all the testcases before declaring victory. The scope rules of Scheme are not all that obvious. A code reviewer called them the Italian scoping rules because he thought I got them wrong …

In any case, there isn’t much to the symbol table management.  You can create an empty one:

let nullEnv (): Env = ref List.empty

Check if a variable is bound:

let keyEq name (k, _) = name = k

let isBound var (env: Env) = !env |> List.exists (keyEq var)

Get a variable out:

let getVar var (env: Env) =
    let result = !env |> List.tryFind (keyEq var)
    match result with
    | None -> throw (UnboundVar("Getting an unbound variable: " , var))
    | Some(_, r) -> !r

Set the value of an existing variable:

let setVar var value (env:Env) =
    let result = !env |> List.tryFind (keyEq var)
    match result with
    | Some(_, v) -> v := value ; value
    | None -> throw (UnboundVar("Setting an unbound variable: " , var))

Or define a new variable in the environment. Note that if the variable already exist, its value gets set.

let define (env:Env) var value =
    let result = !env |> List.tryFind (keyEq var)
    match result with
    | Some(_, v) -> v := value ; value
    | None ->
        env := [var, ref value] @ !env; value
You can also bind a list of (string, LispVal) to the environment by prepending it to the existing ones:
let bindVars bindings (env:Env) =
   ref ((bindings |> List.map (fun (n, v) -> n , ref v)) @ !env)

Once you accept the evil of the global mutable variable scheme, these functions are easy enough.

The only piece left is error management. This is where my implementation differs from the Haskell version the most. In essence, I throw exception and catch them to report errors, while the Haskell version uses a monad to propagate the error information.

I have a LispError that represents everything that can go wrong:

type LispError =
    | NumArgs of int * LispVal list
    | TypeMismatch of string * LispVal
    | ParseError of string * FParsec.Error.ParserError
    | BadSpecialForm of string * LispVal
    | NotFunction of string * string
    | UnboundVar of string * string
    | Default of string
    | IOError of string

I wrap it in an exception:

exception LispException of LispError

This is what I throw in various places in the code.

let throw le = raise (LispException(le))

I then catch it at the outer layer:

let evalString env expr =
    try
        expr |> readExpr |> eval env
    with
    | LispException(error) -> String (showError error)

And display the error by using the below function:

let showError = function
    | NumArgs(expected, found) -> "Expected " + expected.ToString() + " args; found values " + unwordsList found
    | TypeMismatch(expected, found) -> "Invalid type: expected " + expected + ", found " + showVal found
    | ParseError(msg, _) -> "Parse Errror" + msg
    | BadSpecialForm(message, form) -> message + showVal form
    | NotFunction(message, func) -> message + func
    | UnboundVar(message, varName) -> message + varName
    | Default(message) -> message
    | IOError(message) -> message

And that’s all there is to it. I hope you guys and gals enjoyed this seven part extravagance. Cheers.

August 5, 2011

Write Yourself a Scheme in 48 Hours in F# – Part VI

Filed under: F# — lucabol @ 7:08 am

The evaluator takes as an input a LispVal. Where does it come from? There must be something that converts your textual input into it. That is the job of the parser.

I have used FParsec to build my parser. FParsec is a fantastic library to build parsers. It is a perfect showcase of the composition potential that functional code yields. 

When you write an FParsec parser you compose many little parsers to create the one parser that works for your language.  The resulting code looks very much like your language grammar, but you don’t need  a separate code generation compilation step to produce it.

There is one element of ugliness in the syntax to create recursive parsers. You need to define two global variables that can be referred to before they are constructed. This is an artefact of how F# works. So you need a line in your code that looks like this:

let parseExpr, parseExprRef : LispParser * LispParser ref = createParserForwardedToRef()

With that piece of machinery out of the way, we can focus on the parser itself. Our goal here is to parse expressions and generate LispVal. We need a LispParser like the below (the second generic parameter is for advanced usage).

type LispParser = Parser<LispVal, unit>

We need to parse all the kind of expressions that the user can type. Notice in the below the use of a computation expression to simplify the syntax. Also note that lists and dotted lists look very much the same until you encounter the ‘.’ character. You could disambiguate the situation by extracting out the commonality in a separate kind of expression. I decided instead to instruct the parser to backtrack if it gets it wrong (attempt). This is slower, but keeps the code identical to our conceptual model. I value that greatly.

do parseExprRef := parseAtom
                   <|> parseString
                   <|> parseNumber
                   <|> parseQuoted
                   <|> parse {
                           do! chr '('
                           let! x = (attempt parseList) <|> parseDottedList
                           do! chr ')'
                           return x
                       }

Let’s start from the top. Parsing an atom means parsing something that starts with a letter or symbol and continues with letters, symbols or digits. Also “#t” and “#f” can be resolved at parsing time.

let parseAtom : LispParser = parse {
        let! first = letter <|> symbol
        let! rest = manyChars (letter <|> symbol <|> digit)
        return match first.ToString() + rest with
               | "#t" -> Bool true
               | "#f" -> Bool false
               | atom -> Atom atom
}

A string is just a bunch of chars (except ‘\’) surrounded by ‘ ” ’.

let parseString : LispParser = parse {
    do! chr '"'
    let! xs = manyChars (noneOf "\"")
    do! chr '"'
    return String(xs)
}

A number is just one or more digits. I am afraid we just support integers at this stage …

let parseNumber : LispParser = many1Chars digit |>> (System.Int32.Parse >> Number)

A quoted expression is jut a ‘\’ followed by an expression.

let parseQuoted : LispParser = chr '\'' >>. parseExpr |>> fun expr -> List [Atom "quote"; expr] 

A list is just a bunch of expressions separate by at least one space.

let parseList : LispParser = sepBy parseExpr spaces1 |>> List

A dotted list starts in the same way (hence the backtracking above), but then has a dot, one or more spaces and an expression.

let parseDottedList : LispParser = parse {
    let! head = endBy parseExpr spaces1
    let! tail = chr '.' >>. spaces1 >>. parseExpr
    return DottedList (head, tail)
}

And here are a bunch of functions used throughout the code, presented here for completeness.

    let spaces1 : LispParser<unit> = skipMany1 whitespace
    let chr c = skipChar c
    let endBy  p sep = many  (p .>> sep)

    let symbol : LispParser<char> = anyOf "!$%&|*+-/:<=>?@^_~#"

This is all the code you need to translate text to a LispVal to feed the evaluator. That is pretty impressive.

There is also a function to go the other way, from a LispVal to text. It is used in implementing the testcases and to print out diagnostics.

    let rec showVal = function
        | String contents -> "\"" + contents + "\""
        | Atom name -> name
        | Number num -> num.ToString()
        | Bool t -> if t then "#t" else "#f"
        | List l -> "(" + unwordsList l + ")"
        | DottedList (head, tail) -> "(" + unwordsList head + " . " + showVal tail + ")"
        | PrimitiveFunc(_) -> "<primitive>"
        | Port (_) -> "<IO port>"
        | Func({ parms = parms; varargs = varargs; body = body; closure = closure }) ->
                                                "(lambda (" + unwordsList (parms |> List.map (String)) +
                                                    (match varargs with
                                                        | None -> ""
                                                        | Some(arg) -> " . " + arg) + ") ...)"

    and
        unwordsList = List.map showVal >> String.concat " "

July 29, 2011

Write Yourself a Scheme in 48 Hours in F# – Part V

Filed under: F# — lucabol @ 7:26 am

We have one loose end to tie in the evaluator: the primitive operators. These are things that the interpreter knows intrinsically. There is a list of them below.

let rec primitives =
     [
        "+",    numericBinop (+)
        "-",    numericBinop (-)
        "*",    numericBinop (*)
        "/",    numericBinop (/)
        "mod",  numericBinop (%)
        "=",    numBoolBinop (=)
        "<",    numBoolBinop (<)
        ">",    numBoolBinop (>)
        "/=",   numBoolBinop (<>)
        ">=",   numBoolBinop (>=)
        "<=",   numBoolBinop (<=)
        "&&",   boolBoolBinop (&&)
        "||",   boolBoolBinop (||)
        "string=?",     strBoolBinop (=)
        "string>?",      strBoolBinop (>)
        "string<?",      strBoolBinop (<)
        "string<=?",    strBoolBinop (<=)
        "string>=?",    strBoolBinop (>=)
        "car",  car
        "cdr",  cdr
        "cons", cons
        "eq?", eqv
        "eqv?", eqv
        "equal?", equal

        // IO primitives
        "apply", applyProc
        "open-input-file", makePort FileAccess.Read
        "open-output-file", makePort FileAccess.Write
        "close-input-port", closePort
        "close-output-port", closePort
        "read", readProc
        "write", writeProc
        "read-contents", readContents
        "read-all", readAll
     ]

Having seen the above list, it now becomes clearer why the primitiveBindings function was defined as such. It just binds these pairs into the environment.

let primitiveBindings () =
    (nullEnv ()) |> bindVars [ for v, f in primitives -> v, PrimitiveFunc f ] 

numericBinop unpacks the numbers, applies the provided operator and packs the result back in the Number.

let numericBinop op parms =
    if List.length parms < 2
        then throw <| NumArgs(2, parms)
        else parms |> List.map unpackNum |> foldl1 op |> Number

While we are at it, we can define fold1 (it tends to be  useful)

let foldl1 op = function
    | h::t -> List.fold op h t
    | [] -> throw (Default("Expected a not empty list, got an empty list"))

The other XBinops work similarly …

let boolBinop unpacker op args =
    match args with
    | [ left; right ] -> Bool (op (unpacker left) (unpacker right))
    | _ -> throw (NumArgs(2,args))

let numBoolBinop = boolBinop unpackNum
let strBoolBinop = boolBinop unpackStr
let boolBoolBinop = boolBinop unpackBool

We now have to look at the family of unpackers. They all work rather similarly. Notice Scheme making an effort to get a number out of a string and to get anything out of a list. Strong type folks won’t like that. Oh well, just remove these lines …

let rec unpackNum = function
    | Number n  -> n
    | String n  -> let success, result = System.Int32.TryParse n
                   if success
                       then result
                       else throw (TypeMismatch("number", String n))
    | List [n]  -> unpackNum n
    | notNumber -> throw (TypeMismatch("number", notNumber))

let rec unpackStr = function
    | String s -> s
    | Number n -> n.ToString()
    | Bool b   -> b.ToString()
    | List [s]  -> unpackStr s
    | noString -> throw (TypeMismatch("string", noString))

let rec unpackBool = function
    | Bool b -> b
    | List [b]  -> unpackBool b
    | noBool -> throw (TypeMismatch("boolean", noBool))

Now back to the list of primitive operators, there are the signature LISP operators car, cdr and cons. Just understanding the first line for each function should be enough to get an idea of what they do.

let car = function
    | [List (x :: _)] -> x
    | [DottedList (x :: _, _)] -> x
    | [badArg] -> throw (TypeMismatch("pair", badArg))
    | badArgList -> throw (NumArgs(1, badArgList))

let cdr = function
    | [List (x :: xs)] -> List xs
    | [DottedList ([xs], x)] -> x
    | [DottedList ((_ :: xs), x)] -> DottedList (xs, x)
    | [badArg] -> throw (TypeMismatch("pair", badArg))
    | badArgList -> throw (NumArgs(1, badArgList))

let cons = function
    | [x; List xs] -> List (x :: xs)
    | [x; DottedList (xs, xlast)] -> DottedList (x :: xs, xlast)
    | [x1; x2] -> DottedList([x1], x2)
    | badArgList -> throw (NumArgs(2, badArgList))

We then need to work our way to implement eqv (aka eq? in Scheme). We first define a function that tests that two LispVal are the same. It should be pretty self explanatory (the list piece is kind of cute).

    let rec eqvPrim e1 e2 =
        match e1, e2 with
        | (Bool b1, Bool b2) -> b1 = b2
        | (Number n1, Number n2) -> n1 = n2
        | (String s1, String s2) -> s1 = s2
        | (Atom a1, Atom a2) -> a1 = a2
        | (DottedList (xs, x), DottedList(ys, y)) -> eqvPrim (List (xs @ [x])) (List (ys @ [y]))
        | (List l1, List l2) -> l1.Length = l2.Length && List.forall2 eqvPrim l1 l2
        | _ -> false

Now we wrap the result in a Bool. Doing it this way avoid repeating the wrapping in each single line of eqvPrim (thanks to Tobias for spotting this refactoring).

let eqv = function
          | [e1; e2] -> Bool (eqvPrim e1 e2)
          | badArgList -> throw (NumArgs (2, badArgList))

Equal? checks if there is any unpacking scheme that can be used to test equality of the two elements of a two element list.

let equal = function
    | [arg1; arg2] ->
        let unpackEqual = numUnpackEq arg1 arg2 ||
                          strUnpackEq arg1 arg2 ||
                          boolUnpackEq arg1 arg2
        Bool (eqvPrim arg1 arg2 || unpackEqual)
    | argsList -> throw (NumArgs(2, argsList))

We need to define equality of packed primitive types. We do it nicely below.

let tryUnpacker (unpack : LispVal -> 'a) (op : 'a -> 'a -> bool) arg1 arg2 =
    try op (unpack arg1) (unpack arg2) with _ -> false

let numUnpackEq = tryUnpacker unpackNum (=)
let strUnpackEq = tryUnpacker unpackStr (=)
let boolUnpackEq = tryUnpacker unpackBool (=)

The apply statement maps more or less directly to our apply function.

applyProc = function
            | [func; List args] -> apply func args
            | func :: args -> apply func args
            | [] -> throw (Default("Expecting a function, got an empty list"))

And we are left with the I/O processing functions. We are simply wrapping a FileStream in a Port.

    let makePort fileAccess = fileIOFunction (fun fileName ->
                                File.Open(fileName,FileMode.OpenOrCreate, fileAccess) |> Port)

    let closePort = function
                    | [Port(port)] -> port.Close() ; Bool true
                    | _ -> Bool false

We then can read and write from it. Notice how the lack of arguments makes us do it from the standard Console.

let rec readProc port =
    let parseReader (reader:TextReader) = reader.ReadLine() |> readExpr
    match port with
       | [] -> parseReader(System.Console.In)
       | [Port(port)] ->
            use reader = new StreamReader(port)
            parseReader (reader)
       | args -> throw (NumArgs(1, args))

let writeProc objPort =
    let write obj (writer: TextWriter) = writer.Write(showVal obj) ; Bool true
    match objPort with
    | [obj] -> write obj (System.Console.Out)
    | [obj ; Port(port)] ->
        use writer = new StreamWriter(port)
        write obj writer
    | args -> throw (NumArgs(1, args))

There you go. A full evaluator in two blog posts!! Next up, the parser.

July 22, 2011

Write Yourself a Scheme in 48 Hours in F# – Part IV

Filed under: F# — lucabol @ 7:52 am

It is the evaluator turn. It is a big file, let’s see if I can fit it in a single post.

Aptly enough, the most important function is called eval.

eval env = function
| String _ as v -> v
| Number _ as v -> v
| Bool _ as v -> v
| Atom var -> getVar var env
| List [Atom "quote"; v] -> v
| List [Atom "if"; pred; conseq; alt] -> evalIf env pred conseq alt
| List [Atom "load"; fileName] -> load [fileName] |> List.map (eval env) |> last
| List [Atom "set!" ; Atom var ; form] -> env |> setVar var (eval env form)
| List [Atom "define"; Atom var; form] -> define env var (eval env form)
| List (Atom "define" :: (List (Atom var :: parms) :: body)) ->
    makeNormalFunc env parms body |> define env var
| List (Atom "define" :: (DottedList ((Atom var :: parms), varargs) :: body)) ->
    makeVarargs varargs env parms body |> define env var
| List (Atom "lambda" :: (List parms :: body)) -> makeNormalFunc env parms body
| List (Atom "lambda" :: (DottedList(parms, varargs) :: body)) -> makeVarargs varargs env parms body
| List (Atom "lambda" :: ((Atom _) as varargs :: body)) -> makeVarargs varargs env [] body
| List (func :: args) ->
    let f = eval env func
    let argVals = List.map (eval env) args
    apply f argVals
| badForm -> throw (BadSpecialForm("Unrecognized special form", badForm))

This is the core of the evaluator. It takes as an input the LispVal generated by the parser and an environment and returns a LispVal that is the result of the reduction. As a side effect, it occasionally modify the environment. I carefully crafted the previous phrase to maximize the discomfort  of the functional programmers tuned in. Such fun :-)

More seriously (?), here is what it does:

  • If it is a String, Number of Bool, just return it
  • If it is an Atom, return its value
  • If it is a quote  statement, return what is quoted (read your little schemer manual)
  • If it is an if statement, evaluate it (see below)
  • If it is a load statement, load the file (see below) and evaluate each expression using the current environment. Return the last expression in the file
  • If it is a set!, set the value of the passed variable to the evaluated form
  • If it is a define, do almost the same as above (except that you don’t throw if the variable doesn’t exist, but you create it)
  • If it is a define  that defines a function (it has that shape), create a ‘function slot’ in the environment. That is a (functionName,  FuncRecord) pair (see below)
  • If it is a lambda, return the FuncRecord that describe the inline function
  • If it is a function call, evaluate the expression that describe the function (yes, you can do that in Lisp!!), evaluate the arguments, and apply the function to the arguments
  • Otherwise, it must be a bad form, throw it back to the calling function to do something meaningful with it

We have a bunch of ‘see below’ to take care of. We’ll look at them in order.

First the ‘if’ statement. If the evaluated predicate is Bool(True) evaluate the consequent, otherwise evaluate the alternative. For some reason, I wrote it the other way around.

and
    // 1a. If the evaluation of the pred is false evaluate alt, else evaluate cons
    evalIf env pred conseq alt =
        match eval env pred with
        | Bool(false) -> eval env alt
        | _ -> eval env conseq

Then there is the load function. It reads all the test and gets out the list of LispVal contained in it.

let load = fileIOFunction (fun fileName -> File.ReadAllText(fileName)
                                           |> readExprList)

ReadExprList is part of the parser. We’ll look at it then. Sufficient here to say that it takes a string and returns a list of LispVal.

FileIOFunction just encapsulates a common pattern in all the file access functions. I don’t like such mechanical factorization of methods, without any real reusability outside the immediate surroundings of the code. But I like repetition even less.

let fileIOFunction func = function
    | [String fileName] -> func (fileName)
    | [] -> throw (IOError("No file name"))
    | args -> throw (NumArgs(1, args))

A family of functions create FuncRecord given appropriate parameters. I seem to have lost memory of the contortions related to the last one. If I end up having to work again on this code, I’ll need to figure it out again. Note to myself, please comment this kind of code next time.

let makeFunc varargs env parms body =
            Func ({parms = (List.map showVal parms); varargs = varargs; body = body; closure = env})
let makeNormalFunc = makeFunc None
let makeVarargs = showVal >> Some >> makeFunc

apply is the other workhorse function in the evaluator.  The best way to understand it is to start from the bottom (where bindVars starts the line). We are binding the arguments and the variable arguments in the closure that has been passed in. We then evaluate the body. But the body is just a list of LispVal, so we just need to evaluate them in sequence and return the result of the last one.

and apply func args =
    match func with
    | PrimitiveFunc(f) -> f args
    | Func ({parms = parms; varargs = varargs; body = body; closure = closure}) ->
        let invalidNonVarargs = args.Length <> parms.Length && varargs.IsNone
        let invalidVarargs = args.Length < parms.Length && varargs.IsSome  

        if invalidVarargs || invalidNonVarargs
        then
            throw (NumArgs(parms.Length, args))
        else
            let remainingArgs = args |> Seq.skip parms.Length |> Seq.toList
            let evalBody env = body |> List.map (eval env) |> last
            let rec zip xs1 xs2 acc =
                match xs1, xs2 with
                | x1::xs1, x2::xs2 -> zip xs1 xs2 ((x1, x2)::acc)
                | _ -> acc
            let bindVarArgs arg env =
                match arg with
                | Some(argName) -> bindVars [argName, (List remainingArgs)] env
                | None -> env
            bindVars (zip parms args []) closure
                |> bindVarArgs varargs
                |> evalBody
    | funcName -> throw (NotFunction("Expecting a function, getting ", showVal funcName))

This is enough for one post. Next time we’ll finish the evaluator.

July 15, 2011

Write Yourself a Scheme in 48 Hours in F# – Part III

Filed under: F# — lucabol @ 7:12 am

Very often my code ends up having the following form: parse an input to create an intermediate data structure and evaluate the structure to produce an output. Strangely, many years ago, when my code was object oriented, that wasn’t the case. Or at least I wasn’t explicitly aware of it.

When you write an interpreter or a compiler, things always work out like that, but I see the same pattern in almost everything I produce: from financial backtesting to chart libraries. Sometimes when, out of laziness or stupidity, I forego the intermediate structure, I end up in the end having to retrofit it in. Simply processing input and generating output at the same time rarely cuts it. But it is tempting because you get going pretty fast and I’m tricked into it occasionally.

Hence the first thing that I find myself reasoning about is often the particular form of such intermediate structure. In this case it looks like the following:

type Env = (string * LispVal ref) list ref

and FuncRecord = { parms: string list; varargs: string option; body: LispVal list; closure: Env}

and LispVal =
    | Atom of string
    | List of LispVal list
    | DottedList of LispVal list * LispVal
    | Number of int
    | String of string
    | Bool of bool
    | PrimitiveFunc of (LispVal list -> LispVal)
    | Func of FuncRecord
    | Port of System.IO.FileStream

This LispVal structure has one constructor for each kind of expression (production) that is allowed in Scheme. Or at least that ones I support …

It is important that each one stores all the information that is necessary for the evaluator to evaluate the expression. No more, no less. Here is a brief description:

  • Atom: it is a kind of a constant in Scheme. This is probably the worst definition ever given for it. Please read about it in your little schemer book.
  • List: is the main Scheme type. It represents a list of expressions.
  • DottedList: this is the bizarre Scheme way to pass optional parameters
  • Number: is a number :-) You will discover which kind of number when we talk about the parser
  • String : is a string
  • Bool: #t, #f
  • PrimitiveFunc: is the representation for the primitive operators/functions that are burned into the interpreter. It is just a function that takes a list of LispVal and returns a LispVal
  • Func: is a user defined function. Notice that the body of it is simply a list of LispVal. This is why LISP is so powerful. Also notice that a closure gets passed to it for the ‘captured’ variables.
  • Port: is a slightly goofy representation of an in/out stream
  • Anything else (i.e. macros) is not supported, but this would be the first piece to change if they were.

The only remaining code to address is:

type Env = (string * LispVal ref) list ref

This is the symbol table and it is ugly. It is not multithread safe either. But it works and it is close enough to the Haskell version so I decided to retain it. A proper code review would ‘strongly suggest’ rewriting the code to pass it around to each function instead of using ‘ref’ or using the state monad encapsulated in a computation expression. Any of these solutions is left as an exercise to the reader (use the testcases to validate that you get it right).

We could go in many different direction from here. We could talk about:

  • The evaluator
  • The parser
  • The symbol table
  • Error handling

To keep with the top – down approach I’ve been espousing. I’ll probably talk about the evaluator next.

Older Posts »

Theme: WordPress Classic. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.