Adventure in parserland – parsing lambda expressions in F# – Part III

Let’s start from the lexer. Remember, I wrote this code based on my memory of how a lexer ought to look like. I didn’t read again the relevant chapters in the Dragon book. But I think it came out all right after all.

The tokenStream function we looked at last time takes a LazyList<char> and returns a LazyList<Token>. It uses the unfold method on LazyList to call matchToken on each char until the stream is empty.

let rec tokenStream chars = 
    LazyList.unfold
        (fun chList ->
            match chList with
            | LazyList.Nil -> None
            | chList ->  
                let token, chList' = matchToken chList
                Some(token, chList')
        )
        chars 

A token is what gets passed up to the parser to do syntactic analysis on. It is the vocabulary of our language. The lexer divide a phrase in words, the parser put together the words in a phrase. So, these are the words.

type Token =
    | Name of string
    | Dot
    | OpenParens
    | CloseParens
    | Lambda
    | Def
    | Ws of string
    | NewLine
    | EOF

Matching is a process whereas you try to return the token that you have read plus the list of characters yet to be read. Matching a Token is defined below:

let matchToken = function 
    | LazyList.Nil                 -> EOF, LazyList.empty 
    | LazyList.Cons(h, t) as chars ->
        match h with
        | ch when isWs ch -> matchWs chars
        | ch when isSpecialChar ch -> matchSpecialChar ch t
        | _ -> matchString chars

A token is either nothing, a whitespace, a special char or anything else.

Let’s look at what matching each one of them means.  Matching whitespaces means consuming them and remembering what was consumed.

let matchWs chars = 
    let value, remainingChars = matchSeriesOfChars isWs chars
    Ws value, remainingChars

matchSeriesOfChars takes a predicate and a LazyList of chars and returns the string composed of all the consecutive chars for which the predicate is true, plus, as always, the remaining chars to be matched. In this case the predicate returns true if the char is a whitespace.

To write matchSeriesOfChars I need a function that reverses a LazyList. Not having found such thing, I wrote it.

let reversell l =
    let rec go l' a = match l', a with
                        | LazyList.Nil, a -> a
                        | LazyList.Cons(h, t), a -> go t (LazyList.cons h a)
    go l LazyList.empty

Then I wrote matchSeriesOfChars. The function uses an accumulator. It adds to the front whenever the predicate is true, it reverses it and translates it to a string (I could have reversed the string instead, it might have been better).

let matchSeriesOfChars comparer chars =
    let rec go result = function
        | LazyList.Nil    -> charListToString(reversell result), LazyList.empty 
        | LazyList.Cons(h, t) -> if comparer h then go (LazyList.cons h result) t
                                 else charListToString (reversell result), LazyList.cons h t
    go LazyList.empty chars

These are  predicates we’ll use later on to recognize characters:

let isInString (ch: char) (s: string) = s.IndexOf(ch) <> -1
let isWs (chr: char) = isInString chr wsChars
let isNameChar (chr: char) = not (isInString chr (wsChars + specialChars))
let isSpecialChar ch = isInString ch specialChars

wsChar and specialChars are defined below:

    let wsChars = " \t"
    let charTokens =
        Map.ofList [
            '.' , Dot
            '(' , OpenParens
            ')' , CloseParens
            '\\', Lambda
            '\n', NewLine
         ]
let specialChars = charTokens |> Map.fold (fun s k v -> s + k.ToString()) ""

Getting back to the more important matching functions, matching a special character is defined as a simple lookup in the charToken map:

let matchSpecialChar ch chars = Map.find ch charTokens, chars

We are left with matchString, this simply matches the characters until it finds a char that cannot be part of a name. It then looks it up in a list of special strings. If it finds it, it returns it, otherwise it just returns the name.

let stringTokens =
    Map.ofList [
        "Def", Def    
    ]
let matchString chars =
    let value, remainingChars = matchSeriesOfChars isNameChar chars
    let specialString = Map.tryFind value stringTokens
    if specialString.IsSome
        then specialString.Value, remainingChars
        else Name(value), remainingChars

And we are done with the lexer, all of 100+ lines of it …

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s