Exceptions vs. Return Values to represent errors (in F#) – III–The Critical monad

Code for this post is here.

In the last post we looked at some Critical code and decided that, albeit correct, it is convoluted. The error management path obfuscates the underlying logic. Also we have no way of knowing if a developer had thought about the error path or not when invoking a function.

Let’s tackle the latter concern first as it is easier. We want the developer to declaratively tag each method call with something that represents his intent about managing the Contingencies or Faults of the function.  Moreover if the function has contingencies, we want to force the developer to manage them explicitly.

We cannot use attributes for this as function calls happen in the middle of the code, so there is no place to stick attributes into. So we are going to use higher level functions to wrap the function calls.

The first case is easy. If the developer thinks that the caller of his code has no way to recover from all the exceptions thrown by a function, he can prepend his function call with the ‘fault’ word as in:

fault parseUser userText

That signals readers of the code that the developer is willing to propagate up all the exceptions thrown by the function parseUser. Embarrassingly, ‘fault’ is implemented as:

let fault f = f

So it is just a tag. Things get trickier when the function has contingencies. We want to find a way to manage them without introducing undue complexity in the code.

We’d like to catch some exceptions thrown by the function and convert them to return values and then either return such return values or manage the contingency immediately after the function call. On top of that, we’d want all of the code written after the function call to appear as clean as if no error management were taking place. Monads (computation values) can be used to achieve these goals.

Last time we introduced a type to represent error return values:

type Result<'a, 'b> =
| Success of 'a
| Failure of 'b
type UserFetchError =
| UserNotFound  of exn
| NotAuthorized of int * exn 

We can then create a computation expression that ‘abstracts out’ the Failure case and let you write the code as cleanly as if you were not handling errors. Let’s call such thing ‘critical’. Here is how the final code looks like:

let tryFetchUser3 userName =
    if String.IsNullOrEmpty userName then invalidArg "userName" "userName cannot be null/empty"

    critical {
        let Unauthorized (ex:exn) = NotAuthorized (ex.Message.Length, ex)
        let! userText = contingent1
                            [FileNotFoundException()        :> exn, UserNotFound;
                             UnauthorizedAccessException()  :> exn, Unauthorized]
                            dbQuery (userName + ".user")
        
        return fault parseUser userText
    }

You can compare this with the code you would have to write without the ‘critical’ library (from last post):

let tryFetchUser1 userName =
    if String.IsNullOrEmpty userName then invalidArg "userName" "userName cannot be null/empty"

    // Could check for file existence in this case, but often not (i.e. db)
    let userResult =    try
                            Success(dbQuery(userName + ".user"))
                        with
                        | :? FileNotFoundException as ex        -> Failure(UserNotFound ex)
                        | :? UnauthorizedAccessException as ex  -> Failure(NotAuthorized(2, ex))
                        | ex                                    -> reraise ()
    
    match userResult with
    | Success(userText) -> 
        let user        = Success(parseUser(userText))
        user
    | Failure(ex)       -> Failure(ex)

And with the original (not critical) function:

let fetchUser userName =
    let userText            = dbQuery (userName + ".user")
    let user                = parseUser(userText)
    user

Let’s go step by step and see how it works. First of all, you need to enclose the Critical parts of your code (perhaps your whole program) in a ‘critical’ computation:

    critical {
}

This allows you to call functions that return a Result and manage the return result as if it were the successful result. If an error were generated, it would be returned instead. We will show how to manage contingencies immediately after the function call later.

The above is illustrated by the following:

        let! userText = contingent1
                            [FileNotFoundException()        :> exn, UserNotFound;
                             UnauthorizedAccessException()  :> exn, Unauthorized]
                            dbQuery (userName + ".user")

Here ‘contingent1’ is a function that returns a Result, but userText has type string. The Critical monad, and in particular the usage of ‘let!’ is what allows the magic to happen.

‘contingentN’ is a function that you call when you want to manage certain exceptions thrown by a function as contingencies. The N part represents how many parameters the function takes.

The first parameter to ‘contingent1’ is a list of pairs (Exception, ErrorReturnConstructor). That means: when an exception of type Exception is thrown, return the result of calling ‘ErrorReturnConstructor(Exception)’ wrapped inside a ‘Failure’ object. The second parameter to ‘contingent1’ is the function to invoke and the third is the argument to pass to it.

Conceptually, ‘ContingentN’ is a tag that says: if the function throws one of these exceptions, wrap them in these return values and propagate all the other exceptions. Notice that Unauthorized takes an integer and an exception as parameters while the ErrorReturnConstructor takes just an exception. So we need to add this line of code:

        let Unauthorized (ex:exn) = NotAuthorized (ex.Message.Length, ex) 

After the contingent1 call, we can then write code as if the function returned a normal string:

        return fault parseUser userText

This achieves that we set up to do at the start of the series:

  • Contingencies are now explicit in the signature of tryFetchUser3
  • The developer needs to indicate for each function call how he intend to manage contingencies and faults
  • The code is only slightly more complex than the non-critical one

You can also decide to manage your contingencies immediately after calling a function. Perhaps there is a way to recover from the problem. For example, if the user is not in the database, you might want to add a standard one:

let createAndReturnUser userName = critical { return {Name = userName; Age = 43}}
let tryFetchUser4 userName = if String.IsNullOrEmpty userName then invalidArg "userName" "userName cannot be null/empty" critical { let Unauthorized (ex:exn) = NotAuthorized (ex.Message.Length, ex) // depends on ex let userFound = contingent1 [FileNotFoundException() :> exn, UserNotFound; UnauthorizedAccessException() :> exn, Unauthorized] dbQuery (userName + ".user") match userFound with | Success(userText) -> return fault parseUser userText | Failure(UserNotFound(_)) -> return! createAndReturnUser(userName) | Failure(x) -> return! Failure(x) }

The only difference in this case is the usage of ‘let’ instead of ‘let!’. This exposes the real return type of the function allowing you to pattern match against it.

Sometimes a simple exception to return value mapping might not be enough and you want more control on which exceptions to catch and how to convert them to return values. In such cases you can use contingentGen:

let tryFetchUser2 userName =
    if String.IsNullOrEmpty userName then invalidArg "userName" "userName cannot be null/empty"

    critical {
        let! userText = contingentGen
                            (fun ex -> ex :? FileNotFoundException || ex :? UnauthorizedAccessException)
                            (fun ex ->
                                match ex with
                                       | :? FileNotFoundException       -> UserNotFound(ex)
                                       | :? UnauthorizedAccessException -> NotAuthorized(3, ex)
                                       | _ -> raise ex)
                            (fun _ -> dbQuery (userName + ".user"))
        
        return fault parseUser userText
    }

The first parameter is a lambda describing when to catch an exception. The second lambda translate between exceptions and return values. The third lambda represents which function to call.

Sometimes you might want to catch all the exceptions that a function might throw and convert them to a single return value:

type GenericError = GenericError of exn

 // 1. Wrapper that prevents exceptions for escaping the method by wrapping them in a generic critical result
let tryFetchUserNoThrow userName =
    if String.IsNullOrEmpty userName then invalidArg "userName" "userName cannot be null/empty"

    critical {
        let! userText = neverThrow1 GenericError dbQuery (userName + ".user")
        
        return fault parseUser userText
    }

And sometimes you might want to go the opposite way. Given a function that exposes some contingencies, you want to translate them to faults because you don’t know how to recover from them.

let operateOnExistingUser userName =
    let user = alwaysThrow GenericException tryFetchUserNoThrow userName
    ()

Next time we’ll look at how the Critical computation expression is implemented.

Exceptions vs. Return Values to represent errors (in F#) – II– An example problem

In the previous post, we talked about the difference between Critical and Normal code. In this post we are going to talk about the Critical code part. Ideally, we want:

  • A way to indicate that a particular piece of code (potentially the whole program) is Critical
  • A way to force/encourage the programmer to make an explicit decision on the call site of a function on how he wants to manage the error conditions (both contingencies and faults)
  • A way to force/encourage the programmer to expose contingencies/faults that are appropriate for the conceptual level of the function the code is in (aka don’t expose implementation details for the function,  i.e. don’t throw SQLException from a getUser method where the caller is supposed to catch it)

Remember that I can use the word ‘force’ here because the programmer has already taken the decision to analyse each line of code for error conditions. As we discussed in the previous post, In many/most cases, such level of scrutiny is unwarranted.

Let’s use the below scenario to unravel the design:

type User = {Name:string; Age:int}
let fetchUser userName =
    let userText            = dbQuery (userName + ".user")
    let user                = parseUser(userText)
    user

This looks like a very reasonable .NET function and it is indeed reasonable in Normal code, but not in Critical code. Note that the caller likely needs to handle the user-not-in-repository case because there is no way for the caller to check such condition beforehand without incurring the performance cost of two network roundtrips.

Albeit the beauty and simplicity, there are issues with this function in a Critical context:

  • The function throws implementation related exceptions, breaking encapsulation when the user needs to catch them
  • It is not clear from the code if the developer thought about error management (do you think he did?)
  • Preconditions are not checked, what about empty or null strings?

To test our design let’s define a fake dbQuery:

let dbQuery     = function
    | "parseError.user"     -> "parseError"
    | "notFound.user"       -> raise (FileNotFoundException())
    | "notAuthorized.user"  -> raise (UnauthorizedAccessException())
    | "unknown.user"        -> failwith "Unknown error reading the file"
    | _                     -> "FoundUser"

The first two exceptions are contingencies, the caller of fetchUser is supposed to manage them. The unknown.user exception is a fault in the implementation. parseError triggers a problem in the parseUser function.

ParseUser looks like this:

let parseUser   = function
    | "parseError"          -> failwith "Error parsing the user text"
    | u                     -> {Name = u; Age = 43}

Let’s now create a test function to test the different versions of fetchUser that we are going to create:

let test fetchUser =
    let p x                 = try printfn "%A" (fetchUser x) with ex -> printfn "%A %s" (ex.GetType()) ex.Message
    p "found"
    p "notFound"
    p "notAuthorized"
    p "parseError"
    p "unknown"

Running the function exposes the problems described above. From the point of view of the caller, there is no way to know what to expect by just inspecting the signature of the function. There is no differentiation between contingencies and faults. The only way to achieve that is to catch some implementation-specific exceptions.

How would we translate this to Critical code?

First, we would define a type to represent the result of a function:

type Result<'a, 'b> =
| Success of 'a
| Failure of 'b

This is called the Either type, but the names have been customized to represent this scenario. We then need to define which kind of contingencies our function could return.

type UserFetchError =
| UserNotFound  of exn
| NotAuthorized of int * exn

So we assume that the caller can manage the fact that the user is not found or not authorized. This type contains an Exception member.  This is useful in cases where the caller doesn’t want to manage a contingency, but wants to treat it like a fault (for example when some Normal code is calling some Critical code).

In such cases, we don’t lose important debugging information. But we still don’t break encapsulation because the caller is not supposed to ‘catch’ a fault.

Notice that NotAuthorized contains an int member. This is to show that contingencies can carry some more information than just their type. For example, a caller could match on both the type and the additional data.

With that in place, let’s see how the previous function looks like:

let tryFetchUser1 userName =
    if String.IsNullOrEmpty userName then invalidArg "userName" "userName cannot be null/empty"

    // Could check for file existence in this case, but often not (i.e. db)
    let userResult =    try
                            Success(dbQuery(userName + ".user"))
                        with
                        | :? FileNotFoundException as ex        -> Failure(UserNotFound ex)
                        | :? UnauthorizedAccessException as ex  -> Failure(NotAuthorized(2, ex))
                        | ex                                    -> reraise ()
    
    match userResult with
    | Success(userText) -> 
        let user        = Success(parseUser(userText))
        user
    | Failure(ex)       -> Failure(ex)

Here is what changed:

  • Changed name to tryXXX to convey the fact that the method has contingencies
  • Added precondition test, which generates a fault
  • The signature of the function now conveys the contingencies that the user is supposed to know about

But still, there are problems:

  • The code became very long and convoluted obfuscating the success code path
  • Still, has the developer thought about the error conditions in parseUser and decided to let exceptions get out, or did she forget about it?

The return value crowd at this point is going to shout: “Get over it!! Your code doesn’t need to be elegant, it needs to be correct!”. But I disagree, obfuscating the success code path is a problem because it becomes harder to figure out if your business logic is correct. It is harder to know if you solved the problem you set out to solve in the first place.

In the next post we’ll see what we can do about keeping beauty and being correct.

Exceptions vs. Return Values to represent errors (in F#) – I – Conceptual view

Recently I’ve been reading numerous articles on the age old question of exceptions vs. return values. There is a vast literature on the topic with very passionate opinions on one side or the other. Below is my view on it.

First of all, I’ll define my terms.

  • Success code path: chunk of code that is responsible to perform the main task of a function, without any regard for error conditions
  • Contingency: an event that happens during the success code path execution that is of interest for the caller of the function.
    • It happens rarely
    • It can be described in terms that the caller understands, it doesn’t refer to the implementation of the function
    • The caller stands a chance to recover from it
  • Fault: an event that happens during the success code path execution that is not expected by the caller of the function.
    • It happens rarely
    • It cannot be described in terms that the caller understands, it requires reference to the implementation of the function
    • The caller has no way to recover from it

Examples of the above for a FetchUser(userName) function:

  • Success code path: the code that retrieves the user from the database
  • Contingency: the fact that the requested user is not in the database
  • Fault: userName = null, divide by zero, stack overflow, …

The difference between Contingency and Fault is not sharp in practice and requires common sense, but it is useful none less. When in doubt, it would appear prudent to consider an event as a Contingency, so that the caller gets a chance to recover.

Ideally, you would like a Contingency to be part of the signature of a function, so that the caller knows about it. On the other end, a Fault shouldn’t be part of the signature of a function for two reasons:

  • The user cannot recover from it
  • Being part of the signature would break encapsulation by exposing implementation details of the function

The above seems to suggest that Contingencies should be represented as return values and Faults as exceptions. As an aside, in Java the former is represented as checked exceptions, which is part of the signature. We’ll tackle checked exceptions later on.

An important point that is often neglected in the discussions on this topic is that there are two categories of applications: applications that care about Contingencies (Critical apps) and applications that don’t (Normal apps). I am of the opinion that the latter category is the largest.

In many cases you can indeed write just the success code path and, if anything goes wrong, you just clean up after yourself and exit. That is a perfectly reasonable thing to do for very many applications. You are trading off speed of development with stability.  Your application can be anywhere on that continuum.

Examples of Normal apps are: build scripts, utility applications, departmental applications where you can fix things quickly on the user machine, intranet web sites, internet web sites that are purely informative, etc …

Examples of Critical apps are: servers, databases, operating systems, web site that sell stuff,  etc …

For Normal apps, treating Contingencies as Fault is the right thing to do. You just slap a try … catch around your event loop/ thread/ process and you do your best to get the developer to fix the problem quickly. I think a lot of the angst of the ‘return value crowd’ is predicated on not having this distinction in mind. They are making very valid point regarding Critical apps to a crowd that is thinking about Normal apps. So the two sides are cross-talking.

Also, in my opinion, the main problem with Java checked exceptions is that they make writing Normal apps as cumbersome as writing Critical apps. So, reasonably, people complain.

The .NET framework decided to use Exceptions as the main way to convey both Faults and Contingencies. By doing so, it makes it easier to write Normal apps, but more difficult to write Critical apps.

For a Critical app or section of code, you’d want to:

  • Express contingencies in the function signature
  • Force/Encourage people to take an explicit decision at the calling site on how they want to manage both Contingencies and Faults for each function call

In the next post, let’s see how we can represent some of this in F#.

Retrieving SQL Server data with type providers and exposing it with ASP.NET Web APIs in F#

For a good introduction on how to use Web APIs in F#, read here. The starting point for type providers is here. This post is about how I solved a practical problem using these technologies.

First, let’s discuss the scenario. In my company, we needed to log usage information for our various applications to a central repository and build a web site to access such information. I went through three different architectures for such a requirement, ending with the set of technologies described in the title.

I found the intersection of the SQL type provider with the ASP.NET Web Api to be very sweet. Personally, I think this mix is much better than using the WFC Data Services, because the code you have to write and the infrastructure to maintain is significantly less.

I suspect that the F# team and ASP.Net team didn’t talk to each other. It all happens because of a well defined interface (IQueryable) that both teams happen to work against.

1st version, heavy Javascript

·         SQL Server Express backend

·         WFC Data Services middle tier in C# (autogenerated REST service from a table, you can query from the JS code)

·         Plenty of Javascript in the browser, calling back to the REST web services through Ajax

 

Here is an example of the kind of manipulation done in the JS layer:

 

function extractVersions(sessions) {

    var versions = {}

    $.each(sessions, function (i, s) {

        if (typeof versions[s.Version] == ‘undefined’) {

            versions[s.Version] = 0

        } else {

            versions[s.Version] = versions[s.Version] + 1

        }

    })

    var vs = []

    for (var prop in versions) {

        if (versions.hasOwnProperty(prop))

            vs.push({ Version: prop, Sessions: versions[prop] })

    }

    return vs;

}

 

Pros:

·         Back-end autogenerated

·         I’ve learned JS pretty well

Cons:

·         Plenty of autogenerated code (crap) to debug into when things go wrong

·         Difficult to customize the back-end

·         Somehow slow, even if today’s JS engines are pretty good

·         More time consuming to create and maintain JS code (compared to F# or C#)

·         Aesthetically unsatisfying, as you really would like to do most manipulations on the server

 

2nd version, SOAP like web services

·         SQL Server Express backend

·         SOAP like web services middle tier in C# (coded by hand), still using the WCF Data Services infrastructure

·         Little Javascript in the browser exclusively for presentation, calling back to the REST web services through Ajax

 

Here is how one of the web services would look like:

 

        [WebGet]

        public IEnumerable<VersionSessions> GetVersionsForMonth(string application,
                                                                bool isQS, bool isAD)

        {

            var sessions = GetSessionsForMonth(application, isQS, isAD);

 

            var hash = new Dictionary<string, VersionSessions>();

 

            foreach (var s in sessions)

            {

                VersionSessions vs;

                var found = hash.TryGetValue(s.Version, out vs);

                if (!found)

                    hash[s.Version] = new VersionSessions { Version = s.Version,
                                                                       Sessions = 0 };

                else

                    vs.Sessions += 1;

            }

            return hash.Values;

        }

 

Pros:

·         Maintainable business logic code (C# instead of JS)

·         Somehow faster

·         Easier to customize the back-end (just add another web service with the signature you want)

Cons:

·         Still plenty of manipulation code (at least now on the back-end), not very RESTy

·         Feels that the webservice are very ad-hoc for this particular presentation (i.e. a ViewModel, not a model)

·         Still not trivial to customize the back-end logic, mostly because I was using WCF Data Services, which are opaque

 

3rd version, using SQL Views and F# type providers

·         Moved the whole app to a dedicated VM

·         SQL Server backend with a significant layer of Views and inline table-value functions

·         Proper REST web services created through ASP.NET Web API and F# type providers

·         Little Javascript in the browser exclusively for presentation, calling back to the REST web services through Ajax

·         Moved all external ‘reference file’ (i.e. people in QS) to the DB. Before I was keeping business info in the database and config info in files.

 

Here is an example of an inline table value function:

 

ALTER FUNCTION [dbo].[FUsers] (@App nvarchar(40))

RETURNS TABLE

AS

RETURN

(

SELECT     UserAlias, MAX(Application) AS Application, MAX(Start) AS LastSessionStart, MAX(Finish) AS LastSessionFinish, DATEDIFF(MINUTE, MAX(Start), MAX(Finish))

                      AS LastSessionTotalTime, COUNT(*) AS Sessions, MAX(Machine) AS Machine, MAX(Version) AS Version, MAX(Qs) AS Qs

FROM         dbo.VTotalSessions AS s

WHERE     (TotalTime > 30) and (Application = @App)

GROUP BY UserAlias)

 

And here is an example of a type provider based web service. Most of my code works like this:

 

type usersController() =

    inherit ApiController()

 

    let db = dbSchema.GetDataContext()

   

    member x.Get() =

        query {

            for u in db.VUsers do

            select u

        }

    member x.Get(app) =

        query {

            for u in db.FUsers(app) do

            select u

        }

 

    member x.Post (value:string) = ()

    member x.Put (id:int) (value:string) = ()

    member x.Delete (id:int) = ()

 

Here is how a query is represented in JS (the whole query gets passed through to SQL, both the JS part and the F# part, IQueryable magick):

 

function loadOpenSessions(app, qs,  cont) {

    var query = new oQuery()

        .From("/users/" + app)

        .Let("lastMonth", lastMonth)

        .Let("twoMinutesAgo", twoMinutesAgo)

        .Where("item => item.LastSessionFinish > $twoMinutesAgo")

        .Orderby("UserAlias")

    commonCall(query, qs, cont)

 

}

 

Pros:

·         Purely functional business logic code (SQL), very easy to debug problems by just running the query/view

·         Maximum usage of SQL Server optimizer. As they say in the SQL Engine Team: “we’ve spent decades optimizing sql query engines. You are unlikely to do better with your ‘for’ loops …”

·         Very easy to customize the back-end, just write F# code to implement GET/POST/PUT/DELETE etc…

·         Moving all the state for the app (reference files included) in the DB, makes it much easier to integrate it into the business logic. It all becomes a big query for the optimizer to sort out.

·         No autogenerated code anywhere in the architecture

·         More separation between model and viewmodel. Tables are model, Views are ViewModel, F# is just a way to expose such view model to the world at large

Cons:

·       Routing algorithm for ASP.NET Web API is mysterious. It’s the only black box piece in the architecture (aka that I don’t understand it). 

·        Sometimes either SQL is not powerful enough (I abhor Stored procedures on religious grounds) or something doesn’t map well to a REST metaphor. In such cases I have diluted the metaphor as in the code below. The good thing is that, being just code, I can do that. I’m not blocked.

type statsController() =

    inherit ApiController()

 

    let db = dbSchema.GetDataContext()

 

    member x.Get(app:string, isQs:string, call:string) =

        match call with

        | "WeeklyUsers" ->

            let sessions = query {

                for s in db.VTotalSessionsAndWeeks do

                where (s.Application = app && s.Qs = Nullable (Int32.Parse(isQs)))

                select s

            }

            weeklyUsers sessions

        | "WeeklyTime"  ->

            let sessions = query {

                for s in db.FTotalTime(app, Nullable (Int32.Parse(isQs))) do

                select s

            }

            weeklyTime sessions

        | "HourlyUsers"->

            let users = query {

                for u in db.FHourlyUsers(app, Nullable (Int32.Parse(isQs))) do

                select u }

            hourlyUsers users

        | _ -> failwith "No call with that name"

Writing functional code in C++ V – Miscellaneous and conclusions

Just a couple of trivialities and my parting thoughts.

Nested functions

If your language has lambdas, you don’t need nested functions support because you can implement them using it.

I am a heavy user of nested functions, but I’m of two minds about it. On one side, I like that they sit close to where they are used, avoiding going outside the main function body to understand them. I also like that you don’t need to pass a lot of parameters to them, as they capture the function locals. On the other side, they end up creating the impression that your functions are very long and so, in my eyes, they occasionally reduce readability. The IDE helps you out there (at least VS 11) by allowing you to collapse the lambdas.

An example of trivial case is below:

BOOST_AUTO_TEST_CASE(NestedFunctions)
{
    int x = 3;
    auto sumX = [&] (int y) { return x + y;};

    BOOST_CHECK_EQUAL(sumX(2), 3+2);
}

And here is a more realistic one (not written in a functional style), where readability is reduced by the three nested functions (among many other things):

bool condor(
            boost::gregorian::date now,  // date to evaluate 
            double spot,                 // spot price underlying
            double v,                    // ATM volatility
            double r,                    // risk free rate
            double step,                 // % of spot price to keep as distance between wings
            int minCallShortDist,        // min distance from the short call strike in steps
            int minPutShortDist,         // min distance from the short put strike in steps
            int minDays,                 // min number of days to expiry
            int maxDays,                 // max number of days to expiry
            double maxDelta,             // max acceptable delta value for shorts in steps
            double minPremium,           // min accepted premium as % of step
            Condor& ret                  // return value
            )
{
    // convert params to dollar signs
    auto stepPr            = round(step * spot);
    auto toUSD             = [stepPr] (double x) { return round(stepPr * x);};
    auto minCpr            = toUSD( minCallShortDist );
    auto minPpr            = toUSD( minPutShortDist );
    auto premiumPr         = toUSD( minPremium );

    // calc strike values for short legs
    auto atm               = round(spot / stepPr) * (long) stepPr;
    auto callShort         = atm + minCpr;
    auto putShort          = atm - minPpr;
    
    auto addDays           = [](boost::gregorian::date d, int dys) -> boost::gregorian::date {
        using namespace boost::gregorian;

        auto toAdd         = days(dys);
        auto dTarget       = d + toAdd;

        return  dTarget;
    };

    // calc min & max allowed expiry dates
    auto minDate           = addDays(now, minDays);
    auto maxDate           = addDays(now, maxDays);
    auto expiry            = calcExpiry(now, 0);

    // find first good expiry
    while(expiry < minDate)
        expiry          = calcExpiry(expiry, +1);

    Greeks g;
    auto scholes           = [=, &g] (double strike, int days, bool CorP) {
        return blackScholesEuro(spot, strike, days, CorP, v, r, g, true);
    };

    // find a condor that works at this expiry
    auto findCondor        = [=, &g, &ret] (int days) -> bool {
        ret.shortCallStrike                = callShort;
        ret.shortPutStrike                 = putShort;
        auto shCallPremium                 = 0.0;
        auto shPutPremium                  = 0.0;

        // find short call strike price < maxDelta
        while(true) {
            shCallPremium                  = scholes(ret.shortCallStrike, days, true);
            if(g.delta <= maxDelta)
                break;
            else
                ret.shortCallStrike        += stepPr;
        }

        // find short put strike price < maxDelta
        while(true) {
            shPutPremium                   = scholes(ret.shortPutStrike, days, false);
            if( (- g.delta) <= maxDelta)
                break;
            else
                ret.shortPutStrike         -= stepPr;
        }

        // check premium is adeguate
        ret.longCallStrike                = ret.shortCallStrike + stepPr;
        ret.longPutStrike                 = ret.shortPutStrike  - stepPr;
        auto lgCall                       = scholes(ret.longCallStrike, days, true);
        auto lgPut                        = scholes(ret.longPutStrike,  days, false);
        ret.netPremium                    = shCallPremium + shPutPremium - lgCall - lgPut;

        return ret.netPremium > premiumPr;
    };

    // increases the expiry until it finds a condor or the expiry is too far out
    while (expiry < maxDate) {
        auto days        = (expiry - now).days();
        if(findCondor(days)) {
            ret.year     = expiry.year();
            ret.month    = expiry.month();
            ret.day      = expiry.day();
            return true;
        }
        expiry           = calcExpiry(expiry, +1);
    }

    return false;
}

That is quite a mouthful, isn’t it? But really the function is not that long. It is all these nested functions that makes it seems so.

Tuples

Tuples are another feature toward which I have mixed feelings. They are really useful to return multiple results from a function and for rapid prototyping of concepts. But they are easy to abuse. Creating Records is almost always a better, safer way to craft your code, albeit more verbose.

The standard C++ has an excellent tuple library that makes working with them almost as easy as in mainstream functional languages. The below function shows creating them, getting their value, passing them to functions and deconstructing them:

BOOST_AUTO_TEST_CASE(Tuples)
{
    auto t = make_tuple("bob", "john", 3, 2.3);
    BOOST_CHECK_EQUAL(get<0>(t), "bob");
    BOOST_CHECK_EQUAL(get<2>(t), 3);

    // yep, compiler error
    //auto k = get<10>(t);

    auto t2(t);
    BOOST_CHECK(t2 == t);

    // passing as argument, returning it
    auto f = [] (tuple<string, string, int, double> t) { return t;};
    auto t1 = f(t);
    BOOST_CHECK(t == t1);

    // automatic deconstruction
    string s1; string s2; int i; double d;
    std::tie(s1, s2, i, d) = f(t);
    BOOST_CHECK_EQUAL(s1, "bob");

    // partial reconstruction
    string s11;
    std::tie(s11, ignore, ignore, ignore) = f(t);
    BOOST_CHECK_EQUAL(s11, "bob");
}

Conclusion

I’m sure there are some fundamental functional features that I haven’t touched on (i.e. currying, more powerful function composition, etc…).  Despite that, I think we have enough material here to start drawing some conclusions. To start with, where is C++ deficient compared to mainstream functional languages for the sake of writing functional code ?

  • Compiler errors: if you make a well natured error, you often get back from the compiler a long page of rubbish if you are using templates (and you are). You put your goggles on and start swimming through it to find the nugget of gold that is the root cause of your problem. 
    I can hear you C++ gurus screaming: come on, come on, it’s not that bad. After a while you get used to it. You become proficient in rubbish swimming. That makes me think of the story of that guy who lived in a strange neighbourhood where, as soon as you open the door of you house, someone kicks you in the nuts. His wife suggested that they moved to a better part of town, but the man replied: “no worries, I got used to it”.
  • Syntax oddity: we did our best to make the gods of syntax happy in this series of articles, but we are still far removed from Haskell, F#, etc beauty…
  • Advanced features: if you go beyond the basics of functional languages, there is a plethora of interesting features that just make your life so much easier and bring your code to an higher level of abstraction. Things like monads (i.e. f# async workflow), type providers, Haskell’s lazy execution, Haskell’s super strong type system, Closure’s transactional memory etc… Boost is trying to bring many of these things to C++, but there is still a big gap to be filled.
  • Temptation to cheat: you can cheat in many functional programming languages (i.e. having mutable variables), but nowhere the temptation is as strong as in C++. You need an iron discipline not to do it. The fact that sometimes cheating is the right thing to do makes it even worse.

Overall, am I going to use C++ now as my main programming language? Not really. I don’t have one of those. My general view is to always use the best language for the job at hand (yes, I even use Javascript for web pages).

C++ has some unique characteristics (i.e. speed, cross compilation & simple C interfacing). I’m going to use it whenever I need these characteristics, but now I’ll be a happier user knowing that I can write decent functional code in it.

Writing functional code in C++ IV – Algebraic datatypes

And here comes the guilt bit. I have the strong suspicion (but not certainty) that what I am doing here can be done with templates, but didn’t take the time to do it. With that out of the way, let’s go.

Code for this post is here. Thanks to Steve Bower and Andy Sawyer for reviewing it.

Algebraic datatypes (discriminated unions in F#) are a powerful concept in functional programming. They are the main way to represent type variation in your program. Very roughly, where object orientation uses derivation, functional programming uses algebraic datatypes. An entire book could be written on the theory of this, but the goal of this post is to see how we can map them to C++ without loosing C++ness.

When talking about this with C++ programmers, they always point me to boost variant. That doesn’t do it for me for several reasons.

First of all, boost variants represent one of a fixed collection of types. Algebraic datatypes represent one of a fixed collection of named types. That means that a simple thing like the code below cannot be represented as variant:

type LivingEntity =
| Person of string  // name
| Dog of string     // name

Ok, ok maybe you could represent it by ‘typifing’ things using boost strong typedef, but things get ugly syntactically. Moreover, a lot of time the name is all you care about …

type Switch = On | Off

Are we going to strong typedef for such a simple thing? Oh boy. Even accepting the syntactic weight of all this, there are other issues in play. Discriminated unions are used extensively in functional programs. So you want a nice syntax to work with them Something like the below F# syntax:

let print living =
    match living with
    | Person(name) -> printfn "I'm a per named %s" name
    | Dog(name)    -> printfn "I'm a dog named %s" name

Which could be made even sweeter by using the ‘function’ keyword as below:

let print = function
    | Person(name) -> printfn "I'm a per named %s" name
    | Dog(name)    -> printfn "I'm a dog named %s" name

In boost variant, you either use the get<type> functions or you write a visitor function. In the first case you are probably going to write a chain of ‘if’ statements or a ‘switch’ statement. Both are confusing and come with plenty of syntactic weight. I don’t really want to write a visitor like the one below for each ‘match’ in my code. The magic is gone.

class times_two_visitor
    : public boost::static_visitor<>
{
public:

    void operator()(int & i) const
    {
        i *= 2;
    }

    void operator()(std::string & str) const
    {
        str += str;
    }

};

Ok, so boost variant doesn’t really work for this. Remember that our overarching goal was to stay close to C++. The language itself has something that comes pretty close to what we want in the form of a union, or better a tagged union. Again, the types are not named, but maybe we can work that in.

It turns out that Jared here did all the hard work. The general idea is to use macros to hide the construction of a tagged union with methods to test the type and return the contained value.

For example this code:

DU_DECLARE(LivingEntity)
    DU_VALUE(Person,    string)
    DU_VALUE(Dog,       string)
DU_END

Becomes something like:

struct LivingEntity {
    private:
        LivingEntity() {}
        unsigned int m_kind;
    public:
        static LivingEntity Person(const string& value) {
            LivingEntity unionValue;
            unionValue.m_kind = 19;
            unionValue.m_Person = value;
            return unionValue; }

        bool IsPerson() const {
            return m_kind == 19;
        }
        const string& GetPerson() const {
            (void)( (!!(m_kind == 19)) || (_wassert(L"m_kind == __LINE__", L"c:discriminated_union.cpp", 19), 0) );
            return m_Person; }
        string GetPerson() {
            (void)( (!!(m_kind == 19)) || (_wassert(L"m_kind == __LINE__", L"c:discriminated_union.cpp", 19), 0) );
            return m_Person; }
   private:
string m_Person;

You can see the outline of a tagged union (i.e. m_kind) with a constructor for each type (i.e. Person) and methods to test for at type and return its value. You can also see the storage for the value (i.e. m_Person).

The only thing in DU_DECLARE that is different from Jared’s solution is the typedef below that allows not repeating the LivingEntity name in each DU_VALUE.

#define DU_DECLARE(name)                        \
    struct name {                               \
    private:                                    \
        typedef name unionName;                 \
        name() {}                               \
        unsigned int m_kind;                    \
    public:
#define DU_VALUE(entryName, entryType)                                                                      \
        static unionName entryName(const entryType& value) {                                                \
            unionName unionValue;                                                                           \
            unionValue.m_kind = __LINE__;                                                                   \
            unionValue.m_##entryName = value;                                                               \
            return unionValue;  }                                                                           \
        bool Is##entryName() const { return m_kind == __LINE__;}                                            \
        const entryType& Get##entryName() const { assert(m_kind == __LINE__); return m_##entryName; }       \
        entryType Get##entryName() { assert(m_kind == __LINE__); return m_##entryName; }                    \
    private:                                                                                                \
        entryType m_##entryName;                                                                            \
    public:

With all of that at your disposal it becomes easy to write:

    auto entity = LivingEntity::Dog("Bob");

    DU_MATCH(entity)
        DU_CASE(Dog,   BOOST_CHECK_EQUAL(value, "Bob");)
        DU_CASE(Person,BOOST_CHECK(false);) 
    DU_MATCH_END

There are some beautiful things about this. First of all, the construction of any of such types is super simple. You even get intellisense!

Moreover the ‘value’ variable contains whatever was passed in the constructor for the object. So this is semantically equivalent, if not syntactically, to the match statement in F#.

Obviously the code part is not limited to a single instruction:

    DU_MATCH(entity)
        DU_CASE(Dog,
            cout << "I should be here";
            BOOST_CHECK_EQUAL(value, "Bob");
        )
        DU_CASE(Person,
            BOOST_CHECK(false);
        ) 
    DU_MATCH_END

And for those of you addicted to braces, I venture:

    DU_MATCH(entity)
        DU_CASE(Dog,
        {
            cout << "I should be here";
            BOOST_CHECK_EQUAL(value, "Bob");
        })
        DU_CASE(Person,
        {
            BOOST_CHECK(false);
        }) 
    DU_MATCH_END

They all work with the same macro definition. They expand to something along the line of:

    if(false) {}
        else if(entity.IsDog()) {
            auto value = entity.GetDog();
            BOOST_CHECK_EQUAL(value, "Bob");
        }
        else if(entity.IsPerson()) {
            auto value = entity.GetPerson();
            BOOST_CHECK(false);
        } 
        else {
            throw match_exception();
        }

I’ve not reached the pinnacle of macro naming mastering with this one. Making them lowercase and risking a bit more on the conflict side would make the syntax much more palatable. I call it, as it is, not too bad.

The last ‘else’ clause assures you then if you add a new type to the discriminated union and forget to update one of the ‘MATCH’ clauses at least you get a run time error. That is not good. Functional languages would give you a compile time error, which is much better. Maybe with judicious use of templates you can bring the error at compile time.

The macros are trivial:

class match_exception: std::exception {};

#define DU_MATCH(unionName) { auto du_match_var = unionName; if(false) {}

#define DU_CASE_TAG(entry, ...)                        \
    else if(du_match_var.Is##entry()) {                \
        __VA_ARGS__                                    \
    }

#define DU_CASE(entry, ...)                            \
    else if(du_match_var.Is##entry()) {                \
        auto value = du_match_var.Get##entry();        \
        __VA_ARGS__                                    \
    }

#define DU_DEFAULT(...)                                \
    else if(true) { __VA_ARGS__}

#define DU_MATCH_END else {throw new match_exception();} }

Let’s now go back to our initial goal and see how far off we are. We were trying to do the following:

type LivingEntity =
| Person of string  
| Dog of string  


let print = function
    | Person(name) -> printfn "I'm a per named %s" name
    | Dog(name)    -> printfn "I'm a dog named %s" name

And here is what we ended up with:

DU_DECLARE(LivingEntity)
    DU_VALUE(Person,    string)
    DU_VALUE(Dog,        string)
DU_END

auto print(LivingEntity en) -> void {
    DU_MATCH(entity)
        DU_CASE(Dog,    cout << "I'm a dog named " << value;)
        DU_CASE(Person, cout << "I'm a per named " << value;) 
    DU_MATCH_END
}

In our Switch case:

type Switch = On | Off

You get the good looking :

DU_DECLARE(Switch)
    DU_FLAG(On)
    DU_FLAG(Off)
DU_END

And along the way we lost compile time type safety in the very common case of adding new types to the discriminated union. That’s bad.

Also some of you would strongly dislike the (ab)use of macros. As for me, it looks workable.

Writing functional code in C++ III – Performance of different allocation schemes

Now we know how to represent records and we know how to operate on them using a nice F# like syntax. But how do we store our record in a data structure in the first place?

Code for this post is here. Thanks to Andy Sawyer and Steve Bower for reviewing this.

As it is often the case, C++ gives you many options that are not available in standard functional languages. A mixed blessing of some sort.

  1. You can store them by value or by pointer
  2. If you store them by pointer, you can use normal pointers or smart pointers
  3. If you store them by smart pointer, you can use different ones

Well, storing them by value is the simplest option, but it has two shortcomings:

  1. You pay the price of a full record copy whenever the object need to be copied (i.e. when you add/remove records to/from a vector and whenever the vector needs to resize itself)
  2. If you store a record in two containers, you get two different copies of it. If you modify one, the other won’t be modified

The second issue is not a problem in our scenario as records, by definition, are immutable and structurally equivalent (aka equality for them doesn’t mean pointer equality). The first issue is more tricky. Our records get a copy constructor by default which does a ‘memcpy’ of the object. That should be fast. But fast enough?

To answer this question (and more), I embarked in a little performance test that encompasses most ways to store a pod type in an vector like data structure. For all my test I used this struct:

struct Record {
    int Id;
    char k1[2];
    char k2[2];
    char k3[2];
    char k4[2];
    char k5[2];
    char k6[2];
    char mem[bigBlock];

    void Lock() {}
    void Unlock() {}
};

By recompiling with different values for ‘bigBlock’ I can test what happens when the record size grows. In all tests a record is initialized with this function:

void record_init(Record& r, int i) {

    r.Id = i;
    strcpy(r.k1, "0");
    strcpy(r.k2, "0");
    strcpy(r.k3, "0");
    strcpy(r.k4, "0");
    strcpy(r.k5, "0");
    strcpy(r.k6, "0");
    memset(r.mem, '-', bigBlock);

}

The tests are specific to the scenario I care about: performing functional-like operations on a Range:

struct to_int {
    typedef int result_type;
    int operator() (const Record& r) const {
        return r.Id;
    };
};

function<bool (const Record&)> filter_f = [](const Record& r) {return r.Id < filterNo;};

template <class Range>
int accumulate_filter(const Range& r) {
    return boost::accumulate(
        r | filtered(filter_f) | transformed(to_int()),
        0,
        plus<int>()); 
}

The usage of a function and  a functor is a bit odd. I don’t recall why I did it that way, but it doesn’t matter as it is the same for each test. What the test does is just filtering a bunch of record, transforming them (map) to ints and sum these ints.

How many repetitions of each test are used, how big is the record, how many records the Range contains is specified in these constants:

const int repetitions = 1000;
const int bigBlock = 1000;
const int howMany = 1000;

Time is kept using this clock that wraps boost::chrono:

typedef boost::chrono::process_cpu_clock the_clock;

struct timer {
    timer(): clock_(the_clock::now()) {}

    the_clock::times elapsed() {
        return (the_clock::now() - clock_).count();
    }

    the_clock::time_point clock_;
};

I have tested the following configurations. Adding records by value:

int normal() {

    vector<Record> v;
    for (int i = 0; i < howMany; ++i) {
        Record r;
        record_init(r, i);
        v.push_back(r);
    }

    return accumulate_filter(v);  
}

I then tested adding records using a pod_vector. This is a data structure described here and in “Imperfect C++”. It is a vector that uses as an auto_buffer as the underlying storage. An auto_buffer is a class that uses stack memory if it needs more than a certain number of bytes specified at compile time, otherwise it uses heap memory. It works well if you know at compile time that most allocations for something take at most N bytes, but some might need more. This situation is surprisingly frequent. Unfortunately, your objects need to be POD to use the pod_vector.

I tested it in the case where the stack space is big enough (podVector<howMany*2>) and when it is not (podVector<howMany/10>).

int podVector() {

    pod_vector<Record, size> v;
    for (int i = 0; i < howMany; ++i) {
        Record r;
        record_init(r, i);
        v.push_back(r);
    }

    return accumulate_filter(v);  
}

I then tested just allocating the memory, without freeing it and using boost::pool in it’s ‘local’ form, which means that memory is freed when it goes out of scope (stack-like). The main drawback of the latter is that you cannot return the container/range.

int boostallocator(WhichOne which) {

    boost::pool<> p(sizeof(Record));
    vector<Record*> v;
    Record* r;
    for (int i = 0; i < howMany; ++i) {
        if(which == Boost)
            r = (Record*)p.malloc(); // memory freed at function exit
        else
            r = new Record; // oops, memory leak

        record_init(*r, i);
        v.push_back(r);
    }

    return accumulate_filter(v | indirected);
}

Indirected is needed because we are not talking about pointers. I also tested various variations of shared pointers. First a normal shared_ptr, then a shared_ptr initialized with the boost::singleton_pool and finally a pointer initialized with make_shared.

int pointers(WhichOne which) {

    vector<shared_ptr<Record>> v;
    for (int i = 0; i < howMany; ++i) {
        shared_ptr<Record> r;
        if(which == Normal)
            r = shared_ptr<Record>(new Record);
        else if(which == Boost)
            r = shared_ptr<Record>((Record*)record_pool::malloc(), [](void* p) {record_pool::free(p);});
        else if(which == Make)
            r = make_shared<Record>();
        else throw "This kind of pointer doesn't exist";

        record_init(*r, i);
        v.push_back(r);
    }

    return accumulate_filter(v | indirected);
}

Finally, I used a Loki smart pointer. This is a very elegantly designed smart pointers from  “Modern C++ design”. You pass as template parameters the policies you want your smart pointer to have (aka how it should behave). I tested it like so:

typedef Loki::SmartPtr<Record,
                 Loki::RefCounted,
                 Loki::DisallowConversion,
                 Loki::AssertCheck,
                 Loki::DefaultSPStorage,
                 LOKI_DEFAULT_CONSTNESS> RecordPtr;

And using the following, slightly more convoluted code:

int lokipointers(WhichOne) {
    vector<RecordPtr> v;
    for (int i = 0; i < howMany; ++i) {
        RecordPtr r = RecordPtr(new Record());
        record_init(*r, i);
        v.push_back(r);
    }

    auto ret = accumulate_filter(v | transformed(RecordPtr::GetPointer) | indirected);
    return ret;
}

The result of my tests are in this table and at this link. I used VS10 and gcc 4.6.2 on a Windows 7 box. The first number in the (S, N) pair at the top of each column represents  the size of the record, the second one represents the number of objects in the vector. To obtain reasonable numbers, the tests have been run with a different number of iteration for each pair, which means that you can compare the results vertically, but not horizontally.

Results

There are a lot of interesting things in this table. Here are my takeaways. They are obviously specific to this single scenario. You might have different scenarios or different takeaways:

  • Stack allocation is not too bad for small objects, especially for gcc
  • Pod_vector is a good overall performer (even if you don’t get the size right), it is unfortunate that just works with POD types
  • GCC seems to be lagging on the shared_ptr front. Also msvc implementation of the make_shared optimization gives a visible advantage
  • There is not much advantage in using the shared pointer with the boost pooled allocator
  • If you can use the boost local pool allocator, you should. It is faster than stack allocation (in this scenario). Remember that the memory is reclaimed when you exit the function …

So here you have it. A small detour on the performance of different schemes for allocating pod types. As it is often the case in C++, it depends …

BTW: Andy Sawyer told me about his rough algorithm to decide which STL container to use. I reproduce it here:

[BEGIN]

A decision tree for selecting a sequence container:

– I’m in a rush and don’t want to read the rest: use std::deque

– Do we know ahead of time exactly how many elements will be needed (and will they all fit on our stack!) – If so, use std::array.

– Do we need constant-time random access? (Note that we often *think* we do, but actually don’t – YAGNI) If so, then we can eliminate std::list/std::forward_list as candidates.

– Do we need bidirectional iteration? (Again, we need this less often that we think we do). If so, that eliminates std::forward_list

– Will there be a large number of in-the-middle insertion/removal?  (and assuming we haven’t already eliminated them) then std::list/std::forward_list (especially when the contained type is expensive to move/copy).  In extreme cases, this may be a strong enough requirement to override other constraints (e.g. the need for random access). On the other hand, it may be viable to reduce the cost of move/copy of large contained types by using containers of (smart) pointers.

– Do we need the contents as a contiguous array? use std::vector (and call reserve  if we can) – sometimes, anyway; I’ve seen cases where it’s faster to build my collection as a std::deque, then transform to std::vector later on.)

– Use std::deque.

Or, to put it another way, “use std::deque  unless there’s a good reason not to”.  That’s probably an overly-strong statement, but it’s a reasonable starting point.

Naturally, every case will be different – and no set of ‘rules of thumb’ is going to work every time (this is one of the things which distinguish rules of thumb from laws of physics). Commonly, we don’t know ahead of time which of those criteria is going to have the most significant impact on what we’re doing; when I find myself in that situation, I’ll use std::deque; once I have a firmer idea of the criteria, I’ll go back and revisit that choice (which – quite often – is as simple as changing a typedef). And of course – as with all ‘optimisations’ – measure, measure and measure again!

[END]

Next time, we’ll go back to more traditional functional programming topics.

Letter on investing

In 2007 I wrote a blog post on investing. During the last five years, my view evolved a bit. People often ask me how to get a financial education. This is the latest email I sent on the topic.

From: Bolognese, Luca
Sent: 04 April 2012 16:52
To: XXX
Subject: RE: A financial education

Disclaimer: this are just my personal opinions drawn from my 10+ years of investing, reading books and academic papers about it. I can justify most statements below with some academic reference, but that would make this email too long.

Also someone else might read the same material and come up with a different opinion. It is a field that is probabilistic in nature and doesn’t lend itself to certainties. For example, you can make a lot of money out of an investment and still be wrong to have made it. And conversely.

Our brains don’t work well in such fields.

  • The most important thing in investing is not investing, it is saving the money to invest. If you save enough and are mildly reasonable in your investments (aka you diversify), you are going to be ok. Saving is not mechanically difficult, it is motivationally difficult. The best book I found on the topic is this one. I read earlier editions, this is a new one. I don’t like the investment chapters.
  • After that, you need to make sure that your financial matters are in order (i.e. you are ensured against catastrophes, you maximize your tax deductions, etc..). This is the field of personal finance. I’ve been suggesting this book to American audiences. It is on Amazon.co.uk, so it might be the UK version.
  • Now that you have saved money and your finances are in order, you can start thinking about investing. The most important things in investing are: deciding what you believe in, what your risk tolerance is, how important is for your performance to track the market, what your time horizon is and how much time you want to dedicate to investing.
    • My risk tolerance is high. I’ve spent a lot of time learning about investing. I’m willing to see the value of my portfolio go down without experiencing emotional distress (I’ve tried it)
    • I don’t care about my investment performance tracking the market. I don’t watch financial programs.
    • My time horizon is long. These are money I invest for my retirement.
    • I’m willing to invest time to keep myself up to date on investment topics and manage my portfolio
    • I believe the following (these are the biases that colour my view):
    • Diversification among different sources of returns is to be sought aggressively
    • Most asset classes are not efficient because of:
      • Predictable flaws in human brain’s processing machinery
      • Institutional constraints on most professional investor (aka they have to track the market they invest in closely)
      • Short term performance incentive for professional investors
    • Some asset classes are intrinsically more inefficient than others (i.e. emerging markets micro stocks compared to US large stock) because that’s where the causes of inefficiencies are strongest
    • The biggest inefficiencies are among asset classes (i.e. stocks vs bonds vs commodities)
    • Most people (myself included) don’t have the time to do the research necessary to invest in individual stocks using the two ways that I believe are ‘right’: quant models or/and fundamental value analysis.
  • Some books to get you started:
    • A good perspective on how to think about the market and the work you need to do if you want to invest in stocks. The intelligent investor.
    • Why diversification is important. The intelligent asset allocator. I don’t believe in static asset allocation, but you need to know what it is to believe (or not) in it.
    • Moving beyond a static asset allocation with a quant based highly diversified, but simple to implement system. The Ivy Portfolio. I am very tempted to use this one myself
    • Why asset classes are often mispriced. Probably the premier quant shop around. Read most quarterly letters.
    • Systems to take advantage of such mispricing are described here. I have obtained copies of all the newsletters sent by the author going back to 1991 and backtested some of the systems. These are the system I use. But the newsletter is relatively expensive and complex to follow.
    • If you decide to branch out into value analysis of companies. Essays of warren buffett. You would also need to read an introductory and an advanced accounting text. I don’t know UK accounting, so cannot suggest good candidates.
    • If you are very conservative and want to just invest in Inflation Linked Bonds (or fearful of markets in general), there is a book for that: here.
  • With all of that out of the way, here is some practical counsel
    • if you have less than 20,000 pounds to invest try to find a fund or etf that invest in a diversified array of asset classes for a small price. I don’t know the UK market, so cannot suggest one.
    • If you have more than that and are willing to dedicate a few hours a month, you can start implementing some of the systems in the Ivy Portfolio book. Maybe you integrate it with the GMO 7 years asset class forecasts to push a bit of value bias in your system.
    • For most people that’s it. If it becomes a lifelong interest / mania as it is for me, there is a bit more that can be done, but it is unclear to me that the results are necessarily better than the simple systems described above.

Cheers,

.luca

Writing functional code in C++ II – Function composition

Function composition is at the core of functional programming. You start by being very confident that certain very small functions are correct, you compose them in well known ways and you end up being very confident that your final program is correct.

You are very confident that the initial functions are correct because they are very small and side effect free. You are very confident that your program is correct because the means of composition are well known and generate functions that are themselves side effect free.

Such ‘almost primitive’ functions operate on data structures. Each functional language has its own set of data structures and ‘almost primitive’ functions. Probably the most famous ones are contained in the haskell prelude, but F# has similar ones. LINQ in C# is just such a set.

In this article I use the term functional composition in a broad sense as ‘putting functions together’, not in the more technical sense of f(g(x)): dot operator in Haskell or ‘>>’ operator in F#.

Code for this post is here. Thanks to Steve Bower and Andy Sawyer for review and comments.

Here is an example. The F# code below filters in the odd numbers from an array and doubles them. You have two logical operations, filtering and doubling.

let v = [|1;2;3;4;5;6;7;8;9;0|]

let transformed =
    v
    |> Seq.filter(fun i -> i % 2 = 0)
    |> Seq.map ((*) 2)

How would the code look in C++? There are many ways. One is the ‘big fat for loop’, or in C++ 11, for each:

    const int tmp[] = {1,2,3,4,5,6,7,8,9,0};
    const vector<int> v(&tmp[0], &tmp[10]);

    vector<int> filtered;

    for(auto i: v)
    {
        if(i % 2 == 0)
            filtered.push_back(i * 2);
    }

This looks simple enough, apart from the initialization syntax that gets better with C++ 11. But it’s simplicity is misleading. We have now lost the information that our computation is composed by a filter and a map. It’s all intermingled. Obviously, the info is still there in some sense, but you need to read each line in your head and figure out the structure on your own. It is not readily apparent to the reader.

Obviously, things get much worse in a large program composed of several loops where each one does not trivial stuff. As for me, every time I indulge into the ‘big fat for loop pattern’, I end up regretting it either immediately or after some time when I have to make changes to the code. So I try to avoid it.

Wait a minute, you might say, what about STL algorithms? Aren’t you supposed to use those?

The syntax to call them is not very compositional. You cannot take the result of an algorithm and pass it to another easily. It is difficult to chain them that way. Everything takes two iterators, even if 99% of the times you are just using begin and end. Moreover, you have to create all sort of temporary data structures for the sake of not modifying your original one. Once you have done all of that, the syntactic weight of your code overwhelm the simplicity of what you are trying to do.

For example, here is the same code using ‘transform’ for the ‘map’ part of the algorithm:

    vector<int> filtered;
    copy_if(begin(v), end(v), back_inserter(filtered), [](int x) { return x % 2 == 0;});
    
    vector<int> transformed;
    transform(begin(filtered), end(filtered), back_inserter(transformed), [](int x) { return x * 2;});

I’m not arguing that STL is a bad library, I think it is a great one. It’s just that its syntax is not very compositional. And that breaks the magic.

Luckily, help is on the way in the form of the Range library and OvenToBoost. These libraries ‘pack’ the two iterators in a single structure and override the ‘|’ operator to come up with something as below:

    auto result =
        v
        | filtered(   [] (int i) { return i % 2 == 0;})
        | transformed([] (int i) { return i * 2;});

If you use Boost lambdas (not sure I’d do that), the syntax gets even better:

    auto result =
        v
        | filtered(   _1 % 2 == 0)
        | transformed(_1 * 2);

Ok, so we regained compositionality and clarity, but what price are we paying? Apart from the dependency from Boost::Range and OvenToBoost, you might think that this code would be slower than a normal for loop. And you would be right. But maybe not by as much as you would think.

The code for my test is here. I’m testing the perf difference between the following code engineered to put in a bad light the most pleasing syntax. A for loop:

        int sum = 0;
        for(vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
            if(*it < 50)
                sum += *it * 2;
        return sum;

A Range based code using language lambdas (transformedF is a variation of this):

        auto lessThan50 = v | filtered(    [](int i) { return i < 50;})
                            | transformedF([] (int i) { return i * 2;});

        return boost::accumulate (lessThan50, 0);

A Range based code using two functors:

        auto lessThan50 = v | filtered(Filterer())
                            | transformed(Doubler());

        return boost::accumulate (lessThan50, 0);

Where:

struct Doubler {
    typedef int result_type;
    int operator() (int i) const { return i * 2;}
};

struct Filterer {
    typedef int result_type;
    bool operator() (int i) const { return i < 50;}
};

a Range based code using Boost lambdas:

        auto lessThan50 = v | filtered(_1 < 50)
                            | transformed(_1 * 2);

        return boost::accumulate (lessThan50, 0);

And finally, some code using the STL for_each function:

        int sum = 0;
        std::for_each( v.cbegin(), v.cend(),
            [&](int i){
                if( i < 50 ) sum += i*2; 
            });
        return sum;

Notice that I need to run the test an humongous number of times (10000000) on a 100 integers array to start seeing some consistency in results. 
And the differences that I see (in nanosecs below) are not huge:

1>                         Language lambda: {1775645516;1778411400;0}
1>                          Functor lambda: {1433377701;1435209200;0}
1>                            Boost lambda: {1327829199;1326008500;0}
1>                                For loop: {1338268336;1341608600;0}
1>                             STL foreach: {1338268336;1341608600;0}

True to be told, by changing the code a bit these numbers vary and, in some configurations, the Range code takes twice as long.

But given how many repetitions of the test I need to run to notice the difference, I’d say that in most scenarios you should be ok.

The ‘|>’ operator over collections is the kind of functional composition, broadly defined, that I use the most, but you certainly can use it over other things as well. You’d like to write something like the below:

    BOOST_CHECK_EQUAL(pipe("bobby", strlen, boost::lexical_cast<string, int>), "5");

And you can, once you define the following:

    template< class A, class F1, class F2 >
    inline auto pipe(const A & a, const F1 & f1, const F2 & f2)
        -> decltype(REMOVE_REF_BUG(f2(f1(a)))) {
        return f2(f1(a));
    }

Where REMOVE_REF_BUG is a workaround for a bug in the decltype implementation under msvc (it should be fixed in VS 11):

    #ifdef _MSC_VER
        #define REMOVE_REF_BUG(...) remove_reference<decltype(__VA_ARGS__)>::type()
    #else
        #define REMOVE_REF_BUG(...) __VA_ARGS__
    #endif

I didn’t do the work of creating ‘>>’ , ‘<<’ and ‘<|’ F# operators or wrapping the whole lot with better syntax (i.e. operator overloading?), but it certainly can be done.

Now that we have a good syntax for records and a good syntax for operating over collections, the next instalment will put them together and try to answer the question: how should I create collections of records and operate over them?