Writing functional code in C++ II – Function composition

Function composition is at the core of functional programming. You start by being very confident that certain very small functions are correct, you compose them in well known ways and you end up being very confident that your final program is correct.

You are very confident that the initial functions are correct because they are very small and side effect free. You are very confident that your program is correct because the means of composition are well known and generate functions that are themselves side effect free.

Such ‘almost primitive’ functions operate on data structures. Each functional language has its own set of data structures and ‘almost primitive’ functions. Probably the most famous ones are contained in the haskell prelude, but F# has similar ones. LINQ in C# is just such a set.

In this article I use the term functional composition in a broad sense as ‘putting functions together’, not in the more technical sense of f(g(x)): dot operator in Haskell or ‘>>’ operator in F#.

Code for this post is here. Thanks to Steve Bower and Andy Sawyer for review and comments.

Here is an example. The F# code below filters in the odd numbers from an array and doubles them. You have two logical operations, filtering and doubling.

let v = [|1;2;3;4;5;6;7;8;9;0|]

let transformed =
    v
    |> Seq.filter(fun i -> i % 2 = 0)
    |> Seq.map ((*) 2)

How would the code look in C++? There are many ways. One is the ‘big fat for loop’, or in C++ 11, for each:

    const int tmp[] = {1,2,3,4,5,6,7,8,9,0};
    const vector<int> v(&tmp[0], &tmp[10]);

    vector<int> filtered;

    for(auto i: v)
    {
        if(i % 2 == 0)
            filtered.push_back(i * 2);
    }

This looks simple enough, apart from the initialization syntax that gets better with C++ 11. But it’s simplicity is misleading. We have now lost the information that our computation is composed by a filter and a map. It’s all intermingled. Obviously, the info is still there in some sense, but you need to read each line in your head and figure out the structure on your own. It is not readily apparent to the reader.

Obviously, things get much worse in a large program composed of several loops where each one does not trivial stuff. As for me, every time I indulge into the ‘big fat for loop pattern’, I end up regretting it either immediately or after some time when I have to make changes to the code. So I try to avoid it.

Wait a minute, you might say, what about STL algorithms? Aren’t you supposed to use those?

The syntax to call them is not very compositional. You cannot take the result of an algorithm and pass it to another easily. It is difficult to chain them that way. Everything takes two iterators, even if 99% of the times you are just using begin and end. Moreover, you have to create all sort of temporary data structures for the sake of not modifying your original one. Once you have done all of that, the syntactic weight of your code overwhelm the simplicity of what you are trying to do.

For example, here is the same code using ‘transform’ for the ‘map’ part of the algorithm:

    vector<int> filtered;
    copy_if(begin(v), end(v), back_inserter(filtered), [](int x) { return x % 2 == 0;});
    
    vector<int> transformed;
    transform(begin(filtered), end(filtered), back_inserter(transformed), [](int x) { return x * 2;});

I’m not arguing that STL is a bad library, I think it is a great one. It’s just that its syntax is not very compositional. And that breaks the magic.

Luckily, help is on the way in the form of the Range library and OvenToBoost. These libraries ‘pack’ the two iterators in a single structure and override the ‘|’ operator to come up with something as below:

    auto result =
        v
        | filtered(   [] (int i) { return i % 2 == 0;})
        | transformed([] (int i) { return i * 2;});

If you use Boost lambdas (not sure I’d do that), the syntax gets even better:

    auto result =
        v
        | filtered(   _1 % 2 == 0)
        | transformed(_1 * 2);

Ok, so we regained compositionality and clarity, but what price are we paying? Apart from the dependency from Boost::Range and OvenToBoost, you might think that this code would be slower than a normal for loop. And you would be right. But maybe not by as much as you would think.

The code for my test is here. I’m testing the perf difference between the following code engineered to put in a bad light the most pleasing syntax. A for loop:

        int sum = 0;
        for(vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
            if(*it < 50)
                sum += *it * 2;
        return sum;

A Range based code using language lambdas (transformedF is a variation of this):

        auto lessThan50 = v | filtered(    [](int i) { return i < 50;})
                            | transformedF([] (int i) { return i * 2;});

        return boost::accumulate (lessThan50, 0);

A Range based code using two functors:

        auto lessThan50 = v | filtered(Filterer())
                            | transformed(Doubler());

        return boost::accumulate (lessThan50, 0);

Where:

struct Doubler {
    typedef int result_type;
    int operator() (int i) const { return i * 2;}
};

struct Filterer {
    typedef int result_type;
    bool operator() (int i) const { return i < 50;}
};

a Range based code using Boost lambdas:

        auto lessThan50 = v | filtered(_1 < 50)
                            | transformed(_1 * 2);

        return boost::accumulate (lessThan50, 0);

And finally, some code using the STL for_each function:

        int sum = 0;
        std::for_each( v.cbegin(), v.cend(),
            [&](int i){
                if( i < 50 ) sum += i*2; 
            });
        return sum;

Notice that I need to run the test an humongous number of times (10000000) on a 100 integers array to start seeing some consistency in results. 
And the differences that I see (in nanosecs below) are not huge:

1>                         Language lambda: {1775645516;1778411400;0}
1>                          Functor lambda: {1433377701;1435209200;0}
1>                            Boost lambda: {1327829199;1326008500;0}
1>                                For loop: {1338268336;1341608600;0}
1>                             STL foreach: {1338268336;1341608600;0}

True to be told, by changing the code a bit these numbers vary and, in some configurations, the Range code takes twice as long.

But given how many repetitions of the test I need to run to notice the difference, I’d say that in most scenarios you should be ok.

The ‘|>’ operator over collections is the kind of functional composition, broadly defined, that I use the most, but you certainly can use it over other things as well. You’d like to write something like the below:

    BOOST_CHECK_EQUAL(pipe("bobby", strlen, boost::lexical_cast<string, int>), "5");

And you can, once you define the following:

    template< class A, class F1, class F2 >
    inline auto pipe(const A & a, const F1 & f1, const F2 & f2)
        -> decltype(REMOVE_REF_BUG(f2(f1(a)))) {
        return f2(f1(a));
    }

Where REMOVE_REF_BUG is a workaround for a bug in the decltype implementation under msvc (it should be fixed in VS 11):

    #ifdef _MSC_VER
        #define REMOVE_REF_BUG(...) remove_reference<decltype(__VA_ARGS__)>::type()
    #else
        #define REMOVE_REF_BUG(...) __VA_ARGS__
    #endif

I didn’t do the work of creating ‘>>’ , ‘<<’ and ‘<|’ F# operators or wrapping the whole lot with better syntax (i.e. operator overloading?), but it certainly can be done.

Now that we have a good syntax for records and a good syntax for operating over collections, the next instalment will put them together and try to answer the question: how should I create collections of records and operate over them?

Writing functional code in C++ – Records

This is the first of a series of posts about writing functional code in C++.  My goal is different from FC++, which is a full fledged ‘environment’ to write functional code. Instead, I want to experiment with some of the new C++ 11 language features and see if one can build reasonably looking functional code and stay pretty close to the language. The idea is to judiciously use macros and external libraries to build a thin layer on top of the language that doesn’t change the performance characteristics of it (aka it doesn’t slow it down) and integrates fine with existing C++ code.

Think of it as an attempt to answer the question: is there a way to write C++ code in a functional style without loosing its ‘C++sness’? We won’t attempt to be as type safe or syntactically pleasing as Haskell or F# , but we’ll attempt to stay as fast and flexible as C++ and make it functional enough to be interesting.

Thanks to Steve Bower and Ganesh Sittampalam for reviewing and to Andy Sawyer for giving me so much feedback that this post can be considered a co-authorship. Code for this post is here.

Let’s first talk about the data types that you typically find in a functional language. Let’s start with Records. They are not part of the functional model per se. You can do without them and just use algebraic data types and tuples. But they are damn convenient, and  most functional languages (i.e. Haskell, Clojure, etc…) have them now. We start from them because they map naturally to C++. Records are just like structs, but immutable and having structural equality.

In F# your vanilla record looks like this:

type Person = {
    Name: string
    Id: int
}

Nice and simple. With C++, you become more verbose. A first attempt would be:

    struct Person {
        const string Name;
        const int Salary;
    };

Which looks nice and easy, but doesn’t quite work because more often than not you need to be able to compare two records for structural equality (which means the value of their fields, not the pointer in memory, defines equality). Records in functional languages automatically support that. But the syntax gets on the ugly side:

    struct Person {
        const string Name;
        const int Salary;

        bool operator==(const Person& other) const { return Salary == other.Salary && Name == other.Name;}
        bool operator!=(const Person& other) const { return !(*this == other);}
    };

We’ll see how to simplify the syntax later. The syntax on the creation side is not too bad:

Person p = {"Bobby", 2};

Let’s call the above representation, the obvious one. Let’s consider two variations on this scheme. The first one is useful if you want to make your records interoperable with C or with other C++ compilers.

A full discussion of how to achieve these goals would be very long. It will go about discussing what POD types are and how their definition got more precise in C++ 11.

You can look at my experimentations on pod, standard layout and trivially constructible types here. My summary is pretty simple, if you want to achieve all the goals above, you got to use C structs that contain C-compatible types. No const, strings or STL libraries allowed.

The above class would then become:

    struct Person {
        char Name[20];
        int Salary;

        bool operator==(const Person& other) const { return Salary == other.Salary && !strcmp(Name,other.Name);}
        bool operator!=(const Person& other) const { return !(*this == other);}
    };

Obviously, not being able to use strings or STL collections in your record is a big limitation, but in certain cases you might be able to live with it.  Let’s call this representation, the pod one.

You would think you can make the syntax better by doing something like the below:

    struct _Person {
        int id;
        char name[20];

        pod_equality(_Person);
    };    

Where pod_equality is defined as below:

        #define pod_equality(Record)                                                                 \
               bool operator==(const Record& other) const {                                          \
                      static_assert(std::is_trivial<Record>::value, "Not trivially copyable");       \
                      return memcmp(this, &other, sizeof(Record)) == 0;}                             \
               bool operator!=(const Record& other) const { return !(*this == other);}

But you would be wrong (as I was for a while), as comparing memory values doesn’t work in this case because of the padding that the compiler can insert between fields. Such padding can contain random value (whatever was on the stack, for example) which would make the equality return false for structurally equal objects. Also this scheme fails for floating point types (i.e. NaN and signed zeros).

An alternative representation for records, which nicely separates constness from the structure of your record is below. It does have some some advantages that we’ll look at, but in its raw form it is yet more syntax:

namespace Mutable {
    struct Person {
        string Name;
        int Salary;

        bool operator==(const Person& other) const { return Salary == other.Salary && Name == other.Name;}
        bool operator!=(const Person& other) const { return !(*this == other);}
    };
}

typedef const Mutable::Person Person;

Let’s call this representation, the immutable one. Let’s give these guys some usage and talk about their trade-offs. If you want to write a function that increase someone salary, you would write it like this:

template<class T>
T rise_salary1(const T& p) {
    T ret = { p.Name, p.Salary + 1000 };
    return ret;
}

This looks nice and clean, unless your record has a lot of fields. Let me tell you, in real application it probably does. For example:

namespace Mutable {

    struct foo {
        int value1;
        int value2;
        int value3; 
        int value4;
        int value5;
        int value7;
        int value6;
        int value8;
        int value9;
    };
}

typedef const Mutable::foo foo;

foo increment_value7( const foo& f )
{
     foo tmp = { f.value1, f.value2, f.value3, f.value4, f.value5, f.value6, f.value7+1, f.value8 };
     return tmp;
}

Thanks to Andy for this example. BTW: did you spot the bug at first sight? What about the other one?

So this syntax is problematic. True to be told, part of the problem is in the sub-optimal initialization syntax in C++. If you could use named parameters, it would be more difficult to introduce bugs, but the syntax would still be ugly. You really need something like the F# syntax:

let r1 = {f = 0.2; k = 3}
let r2 = {r1 with f = 0.1}

Can we do something like that? Well, if we are willing to pass the Mutable one around, we can write this:

template<class T>
T rise_salary2(const T& p) {
    T ret(p);
    ret.Salary += 1000;
    return ret;
}

Or even this:

template<class T>
T rise_salary3(T p) {
    p.Salary += 1000;
    return p;
}

But that doesn’t make us happy, does it? The whole point of making things const is that you want them to be const Smile If you pass around mutable ones, who knows what’s going to happen inside the method?

There is a middle ground that might be acceptable, which is to write functions so that their interface takes immutable records, but inside the function they operate on mutable ones. This is not a bad pattern in general, as having mutable versions of your immutable records might come useful for optimizing certain algorithms. Luckily the casting rules of C++ favour the bold, so the below works:

Person rise_salary_m(const Person& p) {
    Mutable::Person ret(p);
    ret.Salary += 1000;
    return ret;        
}

And doesn’t look too bad either.

Now let’s talk syntax. Defining a record is still a lot of typing (and a lot of reading if you are maintaining the code). F# does it like this:

type Person = {
    Name: string
    Id: int
}

The best I came up with looks like this:

RECORD2(Person,
          string, Name,
          int,    Salary);

And you need a lot of those macros depending on how many fields your record has. You can write this macro to expand to either the Obvious or Immutable representation trivially. It is a bit more complex for the Pod one because of the interesting C++ array declaration syntax with the number of elements after the name of the field.

For the Obvious one it looks like this:

#define RECORD2(n, t1, f1, t2, f2)                                                            \
    struct n {                                                                                \
        const t1 f1;                                                                          \
        const t2 f2;                                                                          \
                                                                                              \
        bool operator==(const n& other) const { return f1 == other.f1 && f2 == other.f2;}     \
        bool operator!=(const n& other) const { return !(*this == other);}                    \
    };

All the usual concerns about macros apply. Moreover all your fields need to have a meaningful == operator.

To summarize, we have found three different representations of records in C++ and tried to alleviate the syntax burden with some macro magick. We’ll go wild in macro-land when we talk about discriminated unions.

A simple scheme to implement Design by Contract in C++

Recently I got interested in C++ again. The new lambda functions in C++ 11 open up a world of opportunities for C++ programmers. I’ll talk more about how you can write functional code in C++ 11 in upcoming posts. For now let’s look at design by contract.

Design by contract is a development style promoted by  Bertrand Meyer and it is implemented in his own Eiffel programming language. At core, it advocates using preconditions, postconditions and invariants.

An invariant is an assertion that always holds true for a class after the class has been fully constructed and if the code is not executing inside a method. As a user of the class, you always observe the invariant to be true. As an implementer of the class, you can be assured that the invariant is true before a method is entered and you need to make the invariant true again by the time your method exits.

A preconditions is an assertion that needs to hold true at the start of a function, for the postcondition to be true at the end of it. Taken together, invariant, precondition and postcondition define the contract between the implementer and the user of a class.

Code for this post is here and here. Thanks to Andy Sawyer, Steve Bower and Ganesh Sittampalam for reviewing my code and suggesting improvements.

Preconditions are simple and everyone uses them. They are those little if statements that you put at the start of your functions to make sure that the caller has given you the right parameters.

double divide(double x, double y) {
    if(y == 0) throw new exception(“y cannot be 0”);
}

These little ‘if’ statements don’t really make the precondition stand out. They can be confused with other, unrelated, ‘if’ statements that do completely different semantic things. A more readable alternative is:

double divide(double x, double y) {
    requires(y != 0);
    
}

Not an impressive difference, for sure, but kind of nice. The evil macro looks like this:

#ifndef ___PRECOND
#define requires(F) {if((!(F))) throw preexception(__FILE__, __LINE__,"Pre-condition failure: " #F);};
#else
#define requires(F)
#endif

Note that the exception maintains information not just about the file and line number of the failure, but also a textual representation of the failed condition. Such things you can do with macro magick.

Postconditions are trickier. In the case of a side-effect free (pure) function, a postcondition asserts something of interest about the return value. In the case of a class, it asserts something of interest about the state of the class before and after the execution of the method.

Let’s start with a pure function. I like to have all my assertion at the start of the function to allow reasoning about it without looking at implementation details. But that poses the problem that the result is available just at the end of the function.  My solution is to enforce this idiom:

double divide(double x, double y) {
    double result;
    requires(y != 0);
    ensures(result < x); // Silly, just to falsify it in tests
return result; }

So you need to declare your result upfront. That is the biggest limitation of the overall solution in my opinion.  If that is acceptable to you, the trick now is how to execute the postcondition test before the method exits. We can do that by storing a lambda and executing it in the destructor:

typedef std::function<bool ()> ___dbcLambda;

class ___post {
public:
    ___post(const char *file, long line, const char *expr, const ___dbcLambda& postF)
        : _f(postF),
          _file(file),
          _line(line),
          _expr(expr)
    {}

    ~___post()
    {
        if( !std::uncaught_exception() && !_f() )
        {
            throw postexception(_file,_line,_expr);
        }
    }

private:
    const ___dbcLambda _f;
    const char * const _file;
    const long _line;
    const char * const _expr;
};

You might think that you shouldn’t throw exceptions in a destructor. That is something I never understood about the RAII pattern in C++. If I choose to use exceptions as my error notification method, how am I supposed to get notified if there is a problem releasing a resource in RAII, other than by throwing an exception in the destructor?

Maybe because of this, the standard has an uncaught_exception() function that allows you to check if an exception has been thrown, so that you don’t throw another one during stack unwinding. If you really don’t like throwing in the destructor, feel free to assert.

You might be worried about performance, but you really shouldn’t as you can disable all these macros in Release.

The macro then creates a ___post class on the stack.

#define ensures(F) \
    int ___UNIQUE_LINE = __LINE__;  \
    auto ___UNIQUE_POST = ___post( __FILE__, __LINE__, "Post-condition failure:" #F, [&](){return (F);});

The UNIQUE stuff is messy business. Part of it is by design and it is used to make sure that each __post variable has a unique name to have multiple ‘ensures’ in a function. The other part is a workaround for this msvc bug. Let me know if you want more details. I suspect there is a better way to do it.

Here is the full enchilada …

#define ___MERGE(a, b) a##b

#define ___POST(a) ___MERGE(___postcond,a)
#define ___UNIQUE_POST ___POST(__LINE__)

#define ___LINE(a) ___MERGE(___line, a)
#define ___UNIQUE_LINE ___LINE(__LINE__)

The case in which a postcondition is used inside a method of a class is even trickier because the postcondition must be able to compare the state of the class at the entrance of the method to the state of the class at its exit. Assuming a Counter object with an Add method and assuming ‘___pre’ captures the state of the counter at the start of the method, you’d like to write something like:

    void Add(int x) {
        ensuresClass(this->c_ == ___pre.c_ + x);
}

Now, this is tricky. The only way to capture the ‘old’ state in ‘___pre’ is by making a copy of it and store it there. This is what the code below does:

#define ensuresClass(F) \
    auto ___pre(*this); \
    auto ___UNIQUE_POST = ___post( __FILE__, __LINE__, "Post-condition failure: " #F, [&](){return (F);});

More troubling is the possibility that the class doesn’t have a copy constructor. In that case you explicitly need to associate a value with ‘___pre2’ by passing it as the first parameter to the appropriate macro as in the code below:

    void Add(int x) {
        ensuresClass2(this->c_, c_ == ___pre2 + x);
    }

Which is implemented as follows:

#define ensuresClass2(ASS,F) \
    auto ___pre2(ASS); \
    auto ___UNIQUE_POST = ___post( __FILE__, __LINE__, "Post-condition failure: " #ASS " is ___pre2 in " #F, [&](){return (F);});

And I know about the giant ass …

Now for invariants. The user should implement an isValid() method on his class as below:

    bool isValid() { return c_ >= 0;}

Then he should add an ‘invariant()’ call at the start of each method, at the end of each constructor and at the start of each destructor:

    void Add(int x) {
        invariant();
        requires(x < 10);
        ensures(this->c_ == ___pre.c_ + x);
}

This calls the ‘isValid’ function at the start of the method and at the end of it using the same destructor trick:

#define invariant() \
    if(!(this->isValid())) throw preexception(__FILE__, __LINE__,"Invariant failure"); \
    auto ___UNIQUE_INV = ___post( __FILE__, __LINE__, "Invariant failure", [&](){return this->isValid();});

All the above machinery is not at all equivalent to having such constructs in the language, but it is simple enough and with a decent enough syntax to be interesting.

Now a caveat: I have no idea if any of this works. It does work in my examples and its behaviour seems reasonably safe to me, but I haven’t tried it out on any big codebase and haven’t stressed it enough for me to be confident recommending its usage. So, use it at your own risk, let me know how it goes.