Bayesian inference in F# – Part IIa – A simple example – modeling Maia

Other parts:

Let’s start with a simple example: inferring the underlying attitude of a small baby by observing her actions. Let’s call this particular small baby Maia. People always asks her father if she is a ‘good’ baby or not. Her father started to wonder how he can possibly know that. Being ‘good’ is not very clear, so he chooses to answer the related question if her attitude is generally happy, unhappy or simply quiet (a kind of middle ground).

/// Underlying unobservable, but assumed stationary, state of the process (baby). Theta.
type Attitude =
| Happy
| UnHappy
| Quiet

Her poor father doesn’t have much to go with. He can just observe what she does. He decides, for the sake of simplifying things, to categorize her state at each particular moment as smiling, crying or looking silly (a kind of middle ground).

/// Observable data. y.
type Action =
| Smile
| Cry
| LookSilly


The father now has to decide what does it mean for Maia to be of an happy attitude. Lacking an universal definition of happiness in terms of these actions, he makes one up. Maia would be considered happy if she smiles 60% of the times, she cries 20% of the times and looks silly the remaining 20% of the times. He might as well have experimented with “clearly happy/unhappy” babies to come up with those numbers.

/// Data to model the underlying process (baby)
let happyActions = [ Smile, 0.6; Cry, 0.2; LookSilly, 0.2]
let unHappyActions = [Smile, 0.2; Cry, 0.6; LookSilly, 0.2]
let quietActions = [Smile, 0.4; Cry, 0.3; LookSilly, 0.3]

What does it mean exactly? Well, this father would call his wife at random times during the day and ask her if Maia is smiling, crying or looking silly. He would then keep track of the numbers and then “somehow” decide what her attitude is. The general idea is simple, the “somehow” part is not.

/// Generates a new uniformly distributed number between 0 and 1
let random =
let rnd = new System.Random()

We can now model Maia. We want our model to return a particular action depending on which attitude we assume Maia is in mostly. For example, if we assume she is an happy baby, we want our model to return Smile about 60% of the times. In essence, we want to model what happens when the (poor) father calls his (even poorer) wife. What would his wife tell him (assuming a particular attitude)? The general idea is expressed by the following:

/// Process (baby) modeling. How she acts if she is fundamentally happy, unhappy or quiet
let MaiaSampleDistribution attitude =
match attitude with
| Happy -> pickOne happyActions
| UnHappy -> pickOne unHappyActions
| Quiet -> pickOne quietActions

The ‘pickOne’ function simply picks an action depending on the probability of it being picked. The name sample distribution is statistic-lingo to mean ‘what you observe’ and indeed you just can observe Maia’s actions, not her underlying attitude.

The implementation of pickOne gets technical. You don’t need to understand it to understand the rest of this post. This is the beauty of encapsulation. You can start reading from after the next code snippet if you want to.

‘pickOne’ works by constructing the inverse cumulative distribution function for the probability distribution described by the Happy/UnHappy/Quiet/Actions lists. There is an entry on wikipedia that describes how this works and I don’t wish to say more here except presenting the code.

/// Find the first value more or equal to a key in a seq<‘a * ‘b>.
/// The seq is assumed to be sorted
let findByKey key aSeq =
aSeq |> Seq.find (fun (k, _) -> k >= key) |> snd

/// Simulate an inverse CDF given values and probabilities
let buildInvCdf valueProbs =
let cdfValues =
|> Seq.scan (fun cd (_, p) -> cd + p) 0.
|> Seq.skip 1
let cdf =
|> fst
|> cdfValues
|> Seq.cache
fun x -> cdf |> findByKey x

/// Picks an ‘a in a seq<‘a * float> using float as the probability to pick a particular ‘a
let pickOne probs =
let rnd = random ()
let picker = buildInvCdf probs
picker rnd

Another way to describe Maia is more mathematically convenient and will be used in the rest of the post. This second model answers the question: what is the probability of observing an action assuming a particular attitude? The distribution of both actions and attitudes (observable variable and parameter) is called joint probability.

/// Another, mathematically more convenient, way to model the process (baby)
let MaiaJointProb attitude action =
match attitude with
| Happy -> happyActions |> List.assoc action
| UnHappy -> unHappyActions |> List.assoc action
| Quiet -> quietActions |> List.assoc action

“List.assoc” returns the value associated with a key in a list containing (key, value) pairs. Notice that in general, if you are observing a process, you don’t know what its joint distribution is. But you can approximate it by running the MaiaSampleDistribution function on known babies many times and keeping track of the result. So, in theory, if you have a way to experiment with many babies with known attitudes, you can create such a joint distribution.

We now have modeled our problem, this is the creative part. From now on, it is just execution. We’ll get to that.

Expression tree serialization code posted on Code Gallery

Luke and I worked on this last year for one week doing pair programming. It is a good sample of how you can serialize LINQ expression trees to xml.

The sample includes these components:

  1. An Expression Tree serialization API: A general purpose XML serialization of Expression Trees. This should work over any expression tree – though there are inevitably bugs. The serialization format is fairly crude, but has been expressive enough to support the variety of expression trees I’ve tried throwing at it.
  2. A wrapper for serializing/deserializing LINQ to SQL queries: A wrapper around the expression serializer allows serializing LINQ to SQL queries and de-serializing into a query against a given DataContext.
  3. A WCF service which accepts serialized query expression trees and executes against a back-end LINQ to SQL: To enable querying across tiers, a WCF service exposes service methods which execute serialized queries. The service implementation deserializes the queries against its LINQ to SQL connection.
  4. An IQueryable implementation wrapping the client side of the WCF service: The client-side calling syntax is simplified by providing an IQueryable implementation. This implementation, RemoteTable, executes queries by serializing the query expression tree and calling the appropriate service. The object model that the service user is able to query against is imported by the WCF service reference per the DataContracts on the LINQ to SQL mapping on the server side

The sample is here. Enjoy!

Bayesian inference in F# – Part I – Background

Other posts:

My interest in Bayesian inference comes from my dissatisfaction with ‘classical’ statistics. Whenever I want to know something, for example the probability that an unknown parameter is between two values, ‘classical’ statistics seems to answer a different and more convoluted question.

Try asking someone what “the 95% confidence interval for X is (x1, x2)” means. Very likely he will tell you that it means that there is a 95% probability that X lies between x1 and x2. That is not the case in classical statistics. It is the case in Bayesian statistics. Also all the funny business of defining a Null hypothesis for the sake of proving its falseness always made my head spin. You don’t need any of that in Bayesian statistics. More recently, my discovery that statistical significance is an harmful concept, instead of the bedrock of knowledge I always thought it to be, shook my confidence in ‘classical’ statistics even more.

Admittedly, I’m not that smart. If I have an hard time getting an intuitive understanding of something, it tends to go away from my mind after a couple of days I’ve learned it. This happens all the time with ‘classical’ statistics. I feel like I have learned the thing ten times, because I continuously forget it. This doesn’t happen with Bayesian statistics. It just makes intuitive sense.

At this point you might be wandering what ‘classical’ statistics is. I use the term classical, but I really shouldn’t. Classical statistics is normally just called ‘statistics’ and it is all you learn if you pick up whatever book on the topic (for example the otherwise excellent “Introduction to the Practice of Statistics“). Bayesian statistics is just a footnote in such books. This is a shame.

Bayesian statistics provides a much clearer and elegant framework for understanding the process of inferring knowledge from data. The underlying question that it answers is: “If I hold an opinion about something and I receive additional data on it, how should I rationally change my opinion?”. This question of how to update your knowledge is at the very foundation of human learning and progress in general (for example the scientific method is based on it). We better be sure that the way we answer it is sound.

You might wander how it is possible to go against something that is so widely accepted and taught everywhere as ‘classical’ statistics is. Well, very many things that most people believe are wrong. I always like to cite old Ben on this: “The fact that other people agree or disagree with you makes you neither right nor wrong. You will be right if your facts and your reasoning are correct.”. This little rule always served me well.

In this series of posts I will give examples of Bayesian statistics in F#. I am not a statistician, which makes me part of the very dangerous category of ‘people who are not statisticians but talk about statistics”. To try to mitigate the problem I enlisted the help of Ralf Herbrich, who is a statistician and can catch my most blatant errors. Obviously I’ll manage to hide my errors so cleverly that not even Ralf would spot them. In which case the fault is just mine.

In the next post we’ll look at some F# code to model the Bayesian inference process.