Downloading stock prices in F# – Part V – Adjusting historical data

Other parts:



Here is the problem. When you download prices/divs/splits from Yahoo you get a strange mix of historical numbers and adjusted numbers. To be more precise, the dividends are historically adjusted. The prices are not adjusted, but there is one last column in the data for “Adjusted close”. If you don’t know what ‘adjusted’ means in this context read here.


The problem with using the ‘adjusted close’ column is that, for a particular date in the past, ‘adjusted close’ changes whenever the company pays a dividend or splits its stock. So if I retrieve the value on two different days I might get different numbers because, in the meantime, the company paid a dividend. This prevents me from storing a subset of the data locally and then retrieving other subsets later on. It also has the limitation that just the closing price is present while I might need adjusted opening price, adjusted high price or even adjusted volume depending on the operations I want to perform on the data (i.e. calculating oscillators or volume-adjusted moving averages).


The solution I came up with is to download the data and transform it to an ‘asHappened’ state. This state is simply an unadjusted version of what happened in the past. Data in this state is not going to change in the future, which means that I can safely store it locally. I can then on demand produce ‘historically adjusted’ data whenever I need to.


Ok, to the code. As it often happens, I need some auxiliary functions before I get to the core of the algorithms. The first one is a way to compare two observations, I will use it later on to sort a list of observations.

let compareObservations obs1 obs2 =
if obs1.Date <> obs2.Date then obs2.Date.CompareTo(obs1.Date)
else
match
obs1.Event, obs2.Event with
| Price _, Price _ | Div _, Div _ | Split _, Split _
-> failwith “Two same date/ same kind observations”
| Price _, _ -> -1
| _, Price _ -> 1
| _ -> 0

This is rather simple. If the dates of these observations are different, just compare them. If they are the same then the two observations cannot be of the same type (i.e. I cannot have two prices for a particular date). Given that they are not of the same, then &(&^%!#$!4. Crap, that teaches me to put comments in my code! I think I’m putting the price information first, but I’m not sure. Anyhow my universal excuse not to figuring it out is that the testcases succeed so I must be doing it right (how lame, testcase-addiction I guess …).


The next auxiliary function is just a wrapper over fold. I always tend to wrap fold calls in a method with a better name because I remember the old times when I didn’t know what fold was. I want a reader of my code to be able to understand it even if they are not familiar with fold (the ‘universal functional Swiss-Army-Knife). This function is a map that needs to know the value of an accumulator to correctly perform its mapping over each element.

let mapAcc acc newAccF newItemF inl =
let foldF (acc, l) x = newAccF acc x, (newItemF acc x)::l
let _, out = inl |> List.fold_left foldF (acc, [])
out

Apart from the implementation details, this function takes an accumulator, an accumulator function, an item function and an input list. For each element in the list it calculates two things:



  1. a new value for the accumulator: newAccumulatorValue = newAccF oldAccValue itemValue

  2. a new value for the item: new ItemValue = newItemF accValue oldItemValue

Maybe there is a standard functional way to do such a thing with a specific name that I’m not aware of. Luke might know. He is my resident fold expert.


All right, now to he main algorithm.

let asHappened splitFactor observations =
let newSplitFactor splitFactor obs =
match obs.Event with
| Split(factor) -> splitFactor * factor
| _ -> splitFactor
let newObs splitFactor obs =
let date = obs.Date
let event = match obs.Event with
| Price(p) -> Price(p)
| Div(amount) -> Div(amount * splitFactor)
| Split(factor) -> Split(factor)
{Date = date; Event = event}
observations
|> List.sort compareObservations
|> mapAcc splitFactor newSplitFactor newObs

To understand what’s going on start from the bottom. I’m taking the observation list downloaded from Yahoo and sorting it using my compareObservations function. I then take the resulting list and apply the previously described mapAcc to it. For this function splitFactor is the accumulator, newSplitFactor is the accumulator function and newObs is the function that generate a new value for each item in the list.


NewSplitFactor is trivial: every time it sees a Split observation it updates the value of the split factor. That’s it. NewObs is rather simple as well. Every time it sees a dividend, it ‘unadjust’ it by multiplying its amount by the split factor. The end result is to transform the dividends downloaded from Yahoo (which are adjusted) to an unadjusted state. I could have filtered out the price observations before doing all of this and add them back afterward, but didn’t. It’d probably be slower …


Now that I can recreate the state of the world as it was at a particular point in time, what if I want to adjust the data? I can call adjusted

let adjusted (splitFactor, lastDiv, oFact, hFact, lFact, cFact, vFact)
asHappenedObs =
let newFactor (splitFactor, lastDiv, oFact, hFact, lFact, cFact, vFact) obs =
match obs.Event with
| Split(split) ->
splitFactor * split, lastDiv, oFact, hFact, lFact, cFact, vFact
| Div(div) -> splitFactor, div, oFact, hFact, lFact, cFact, vFact
| Price(p) ->
splitFactor, 0.<money>, oFact / (1. – lastDiv / p.Open),
hFact / (1. – lastDiv / p.High), lFact / (1. – lastDiv / p.Low),
cFact / (1. – lastDiv / p.Close), vFact / (1. – lastDiv / p.Close)
let newObs (splitFactor, lastDiv, oFact, hFact, lFact, cFact, vFact) obs =
let date = obs.Date
let event = match obs.Event with
| Price(p) ->
Price({Open = p.Open / splitFactor / oFact;
High = p.High / splitFactor / hFact;
Low = p.Low / splitFactor / lFact;
Close = p.Close / splitFactor / cFact;
Volume = p.Volume / splitFactor / vFact })
| Div(amount) -> Div (amount / splitFactor)
| Split(split) -> Split(split)
{Date = date; Event = event}
asHappenedObs
|> List.sort compareObservations
|> mapAcc (splitFactor, lastDiv, oFact, hFact, lFact, cFact, vFact)
newFactor newObs
|> List.filter (fun x -> match x.Event with Split(_) -> false | _ -> true)

Wow, ok, this looks messy. Let’s go through it. Starting from the bottom: sort the observations, perform the right algorithm and filter away all the splits. It doesn’t make sense to have splits in adjusted data.


The interesting piece is the mappAcc function. It take a tuple of factors as accumulator and the usual two functions to update such tuple and create new observations. The newObs function creates a new Observation using the factors in the accumulator tuple. Notice how the dividends are divided by the splitFactor (which is the opposite of our asHappened algorithm where we were multiplying them). Also notice how the prices are divided by both the splitFactor and the pertinent price factor. This is needed because the prices need to be adjusted by the dividends paid out and the adjustment factor is different for each kind of price (i.e. open, close, etc…). The newFactor function simply updates all the factors depending on the current observation.


Notice how asHappened and adjusted are structurally similar. This is an artifact of having a functional approach to writing code: it kind of forces you to identify these commonality in the way an algorithm behave and abstract them out (in this case in the mapAcc function). You often discover that such abstracted-out pieces are more generally useful than the case at hand.

Advertisements

Downloading stock prices in F# – Part IV – Async loader for splits

Other parts:



Downloading splits is a messy affair. The problem is that Yahoo doesn’t give you  a nice comma-delimitated stream to work with. You have to parse the Html yourself (and it can be on multiple pages). At the end of the post, the overall result is kind of neat, but to get there we need a lot of busywork.


First, let’s define a function that constructs the correct URL to download splits from. Notice that you need to pass a page number to it.

let splitUrl ticker span page =
“http://finance.yahoo.com/q/hp?s=” + ticker + “&a=”
+ (span.Start.Month – 1).ToString() + “&b=” + span.Start.Day.ToString() + “&c=”
+ span.Start.Year.ToString() + “&d=” + (span.End.Month – 1).ToString() + “&e=”
+ span.End.Day.ToString() + “&f=” + span.End.Year.ToString() + “&g=v&z=66&y=”
+ (66 * page).ToString();

The reason for this particular url format (i.e. 66 * page) is completely unknown to me. I also have the feeling that it might change in the future. Or maybe not given how many people rely on it.


I then describe the driver function for loading splits:

let rec loadWebSplitAsync ticker span page splits =
let parseSplit text splits =
List.append splits (parseSplits (scrapHtmlRows text)),
not(containsDivsOrSplits (scrapHtmlCells text))
async {
let url = splitUrl ticker span page
let! text = loadWebStringAsync url
let splits, beyondLastPage = parseSplit text splits
if beyondLastPage then return splits else
return!
loadWebSplitAsync ticker span (page + 1) splits }

This is a bit convoluted (it is an Async recursive function). Let’s go through it in some detail. First there is a nested function parseSplit. It takes an html string and a list of observations and returns a tuple of two elements. The first element is the same list of observations augmented with the splits found in the text. The second element is a boolean that is true if we have navigated beyond the last page for the splits.


The function to test that we are beyond the last page is the following:

let containsDivsOrSplits cells =
cells |> Seq.exists
(fun (x:string) -> Regex.IsMatch(x, @”$.+Dividend”, RegexOptions.Multiline)
|| Regex.IsMatch(x, “Stock Split”))

This function just checks if the words “Stock Split” or “Dividend” are anywhere in the table. If they aren’t, then we have finished processing the pages for this particular ticker and date span.


The function to extract the splits observations from the web page takes some cells (a seq<seq<string>>) as input and returns an observation list. It is reproduced below:

let parseSplits rows =
let parseRow row =
if row |> Seq.exists (fun (x:string) -> x.Contains(“Stock Split”))
then
let
dateS = Seq.hd row
let splitS = Seq.nth 1 row
let date = DateTime.Parse(dateS)
let regex = Regex.Match(splitS,@”(d+)s+:s+(d+)s+Stock Split”,
RegexOptions.Multiline)
let newShares = shares (float (regex.Groups.Item(1).Value))
let oldShares = shares (float (regex.Groups.Item(2).Value))
Some({Date = date; Event = Split(newShares / oldShares)})
else None
rows |> Seq.choose parseRow |> Seq.to_list

It just take a bunch of rows and choose the ones that contain stock split information. For these, it parses the information out of the text and creates a Split Observation out of it. I think it is intuitive what the various Seq functions do in this case. Also note my overall addiction to the pipe operator ( |> ). In my opinion this is the third most important keyword in F# (after ‘let’ and ‘match’).


Let’s now go back to the loadWebSplitAsync function and discuss the rest of it. In particular this part:

async {
let url = splitUrl ticker span page
let! text = loadWebStringAsync url
let splits, beyondLastPage = parseSplit text splits
if beyondLastPage then return splits else
return!
loadWebSplitAsync ticker span (page + 1) splits }

First of all it is an Async function. You should expect some Async stuff to go on inside it. And indeed, after forming the URL in the first line, the very next line is a call to loadWebStringAsync. We discussed this one in the previous installment. It just asynchronously loads a string from an URL. Notice the bang after ‘let’. This is your giveaway that async stuff is being performed.


The result of the async request is parsed to extract splits. Also, the beyondLastPage flag is set if we have finished our work. If we have, we return the split observation list; if we haven’t, we do it again incrementing the page number to load the html text from.


Now that we have all the pieces in places, we can wrap up the split loading stuff inside this facade function:

let loadSplitsAsync ticker span = loadWebSplitAsync ticker span 0 []

And finally put together the results of this post and the previous one with the overall function-to-rule-them-all:

let loadTickerAsync ticker span =
async {
let prices = loadPricesAsync ticker span
let divs = loadDivsAsync ticker span
let splits = loadSplitsAsync ticker span
let! prices, divs, splits = Async.Parallel3 (prices, divs, splits)
return prices |> List.append divs |> List.append splits
}

All right, that was a lot of work to get to this simple thing. This is a good entry point to our price/divs/split loading framework. It has the right inputs and outputs: it takes a ticker and a date span and returns an Async of a list of observations. Our caller can decide when he wants to execute the returned Async object.


Notice that in the body of the function I call Async.Parallel. This is debatable. A more flexible solution is to return a tuple containing three Asyncs (prices, divs, splits) and let the caller decide how to put them together. I decided against this for simplicity reasons. This kind of trade-off is very common in Async programming: giving maximum flexibility to your caller against exposing something more understandable.


I have to admit I didn’t enjoy much writing (and describing) all this boilerplate code. I’m sure it can be written in a better way. I might rewrite plenty of it if I discover bugs. I kind of like the end result though. loadTickerAsync has an overall structure I’m pretty happy with.


Next post,  some algorithms with our observations …

Downloading stock prices in F# – Part III – Async loader for prices and divs

Other parts:



It is now time to load our data. There is a bit of uninteresting code to start with, but things get interesting afterward. Let’s start with functions that create the right URLs to download prices and dividends. We’ll talk about splits in the next installment.

let commonUrl ticker span =
@”http://ichart.finance.yahoo.com/table.csv?s=” + ticker + “&a=”
+ (span.Start.Month – 1).ToString() + “&b=” + span.Start.Day.ToString() + “&c=”
+ span.Start.Year.ToString() + “&d=” + (span.End.Month – 1).ToString() + “&e=”
+ span.End.Day.ToString() + “&f=” + span.End.Year.ToString()
let priceUrl ticker span = commonUrl ticker span + “&g=d&ignore=.csv”
let divUrl ticker span = commonUrl ticker span + “&g=v&ignore=.csv”

We will also need to construct an observation given a comma delimitated line of text. Again, for spits things will be harder.

let parsePrice (line: string) =
let tokens = line.Split([|‘,’|])
{ Date = DateTime.Parse(tokens.[0]);
Event = Price ({Open = money (Double.Parse(tokens.[1])) ;
High = money (Double.Parse(tokens.[2]));
Low = money (Double.Parse(tokens.[3])); Close = money (Double.Parse(tokens.[4]));
Volume = volume (Double.Parse(tokens.[5]))})}
let parseDiv (line: string) =
let tokens = line.Split([|‘,’|])
let date = DateTime.Parse(tokens.[0])
let amount = money (Double.Parse(tokens.[1]))
{Date = date; Event = Div amount}

Nothing noteworthy about this code. We have a couple of other ‘infrastructure pieces before we get to the Async pieces. The next function is recursive. It takes a StringReader and reads lines out of it. For each line it calls a parsing function that takes the line as input and returns an object as output. The function gathers all such objects in the listOfThings list. If you are new to F# the following construct (parseLineFunc line:: listOfThings) means: “execute the parseLineFunc with argument line, take the result and create a list that has the result as head and listOfThings as tail”).

let rec loadFromLineReader (reader:StringReader) listOfThings parseLineFunc =
match reader.ReadLine () with
| null -> listOfThings
| line -> loadFromLineReader reader (parseLineFunc line::listOfThings) parseLineFunc

The next function is rather uninteresting. It just converts a string to a StringReader, cut out the first line (header) and calls loadFromLineReader.

let loadFromLineString text listOfThings parseLineFunc =
let reader = new StringReader(text)
reader.ReadLine ()|> ignore // skip header
loadFromLineReader reader listOfThings parseLineFunc

We now come to the first Async function. But what is an Async function? There are several possible technically correct definition as: it is an instance of the monad pattern or it is a function that returns an Async object or it is a way to release your thread to the thread pool. These definition don’t help me much. I need something intuitive to latch one.


The way that I personally visualize it is: there are things in the world that are very good at executing certain tasks and like to be hit by multiple parallel requests for these tasks. They’d like me to give them their workload and get out of their way. They’ll call me when they are done with it. These ‘things’ are disk drives, web servers, processors, etc … Async is a way to say: hey, go and do this, call me when you are done.


Now, you can call the asynchronous APIs directly, or you can use the nice F# language structures to do it. Let’s do the latter.

let loadWebStringAsync url =
async {
let req = WebRequest.Create(url: string)
use! response = req.AsyncGetResponse()
use reader = new StreamReader(response.GetResponseStream())
return! reader.AsyncReadToEnd()}

This function retrieves a web page as a string asynchronously. Notice that even if the code looks rather normal, this function will likely be executed on three different thread. The first thread is the one the caller of the function lives on. The function AsyncGetResponse causes the thread to be returned to the thread pool waiting for a response back from the web server. Once such a response arrives, the execution resumes on a different thread until AsyncReadToEnd. That instruction returns the execution thread to the thread pool. A new thread is then instantiated when the string has been completely read. The good thing is that all of this is not explicitly managed by the programmer. The compiler ‘writes the code’ to make it all happen. You just have to follow a set of simple conventions (i.e. putting exclamation marks in the right place).


The return result of this function is an Async<string>, which is something that, when executed, returns a string. I cannot emphasize this enough: always look at the signature of your F# functions. Type inference can be tricky …


Async is somehow contagious. If you are calling an Async function you have to decide if propagate the “Asyncness” to your callers or remove it by executing the function. Often propagating it is the right thing to do as your callers might want to batch your function with other aync ones to be executed together in parallel. Your callers have more information than you do and you don’t want to short-circuit them. The following function propagates ayncness.

let loadFromUrlAsync url parseFunc =
async {
let! text = loadWebStringAsync url
return loadFromLineString text [] parseFunc}

Let’s see how the functions presented to this point compose to provide a way to load prices and dividends (splits will be shown afterward).

let loadPricesAsync ticker span = loadFromUrlAsync (priceUrl ticker span) parsePrice
let loadDivsAsync ticker span = loadFromUrlAsync (divUrl ticker span) parseDiv

This composition of functions is very common in functional code. You construct your building blocks and assemble them to achieve your final goal. Functional programming is good at almost forcing you to identify the primitive blocks in your code. All right, next in line is how to load splits.

Downloading stock prices in F# – Part II – Html scraping

Other parts:



Getting stock prices and dividends is relatively easy given that, on Yahoo, you can get the info as a CVS file. Getting the splits info is harder. You would think that Yahoo would put that info in the dividends CVS as it does when it displays it on screen, but it doesn’t. So I had to write code to scrap it from the multiple web pages where it might reside. In essence, I’m scraping this.


html.fs


In this file there are utility functions that I will use later on to retrieve split info.

#light
open
System
open System.IO
open System.Text.RegularExpressions

// It assumes no table inside table …
let tableExpr = “<table[^>]*>(.*?)</table>”
let headerExpr = “<th[^>]*>(.*?)</th>”
let rowExpr = “<tr[^>]*>(.*?)</tr>”
let colExpr = “<td[^>]*>(.*?)</td>”
let regexOptions = RegexOptions.Multiline ||| RegexOptions.Singleline
||| RegexOptions.IgnoreCase


This code is straightforward enough (if you know what Regex does). I’m sure that there are better expression to scrap tables and rows on the web, but these work in my case. I really don’t need to scrape tables. I put the table expression there in case you need it.


I then write code to scrape all the cells in a piece of html:

let scrapHtmlCells html =
seq { for x in Regex.Matches(html, colExpr, regexOptions) -> x.Groups.Item(1).ToString()}

This is a sequence expression. Sequence expressions are used to generate sequences starting from some expression (as the name hints to). In this case Regex.Matches returns a MatchClollection, which is a non-generic IEnumerable. For each element in it, we return the value of the first match. We could as easily have constructed a list or an array, given that there is not much deferred computation going on. But oh well …


Always check the type of your functions in F#! With type inference it is easy to get it wrong. Hovering your mouse on top of it in VS shows it. This one is typed: string -> seq<string>. It takes a string (html) and return a sequence of strings (the cells in html).


We’ll need rows as well.

let scrapHtmlRows html =
seq { for x in Regex.Matches(html, rowExpr, regexOptions) -> scrapHtmlCells x.Value }

This works about the same. I’m matching all the rows and retrieving the cells for each one of them. I’m getting back a matrix-like structure, that is to say that this function as type: string -> seq<seq<string>>.


That’s all for today. In the next installment we’ll make it happen.