60 Days of Euler in F# - Problem 14 | Bryan Slatner — Software Developer, Foodie, Cigar Lover

The Problem

The following iterative sequence is defined for the set of positive integers:
    *n → n/2* (*n* is even)
    *n → 3n + 1* (*n* is odd)
Using the rule above and starting with 13, we generate the following sequence:
    13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.

The Solution

I had some fun with this one.

I originally wrote the solution to this on an Alienware laptop with a 2GHz processor and 8 cores. By default, most languages use only one of those cores at a time, so most of my solutions were running on a single core. Most of the time, that’s good enough.

My original solution, however, took long enough to execute that I had enough time to go inside and make a sandwich.

My current laptop has a 3.8Ghz processor and solves the problem in about 3.5 seconds. There has to be some other difference in play here, but I’ll be damned if I can figure out what it is.

Here’s the “purely functional” solution.

let isEven i = i &&& 1L = 0L
let (|Even|Odd|) i = if isEven i then Even else Odd

let next n =
    match n with
    | Even -> n/2L
    | Odd -> 3L*n+1L

let collatz n = 
    let rec c n length = 
        if n = 1L then length
        else c (next n) (length+1L)
    c n 1L 

#time
seq { 1L..999999L } |> Seq.maxBy collatz
#time

We start with a helper function, isEven and an active pattern matching function that will return Even or Odd for a given number.

The next function finds the next number in the sequence based on the rules in the problem definition.

The collatz function uses a tail-recursive function to generate the length of the sequence.

Back when this was taking forever to execute, I tried three more solutions to the problem. I’ll outline them here as a learning experience.

Attempt #2

Since the functional approach wasn’t working for me, I tried a solution that uses mutables.

let collatzWithMutables start = 
    let mutable count = 1L
    let mutable current = start
 
    while current > 1L do
        current <- next current
        count <- count + 1L
    count

#time
seq { 1L..999999L } |> Seq.maxBy collatzLength
#time

That executes in 3.3 seconds on my machine. I’m not happy about a 5.7% reduction in speed. I think we can do better.

Attempt #3

The length of the sequence starting with n is constant. I figured that it was highly likely I would be encountering the same values over and over. If you encounter an n for which you’ve already computed the length, it’s not necessary to compute it again. Just add what you have now to the already-computed length for n.

Let’s try that:

let cache = System.Collections.Generic.Dictionary<int64,int64>()

let collatzWithMutablesCached start =
    let mutable count = 1L
    let mutable current = start

    while current > 1L do
        match cache.TryGetValue current with
        | true,cachedCount ->
            count <- count + cachedCount
            current <- 1L
        | _ ->
            current <- next current
            count <- count + 1L

    cache.Add(start,count)
    count

#time
seq { 1L..999999L} |> Seq.maxBy collatzWithMutablesCached
#time

That brings our time down to 0.7 seconds. I’m totally happy with that. That’s an 80% reduction.

I went back and added some code to count the number of cache hits and cache misses. There are 999,997 cache hits vs. 5,226,259 misses. So the cache hits only 16% of the time. However, those hits eliminate a whopping 150,522,402 calls to next which is why this is so much faster.

Attempt #4

I’m still feeling experimental. There’s one more approach I’d like to try.

I have 8 cores. Let’s use all of them!

let collatzAsync startNum endNum = async {
    return 
        seq { startNum..endNum }
        |> Seq.map (fun i -> i,collatzWithMutables i)
        |> Seq.maxBy snd
}

let ranges = 
	[ 
	1L,125000L
	125001L,250000L
	250001L,375000L
	375001L,500000L
	500001L,625000L
	625001L,750000L
	750001L,875000L
	875001L,999999L
	]

#time
ranges
|> Seq.map (fun (s,e) -> collatzAsync s e)
|> Async.Parallel
|> Async.RunSynchronously
|> Seq.maxBy snd
|> fst
#time

I broke the problem up into 8 chunks of 125000 each. collatzAsync is an asynchronous function that works out the answer to the chunk using the collatzWithMutables function we used before.

The main body executes all those chunks in parallel and then returns the biggest answer found.

This runs in 0.9 seconds on my machine. Not as fast as the cached solution, but quite fun to write.