euler !! 2

Same problem set, different day.


Problem 3:
What is the largest prime factor of the number 600851475143?

Like those before it, this problem can easily be solved with a list comprehension (though in this case we need a way of generating prime numbers). Just examine all the prime divisors less than the square root of 600851475143, and pick out the largest one:

problem3 = last [ x | x <- takeWhile (<= (floor $ sqrt 600851475143)) primes, 600851475143 `mod` x == 0]

We have to use floor here because sqrt doesn't return an integer. Note the $ syntax we use for applying a function to the output of another function. There is also ., which composes two functions. The distinction between them is a fine one, but it's important. Essentially, $ is just "syntactic sugar" that replaces a pair of parentheses, so floor $ sqrt x is equivalent to floor (sqrt x). Function composition is another matter: the output of one function is passed directly to the input of another. This means that their data types must match. And the . operator is not merely syntactic sugar, it is a real function that returns a new function.
At any rate, this solution works, but it is lacking. First of all, we didn't discuss a method of generating primes. But more importantly, it's just too slow. The square root is 100 times larger than the largest prime divisor, so we are wasting a lot of effort.
There is a faster approach, but it's a bit more complicated. The concept is that we can save a lot of time by stopping once our list of divisors multiplies up to the original number. We'll reuse the same list comprehension from before, and apply a scanl1 (*) to it:

takeWhile (/= 600851475143) [ x | x <- primes, 600851475143 `mod` x == 0]

Here we have to use /= ("not equal to") because the list comprehension won't generate values greater than 600851475143 anyway. scanl1 is the same as scanl, except that it takes its initial value from the head of the list instead of as an argument. This produces the list [71,59569,87625999]. What is the significance of this list? It's the result of multiplying together the factors of our number, from smallest to largest. So we can assume that the largest divisor of our number is the original number divided by the last item in this list. The final result:

problem3 = n `div` (last $ takeWhile (/= n) [ x | x <- primes, n `mod` x == 0]))
    where n = 600851475143

Here we introduce where, a useful construct that allows us to define local variables. It can be placed at the end of the line or on a new line with an indent. If you don't follow these rules, the compiler will throw an error. This is because Haskell, unlike Perl or C++, is a "2D" language. The compiler can't just check for semicolons to determine where a line ends, so there are formatting rules that must be followed instead.
This version runs much faster, but you may have noticed that it cheats a bit. Since it only captures each divisor once, this method will fail on numbers whose prime factorization contains any repeats (e.g. 12 = 2*2*3). This isn't the case for our number, but it doesn't feel right using this solution.
Before we construct a real solution, I suppose I should discuss a method of generating primes. Unsurprisingly, there are quite a few. In general, their efficiency is directly proportional to their complexity. We'll use a simple implementation here because it's fast enough for our purposes:

primes = sieve [2..] where
    sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]

This is a recursive approach, but there is no base case! How can that be? The reason we can get away with this is Haskell's lazy nature. Each iteration adds values to the list of primes, so the algorithm will just keep running until it generates as many primes as you asked for. Actually this makes a lot of sense: since we are generating an infinite list, it wouldn't make any sense for our generating function to terminate!
As you may have guessed from the function name, this is an implementation of the Sieve of Eratosthenes (actually, not quite. See this paper if you are interested in a genuine sieve algorithm). Each time we find a new prime, we can eliminate any multiples of that prime from our list of numbers to check. To see how this works, imagine how this algorithm would calculate the third prime number, 5. We start with the list [2..]. sieve (p:xs) means that sieve takes a single argument (a list), but we treat it as a value p appended to the rest of the list xs. This allows us to refer to these values independently, and without having to use head. Thus our list is split into p = 2 and xs = [3..]. We then state that the return value (our list of primes) is defined as p appended to a new sieve of some list comprehension of xs. We know that what p is, so we can safely say that primes!!0 == 2. The list comprehension simply filters out numbers that are divisible by p, so in this case, the result of the list comprehension is [3,5..]. We then run sieve on this list, and take its first value as our new p. Now we have identified the second prime as 3, and we run a new list comprehension to eliminate the multiples of 3. We don't need to worry about removing multiples of 2, because the list that was passed in already had the multiples of 2 taken out! After one more iteration, we have our answer.
Now we will construct a non-cheating solution, using the same approach we used with Perl. We simply divide our number by primes of increasing magnitude, and stop when the prime number we are dividing by exceeds the square root of the number. This lends itself to three cases:

   1. The current prime exceeds the square root of the number. This will be the last prime in our list, so this is the base case.
   2. The current prime divides the number. Add the prime to the list of divisors, divide the number by the prime, and calculate the factors of the result.
   3. The current prime does not divide the number. Try the next prime.

We can implement this algorithm almost verbatim in Haskell using a feature called guards. Guards are similar to the cases we used earlier, when we defined the Fibonacci sequence. Each guard defines a condition that the function input is checked against, and separate function definitions are used depending on which condition matches (note that conditions are checked in order, top to bottom). Here's the implementation:

problem3 = factor 600851475143 primes
    where
        factor n (p:ps) 
            | p * p > n      = [n]
            | n `mod` p == 0 = p : factor (n `div` p) (p:ps)
            | otherwise      = factor n ps

It's a bit lengthy, but it's (relatively) easy to understand and it does the job well. Though I enjoyed dealing with obfuscation in Perl, I find Haskell difficult enough to understand when it's written as plainly as this!

Leave a Reply

Theme: Esquire by Matthew Buchanan.