Tuesday, 13 October 2009

Scala Streams for iteration / Use streams to avoid TCO(?)


My last post Scala: C style iterate function for building lists got an amazing response from somebody called Johan. (Thanks Johan). It introduced me to the amazing power of lazy scala streams.

Here's what Johan said slightly edited (In Johan's comment, he used lazy_:: which is now called #:: in Scala 2.8):
   
[Your iterateFor] function is very similar to until (but with a reversed predicate) in Haskell: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v%3Auntil. Basically, it's iterate and takeWhile.

Unfortunately, the iterate functions in Scala's traversable collections are not polymorphic. (Maybe it's because it would only work for lazy collections.) But here's one for Stream:

def iterate[A](step: A => A)(seed: A): Stream[A] = seed #:: iterate(step)(step(seed))

Now it's easy to write iterateFor:

def iterateFor[A](predicate: A => Boolean)(step: A => A)(seed: A): Stream[A] = iterate(step)(seed) takeWhile predicate

or, all in one go:

def until[A](...) = if (!predicate(seed)) seed #:: until(predicate)(step)(step(seed)) else Stream.empty



Now the symbol #:: is a special lazy "cons" operator for Streams (used just like the List :: operator) which only gets evaluated on demand. The #:: operator can be used to define infinite lists as Johan has done with the iterate function.

Now before I get on to trying this stuff out, the way Johan defined the functions, you need to specify the type of [A] when you call them, for example iterate[Int].... But if you move the seed parameter from the last parameter to the first then you can get away with not defining A explicitly. So here are the functions reorganized to make them easier to use:


def iterate[A]
(seed: A)(step: A => A)
: Stream[A]
= seed #:: iterate(step(seed))(step)


def iterateFor[A] (seed: A)(predicate: A => Boolean)(step: A => A)
: Stream[A]
= iterate(seed)(step) takeWhile predicate

def until[A]
(seed: A)(predicate: A => Boolean)(step: A => A)
: Stream[A]
= if (!predicate(seed)) seed #:: until(step(seed))(predicate)(step) else Stream.empty



So what can I do with all this ****? Well first of all, iterate defines an infinite list, that is evaluated as needed so the following loop will iterate forever printing as it goes,

for (i <- iterate(0)(_ + 1)) {
println(i)
}

as will

iterate(0)(_ + 1).foreach(println)

Note that I said printing as it goes.

If you defined iterator using List instead of Stream replacing #:: with :: , then the loops above would get a StackOverflowError after munching stack on each iteration, but also, nothing would be printed, as the entire (infinite) List is calculated before anything is printed.

So what use is the iterate function using Streams? Well you can reinvent a convoluted for loop, by taking the first n of the infinite stream.

for (i <- iterate(0)(_ + 1) take 10 ) {
println(i)
}


Big deal, but more importantly its a building block: Johan went on to redefine my original iterateFor function (see the last post) using iterator and takeWhile.

def iterateFor[A] (seed: A)(predicate: A => Boolean)(step: A => A)
: Stream[A]
= iterate(seed)(step) takeWhile predicate


How simple is that!

As he also pointed out iterateFor can be defined without iterate like this:

def iterateFor[A](seed: A)(predicate: A => Boolean)(step: A => A) : Stream[A]
= if (predicate(seed)) seed #:: iterateFor(step(seed))(predicate)(step) else Stream.empty

except that he called that version "until" to match the haskell function. I don't like this name for two reasons in scala: (1) "until" is already used in "for (i <- 1 until 10)" and (2) by moving the seed parameter to the first parameter (as I explained above), until(0)(_ >= 10)(_ + 1) does not read well.

A Reminder About What This Is All For!

iterateFor is not for counting from 1 to 10! Use the normal Scala for (i <- 1 to 10) for that!

This is what I originally wanted it for:

val dates = iterateFor(today)(d => d >= _start && d >= limit)(d => d - 7.days)


The elephants in the room

1. Stack? These lazy stream versions of iterateFor DO NOT munch stack even though they do not look like they can be TCO'd. Amazing. I don't understand how it works, but it does. Maybe they munch alot more heap . Who knows.

2. Performance. I have not studied this in detail, but from what I have found with some simple (and probably flawed) tests, there is NO performance penalty. The use of streams in this way seems to be at least as fast as the TCOd recursive algorithms and the imperative while algorithm.




Post a Comment