Tuesday 13 October 2009

Scala Streams for iteration / Use streams to avoid TCO(?)


My last post Scala: C style iterate function for building lists got an amazing response from somebody called Johan. (Thanks Johan). It introduced me to the amazing power of lazy scala streams.

Here's what Johan said slightly edited (In Johan's comment, he used lazy_:: which is now called #:: in Scala 2.8):
   
[Your iterateFor] function is very similar to until (but with a reversed predicate) in Haskell: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v%3Auntil. Basically, it's iterate and takeWhile.

Unfortunately, the iterate functions in Scala's traversable collections are not polymorphic. (Maybe it's because it would only work for lazy collections.) But here's one for Stream:

def iterate[A](step: A => A)(seed: A): Stream[A] = seed #:: iterate(step)(step(seed))

Now it's easy to write iterateFor:

def iterateFor[A](predicate: A => Boolean)(step: A => A)(seed: A): Stream[A] = iterate(step)(seed) takeWhile predicate

or, all in one go:

def until[A](...) = if (!predicate(seed)) seed #:: until(predicate)(step)(step(seed)) else Stream.empty



Now the symbol #:: is a special lazy "cons" operator for Streams (used just like the List :: operator) which only gets evaluated on demand. The #:: operator can be used to define infinite lists as Johan has done with the iterate function.

Now before I get on to trying this stuff out, the way Johan defined the functions, you need to specify the type of [A] when you call them, for example iterate[Int].... But if you move the seed parameter from the last parameter to the first then you can get away with not defining A explicitly. So here are the functions reorganized to make them easier to use:


def iterate[A]
(seed: A)(step: A => A)
: Stream[A]
= seed #:: iterate(step(seed))(step)


def iterateFor[A] (seed: A)(predicate: A => Boolean)(step: A => A)
: Stream[A]
= iterate(seed)(step) takeWhile predicate

def until[A]
(seed: A)(predicate: A => Boolean)(step: A => A)
: Stream[A]
= if (!predicate(seed)) seed #:: until(step(seed))(predicate)(step) else Stream.empty



So what can I do with all this ****? Well first of all, iterate defines an infinite list, that is evaluated as needed so the following loop will iterate forever printing as it goes,

for (i <- iterate(0)(_ + 1)) {
println(i)
}

as will

iterate(0)(_ + 1).foreach(println)

Note that I said printing as it goes.

If you defined iterator using List instead of Stream replacing #:: with :: , then the loops above would get a StackOverflowError after munching stack on each iteration, but also, nothing would be printed, as the entire (infinite) List is calculated before anything is printed.

So what use is the iterate function using Streams? Well you can reinvent a convoluted for loop, by taking the first n of the infinite stream.

for (i <- iterate(0)(_ + 1) take 10 ) {
println(i)
}


Big deal, but more importantly its a building block: Johan went on to redefine my original iterateFor function (see the last post) using iterator and takeWhile.

def iterateFor[A] (seed: A)(predicate: A => Boolean)(step: A => A)
: Stream[A]
= iterate(seed)(step) takeWhile predicate


How simple is that!

As he also pointed out iterateFor can be defined without iterate like this:

def iterateFor[A](seed: A)(predicate: A => Boolean)(step: A => A) : Stream[A]
= if (predicate(seed)) seed #:: iterateFor(step(seed))(predicate)(step) else Stream.empty

except that he called that version "until" to match the haskell function. I don't like this name for two reasons in scala: (1) "until" is already used in "for (i <- 1 until 10)" and (2) by moving the seed parameter to the first parameter (as I explained above), until(0)(_ >= 10)(_ + 1) does not read well.

A Reminder About What This Is All For!

iterateFor is not for counting from 1 to 10! Use the normal Scala for (i <- 1 to 10) for that!

This is what I originally wanted it for:

val dates = iterateFor(today)(d => d >= _start && d >= limit)(d => d - 7.days)


The elephants in the room

1. Stack? These lazy stream versions of iterateFor DO NOT munch stack even though they do not look like they can be TCO'd. Amazing. I don't understand how it works, but it does. Maybe they munch alot more heap . Who knows.

2. Performance. I have not studied this in detail, but from what I have found with some simple (and probably flawed) tests, there is NO performance penalty. The use of streams in this way seems to be at least as fast as the TCOd recursive algorithms and the imperative while algorithm.




4 comments:

Johan said...

I'd just like to point out that the last function, the definition of until, is not correct. Either rename it to iterateFor or swap the cases in the if-statement.

TimA said...

@Johan: Done

Dan said...

I think that reason that the streams do not eat stack is because the remainder of the stream is constructed lazily. It doesn't actually call the function recursively, it just looks like it does. It actually captures a function that will generate the next segment of the stream when it is accessed for the first time.

Either way, streams are looking pretty useful, thanks for the post!

Anonymous said...

You'll notice that lazy_:: (known as #:: in Scala 2.8) isn't defined on Stream[A]. The call on the right side, which returns a Stream[A] is converted to StreamCons[T] by Predef.lazyStreamToConsable, which takes the stream as a by name parameter.

What's the takeaway? We know we can define short circuit evaluation for ourselves by

class Foo{
def toBool:Boolean = //...
def and (rhs: => Foo) = if (toBool) rhs.toBool else false
}


and rhs will not be evaluated. But what if we want to avoid evaluating the left side of the operator? In that case, we can create an implicit conversion with a by name parameter to convert to some other class that defines the appropriate operator.

Then all that's left is to reverse the sides when the operator in question ends with a colon.