Thursday, 5 November 2009

C# Linq OrderBy, Scala SortWith and SortBy (and Scary Implicits)

Here is a tiny example of a little bit of linq at work in Visual Studio 2010 Beta2. Lets order a list of names in C#

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace TestOrderBy
{
    class Person
    {
        public Person(string firstName, string lastName, DateTime dateOfBirth)
        {
            FirstName = firstName;
            LastName = lastName;
            DateOfBirth = dateOfBirth;
        }
        public readonly string FirstName;
        public readonly string LastName;
        public readonly DateTime DateOfBirth;
        override public string ToString()
        {
            return FirstName + ", " + LastName + " " + DateOfBirth.ToShortDateString();
        }
    }
    class Program
    {
        static void Main(string[] args)
        {
            var people = new List() {
                new Person("Alan", "Kay", new DateTime(1973,12,1)),
                new Person("James", "Gosling", new DateTime(1991,12,1)),
                new Person("Anders", "Hejlsberg", new DateTime(1999,12,1)),
                new Person("Martin", "Odersky", new DateTime(2000,12,1)),
                new Person("Guido", "Van Rossum", new DateTime(2002,12,1)),
                new Person("Yukihiro", "Matsumoto", new DateTime(1990,12,1))
            };

            var peopleOrderedByDob = people.OrderBy(person => person.DateOfBirth);
            foreach (var p in peopleOrderedByDob)
                Console.WriteLine(p);
            Console.WriteLine();

            var peopleOrderedNaturally = people.OrderBy(person => person);
            foreach (var p in peopleOrderedNaturally)
                Console.WriteLine(p);
            Console.WriteLine();

            Console.ReadLine();
        }
    }
}

What happens when we run it: We get:

Alan, Kay 01/12/1973
Yukihiro, Matsumoto 01/12/1990
James, Gosling 01/12/1991
Anders, Hejlsberg 01/12/1999
Martin, Odersky 01/12/2000
Guido, Van Rossum 01/12/2002


So the clause "OrderBy(person => person.DateOfBirth)" worked. Cool. C# knows how to compare birthdays.

But then Kaboom! The debugger stops at line 42 with ArgumentExeption "At least one object must implement IComparable". So "people.OrderBy(person => person)" was not checked at compile time. Wow! So lets implement IComparable on Person sorting on LastName:

class Person  : IComparable
{
    ...
    int IComparable.CompareTo(object obj)
    {
        Person otherPerson = obj as Person;
        if (otherPerson != null)
            return this.LastName.CompareTo(otherPerson.LastName);
        else
            throw new ArgumentException("Object is not a Person");
    }
}

And hey presto, C# knows how to sort people.

Alan, Kay 01/12/1973
Yukihiro, Matsumoto 01/12/1990
James, Gosling 01/12/1991
Anders, Hejlsberg 01/12/1999
Martin, Odersky 01/12/2000
Guido, Van Rossum 01/12/2002

James, Gosling 01/12/1991
Anders, Hejlsberg 01/12/1999
Alan, Kay 01/12/1973
Yukihiro, Matsumoto 01/12/1990
Martin, Odersky 01/12/2000
Guido, Van Rossum 01/12/2002


These clauses: "person => person.DateOfBirth" and "person => person" are known in C# are known as keySelector and are part of the .net linq library. If you want to drink the full linq Kool-Aid you can write

var peopleOrderedByDob = 
      from p in people
      orderby p.DateOfBirth
      select p;

foreach (var p in peopleOrderedByDob)
  Console.WriteLine(p);

But its just doing the same thing under the covers.

So lets do the same thing in Scala, using the sortWith method, which until 22 Oct 2009 was the only thing available. Lets sort by DateOfBirth first

import org.scala_tools.time.Imports._

object TestSortBy {

  case class Person(firstName: String, lastName: String, dateOfBirth: DateTime) 
  { 
    override def toString() = {
      firstName + ", " + lastName + " " + DateTimeFormat.shortDate().print(dateOfBirth);
    }
  }

  def main(args : Array[String]) : Unit = {    
    val people = 
      Person("Alan", "Kay", new DateTime(1973,12,1,0,0,0,0))::
      Person("James", "Gosling", new DateTime(1991,12,1,0,0,0,0))::
      Person("Anders", "Hejlsberg", new DateTime(1999,12,1,0,0,0,0))::
      Person("Martin", "Odersky", new DateTime(2000,12,1,0,0,0,0))::
      Person("Guido", "Van Rossum", new DateTime(2002,12,1,0,0,0,0))::
      Person("Yukihiro", "Matsumoto", new DateTime(1990,12,1,0,0,0,0))::Nil
   
    val peopleOrderedByDob = people.sortWith((p1,p2) => p1.dateOfBirth < p2.dateOfBirth) 
    peopleOrderedByDob foreach println
    println
  }
}
Which works just fine: 
Alan, Kay 01/12/73
Yukihiro, Matsumoto 01/12/90
James, Gosling 01/12/91
Anders, Hejlsberg 01/12/99
Martin, Odersky 01/12/00
Guido, Van Rossum 01/12/02

So now for the ordering naturally too

val peopleOrderedNaturally = people.sortWith((p1,p2) => p1 < p2) 
    peopleOrderedNaturally foreach println
    println

but line 2 does not compile (unlike c# where this is a runtime error) because the compiler does not know how to compare one Person with another. Doh! So lets add Ordered[Person] to Person to tell the compiler how to judge people :)
case class Person(firstName: String, lastName: String, dateOfBirth: DateTime) 
    extends Ordered[Person]
  { 
    override def toString() = {
      firstName + ", " + lastName + " " + DateTimeFormat.shortDate().print(dateOfBirth);
    }
    def compare(otherPerson: Person) = {
      lastName.compareTo(otherPerson.lastName)
    }
  }
And hey presto, same results as C#
Alan, Kay 01/12/73
Yukihiro, Matsumoto 01/12/90
James, Gosling 01/12/91
Anders, Hejlsberg 01/12/99
Martin, Odersky 01/12/00
Guido, Van Rossum 01/12/02

James, Gosling 01/12/91
Anders, Hejlsberg 01/12/99
Alan, Kay 01/12/73
Yukihiro, Matsumoto 01/12/90
Martin, Odersky 01/12/00
Guido, Van Rossum 01/12/02

Well that was easy. But if you look at Scala sortWith compared to C# OrderBy, you realize that in Scala you have to say how to do a comparison, whereas in C# you say what thing to order by.
Scala
  val peopleOrderedByDob = people.sortWith((p1,p2) => p1.dateOfBirth < p2.dateOfBirth) 
C#
  var peopleOrderedByDob = people.OrderBy(person => person.DateOfBirth);
When the Scala team saw that the C# version was shorter, and worse still, more functional than Scala's imperativeness, there was a great silence of the lambdas and then they recursed and recursed until they'd invented the entire Scala implicits system described well here and here. (Well maybe that wasn't quite how it happened...) So, very recently the sortBy method got added to the SeqLike trait as was kindly pointed out to me by Ismael Juma in an earlier post. So now we can write
val peopleOrderedNaturallyWithSortBy = people.sortBy(person => person) 
  peopleOrderedNaturallyWithSortBy foreach println
  println    
and this works fine. (Note that if you take the "extends Ordered[Person]" away from Person then "people.sortBy(person => person)" will not compile.) So how does it do it. Lets look at the signature of SortBy on the SeqLike trait.
def sortBy[B](f: A => B)(implicit ord: Ordering[B]): Repr
I'm not sure what Repr is, but the "implicit ord: Ordering[B])" is a suitable "thing" that knows how to order a B. In our case a B is a Person, which is an Ordered[Person] but not an Ordering[Person]. But Ordering.scala contains a trait LowPriorityOrderingImplicits with an implicit def that know how to "upgrade" an Ordered[Person] to an Ordering[Person] "thing" and that is then passed to the sortBy method to be used to get-the-job-done-tm.

My head is about to explode. If you followed that last paragraph well done. If you didn't then read the next one instead.

Anyhow, the compiler inserts an "implicit" "thing" into the last parameter of the sortBy method. And that "implicit" "thing" knows how to order people about into neat lines. It just works.

So what about ordering by say firstName?
val peopleOrderedByDobWithSortBy = people.sortBy(person => person.firstName) 
    peopleOrderedNaturallyWithSortBy foreach println
    println 
Well that works fine. Turns out its because there is a whole stack of implicit object defs in Ordering.scala that provide instances of Ordering traits for the basic value types like String and Int etc.

So what about ordering by say dateOfBirth?
val peopleOrderedByDobWithSortBy = people.sortBy(person => person.dateOfBirth) 
    peopleOrderedNaturallyWithSortBy foreach println
    println    
It doesn't compile at line 2: "type arguments [org.scala_tools.time.imports.DateTime] do not conform to method ordered's type parameter bounds [A <: Ordered[A]]". Silly me, bah humbug! Its because a org.joda.time.DateTime isn't an Ordered[DateTime]. Now Scala-Time uses the pimp-my-library technique to upgrade org.joda.time.DateTime to RichDateTime, so we can try changing Scala-Time's RichDateTime to
class RichDateTime(val underlying: DateTime) extends Ordered[RichDateTime]  {
  def compare(y: RichDateTime): Int = {
    return underlying.compareTo(y.underlying)
  }
  ...
but that still doesn't compile "people.sortBy(person => person.dateOfBirth)" because, the compiler is not "upgrading" DateTime to RichDateTime.
But we can say:
val peopleOrderedByDobWithSortBy = people.sortBy(person => RichDateTime(person.dateOfBirth)) 
but thats not very pretty. So what's the answer? Put in an implicit object that knows how to order DateTime.
implicit object DateTimeOrderingObject extends Ordering[DateTime] {
  def compare(x: DateTime, y: DateTime) = x.compareTo(y)
} 
That definition should live in the Scala-Time library, but for now, it can be in my program. Now this will compile:
val peopleOrderedByDobWithSortBy = people.sortBy(person => person.dateOfBirth) 
and the code works.

But now for something scary. If I accidentally import this definition from some library that I happen to be using
implicit object SomeOtherPersonOrderingObjectInadvertentlyImported extends Ordering[Person] {
def compare(p1: Person, p2: Person) = 
(p1.firstName+p1.lastName).compareTo(p2.firstName+p2.lastName)
}
Then my code breaks because the people are printed in a different order. The code still compiles, but I get the SomeOtherPersonOrderingObjectInadvertentlyImported object used to do the sorting of the People instead of People's normal ordering.

If People line up in the wrong order then nobody cares, but lets call that class StocksToShortRightNow instead. Bang goes my Christmas bonus!

I hope I am somehow wrong here, and this is a bug or I am thinking about this the wrong way. But it makes me nervous.

Here is the complete code with the scary implicit commented out.

import org.scala_tools.time.Imports._

object TestSortBy {
  case class Person(firstName: String, lastName: String, dateOfBirth: DateTime) 
    extends Ordered[Person]
  { 
    override def toString() = {
      firstName + ", " + lastName + " " + DateTimeFormat.shortDate().print(dateOfBirth);
    }
    def compare(otherPerson: Person) = {
      lastName.compareTo(otherPerson.lastName)
    }
  }

  // Comment this is and watch the behaviour change. This could be imported accidentally
  // implicit object SomeOtherPersonOrderingObjectInadvertentlyImported 
  //   extends Ordering[Person] {
  //   def compare(p1: Person, p2: Person) =
  //    (p1.firstName+p1.lastName).compareTo(p2.firstName+p2.lastName)
  // }

  
  implicit object DateTimeOrderingObject extends Ordering[DateTime] {
     def compare(x: DateTime, y: DateTime) = x.compareTo(y)
  }
  
  def main(args : Array[String]) : Unit = {
    
    val people = 
      Person("Alan", "Kay", new DateTime(1973,12,1,0,0,0,0))::
      Person("James", "Gosling", new DateTime(1991,12,1,0,0,0,0))::
      Person("Anders", "Hejlsberg", new DateTime(1999,12,1,0,0,0,0))::
      Person("Martin", "Odersky", new DateTime(2000,12,1,0,0,0,0))::
      Person("Guido", "Van Rossum", new DateTime(2002,12,1,0,0,0,0))::
      Person("Yukihiro", "Matsumoto", new DateTime(1990,12,1,0,0,0,0))::Nil
    
    val peopleOrderedByDob = people.sortWith((p1,p2) => p1.dateOfBirth < p2.dateOfBirth) 
    peopleOrderedByDob foreach println
    println
    
    val peopleOrderedNaturally = people.sortWith((p1,p2) => p1 < p2) 
    peopleOrderedNaturally foreach println
    println

    val peopleOrderedNaturallyWithSortBy = people.sortBy(person => person) 
    peopleOrderedNaturallyWithSortBy foreach println
    println    

    val peopleOrderedByDobWithSortBy = people.sortBy(person => person.dateOfBirth) 
    peopleOrderedNaturallyWithSortBy foreach println
    println    


  }

}

[Update: 2009-11-05]
I posted a question about the SomeOtherPersonOrderingObjectInadvertentlyImported overriding the Person extends Ordered[Person] natural ordering. Here is Martin Odersky's response:
Implicits that are explicitly declared or imported take precedence over implicits that come with the type. Btw you can find out what implicits are inserted by running scalac with option -Xprint:typer. This will print out the tree after type checking.

Tuesday, 3 November 2009

Using arithmetic expressions with Option[..] in Scala

Fire and motion: Here is how it works. You fire at the enemy. That's the fire part. And you move forward at the same time. That's the motion. Get it?

Scala's answer to the "null" that everybody loves to hate is to use is the Option[T] types. The Option thing is described here in section "Option, Some, and None: Avoiding nulls".
Despite all the waffle, Option types are simple. A variable called v of class Option[Int] can have a value of either None or Some(123) where 123 is an example of the underlying value you actually care about. The value None is a real value, rather than an invalid reference as null is in Java or C#. If I want to test is whether v is None I can say v==None or v.isDefined() or !v.isEmpty(). If I want its value 123, I say v.getOrElse(0) where 0 is a default if its None. Easy. Or I can use a patten match. Easy too.

A scenario I hit was doing various bits of arithmetic with stock prices. Now a stock price may not exists on a given date. But I may still want to write the code to try to do some calculation with the price. In Java I probably use a lot of null checks or set the price to a special value early on or something. C# gives you Nullable types which lets you say
double? price1 = null;
double? ratio1 = 0.7;
double? result = ratio1 * price1; // or some other complex expression
Console.Write("{0,7:N2}", result);
In C#, if either price1 or ratio1 is null then result is null. The operators == and != do the right thing. Comparison using < and > requires a little care.

So how do I do this in Scala? Well, if the Option type is what we are supposed to use then lets do it.
val price1:Option[Double] = None
  val ratio1:Option[Double] = Some(0.7)
  val result:Option[Double] = price1 * ratio1
  printf("%7.2f", result getOrElse -99999.99)
but line 3 does not compile because * is not a member of Option[Double].
So a bit of googling and I find this clever-but-obscure trick:
val result = 
    for (ratio1_alias <- ratio1; price1_alias <- price1)
    yield ratio1_alias * price1_alias 
It seems that you can iterate over an Option[..]: Its a sequence of either zero (for the None case) or one element (for the Some(123.0) case). The expression returns None if anything is None or Some(ratio1_alias * price1_alias) if not. Tricksy stuff. It works but boy does it suck from a readability point of view. Oh and all the vals need to be aliased. Can I have my C# nullable types back please? So after a question on the Scala-User mailing list "Rex Kerr-2" suggested using the pimp-my-library techique to add functionality to the Option[Double] class, by "upgrading" it to a RichOptionDouble to which I can add arithmetic operators. So I ended up with the non-generic (but simple and, better still, working) code:
package org.scala_tools.option.math
class RichOptionDouble(od:Option[Double]) extends Ordered[Option[Double]] {
  def +(o:Option[Double]) = if (o==None) o else Some(numeric.plus(od.get,o.get))
  def unary_-():Option[Double] = Some(-od.get) 
  def -(o:Option[Double]) = if (o==None) o else Some(od.get-o.get)
  def *(o:Option[Double]) = if (o==None) o else Some(od.get*o.get)
  def /(o:Option[Double]) = if (o==None) o else Some(od.get/o.get)
  def compare(y: Option[Double]): Int = {
    if (od.isEmpty) return if (y.isEmpty) 0 else 1 
    if (y.isEmpty) return -1 
    // This is what Scala RichDouble does
    return java.lang.Double.compare(od.getOrElse(0d), y.getOrElse(0d))
  }
}

object RichOptionDoubleNone extends RichOptionDouble(None) { 
  override def +(o:Option[Double]) = None
  override def unary_-() = None
  override def -(o:Option[Double]) = None
  override def *(o:Option[Double]) = None
  override def /(o:Option[Double]) = None
}

trait Implicits {
  implicit def optiondouble2richoptiondouble(od:Option[Double]) = {
    if (od==None) RichOptionDoubleNone else new RichOptionDouble(od)
  }
  implicit def double2optiondouble(d:Double) = Some(d)
  implicit def int2optiondouble(i:Int) = Some(i.toDouble) 
  implicit def double2richoptiondouble(d:Double) = new RichOptionDouble(Some(d)) 
  implicit def int2richoptiondouble(i:Int) = new RichOptionDouble(Some(i.toDouble)) 
}

// Modelled on technique in Scala-Time
object Imports extends Implicits
So now I can say:
import org.scala_tools.option.math.Imports._  
 
object Bar {
  def main(args : Array[String]) : Unit = {
    val price1 = None
    val price2 = Some(123.45)
    val ratio1 = Some(0.7)
    val result1 = price1 * ratio1
    val result2 = price2 * ratio1 / 2 + 7.0
    printf("%7.2f", result1 getOrElse -99999.99)
    printf("%7.2f", result2 getOrElse -99999.99) 
    printf(" %s\n", result1 > result2) 
  }
}
Which prints
-99999.99  86.41 true

Not ground breaking stuff. But still nice to get there in the end. It feels something like this (probably in generic form) should be in the core libraries,

Sunday, 1 November 2009

Scala almost as good as C# .NET4?

Disclaimer: I am primarily a Java developer with some experience in C# NET 2, Ruby and Javascript, + few of others from the C derived stable. I'm learning Scala and I am not using it in paid work yet. (Offers welcome!)

So far my posts have been about Scala, and learning about functional programming, but this is a bit of a diversion. In the previous post Becoming really rich with scala, I translated a very nice example of some C# .NET 4 code to Scala using alot of idiomatic Scala. The C# code stands up very well in comparison. Much to my surprise, I think that C# has the edge overall although its a close call. I dread to think what the equivalent code would look like in Java. This exercise made me realise just how far behind Java is compared to C# especially the version coming in .NET4. While the java world has been bickering about how to do proper closures without breaking backwards compatibility, C# added them long ago. Then C# added linq and updated the libraries to use the new features. Now Plinq is being built on that foundation. IMO, this combination has moved C# up to a totally different level compared to Java.

The C# features that caught my eye are:

a) The functional features and linq can make C# as *readably* concise as Scala, and all the imperative stuff is still there if you want it.

b) The way that the well known SQL vocabulary "select", "where", "orderBy" has been used C# instead of "map" and "filter" and "sortBy" in Scala (and others). This is subtle point but it seems very important to me. C#/Java developers are familiar with SQL but not necessaryily map, filter, etc. This choice of familiar vocabulary is important.

c) C# OrderBy(s => s.LRS) vs Scala .sortWith((elem1, elem2) => elem1.LRS <= elem2.LRS). This small point may appears trivial, but IMO is a huge mindset win for C#. In list.OrderBy(s => s.LRS), you are saying *what* you want and not, as in Scala, not *how* to do it; list.sortWith((elem1, elem2) => elem1.LRS <= elem2.LRS). This principle is what makes SQL so powerful, but of course here is is being used with C# lists. I'm sure this is just the tip of the iceberg for the way that this principle can be used.

d) [apologies: I added this point since posting the original on which the first 8 comments are based]. Nullable types. This is a sane way of dealing with arithmetic where variables can be null. Java just offers huge amounts of ==null? boilerplate. Scala offers Option[Double/etc] which is a start, but you can't do arithmetic expressions using Option variables out-of-the-box. Its *fairly* easy to add arithmetic expressions to Option types but its not in the core libraries.


The Java world makes a lot of noise about the other JVM languages that are "better" in some way that Java: JavaFX, Clojure, Groovy, JRuby, Fan, or Scala. But C# .NET4 has raised the bar up very very high with C# 4. Nobody in the alternative JVM language world should feel complacent! (Well (except Rich Hickey, ((!)))).