Thursday, 5 November 2009

C# Linq OrderBy, Scala SortWith and SortBy (and Scary Implicits)

Here is a tiny example of a little bit of linq at work in Visual Studio 2010 Beta2. Lets order a list of names in C#

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace TestOrderBy
{
    class Person
    {
        public Person(string firstName, string lastName, DateTime dateOfBirth)
        {
            FirstName = firstName;
            LastName = lastName;
            DateOfBirth = dateOfBirth;
        }
        public readonly string FirstName;
        public readonly string LastName;
        public readonly DateTime DateOfBirth;
        override public string ToString()
        {
            return FirstName + ", " + LastName + " " + DateOfBirth.ToShortDateString();
        }
    }
    class Program
    {
        static void Main(string[] args)
        {
            var people = new List() {
                new Person("Alan", "Kay", new DateTime(1973,12,1)),
                new Person("James", "Gosling", new DateTime(1991,12,1)),
                new Person("Anders", "Hejlsberg", new DateTime(1999,12,1)),
                new Person("Martin", "Odersky", new DateTime(2000,12,1)),
                new Person("Guido", "Van Rossum", new DateTime(2002,12,1)),
                new Person("Yukihiro", "Matsumoto", new DateTime(1990,12,1))
            };

            var peopleOrderedByDob = people.OrderBy(person => person.DateOfBirth);
            foreach (var p in peopleOrderedByDob)
                Console.WriteLine(p);
            Console.WriteLine();

            var peopleOrderedNaturally = people.OrderBy(person => person);
            foreach (var p in peopleOrderedNaturally)
                Console.WriteLine(p);
            Console.WriteLine();

            Console.ReadLine();
        }
    }
}

What happens when we run it: We get:

Alan, Kay 01/12/1973
Yukihiro, Matsumoto 01/12/1990
James, Gosling 01/12/1991
Anders, Hejlsberg 01/12/1999
Martin, Odersky 01/12/2000
Guido, Van Rossum 01/12/2002


So the clause "OrderBy(person => person.DateOfBirth)" worked. Cool. C# knows how to compare birthdays.

But then Kaboom! The debugger stops at line 42 with ArgumentExeption "At least one object must implement IComparable". So "people.OrderBy(person => person)" was not checked at compile time. Wow! So lets implement IComparable on Person sorting on LastName:

class Person  : IComparable
{
    ...
    int IComparable.CompareTo(object obj)
    {
        Person otherPerson = obj as Person;
        if (otherPerson != null)
            return this.LastName.CompareTo(otherPerson.LastName);
        else
            throw new ArgumentException("Object is not a Person");
    }
}

And hey presto, C# knows how to sort people.

Alan, Kay 01/12/1973
Yukihiro, Matsumoto 01/12/1990
James, Gosling 01/12/1991
Anders, Hejlsberg 01/12/1999
Martin, Odersky 01/12/2000
Guido, Van Rossum 01/12/2002

James, Gosling 01/12/1991
Anders, Hejlsberg 01/12/1999
Alan, Kay 01/12/1973
Yukihiro, Matsumoto 01/12/1990
Martin, Odersky 01/12/2000
Guido, Van Rossum 01/12/2002


These clauses: "person => person.DateOfBirth" and "person => person" are known in C# are known as keySelector and are part of the .net linq library. If you want to drink the full linq Kool-Aid you can write

var peopleOrderedByDob = 
      from p in people
      orderby p.DateOfBirth
      select p;

foreach (var p in peopleOrderedByDob)
  Console.WriteLine(p);

But its just doing the same thing under the covers.

So lets do the same thing in Scala, using the sortWith method, which until 22 Oct 2009 was the only thing available. Lets sort by DateOfBirth first

import org.scala_tools.time.Imports._

object TestSortBy {

  case class Person(firstName: String, lastName: String, dateOfBirth: DateTime) 
  { 
    override def toString() = {
      firstName + ", " + lastName + " " + DateTimeFormat.shortDate().print(dateOfBirth);
    }
  }

  def main(args : Array[String]) : Unit = {    
    val people = 
      Person("Alan", "Kay", new DateTime(1973,12,1,0,0,0,0))::
      Person("James", "Gosling", new DateTime(1991,12,1,0,0,0,0))::
      Person("Anders", "Hejlsberg", new DateTime(1999,12,1,0,0,0,0))::
      Person("Martin", "Odersky", new DateTime(2000,12,1,0,0,0,0))::
      Person("Guido", "Van Rossum", new DateTime(2002,12,1,0,0,0,0))::
      Person("Yukihiro", "Matsumoto", new DateTime(1990,12,1,0,0,0,0))::Nil
   
    val peopleOrderedByDob = people.sortWith((p1,p2) => p1.dateOfBirth < p2.dateOfBirth) 
    peopleOrderedByDob foreach println
    println
  }
}
Which works just fine: 
Alan, Kay 01/12/73
Yukihiro, Matsumoto 01/12/90
James, Gosling 01/12/91
Anders, Hejlsberg 01/12/99
Martin, Odersky 01/12/00
Guido, Van Rossum 01/12/02

So now for the ordering naturally too

val peopleOrderedNaturally = people.sortWith((p1,p2) => p1 < p2) 
    peopleOrderedNaturally foreach println
    println

but line 2 does not compile (unlike c# where this is a runtime error) because the compiler does not know how to compare one Person with another. Doh! So lets add Ordered[Person] to Person to tell the compiler how to judge people :)
case class Person(firstName: String, lastName: String, dateOfBirth: DateTime) 
    extends Ordered[Person]
  { 
    override def toString() = {
      firstName + ", " + lastName + " " + DateTimeFormat.shortDate().print(dateOfBirth);
    }
    def compare(otherPerson: Person) = {
      lastName.compareTo(otherPerson.lastName)
    }
  }
And hey presto, same results as C#
Alan, Kay 01/12/73
Yukihiro, Matsumoto 01/12/90
James, Gosling 01/12/91
Anders, Hejlsberg 01/12/99
Martin, Odersky 01/12/00
Guido, Van Rossum 01/12/02

James, Gosling 01/12/91
Anders, Hejlsberg 01/12/99
Alan, Kay 01/12/73
Yukihiro, Matsumoto 01/12/90
Martin, Odersky 01/12/00
Guido, Van Rossum 01/12/02

Well that was easy. But if you look at Scala sortWith compared to C# OrderBy, you realize that in Scala you have to say how to do a comparison, whereas in C# you say what thing to order by.
Scala
  val peopleOrderedByDob = people.sortWith((p1,p2) => p1.dateOfBirth < p2.dateOfBirth) 
C#
  var peopleOrderedByDob = people.OrderBy(person => person.DateOfBirth);
When the Scala team saw that the C# version was shorter, and worse still, more functional than Scala's imperativeness, there was a great silence of the lambdas and then they recursed and recursed until they'd invented the entire Scala implicits system described well here and here. (Well maybe that wasn't quite how it happened...) So, very recently the sortBy method got added to the SeqLike trait as was kindly pointed out to me by Ismael Juma in an earlier post. So now we can write
val peopleOrderedNaturallyWithSortBy = people.sortBy(person => person) 
  peopleOrderedNaturallyWithSortBy foreach println
  println    
and this works fine. (Note that if you take the "extends Ordered[Person]" away from Person then "people.sortBy(person => person)" will not compile.) So how does it do it. Lets look at the signature of SortBy on the SeqLike trait.
def sortBy[B](f: A => B)(implicit ord: Ordering[B]): Repr
I'm not sure what Repr is, but the "implicit ord: Ordering[B])" is a suitable "thing" that knows how to order a B. In our case a B is a Person, which is an Ordered[Person] but not an Ordering[Person]. But Ordering.scala contains a trait LowPriorityOrderingImplicits with an implicit def that know how to "upgrade" an Ordered[Person] to an Ordering[Person] "thing" and that is then passed to the sortBy method to be used to get-the-job-done-tm.

My head is about to explode. If you followed that last paragraph well done. If you didn't then read the next one instead.

Anyhow, the compiler inserts an "implicit" "thing" into the last parameter of the sortBy method. And that "implicit" "thing" knows how to order people about into neat lines. It just works.

So what about ordering by say firstName?
val peopleOrderedByDobWithSortBy = people.sortBy(person => person.firstName) 
    peopleOrderedNaturallyWithSortBy foreach println
    println 
Well that works fine. Turns out its because there is a whole stack of implicit object defs in Ordering.scala that provide instances of Ordering traits for the basic value types like String and Int etc.

So what about ordering by say dateOfBirth?
val peopleOrderedByDobWithSortBy = people.sortBy(person => person.dateOfBirth) 
    peopleOrderedNaturallyWithSortBy foreach println
    println    
It doesn't compile at line 2: "type arguments [org.scala_tools.time.imports.DateTime] do not conform to method ordered's type parameter bounds [A <: Ordered[A]]". Silly me, bah humbug! Its because a org.joda.time.DateTime isn't an Ordered[DateTime]. Now Scala-Time uses the pimp-my-library technique to upgrade org.joda.time.DateTime to RichDateTime, so we can try changing Scala-Time's RichDateTime to
class RichDateTime(val underlying: DateTime) extends Ordered[RichDateTime]  {
  def compare(y: RichDateTime): Int = {
    return underlying.compareTo(y.underlying)
  }
  ...
but that still doesn't compile "people.sortBy(person => person.dateOfBirth)" because, the compiler is not "upgrading" DateTime to RichDateTime.
But we can say:
val peopleOrderedByDobWithSortBy = people.sortBy(person => RichDateTime(person.dateOfBirth)) 
but thats not very pretty. So what's the answer? Put in an implicit object that knows how to order DateTime.
implicit object DateTimeOrderingObject extends Ordering[DateTime] {
  def compare(x: DateTime, y: DateTime) = x.compareTo(y)
} 
That definition should live in the Scala-Time library, but for now, it can be in my program. Now this will compile:
val peopleOrderedByDobWithSortBy = people.sortBy(person => person.dateOfBirth) 
and the code works.

But now for something scary. If I accidentally import this definition from some library that I happen to be using
implicit object SomeOtherPersonOrderingObjectInadvertentlyImported extends Ordering[Person] {
def compare(p1: Person, p2: Person) = 
(p1.firstName+p1.lastName).compareTo(p2.firstName+p2.lastName)
}
Then my code breaks because the people are printed in a different order. The code still compiles, but I get the SomeOtherPersonOrderingObjectInadvertentlyImported object used to do the sorting of the People instead of People's normal ordering.

If People line up in the wrong order then nobody cares, but lets call that class StocksToShortRightNow instead. Bang goes my Christmas bonus!

I hope I am somehow wrong here, and this is a bug or I am thinking about this the wrong way. But it makes me nervous.

Here is the complete code with the scary implicit commented out.

import org.scala_tools.time.Imports._

object TestSortBy {
  case class Person(firstName: String, lastName: String, dateOfBirth: DateTime) 
    extends Ordered[Person]
  { 
    override def toString() = {
      firstName + ", " + lastName + " " + DateTimeFormat.shortDate().print(dateOfBirth);
    }
    def compare(otherPerson: Person) = {
      lastName.compareTo(otherPerson.lastName)
    }
  }

  // Comment this is and watch the behaviour change. This could be imported accidentally
  // implicit object SomeOtherPersonOrderingObjectInadvertentlyImported 
  //   extends Ordering[Person] {
  //   def compare(p1: Person, p2: Person) =
  //    (p1.firstName+p1.lastName).compareTo(p2.firstName+p2.lastName)
  // }

  
  implicit object DateTimeOrderingObject extends Ordering[DateTime] {
     def compare(x: DateTime, y: DateTime) = x.compareTo(y)
  }
  
  def main(args : Array[String]) : Unit = {
    
    val people = 
      Person("Alan", "Kay", new DateTime(1973,12,1,0,0,0,0))::
      Person("James", "Gosling", new DateTime(1991,12,1,0,0,0,0))::
      Person("Anders", "Hejlsberg", new DateTime(1999,12,1,0,0,0,0))::
      Person("Martin", "Odersky", new DateTime(2000,12,1,0,0,0,0))::
      Person("Guido", "Van Rossum", new DateTime(2002,12,1,0,0,0,0))::
      Person("Yukihiro", "Matsumoto", new DateTime(1990,12,1,0,0,0,0))::Nil
    
    val peopleOrderedByDob = people.sortWith((p1,p2) => p1.dateOfBirth < p2.dateOfBirth) 
    peopleOrderedByDob foreach println
    println
    
    val peopleOrderedNaturally = people.sortWith((p1,p2) => p1 < p2) 
    peopleOrderedNaturally foreach println
    println

    val peopleOrderedNaturallyWithSortBy = people.sortBy(person => person) 
    peopleOrderedNaturallyWithSortBy foreach println
    println    

    val peopleOrderedByDobWithSortBy = people.sortBy(person => person.dateOfBirth) 
    peopleOrderedNaturallyWithSortBy foreach println
    println    


  }

}

[Update: 2009-11-05]
I posted a question about the SomeOtherPersonOrderingObjectInadvertentlyImported overriding the Person extends Ordered[Person] natural ordering. Here is Martin Odersky's response:
Implicits that are explicitly declared or imported take precedence over implicits that come with the type. Btw you can find out what implicits are inserted by running scalac with option -Xprint:typer. This will print out the tree after type checking.
Post a Comment