Thursday 5 November 2009

C# Linq OrderBy, Scala SortWith and SortBy (and Scary Implicits)

Here is a tiny example of a little bit of linq at work in Visual Studio 2010 Beta2. Lets order a list of names in C#

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace TestOrderBy
{
    class Person
    {
        public Person(string firstName, string lastName, DateTime dateOfBirth)
        {
            FirstName = firstName;
            LastName = lastName;
            DateOfBirth = dateOfBirth;
        }
        public readonly string FirstName;
        public readonly string LastName;
        public readonly DateTime DateOfBirth;
        override public string ToString()
        {
            return FirstName + ", " + LastName + " " + DateOfBirth.ToShortDateString();
        }
    }
    class Program
    {
        static void Main(string[] args)
        {
            var people = new List() {
                new Person("Alan", "Kay", new DateTime(1973,12,1)),
                new Person("James", "Gosling", new DateTime(1991,12,1)),
                new Person("Anders", "Hejlsberg", new DateTime(1999,12,1)),
                new Person("Martin", "Odersky", new DateTime(2000,12,1)),
                new Person("Guido", "Van Rossum", new DateTime(2002,12,1)),
                new Person("Yukihiro", "Matsumoto", new DateTime(1990,12,1))
            };

            var peopleOrderedByDob = people.OrderBy(person => person.DateOfBirth);
            foreach (var p in peopleOrderedByDob)
                Console.WriteLine(p);
            Console.WriteLine();

            var peopleOrderedNaturally = people.OrderBy(person => person);
            foreach (var p in peopleOrderedNaturally)
                Console.WriteLine(p);
            Console.WriteLine();

            Console.ReadLine();
        }
    }
}

What happens when we run it: We get:

Alan, Kay 01/12/1973
Yukihiro, Matsumoto 01/12/1990
James, Gosling 01/12/1991
Anders, Hejlsberg 01/12/1999
Martin, Odersky 01/12/2000
Guido, Van Rossum 01/12/2002


So the clause "OrderBy(person => person.DateOfBirth)" worked. Cool. C# knows how to compare birthdays.

But then Kaboom! The debugger stops at line 42 with ArgumentExeption "At least one object must implement IComparable". So "people.OrderBy(person => person)" was not checked at compile time. Wow! So lets implement IComparable on Person sorting on LastName:

class Person  : IComparable
{
    ...
    int IComparable.CompareTo(object obj)
    {
        Person otherPerson = obj as Person;
        if (otherPerson != null)
            return this.LastName.CompareTo(otherPerson.LastName);
        else
            throw new ArgumentException("Object is not a Person");
    }
}

And hey presto, C# knows how to sort people.

Alan, Kay 01/12/1973
Yukihiro, Matsumoto 01/12/1990
James, Gosling 01/12/1991
Anders, Hejlsberg 01/12/1999
Martin, Odersky 01/12/2000
Guido, Van Rossum 01/12/2002

James, Gosling 01/12/1991
Anders, Hejlsberg 01/12/1999
Alan, Kay 01/12/1973
Yukihiro, Matsumoto 01/12/1990
Martin, Odersky 01/12/2000
Guido, Van Rossum 01/12/2002


These clauses: "person => person.DateOfBirth" and "person => person" are known in C# are known as keySelector and are part of the .net linq library. If you want to drink the full linq Kool-Aid you can write

var peopleOrderedByDob = 
      from p in people
      orderby p.DateOfBirth
      select p;

foreach (var p in peopleOrderedByDob)
  Console.WriteLine(p);

But its just doing the same thing under the covers.

So lets do the same thing in Scala, using the sortWith method, which until 22 Oct 2009 was the only thing available. Lets sort by DateOfBirth first

import org.scala_tools.time.Imports._

object TestSortBy {

  case class Person(firstName: String, lastName: String, dateOfBirth: DateTime) 
  { 
    override def toString() = {
      firstName + ", " + lastName + " " + DateTimeFormat.shortDate().print(dateOfBirth);
    }
  }

  def main(args : Array[String]) : Unit = {    
    val people = 
      Person("Alan", "Kay", new DateTime(1973,12,1,0,0,0,0))::
      Person("James", "Gosling", new DateTime(1991,12,1,0,0,0,0))::
      Person("Anders", "Hejlsberg", new DateTime(1999,12,1,0,0,0,0))::
      Person("Martin", "Odersky", new DateTime(2000,12,1,0,0,0,0))::
      Person("Guido", "Van Rossum", new DateTime(2002,12,1,0,0,0,0))::
      Person("Yukihiro", "Matsumoto", new DateTime(1990,12,1,0,0,0,0))::Nil
   
    val peopleOrderedByDob = people.sortWith((p1,p2) => p1.dateOfBirth < p2.dateOfBirth) 
    peopleOrderedByDob foreach println
    println
  }
}
Which works just fine: 
Alan, Kay 01/12/73
Yukihiro, Matsumoto 01/12/90
James, Gosling 01/12/91
Anders, Hejlsberg 01/12/99
Martin, Odersky 01/12/00
Guido, Van Rossum 01/12/02

So now for the ordering naturally too

val peopleOrderedNaturally = people.sortWith((p1,p2) => p1 < p2) 
    peopleOrderedNaturally foreach println
    println

but line 2 does not compile (unlike c# where this is a runtime error) because the compiler does not know how to compare one Person with another. Doh! So lets add Ordered[Person] to Person to tell the compiler how to judge people :)
case class Person(firstName: String, lastName: String, dateOfBirth: DateTime) 
    extends Ordered[Person]
  { 
    override def toString() = {
      firstName + ", " + lastName + " " + DateTimeFormat.shortDate().print(dateOfBirth);
    }
    def compare(otherPerson: Person) = {
      lastName.compareTo(otherPerson.lastName)
    }
  }
And hey presto, same results as C#
Alan, Kay 01/12/73
Yukihiro, Matsumoto 01/12/90
James, Gosling 01/12/91
Anders, Hejlsberg 01/12/99
Martin, Odersky 01/12/00
Guido, Van Rossum 01/12/02

James, Gosling 01/12/91
Anders, Hejlsberg 01/12/99
Alan, Kay 01/12/73
Yukihiro, Matsumoto 01/12/90
Martin, Odersky 01/12/00
Guido, Van Rossum 01/12/02

Well that was easy. But if you look at Scala sortWith compared to C# OrderBy, you realize that in Scala you have to say how to do a comparison, whereas in C# you say what thing to order by.
Scala
  val peopleOrderedByDob = people.sortWith((p1,p2) => p1.dateOfBirth < p2.dateOfBirth) 
C#
  var peopleOrderedByDob = people.OrderBy(person => person.DateOfBirth);
When the Scala team saw that the C# version was shorter, and worse still, more functional than Scala's imperativeness, there was a great silence of the lambdas and then they recursed and recursed until they'd invented the entire Scala implicits system described well here and here. (Well maybe that wasn't quite how it happened...) So, very recently the sortBy method got added to the SeqLike trait as was kindly pointed out to me by Ismael Juma in an earlier post. So now we can write
val peopleOrderedNaturallyWithSortBy = people.sortBy(person => person) 
  peopleOrderedNaturallyWithSortBy foreach println
  println    
and this works fine. (Note that if you take the "extends Ordered[Person]" away from Person then "people.sortBy(person => person)" will not compile.) So how does it do it. Lets look at the signature of SortBy on the SeqLike trait.
def sortBy[B](f: A => B)(implicit ord: Ordering[B]): Repr
I'm not sure what Repr is, but the "implicit ord: Ordering[B])" is a suitable "thing" that knows how to order a B. In our case a B is a Person, which is an Ordered[Person] but not an Ordering[Person]. But Ordering.scala contains a trait LowPriorityOrderingImplicits with an implicit def that know how to "upgrade" an Ordered[Person] to an Ordering[Person] "thing" and that is then passed to the sortBy method to be used to get-the-job-done-tm.

My head is about to explode. If you followed that last paragraph well done. If you didn't then read the next one instead.

Anyhow, the compiler inserts an "implicit" "thing" into the last parameter of the sortBy method. And that "implicit" "thing" knows how to order people about into neat lines. It just works.

So what about ordering by say firstName?
val peopleOrderedByDobWithSortBy = people.sortBy(person => person.firstName) 
    peopleOrderedNaturallyWithSortBy foreach println
    println 
Well that works fine. Turns out its because there is a whole stack of implicit object defs in Ordering.scala that provide instances of Ordering traits for the basic value types like String and Int etc.

So what about ordering by say dateOfBirth?
val peopleOrderedByDobWithSortBy = people.sortBy(person => person.dateOfBirth) 
    peopleOrderedNaturallyWithSortBy foreach println
    println    
It doesn't compile at line 2: "type arguments [org.scala_tools.time.imports.DateTime] do not conform to method ordered's type parameter bounds [A <: Ordered[A]]". Silly me, bah humbug! Its because a org.joda.time.DateTime isn't an Ordered[DateTime]. Now Scala-Time uses the pimp-my-library technique to upgrade org.joda.time.DateTime to RichDateTime, so we can try changing Scala-Time's RichDateTime to
class RichDateTime(val underlying: DateTime) extends Ordered[RichDateTime]  {
  def compare(y: RichDateTime): Int = {
    return underlying.compareTo(y.underlying)
  }
  ...
but that still doesn't compile "people.sortBy(person => person.dateOfBirth)" because, the compiler is not "upgrading" DateTime to RichDateTime.
But we can say:
val peopleOrderedByDobWithSortBy = people.sortBy(person => RichDateTime(person.dateOfBirth)) 
but thats not very pretty. So what's the answer? Put in an implicit object that knows how to order DateTime.
implicit object DateTimeOrderingObject extends Ordering[DateTime] {
  def compare(x: DateTime, y: DateTime) = x.compareTo(y)
} 
That definition should live in the Scala-Time library, but for now, it can be in my program. Now this will compile:
val peopleOrderedByDobWithSortBy = people.sortBy(person => person.dateOfBirth) 
and the code works.

But now for something scary. If I accidentally import this definition from some library that I happen to be using
implicit object SomeOtherPersonOrderingObjectInadvertentlyImported extends Ordering[Person] {
def compare(p1: Person, p2: Person) = 
(p1.firstName+p1.lastName).compareTo(p2.firstName+p2.lastName)
}
Then my code breaks because the people are printed in a different order. The code still compiles, but I get the SomeOtherPersonOrderingObjectInadvertentlyImported object used to do the sorting of the People instead of People's normal ordering.

If People line up in the wrong order then nobody cares, but lets call that class StocksToShortRightNow instead. Bang goes my Christmas bonus!

I hope I am somehow wrong here, and this is a bug or I am thinking about this the wrong way. But it makes me nervous.

Here is the complete code with the scary implicit commented out.

import org.scala_tools.time.Imports._

object TestSortBy {
  case class Person(firstName: String, lastName: String, dateOfBirth: DateTime) 
    extends Ordered[Person]
  { 
    override def toString() = {
      firstName + ", " + lastName + " " + DateTimeFormat.shortDate().print(dateOfBirth);
    }
    def compare(otherPerson: Person) = {
      lastName.compareTo(otherPerson.lastName)
    }
  }

  // Comment this is and watch the behaviour change. This could be imported accidentally
  // implicit object SomeOtherPersonOrderingObjectInadvertentlyImported 
  //   extends Ordering[Person] {
  //   def compare(p1: Person, p2: Person) =
  //    (p1.firstName+p1.lastName).compareTo(p2.firstName+p2.lastName)
  // }

  
  implicit object DateTimeOrderingObject extends Ordering[DateTime] {
     def compare(x: DateTime, y: DateTime) = x.compareTo(y)
  }
  
  def main(args : Array[String]) : Unit = {
    
    val people = 
      Person("Alan", "Kay", new DateTime(1973,12,1,0,0,0,0))::
      Person("James", "Gosling", new DateTime(1991,12,1,0,0,0,0))::
      Person("Anders", "Hejlsberg", new DateTime(1999,12,1,0,0,0,0))::
      Person("Martin", "Odersky", new DateTime(2000,12,1,0,0,0,0))::
      Person("Guido", "Van Rossum", new DateTime(2002,12,1,0,0,0,0))::
      Person("Yukihiro", "Matsumoto", new DateTime(1990,12,1,0,0,0,0))::Nil
    
    val peopleOrderedByDob = people.sortWith((p1,p2) => p1.dateOfBirth < p2.dateOfBirth) 
    peopleOrderedByDob foreach println
    println
    
    val peopleOrderedNaturally = people.sortWith((p1,p2) => p1 < p2) 
    peopleOrderedNaturally foreach println
    println

    val peopleOrderedNaturallyWithSortBy = people.sortBy(person => person) 
    peopleOrderedNaturallyWithSortBy foreach println
    println    

    val peopleOrderedByDobWithSortBy = people.sortBy(person => person.dateOfBirth) 
    peopleOrderedNaturallyWithSortBy foreach println
    println    


  }

}

[Update: 2009-11-05]
I posted a question about the SomeOtherPersonOrderingObjectInadvertentlyImported overriding the Person extends Ordered[Person] natural ordering. Here is Martin Odersky's response:
Implicits that are explicitly declared or imported take precedence over implicits that come with the type. Btw you can find out what implicits are inserted by running scalac with option -Xprint:typer. This will print out the tree after type checking.

15 comments:

Tibi said...

Great post, you know how to tell a story! I loved the “silence of the lambdas” is it from you?

IMHO, implicits are pretty, but a thing of the devil.

TimA said...

Thank you for your comments.

I'm still finding implicts scary, but maybe that will pass.

"silence of the lambdas" did pop into my head as I was writing the post but google tells me its been used before.

Simon Chemouil said...

How do you "inadvertently" import an implicit object?

Either the implicit conversions are added to your existing imports because they were somehow included in a package object from an originating library (so it's the responsibility of the library provider and the implicits are helpers for that library), or you import a bad implicit explicitly, which is just as bad as calling the wrong method. (In most cases, you simply import the conversion you want, unless it's in Predef, and you're pretty safe from bad implicits). Imports can be scoped and thus smarter in Scala compared to Java, so you can make sure your implicit is only applied when you need it.

Either way, while there may be valid critics for implicits (for instance, conflicting implicit conversions, lack of clarity, dispatch on argument type could be better, etc), I fail to see what's wrong here.

When it comes to debugging, Scala provides a command line option to explicit implicit usage and we can hope that Scala development environments will very soon provides graphical hints.

I guess this is subjective, but from all your examples I found the Scala code more readable than the C#, and it also showed that Scala "this is just a library" approach is far more scalable. Proof is you don't have to wait for the EPFL to sort on date like you have to wait on Microsoft for all those C# 4.0 (and future versions) features.

Unknown said...

Kool-Aid is spelled with a K

TimA said...

@Simon: Thank you very much for your comments.

My feeling was that if the compiler shows an error when implicits are ambiguous then it should also complain when there is a conflict with between an implicit and the implicit that comes with the type.

Re: Inadvertently: Well something like:
import org.prototype.Import._
import org.scriptaculous.Import._

where both are big third party libraries, and scriptaculous builds on types in prototype, that I am using too. Perhaps the reality is different and its all very safe in practice. Right now I can't see it.

I posted a question on the Scala mailing list about this. http://old.nabble.com/Implicits-bug-(or-feature-)-to26213631.html
Here is Martin's answer:
Implicits that are explicitly declared or
imported take precedence over implicits that come with the type. Btw you can find out what implicits are inserted by running scalac wigth option -Xprint:typer. This will print out the tree after type checking.

That doesn't ease the worry!

> Scala code more readable than the C#,
Yes
> Scala "this is just a library" approach is far more scalable.
> Proof is you don't have to wait for the EPFL to sort on date
> like you have to wait on Microsoft for all those C# 4.0 (and
> future versions) features.
Absolutely.
Does it make me want a linq equivalent in Scala any less? No/
To me C#'s OrderBy feels cleaner/safer than the Scala way.
Do I think implicits have their place? Yes. Pimping is great.

I'm sorry to keep banging on about linq, but one thing that I didn't say in the post was that the linq approach doesn't actually execute the sort until the foreach, whereas Scala's SortBy executes immediatly. The result of OrderBy in linq is an execution plan not a list. This opens up a world of possibilities.

Thanks again. Its great to have feedback.
A+

TimA said...

@DBO Kool-Aid it is! Thanks.

Daniel said...

You keep saying sortWith is non-functional, but that's a huge mistake. The parameter to sortWith is what is called a "predicate", though this one does not return a Boolean, but an Integer.

A predicate denotes the conditions in which something is true, which is precisely what is happening.

The sortWith algorithm knows what to do, but to do so it needs to know when an element is less than another, ordering-wise.

Ordered, Ordering and Comparator, on the other hand, are just an interface to such predicates. On one hand, they make sorting naturally ordered classes easy. On the other hand, they add unnecessary boilerplate for custom ordering.

Simon Chemouil said...

> I'm sorry to keep banging on about linq, but one thing that I didn't say in the post was that the linq
> approach doesn't actually execute the sort until the foreach, whereas Scala's SortBy executes immediatly.
> The result of OrderBy in linq is an execution plan not a list. This opens up a world of possibilities.

I'm pretty sure it is possible to create a SortBy in Scala that's lazy, returns a view of the sortby operation and actually sorts the collection items only when you first use them. You can read Daniel Sobral's blog posts about the details of lazy operations on collections.

The advantage of the lazy approach is that it's within the language and available for a variety of operations, and not limited to data queries like Linq is.

Finally, when it comes to the "inadvertent" import problem, it looks to me the problem is the wildcard import, which is considered bad in Java and probably not the best idea in Scala either. The selective import "import com.company.{ObjectY, ClassX}" is safer in that regard.

I think that these worries will go away with proper tooling. A major advantage of statically typed languages is after all to have an IDE/Editor that actually has a clue about the semantics. Scala is still young but a great editor will certainly make the more obscure features of Scala more accessible. For instance, tooling can not only give visual hints (and type feedback) about implicit usage, but also provide a smart import feature that imports only what you actually use (eliminating the need for wildcard imports), and thus force you to explicitly import any implicit conversion without risks of importing too broad because of laziness (not Scala this time ;).

TimA said...

@Daniel. You are absolutley right, sortBy and sortWith are both functional in the FP sense. I suppose what I mean is that sortBy is more *semantically* richer than sortWith: You have to tell sortWith HOW to compare elements. You tell sortBy WHAT to order by.

@Simon:
A lazy sortBy would be interesting! Would also be interesting to see that work in combination with a ThenBy() equivalent
e.g. linq
list.OrderBy(p=>p.firstName)
.ThenBy(
(p=>p.lastName)

> Re: the wildcard import
The solution to wildcard imports is tooling: "Organize imports" in the ide.
The problem is third party libraries. If I "organize imports" in Java, I can end up with a list imports from a library. I do not want to have to check every import to see whether it might pull in a hidded implicit object that may override the implicit-that-comes-with-the-type on one of the many types I use.

Also see Martin's response to my concerns: http://old.nabble.com/Implicits-bug-%28or-feature-%29-td26213631.html.

Cheers
Tim

Daniel said...

Tim, in Scala is the collections which are lazy, not particular functions on them. With the exception of the upcoming "withFilter" on Scala 2.8.

On Scala 2.7, the "projection" method will usually return a non-strict version of the collection you are using -- which may be lazy (caching) or not.

On Scala 2.8, the equivalent method is called "view".

So, all you need to do to get a non-strict sort is convert the collection to a non-strict version by calling "projection" or "view" first.

For completeness sake, let me state that not all methods on non-strict collections are non-strict. The method "foreach", for instance, is always strict. If you want to convert back to a strict collection, you can call the "force" method.

TimA said...

@Daniel: Thanks for the info. The discussion about withFilter is interesting. I'll have to think of something contentious (or just plain wrong) to write about that next. Just kidding ;) Thanks for the discussion.

Optevo said...

Interesting - I've wondered how dangerous implicits can be from time to time. As you point out, the main problem is there is no error and that's what makes it hard to track down. I think this is somewhere where smarts in the IDE can help. Personally, I'd like the IDE to highlight where the implicits are being used and shows the types they are converted to when that piece of code is hovered over. The option Martin talked about, -Xprint:typer, can be used to extract this information too of course.

I was thinking about another option: an annotation that could *disable* implicits on a class/method basis might be handy too: It could help to identify the above problem and it would also avoid other problems that I've had with the Predef implicit conversions making life difficult for me.

Richard

Alex Black said...

Great post! Has anyone backported SortBy to 2.7.7, or is its source easily available, sounds like something I'd love to use having come from the world of C# :)

- Alex

Mohamed Bana said...

On hunting for ways to do

var counts = new Dictionary();

var sortedDict =
from entry in counts
orderby entry.Value descending, entry.Key ascending
select entry;

i.e., what Tim wrote above

list.OrderBy(p=>p.firstName)
.ThenBy(
(p=>p.lastName)


The closest I could come is below. I'm sure there's a better way of writing it without using `m.toList.sort((x,y) => if (x._2 == y._2) x._1 < y._1 else x._2 > y._2)`

scala> val m = Map("one"->1,"once"->1,"two"->2,"twice"->2,"double"->2,"three"->3)
m: scala.collection.immutable.Map[java.lang.String,Int] = Map(double -> 2, one -> 1, twice -> 2, two -> 2, three -> 3, once -> 1)

scala> m.toList.sortBy(_._2).reverse.groupBy(_._2).toList.map(_._2.sortBy(_._1) ).flatten
res32: List[(java.lang.String, Int)] = List((three,3), (double,2), (twice,2), (two,2), (once,1), (one,1))

scala> res32 foreach println
(three,3)
(double,2)
(twice,2)
(two,2)
(once,1)
(one,1)

Mohamed Bana said...

Ok I got an answer on the Scala mailing list

Aaron Harnly

for (
(a,b) <- m.toSeq.sortBy { case (a,b) => (-b, a) }
) yield a

Jorge Ortiz

m.toSeq.sortBy { case (key, value) => (-value, key) }