Sunday 1 November 2009

Scala almost as good as C# .NET4?

Disclaimer: I am primarily a Java developer with some experience in C# NET 2, Ruby and Javascript, + few of others from the C derived stable. I'm learning Scala and I am not using it in paid work yet. (Offers welcome!)

So far my posts have been about Scala, and learning about functional programming, but this is a bit of a diversion. In the previous post Becoming really rich with scala, I translated a very nice example of some C# .NET 4 code to Scala using alot of idiomatic Scala. The C# code stands up very well in comparison. Much to my surprise, I think that C# has the edge overall although its a close call. I dread to think what the equivalent code would look like in Java. This exercise made me realise just how far behind Java is compared to C# especially the version coming in .NET4. While the java world has been bickering about how to do proper closures without breaking backwards compatibility, C# added them long ago. Then C# added linq and updated the libraries to use the new features. Now Plinq is being built on that foundation. IMO, this combination has moved C# up to a totally different level compared to Java.

The C# features that caught my eye are:

a) The functional features and linq can make C# as *readably* concise as Scala, and all the imperative stuff is still there if you want it.

b) The way that the well known SQL vocabulary "select", "where", "orderBy" has been used C# instead of "map" and "filter" and "sortBy" in Scala (and others). This is subtle point but it seems very important to me. C#/Java developers are familiar with SQL but not necessaryily map, filter, etc. This choice of familiar vocabulary is important.

c) C# OrderBy(s => s.LRS) vs Scala .sortWith((elem1, elem2) => elem1.LRS <= elem2.LRS). This small point may appears trivial, but IMO is a huge mindset win for C#. In list.OrderBy(s => s.LRS), you are saying *what* you want and not, as in Scala, not *how* to do it; list.sortWith((elem1, elem2) => elem1.LRS <= elem2.LRS). This principle is what makes SQL so powerful, but of course here is is being used with C# lists. I'm sure this is just the tip of the iceberg for the way that this principle can be used.

d) [apologies: I added this point since posting the original on which the first 8 comments are based]. Nullable types. This is a sane way of dealing with arithmetic where variables can be null. Java just offers huge amounts of ==null? boilerplate. Scala offers Option[Double/etc] which is a start, but you can't do arithmetic expressions using Option variables out-of-the-box. Its *fairly* easy to add arithmetic expressions to Option types but its not in the core libraries.


The Java world makes a lot of noise about the other JVM languages that are "better" in some way that Java: JavaFX, Clojure, Groovy, JRuby, Fan, or Scala. But C# .NET4 has raised the bar up very very high with C# 4. Nobody in the alternative JVM language world should feel complacent! (Well (except Rich Hickey, ((!)))).

24 comments:

Ismael Juma said...

Hi,

You gave 3 reasons. a) is equally applicable to Scala, Scala has sortBy[1] so c) is not an advantage for C#. That leaves b) which is quite subjective.

Best,
Ismael

[1] https://lampsvn.epfl.ch/trac/scala/changeset/19248

Ross McDonald said...

Hi.

Let us not forget about the 'theory of evolution'. Languages evolve, and leaner newer ones are in theory genetically superior, as opposed to bloated, patched up ones.

Furthermore, 'SQL' concepts are designed for use in manipulating 'relational' data and it is becoming clear that 'relational by default' is a paradigm rapidly falling out of favour.

Just a couple of arguments to counter your own offering :-)

Regards,

Ross

Anonymous said...

I might be talking out of my ass since I haven't actually used C# myself, but it does seem patched when I see code using some of the new features. Scala on the other hand, is elegant. But it may be just personal taste.

TimA said...

@Ismael: Thanks for for pointing out sortBy, I'm pleased to see it appear in Scala. A Couple of comments:
a) sortBy was just made unexperimental 10 days ago. OrderBy was in .net 3.5.
b) minor niggle: there is no sortByDescending, but sortBy(...).reverse will do for now
c) the way that c# and scala implement it is fundamentaly different. The scala implementation has an extra hidden 'implicit' parameter implicits which requires the compiler to provide the an instance of a suitable Ordering class. I find the Scala approach scary! but maybe thats just me.

The new Scala sortBy worked here:
val top15perc =
summaries
.filter(s => s.LRS.isDefined)
.sortBy(elem1 => elem1.LRS.get).reverse
.take((len * 0.15).toInt)

val bottom15perc =
summaries
.filter(s => s.LRS.isDefined)
.sortBy(elem1 => elem1.LRS.get)
.take((len * 0.15).toInt)

but not here:
return _adjDictionary.toList
.sortBy(elem1 => elem1._1)
.take(200).map(keyValue => keyValue._2).average

The compiler says "type arguments [org.scala_tools.time.imports.DateTime] do not conform to method ordered's type parameter bounds [A <: Ordered[A]]. I'm guessing this can be fixed by adding the Ordered trait to Scala-Time's RichDateTime, but my quick attempt to do this failed.

So sure, Scala looks like it has the potential to get this right. I certainly hope so.
If you look at C# also has Average and Sum as list methods big deal, yes but wait.... I believe, C# OrderBy, Sum and Average all work by using "field selectors". I *think* that C# implements this using linq/AST magic, (but I may be wrong). This general mechanism seems simpler that the implicits mechanism which looks specific to sortBy, but maybe that is just my lack of understanding of the implicits.

On your other points, on the imperative side, the lack of continue in loops is a very be a minor annoyance, (but not one I care about particularly.) The lack of a step on the "for loop" is a slight surprise intitially but can be worked around with Iteror.iterate. This only really matters for old cranky imperative programmers, but there are quite a few of us about!

Thank you for your comments!

Ismael Juma said...

Tim,

I am aware that sortBy was added to trunk not long ago. Since you were talking about C# .NET4, which will only be released in March 2010, I thought it was fair play to include things that will be part of the upcoming Scala 2.8.0 (planned to be out before then, but who knows).

"The scala implementation has an extra hidden 'implicit' parameter implicits which requires the compiler to provide the an instance of a suitable Ordering class. I find the Scala approach scary! but maybe thats just me."

I don't see why.

"I'm guessing this can be fixed by adding the Ordered trait to Scala-Time's RichDateTime"

Indeed.

"If you look at C# also has Average and Sum as list methods"

For what is worth, Scala has sum (again in 2.8.0).

"I *think* that C# implements this using linq/AST magic, (but I may be wrong)."

It would be nice to have similar dynamic expression manipulation in Scala, but not sure if this case is the best example why as implicits seem to do a decent job.

"This general mechanism seems simpler that the implicits mechanism which looks specific to sortBy, but maybe that is just my lack of understanding of the implicits."

Not sure what you mean here. Implicits are not specific to sortBy.

"On your other points, on the imperative side, the lack of continue in loops is a very be a minor annoyance"

True.

"The lack of a step on the "for loop" is a slight surprise intitially but can be worked around with Iteror.iterate."

You know that Range includes a step parameter, right? So:

for (i <- 1 to 10 by 3)

Best,
Ismael

TimA said...

@Ismael:
Thanks again for you comments.

'by' works for Int not for Scala-Time's DateTime. I should have clarified what I meant. Apologies, my bad.

I wanted to step by Scala-Time's DateTime/Interval which does not implement 'to' or 'until' or 'by'. That would be a very natural fit if it did. (Are you there Jorge?!)

In the end I used the new-in-2.8 Iterator.iterate for this
val dates = Iterator.iterate(today)(_ - 7.days)
.takeWhile (d => d >= (_start + 12.days) && d >= limit)
.toList

But the neat solution would be if Scala-Time implemented to/until/by.

Ismael Juma said...

Oh I see. Yes, that's true.

There is a GenericRange in Scala 2.8, maybe Jorge can use that in the joda-time wrapper.

Ismael

HackerHacker said...

Yes, C# is using "linq/AST magic", and that can make a program fail at run-time since some expressions might not be expressed in SQL.

I guess that's what it has Sum() and alike too, since it's SQL standard and much faster than accumulating in any other way.

J. Suereth said...

Seems to me you're starting to compare apples to oranges a bit. A lot of the examples about Scala-Time are examples of C# having a better *library* than scala, but this could of course be fixed. You could even endeavor to add the "step" function to scala-time yourself, if you were so inclined. I do believe this is one of two weaknesses in scala at the current time: There aren't a lot of libraries for *just* scala, so you end up using something designed for java -> not as nice.

TimA said...

@J. Suereth

Thank you for your comment.

I agree with you. I suppose I am stepping back and looking at the whole package on both sides. VS2010 beta2 C# has alot of mature libraries built in (this is MS$ after all!) and Scala2.8pre has fewer (this is epfl+friends after all!). I hope that that will change. For now, at least it seems that C#.net4
is the more complete package for the problem that I was looking at. Scala has the (great) potential to get all this stuff right. Point (d) that I just added to my original post is something that IMO should be build into the core Scala libraries. I'm sure that will come. Hopefully debate like this will encourage people to chip in!

Unknown said...

On the other hand, shouldn't be impossible to create a SQL-like collections library in Scala. You don't need to wait for Martin :)
And probably, that's one major difference: AFAIK, LINQ has to be bolted in the internals of C#, while in Scala could be more or less "just a library" (but then, I may be wrong :) )

HackerHacker said...

Gabriel and Tim:

I'm afraid you don't see the big picture here.

Unless I'm completely mistaken, Linq2SQL wasn't possible before they added Expressions. C# Expressions is a bit like Lisp macros. If you don't know Lisp, go learn it. It's good for you.

Something along the lines of this:

var ages = dc.Persons.Select(person => person.Age)

is translated to SQL as

select age from persons

and var ageSum = dc.Persons.Sum()

is translated to SQL as

select sum(age) from persons


I've not read my Scala book cover to cover yet, but nothing I've seen so far allows Scala code to go from person => person.Age to the "symbol" Age.

TimA said...

@Tommy:
Thank you for your comments.

"Unless I'm completely mistaken, Linq2SQL wasn't possible before they added Expressions."
Did I suggest it was?

ScalaQL http://www.cs.uwm.edu/~dspiewak/papers/scalaql.pdf by Daniel Spiewak and Tian Zhao describes something that looks like it can do most of what LINQ without expressions. Unfortunatley this is just research and no code has been released. According to the next reference there are some holes in what ScalaQL can do but I don't understand the detail.
To confuse matters more there is another research paper about something similar called also called ScalaQL http://www.sts.tu-harburg.de/people/mi.garcia/ScalaQL/. But that one uses a compiler plugin to extend the Scala language to basically recreate LINQ I think. Again I don't think code has been released.

But, I think all this talk of SQL is clouding the issue....


@all: LINQ does not mean SQL embedded in your code. Repeat three times!
LINQ is a compiler technology that allows the compilation of a certain type of syntactic sugar.
"LINQ providers" then back up the syntax to do something hopefully useful.
In .NET, there are LINQ providers for
LINQ to SQL
LINQ to XML
LINQ to Objects
...

LINQ to Objects is BY FAR the the most exiting because it (potentially) brings the declarative
power of set based operations to sequences of objects. The great attraction of this
syntactic sugar is that it lets programmers say what they want, and not how to do it by
using a familiar syntax plagairised from SQL.
Write this in imperative Java or C#!

var =
from
listOfPeople p,
join c in listOfCompanies
on p.com_id equals c.com_id
where
p.isOfWorkingAge = true
group by
c.name, p.sex
select new {
c.name,
p.sex,
max(p.income) as maxincome
}

Wouldn't *you* like to have that possibilty in your toolbox?

*The actual syntax does NOT matter too much as long as there is not too much unnecessary NOISE.*

If Scala can achieve 90% of this with the implict trick used in sortBy then great, bring it on.

If ScalaQL from Daniel and Ting can achieve 90% of this with their ScalaQL then great, bring it on.

If ScalaQL from Miguel Garcia et al. can achieve 100% of this with a compiler then great, bring it on.

What I know is I want that tool in my toolbox and I can't see it yet in Scala.

(Please forget LINQ to SQL! Its not relevant to this discussion. Well not for me anyway)

Thanks again for the comments.

HackerHacker said...

@Tim: No, it was Gabriel that wrote about Linq was being bolted on top of C#.

I do agree with you that C# Expressions and the focus is not what Scala should focus on.

We all want a nice query system like in the example you wrote.

If that can be done by changing the existing collection classes rather than adding obscure-ish new features to language, then the better.

Thanks all for the comments, I didn't mean to sound snobby.

HackerHacker said...

"and the focus on SQL", typo...

TimA said...

@TommyHacker @all: Thanks for a lively debate!

TimA said...

@Ismael: it turns out that the reason that my quick attempt to make sortBy work with joda DateTime didn't work was because RichDateTime needs to extend Ordered[RichDateTime] (not Ordered[DateTime]).
In that case you can say:
l.sortBy(e => RichDateTime(e))
which is not ideal but works.

It can also be made to work if there is an implicit object with trait Ordering[DateTime] in scope.

implicit object DateTimeOrderingObject extends Ordering[DateTime] {
def compare(x: DateTime, y: DateTime) = x.compareTo(y)
}

In that case you can say:
l.sortBy(e => e)

which is better.

Another feature to add to Scala-Time.

Looking at linq-to-objects, all linq requires is that the selected field type implements IComparable.

Unknown said...

Tommy,
I didn't mean to dismiss C#, I think is a fine language and as opposed to Java, is not afraid of evolving. I think the work being done by Microsoft Research labs is great. Taking a popular language and (almost) seamlessly adding something like LINQ is a big achievement.
But what I'm trying to say is: in Lisp, you can use macros to add any kind of constructs you want. In Scala many of the constructs in the language are just a library and is pretty easy to new ones and define your own domain specific language.
Can you do the same in C# ( by using expressions or something else) or you're constrained to LINQ ?

And thanks for the recommendation, I'm already in the process (slowly though) of learning common lisp and clojure, anyway my preference is for static typed languages like Scala, F#, or Haskell (if you don't know Haskell, go learn it, It's good for you)

Anonymous said...

Hi Tim,

Nice post. Thanks for opening up this discussion.

Just a minor syntactic note: You might find the ubiquitous _ handy in tidying your Scala. For example you can change this:

val top15perc =
summaries
.filter(s => s.LRS.isDefined)
.sortBy(elem1 => elem1.LRS.get).reverse
.take((len * 0.15).toInt)

to this:

val top15perc =
summaries
.filter(_.LRS.isDefined)
.sortBy(_.LRS.get).reverse
.take((len * 0.15).toInt)

To veer into more a matter of taste, you can also omit the . for something like this:
val top15perc =
summaries
filter (_.LRS.isDefined)
sortBy (_.LRS.get) reverse
take ((len * 0.15).toInt)

which I think is pretty nice.

Unknown said...

@Tim Azzopardi: "there is no sortByDescending".

That's because it's redundant:

scala> List("a", "aa", "aaa") sortBy { _.length }
res1: List[java.lang.String] = List(a, aa, aaa)

scala> List("a", "aa", "aaa") sortBy { -_.length }
res2: List[java.lang.String] = List(aaa, aa, a)

Hendry Luk said...

Personally I think Linq (especially Linq-to-object) is quite a small implication of C# language features. All the capabilities mentioned above are focused primarily in just how we access collections, which makes for a really small impact to overall application development.
But the building block of Linq (lambda and expression-tree) is a much more important feature of C#3.0 that completely changed the game ever since. Although it was invented primarily to enable Linq, expression-tree has really been used extensively in interesting ways in various areas. It's all because they have the capability to treat code as asset, instead of just an executable instructions. The language gives the developer the capability to "read" the expression within a function, and intepret it as data. There's no way you can do this in Java without modifying your compiler, and enable your code to read through the bytecode to produce expression trees of your function to be used as data at runtime.
I'm really frustrated with the fact that closure has almost been certainly excluded from Java7. While expression-tree support for scala is still under development by community (e.g. jaque). I really can't understand all the noises and the hold-ups to incorporate closure to java.
Back to Linq. This has nothing to do with Sql or Relational trend. In fact NoSql movement just emphasizes the urgency of having linq in java. Look at how document-dbs (e.g. mongo) have been used in c# painlessly using Linq, and dynamic. Which is amazingly the same way they access relational db, which by the way does not have to be linq-to-sql. Compare Linq in Nhibernate with Hql in Hibernate.
You have Hql(Hibernate), Javascript(Mongo), REST/SOAP(web-service) in Java, whereas they only have Linq for anything in .net.
The quest to Linq is not a short one. This is the homework we still have to work on just to make Linq possible at all on java:
- Property
- Closure
- Expression Tree
- Extension Method
- Generics (the real one)
- Type inference
- Anonymous Class

All of which have long been available in C#3! Only couple of them available in Scala, and none in Java. In addition, you'll still need Dynamic to make it even useful for anything beyond relational db (e.g. NoSql, JSON, XML).
Closure was supposed to be one small step toward this direction. I was really pissed off when I discovered that not even this small step end up to make it to Java7. Why so much noise and bureaucratic process for years just to add this one language feature!? That's the downside of Java's democratic culture. At the pace of how they go, it doesnt seem likely for Java to even reach Linq before 2015, while at the moment today c#4 has had Dynamic, theorem-prover (design-by-contract), Parallel (including PLinq), Reactive-extension... and C#5 is just around the corner with metaprogramming and compiler-as-a-service.
It's now really the time to realize how far we've been left behind.
Regardling Lisp, macro, mixin, and all that, I intentionally did not discuss Boo (another .net language that supports those capabilitites), and .net dynamic languages (IronRuby, IronPython), because they're a different class of languages intended to satisfy different segment. And so is F#. I think it's only fair to compare C# and Java, and maybe Scala, being mainstream strict typed static languages. They all share the same language characteristics, it's just how far they go in maximising expressiveness and reducing language noise.

Hendry Luk said...

I think Linq (especially to-object) is only a small implication of its underlying building blocks. All capabilities above are only focused on how we access collections, its not the big picture.
The building block of Linq (lambda and expression-tree) is the more important thing, added since C#3.0, it completely changed the game ever since. Although it was invented primarily to enable Linq, expression-tree has really been used in interesting ways in various areas, where we can now treat code as asset, not merely executable instructions. It gives us the ability to "read" the expression within a method, and intepret it as data. It's used to build MVC UI, write business-rule, configure your framework(replacing xml), generate sql/web-service.
Expression-tree for scala, that's still under development by community (jaque), looks promising.

Hendry Luk said...

Back to Linq. This thing has nothing to do with Sql or Relational trend. In fact, NoSql only emphasizes the urgency of having linq in java. Look at how they've been using document-dbs in .net using Linq, and dynamic. Which is amazingly the same way as they access object collections or relational db. In relational front, compare Linq in Nhibernate with ugly Hql/criteria-api in Hibernate.
In java, you might migrate from Hql to mongoDB Javascript, or even to amazon-api, whereas in .net there's only Linq.
The quest to Linq is not a short one. This is the homework we still have to work on just to make Linq possible at all on java:
- Property
- Closure
- Expression Tree
- Extension Method
- Generics (the real one)
- Type inference
- Anonymous Class

They all have long been there since C#3! Only couple of them are in Scala, and none in Java. In addition, you'll still need Dynamic to make it even useful for anything beyond relational db (e.g. NoSql, JSON, XML).
Closure was supposed to be one small step toward this direction. I was really pissed off when it turnes out not even this small step makes it to Java7. Why years of so much noise and bureaucratic process just to add this one thing!? At this pace, it doesnt seem likely we'll even reach Linq before 2015, while today c#4 has already had Dynamic, theorem-prover (design-by-contract), Parallel (including PLinq), Reactive-extension... and C#5 is just around the corner with metaprogramming and compiler-as-a-service. It's time to realize how far we've been left behind.

Hendry Luk said...

Regardling Lisp, macro, mixin, and all that, I intentionally did not discuss Boo (another .net language that supports those capabilitites), and .net dynamic languages (IronRuby, IronPython), because they're a different class of languages intended to satisfy different segment. And so is F#. I think it's only fair to compare C# and Java, and maybe Scala, being mainstream strict typed static languages. They all share the same language characteristics, it's just how far they go in maximising expressiveness and reducing language noise.