Thursday, 15 April 2010

Scala asymmetrical function definitions

I was impressed by the clarity of the "Better Haskell solution" to this Odd Word problem. (Disclaimer: I don't know haskell, but I can just about read a simple example like the one in this problem.)

haskell:
import Data.List

 enumerate = zip $ cycle [0,1]

 oddwords x = foldl' add "" (enumerate ws) ++ "."
     where add "" (_,t) = t
           add s (0, t) = s ++ " " ++ t
           add s (1, t) = s ++ " " ++ reverse t
           ws = words $ takeWhile (/= '.') x


So I added a Scala version that is a translation of the Haskell which I'm repeating here which copies the essence of the Haskell algorithm as well as the names used:
object OddWordProblem1 {
  
    // define cycle as per Haskell as Scala does not define it 
    def cycle[T](seq: Seq[T]) = Stream.continually(seq).flatten

    def enumerate[T](words: Seq[T]) = cycle(List(0,1)) zip words   
    
    def oddwords(input : String) = {
        
        def add(stringSoFar : String, wordAndOddIndicator : (Int,String)) = 
            (stringSoFar, wordAndOddIndicator) match {
                case ("", (_, t)) => t
                case (s, (0, t)) => s + " " + t
                case (s, (1, t)) => s + " " + t.reverse
            }
        
        def ws(words : Seq[String]) = words takeWhile ("." !=)
        
        // split the string into words,  first making sure that "kansas." always becomes "kansas ."
        val words =  input.replaceFirst("""\.""", " .").split(" +").toList
        
        enumerate(ws(words)).foldLeft("")(add) + "."
    }

    
    def main(args : Array[String]) = {
        println(oddwords("whats the matter with kansas."))
    }
}



The core logic is a bit longer than the Haskell version because
(a) Its doing IO(!) and handling input corner cases and is self contained and runnable.
(b) It defines cycle which is no big deal
(c) It defines method parameters and their types
(d) It has more brackets than haskell as haskell does not use brackets for function application. Not much we can do about that.

So, we have to define method parameters and their types, or do we? Now it could be argued that defining the parameters and types is good for readability, but it would be nice to have the choice. But of course with inline anonymous functions, the parameters and types are not required. So instead of defining the add method separately, we can write,

def oddwords(input : String) = {

  def ws(words : Seq[String]) = words takeWhile ("." !=)

  val words =  input.replaceFirst("""\.""", " .").split(" +").toList

  enumerate(ws(words)).foldLeft(""){
    (_, _) match {
      case ("", (_, t)) => t
      case (s, (0, t)) => s + " " + t
      case (s, (1, t)) => s + " " + t.reverse
    }
  } + "."
} 

or to name the parameters without declaring the types,

// ...
  enumerate(ws(words)).foldLeft(""){
    (stringSoFar, wordAndOddIndicator) => 
      (stringSoFar, wordAndOddIndicator) match {
        case ("", (_, t)) => t
        case (s, (0, t)) => s + " " + t
        case (s, (1, t)) => s + " " + t.reverse
      }
  } + "."


So that's the asymmetry: Scala named function definitions require parameter names and types, and the anonymous functions don't, because the parameter types can usually be inferred by usage. The Haskell "where" syntax is rather nice and a similar feature in Scala would be great...

An imaginary Scala syntax for haskell style context based method definition:

// ... 
enumerate(ws(words)).foldLeft("")(add)
where {
  def add = 
    (_, _) match {
       case ("", (_, t)) => t
       case (s, (0, t)) => s + " " + t
       case (s, (1, t)) => s + " " + t.reverse
    }
  }  
  // other defs
}

It would just be an aliased anonymous functions which could (a) be used more than once in the expression and (b) simplify the overall expression compared to usage with anonymous functions.

Just a thought!

2 comments:

Alex Boisvert said...

By the way, there's also the option of declaring add as a function literal,

val add = (_: String, _: (Int,String)) match {
case ("", (_, t)) => t
case (s, (0, t)) => s + " " + t
case (s, (1, t)) => s + " " + t.reverse
}

where only the types are specified.

I agree it would be nice to be able to declare "add" after and benefit from contextual type inference.

Daniel said...

By the way, you can just split with the following pattern, and avoid all that business with ".":

"\\s+|(?=[.,;:!?])"

It will isolate all of the above punctuation, but, as opposed to the spaces, it will not remove it.