Saturday, September 28, 2013

Performance of Tail Recursion in Scala

This post evaluates the performance of Scala tail recursion comparatively to iterative methods

Overview
In Scala, the tail recursion is a commonly used technique to apply a transformation to the elements of a collection. The purpose of this post is to evaluate the performance degradation of the tail recursion comparatively to iterative based methods.

Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted

Test Benchmark
let's consider a "recursive" data transformation on an array using a sliding window. For the sake of simplicity, we create a simple polynomial transform on a array of values
  {X0, ... ,Xn, ... Xp}
with a window w, defined as
   f(Xn) = (n-1)Xn-1 + (n-2)Xn-2 + ... + (n-w)Xn-w
Such algorithms are widely used in signal processing and technical analysis of financial markets (i.e. moving average, filters).

def polynomial(values: Array[Int]): Int = 
  (if(values.size < W_SIZE) 
     values 
  else 
     values.takeRight(W_SIZE)
  ).sum

The first implementation of the polynomial transform is a tail recursion on each element Xn of the array. The transform f compute f (values(cursor) ) from the array values[0, ... , cursor-1] as describe in the code snippet below

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Evaluation(values: Array[Int]) {
  def recurse(f: Array[Int] => Int): Array[Int] = {

    @scala.annotation.tailrec
    def recurse(
      f: Array[Int] => Int, 
      cursor: Int, 
      results: Array[Int]): Boolean = {  
        
      if( cursor >= values.size) // exit condition
        true
      else {
        val arr = f(values.slice(cursor+1, cursor-W_SIZE))
        results.update(cursor, arr)
        recurse(f, cursor+1, results)
      }
    }

    val results = new Array[Int](values.size)
    recurse(f, 0, results)
    results
  }
}

The second implementation relies on the scanLeft method that return a cumulative of transformed value f(Xn).

def scan(f: Array[Int] => Int): Array[Int] = 
  values.zipWithIndex.scanLeft(0)((sum, vn) => 
     f(values.slice(vn._2+1, vn._2-W_SIZE))
  )

Finally, we implement the polynomial transform on the sliding array window with a map method.

def map(f: Array[Int] => Int): Array[Int] = 
   values.zipWithIndex.map(vn => 
     f(values.slice(vn._2+1, vn._2-W_SIZE))
   )

Results
For the test, each of those 3 methods is executed 1000 on a dual core i7 with 8 Gbyte RAM and MacOS X Mountain Lion 10.8. The first test consists of executing the 3 methods and varying the size of the array from 10 to 90. The test is repeated 5 times and the duration is measured in milliseconds.



The tail recursion is significantly faster than the two other methods. The scan methods (scan, scanLeft, scanRight) have significant overhead that cannot be "amortized" over a small array. It is worth noticing that the performance of map and scan are similar. The relative performance of those 3 methods is confirmed while testing with large size array (from 1,000,000 to 9,000,000 items).




References
Programming in Scala M. Odersky, L. Spoon, B. Venners - Artima 2010
Scala for the Impatient - C. Horstman - Addison-Wesley - 2012
Scala for Machine Learning - Patrick Nicolas - Packt Publishing

Tuesday, September 10, 2013

Variance Annotations in Scala

The purpose of this post, is to introduce the basics of parameterized classes, upper and lower bounds and variance subtyping in Scala.

Overview
Scala is a statically typed languages similar to Java and C++. Most of those concepts have been already introduced and commonly in Java with generics and type parameters wildcard. However Scala added some useful innovation regarding sub-typing covariance and contravariance. API designers and library developers are facing the dilemma of choosing between parameterized types and abstract types and which parameters should be invariant, covariant or contravariant.


Sub-classing
Like in Java, generic or parameterized type have been introduced to overcome the limitation of polymorphism. Moreover, understanding of concept behind the Scala type system help developers decipher and fix the occasional compilation failure related to typing.
Let's consider a simple collection class ,Container that adds and retrieves an array of items of type Item as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Item
case class SubItem() extends Item
 
class Container {
  def add(items: Array[Item]) : Unit = 
    items.foreach( add(_) )
  def retrieve(items: Array[Item]) : Unit =
    Range(0, items.size).foreach(items.update(_, retrieve))

  def add(item: Item) : Unit = { }
  def retrieve : Item = new SubItem
}

The client code can invoke the add (lines 5 & 10) and retrieve (lines 7 & 11) methods passing an array of elements of type Item. However passing an array of type SubItem inherited from Item (line 2) will generate a compiling error!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
val subItemsContainer = new Container
val array = Array[SubItem](new SubItem())
 
  // Compile error:
  // "type mismatch;  found   : Array[SubItem]  
  // required: Array[Item] 
  // Note: SubItem <: Item, but class Array is 
  // invariant in type T..."
 subItemsContainer.add(array)
 subItemsContainer.retrieve(Array[SubItem](new SubItem))

The Scala compiler throws a compiling error because the class Container does not have the adequate information about the sub type Item into SubItem in the add method (line 9). The invocation of the retrieve method, passing an argument of type SubItem generates also a compiler error.
Scala provides developers with upper bounds and lower bounds subtyping which is quite similar to the type variance available in Java.

Variance to the Rescue
The type T upper bounded by the type Item as defined by the following notation (line 1)
  T <: Item
can be processed as a 'producer' of type Item. The notation provides the Scala compiler that any type inheriting Item should be allowed to be passed as argument of the add method (lines 2 & 7).
Similarly, he type U with a lower bound type SubItem
   U >: SubItem
can be processed as a 'consumer' operation of type SubItem, in the retrieve method (lines 4 & 8).

1
2
3
4
5
6
7
8
9
class Container[T <: Item] {
  def add(items: Array[T]) : Unit = items.foreach( add(_) )
     
  def retrieve[U >: SubItem](items: Array[U]) : Unit = 
     Range(0, items.size)foreach(items.update(_, retrieve))
  
  def add(item: T): Unit = {}
  def retrieve: U = new SubItem
}

Container[T] is a covariant class for the type Item as argument of the method add. The method retrieve[U] is contravariant for the type SubItem.
This time around the compiler has the all the necessary information to subtype Item and super type SubItem, therefore the following code snippet won't generate a compiling error

val itemsContainer = new ContainerT[SubItem]
itemsContainer.add(Array[SubItem](new SubItem)))
itemsContainer.retrieve[Item](Array[Item](new SubItem))

Scala has a dedicated annotation for unvariance 'T', covariance '+T' and contravariance '-T'. Using this annotation the container class can be expressed as follow.

1
2
3
4
5
6
7
8
9
class Container[+T] {
  def add(items: Array[T]) : Unit = 
    items.foreach( add(_) )
  def retrieve[-U](items: Array[U]) : Unit = 
    Range(0, items.size).foreach(items.update(_, retrieve))
  
  def add(item: T) : Unit = { }
  def retrieve : U = new SubItem
}

In a nutshell, variances can be represented as functors:
Covariance: if (B inherit A) then (Container[B] inherit Container[A])
Contravariance: if( B inherit A) then (Container[A] super class of Container[B])
Univariance: No rule applies.
 

The subtyping rules above can be represented graphically


References
Programming in Scala M. Odersky, L. Spoon, B. Venners - Artima 2010
Effective Java - 2nd edition - J. Block Addison-Wesley 2008
github.com/prnicolas