Friday, August 30, 2013

Breakable loops in Scala

Contrary to C++ and Java, Scala does not allow developer to exit an iterative execution prematurely using a syntax equivalent to break.

var sum = 0

for( i <- 0 until 100) {
  sum += i
  if( sum > 400) 
    break   // won't compile!                                  

Scala purists tend to stay away from this type of constructs and use higher order collection methods such as
   exists( p: (T) => Boolean)
   find( p: (T) => Boolean)
   takeWhile( p: (T) => Boolean)
However these methods are not available outside collections. There are cases where a `break` construct may be a simpler solution.
This post review the different options to 'break' from a loop according to a predefined condition.

Breakable control
Although breaking out of a loop may not be appealing to "pure" functional developer, the Scala language provides former Java and C++ developer with the option to use break and continue statements. The breakable statement define the scope for which break is valid.

import scala.util.control.Breaks._                            

var sum = 0
breakable { 
  for (i <- 0 until 100 ) {
    sum += i
    if( sum > 55) 

Any experienced developer would be quite reluctant to use such an idiom: beside the fact that the accumulator is defined as a variable, the implementation is unnecessary convoluted adding a wrapper around the loop. Let's try to find a more functional like approach to breaking out of any loop

scan & fold to the rescue
Luckily, Scala provides collections with functional traversal patterns that can be used and chained to break, elegantly from a loop. The following code snippets introduce applies some of those traversal patterns to an array and a associative map to illustrate the overall "functional" approach to iteration.
Let's consider the problem of extracting the elements of an array or map before a predefined condition is met for an element. For instance, let's extract the elements of an array or a hash map until one element has the value 0. The Scala programming language provide us with the takeWhile method (lines 7 & 10) that allows to to end traversing a collection and return a elements of the collection visited when a condition is met.

val values = Array[Int](9, 45, 11, 0, 67, 33)

val mappedValues = Map[String, Int](
  "A"->2, "B"->45, "C"->67, "D"->0, "E"->56
 // Extract a subarray of values until the condition is met
values takeWhile( _ != 0) // -> Array(9, 45, 11)
  // Extract a submap until the condition is met
mappedValues takeWhile( _._2 != 0) 
  // -> Map(E -> 56, A -> 2, B -> 45, C -> 67)

The second case consists in accumulating values until a condition on the accumulator is met. Contrary to fold and reduce methods which apply a function (i.e summation) to all the elements of a collection to return a single value, scan, scanLeft and scanRight methods return a collection of values processed by the function.

In the following example, the invocation of the higher order method scanLeft (lines 4 generates an array and a hash map with the cumulative values. The takeWhile method i(lines 5 & 8) s then applied to the resulting array or map to return a collection of cumulative values until the accumulator exceeds 56. The same invocation can be used to return the element which push the accumulator beyond the threshold.

values.scanLeft(0)(_ + _).takeWhile (_ < 56) 
   //-> Array(0, 9, 54)
mappedValues.scanLeft(0) (_ + _._2)
   .takeWhile( _ < 56)
val indexTargetEl = values.scanLeft(0)(_ + _)
      .takeWhile (_ < 56).size -1            
val targetEl = values(indexTargetEl)

Tail recursion
A third and elegant alternative to exit from a loop is using the tail recursion which is supported natively in the Scala language. The first method, findValue exits the loop when a condition on the value is reached (line 5). The second method, findCummulative, is implemented as a closure and exits when the sum of elements exceeds a predefined value (line 15).

val values = Array[Int](9, 45, 11, 0, 67, 33)
def findValue(values: Array[Int], index: Int) : Int = {
   if( values(index) == 0 || index >= values.size)
     findValue(values, index + 1)
val newValues = values.slice(0, findValue(values, 0))   

val MAX_VALUE :Int = 86

def findCumulative(sum: Int, index: Int): Int = {
   if( index >= values.size ||sum >= MAX_VALUE)
     index -1
     findCumulative(sum + values(index), index + 1)

val newValues2 = values.slice(0, findCumulative(0, 0))

This concludes our review of three different Scala constructs available to developer to exit prematurely from a loop
  • breakable construct
  • Higher order method such as exists, find, takeWhile....
  • tail recursion

Programming in Scala M. Odersky, L. Spoon, B. Venners - Artima 2010

Sunday, August 18, 2013

Symbolic Regression for Data Modeling

Symbolic Regression allows domain experts to create, add modify rules or policies extracted from data. The most commonly used algorithms used in Symbolic Regression are:
- Genetic Algorithms
- Learning Classifiers Systems

Symbolic regression is used in many application ranging from network performance optimization, predicting failure (MTBF), streaming data to detecting security breaches.

The following presentation describes the main components, benefits and drawbacks of symbolic regression.

Genetic Programming: on the Programming of Computers by Means of Natural Selection - J. Koza - MIT Press 1992
Reinforcement Learning: An introduction (Adaptive Computation and Machine learning) - R. Sutton, A. Barto - MIT Press 1998

Friday, August 2, 2013

Analyzing Scala performance using javap

As mentioned in a previous post, iterators in Scala, such as foreach, for & while loop have different performance characteristics. A recurrent question between developers is the relative performance of Scala and Java regarding for and while.

The "for comprehension" closure in Scala is indeed a syntactic sugar wraps around a sequence of flatMap and map methods. Monad in Practice. It is expected to be significantly slower and should not be used as an iterator.
We select the higher method foreach to implement the for iterator in Scala.

Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted

Evaluation while
The simple evaluation consists of processing a large number of iterations of a very simple statement which execution does not interfere with the actual performance of the iterator. The code is also is easy to understand. The code for the while loop is described below.

def testRun(numIterations: Int) : Unit = {
  val startTime = System.currentTimeMillis
  var sum: Long = 0
  var i: Int = 0

  while(i < numIterations) {
    var j = 0
    while(j  < 1000) {
      sum += 1
      j += 1
     i += 1

The test is executed on a 4 cores Intel i7 CPU with java 1.7.02 64-bit and Scala 2.10.2. The chart below, compares the duration of the execution of the for iterator for Java and Scala.

The performance of Scala and Java for executing the while loop are very similar.

Evaluation for
The second test consists of comparing the relative performance of Java for loop and Scala foreach higher order method.

def testRun(numIterations: Int) : Unit = {
  var sum: Long = 0
  (0 until numIterations.foreach( sum += _)

Analysis using Javap
The first step is to analyze the number of byte-code instructions generated by the Scala compiler. We use the Java Class File Disassembler,  javap, to print out the actual instructions processed by the Java virtual machine.
javap -c -verbose xxx.class
The sequence of instructions generated for the execution of the while loops are displayed below.

0:   lconst_0
1:   lstore_1
2:   iconst_0
3:   istore_3
4:   iload_3
5:   sipush  1000
8:   if_icmpge       42
11:  iconst_0
12:  istore  4
14:  iload   4
16:  sipush  1000
19:  if_icmpge       35
22:  lload_1
23:  lconst_1
24:  ladd
25:  lstore_1
26:  iload   4
28:  iconst_1
29:  iadd
30:  istore  4
32:  goto    14
35:  iload_3
36:  iconst_1
37:  iadd
38:  istore_3
39:  goto    4
42:  return

Dissambling the Java class for the code with the for iterators produces the following print out:

0:   new     #12; //class scala/runtime/LongRef
3:   dup
4:   lconst_0
5:   invokespecial   #16; //Method scala/runtime/LongRef."<init>":(J)V
8:   astore_1
9:   getstatic       #22; //Field scala/runtime/RichInt$.MODULE$:Lscala/runtime/RichInt$;
12:  getstatic       #27; //Field scala/Predef$.MODULE$:Lscala/Predef$;
15:  iconst_0
16:  invokevirtual   #31; //Method scala/Predef$.intWrapper:(I)I
19:  sipush  1000
22:  invokevirtual   #35; //Method scala/runtime/RichInt$.until$extension0:
                            (II Lscala/collection/immutable/Range;
25:  new             #37; //class JavaScala$$anonfun$testRun$1
28:  dup
29:  aload_0
30:  aload_1
31:  invokespecial   #40; //Method JavaScala$$anonfun$testRun$1."<init>
34:  invokevirtual   #46; //Method scala/collection/immutable/Range.foreach$mVc$sp:
37:  return

Although the number of instructions for the for loop is smaller than the number of instructions for while, most of those instructions are function calls:
- conversion of counter to long
- static conversion of int to RichInt to wrap Java Integer:
@inline implicit def intWrapper(x: Int)= new runtime.RichInt(x)
- ultimately the foreach method is invoked to execute the loop.

Interestingly enough, the foreach method for collection is implemented using the while loop not the for loop. Scala compiler plug-in such as ScalaCL to optimize the execution of iteration on Scala collections, Arrays, Lists,... have been introduced to get around this issue. The reader can also take comfort in using the Java Class File Dissambler to understand how a method or piece of code translate into efficient, or in our case, inefficient byte-code. Quite a few methods in Scala such as foldLeft, reduceLeft uses tail recursion to traverse collections. It would be interesting to compare the relative performance of those methods with alternatives using iterators.. stay tune.

Loop Performance and Local Variables in Scala