## Friday, August 2, 2013

### Analyzing Scala performance using javap

Overview
As mentioned in a previous post, iterators in Scala, such as foreach, for & while loop have different performance characteristics. A recurrent question between developers is the relative performance of Scala and Java regarding for and while.

The "for comprehension" closure in Scala is indeed a syntactic sugar wraps around a sequence of flatMap and map methods. Monad in Practice. It is expected to be significantly slower and should not be used as an iterator.
We select the higher method foreach to implement the for iterator in Scala.

Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted

Evaluation while
The simple evaluation consists of processing a large number of iterations of a very simple statement which execution does not interfere with the actual performance of the iterator. The code is also is easy to understand. The code for the while loop is described below.

def testRun(numIterations: Int) : Unit = {
val startTime = System.currentTimeMillis
var sum: Long = 0
var i: Int = 0

while(i < numIterations) {
var j = 0
while(j  < 1000) {
sum += 1
j += 1
}
i += 1
}
Console.printf(s"${(System.currentTimeMillis-startTime)}") }  The test is executed on a 4 cores Intel i7 CPU with java 1.7.02 64-bit and Scala 2.10.2. The chart below, compares the duration of the execution of the for iterator for Java and Scala. The performance of Scala and Java for executing the while loop are very similar. Evaluation for The second test consists of comparing the relative performance of Java for loop and Scala foreach higher order method. def testRun(numIterations: Int) : Unit = { var sum: Long = 0 (0 until numIterations.foreach( sum += _) }  Analysis using Javap The first step is to analyze the number of byte-code instructions generated by the Scala compiler. We use the Java Class File Disassembler, javap, to print out the actual instructions processed by the Java virtual machine. javap -c -verbose xxx.class The sequence of instructions generated for the execution of the while loops are displayed below. 0: lconst_0 1: lstore_1 2: iconst_0 3: istore_3 4: iload_3 5: sipush 1000 8: if_icmpge 42 11: iconst_0 12: istore 4 14: iload 4 16: sipush 1000 19: if_icmpge 35 22: lload_1 23: lconst_1 24: ladd 25: lstore_1 26: iload 4 28: iconst_1 29: iadd 30: istore 4 32: goto 14 35: iload_3 36: iconst_1 37: iadd 38: istore_3 39: goto 4 42: return  Dissambling the Java class for the code with the for iterators produces the following print out: 0: new #12; //class scala/runtime/LongRef 3: dup 4: lconst_0 5: invokespecial #16; //Method scala/runtime/LongRef."<init>":(J)V 8: astore_1 9: getstatic #22; //Field scala/runtime/RichInt$.MODULE$:Lscala/runtime/RichInt$;
12:  getstatic       #27; //Field scala/Predef$.MODULE$:Lscala/Predef$; 15: iconst_0 16: invokevirtual #31; //Method scala/Predef$.intWrapper:(I)I
19:  sipush  1000
22:  invokevirtual   #35; //Method scala/runtime/RichInt$.until$extension0:
(II Lscala/collection/immutable/Range;
25:  new             #37; //class JavaScala$$anonfuntestRun1 28: dup 29: aload_0 30: aload_1 31: invokespecial #40; //Method JavaScala$$anonfun$testRun$1."<init>
(LJavaScala;Lscala/runtime/LongRef;)V
34:  invokevirtual   #46; //Method scala/collection/immutable/Range.foreach$mVc$sp:
(Lscala/Function1;)V
37:  return


Although the number of instructions for the for loop is smaller than the number of instructions for while, most of those instructions are function calls:
- conversion of counter to long
- static conversion of int to RichInt to wrap Java Integer:
@inline implicit def intWrapper(x: Int)= new runtime.RichInt(x)
- ultimately the foreach method is invoked to execute the loop.

Interestingly enough, the foreach method for collection is implemented using the while loop not the for loop. Scala compiler plug-in such as ScalaCL to optimize the execution of iteration on Scala collections, Arrays, Lists,... have been introduced to get around this issue. The reader can also take comfort in using the Java Class File Dissambler to understand how a method or piece of code translate into efficient, or in our case, inefficient byte-code. Quite a few methods in Scala such as foldLeft, reduceLeft uses tail recursion to traverse collections. It would be interesting to compare the relative performance of those methods with alternatives using iterators.. stay tune.

References
Loop Performance and Local Variables in Scala
github.com/prnicolas