Target audience: Beginner
Estimated reading time: 10'
The Scala programming language provides software developers with several options to iterate through the elements of a collection:
This post attempts to quantify the overhead of the most commonly used iterative methods in Scala and demonstrate the effectiveness of the higher order methods map and foldLeft
- for,while loops and foreach ( x => f(x)) higher order function.
- map[Y] { f: X => Y) : Collection[Y] that creates a new collection by applying a function f to each element of the collection
- foldLeft[Y] (y : Y)(f : (Y,X)=>Y) : Y) (resp. foldRight[Y] (y : Y)(f : (X,Y)=>Y) : Y) that applies a binary operator to each element of this collection starting left to right (resp. right to left)
This post attempts to quantify the overhead of the most commonly used iterative methods in Scala and demonstrate the effectiveness of the higher order methods map and foldLeft
Scala loops for summation
Data transformation
References
The test runs are executed on a 'plain vanilla' dual core i3 2.1 Ghz running Linux CentOs 6.0.
The first test consists of comparing compare the performance of the different options to traverse an array of Float with size varies from 2,000,000 to 40,000,000 elements then apply an operation += z to each of its members. The options are
foreach (line 6)
for loop (line 9)
while loop (lines 14 - 16)
foldLeft (line 19)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | val rGen = new scala.util.Random(System.currentTimeMillis) var data = Array.fill(size)(rGen.nextFloat) var sum = 0.0 // Higher order method data.foreach(sum += _) // for loop for( x <- data) sum += x // while loop var k = 0 val len = data.size while( k < len) { sum += data(k) k += 1 } // fold sum = data.foldLeft(0.0)((x, z) => x + z) |
The test is repeated 25 times in order to reduce variance and noise generated by the garbage collector. The first 5 iterations are discarded to avoid the overhead of the initialization of the JVM. The mean value of the execution time for each method is computed for different size of an array from 2,000,000 to 40,000,000 floating point values (type Float). The results of the test are plotted in the graph below. The unit of time on the Y-coordinate is milliseconds.
The for, while and foreach expression have very similar performance.
The foldLeft is significantly faster (ratio 1:6)
The for, while and foreach expression have very similar performance.
The foldLeft is significantly faster (ratio 1:6)
Data transformation
The second test consists of comparing the performance of
- foreach: "fills-up" iteratively a mutable array of type ArrayBuffer (line 3)
- foreach: creates and updates a copy of the original array (immutable approach)(lines 7 & 8)
- map: transform the original array into an array of square values (line 11)
1 2 3 4 5 6 7 8 9 10 11 | // foreach with mutable array buffer val newData = new mutable.ArrayBuffer[Float] data.foreach( (x: Float) => newData.append(x *x)) val result = newData.toArray // foreach with update of immmutable array val pData = Array.fill(sz)(0.0) data.foreach( pData.update(i, _) ) // map val nData = data.map((x:Float) => x*x) |
Let's run the same test (with the same setup defined in the previous section).
The test shows that the method dedicated to convert an array to other array by applying a natural transformation, map is by far the most efficient.
The methods dedicated to a specific task such as foldLeft for summation and map for data transformation are far more effective that the "plain vanilla" loop constructors. The tests are conducted with Scala 2.10.2
Important Notes:
The syntax or construct for has a very different meaning in Scsla as in C or Java. It is actually a wrapper or syntactic sugar layer around the monadic chain of flatMap and map transformation as follows
The methods dedicated to a specific task such as foldLeft for summation and map for data transformation are far more effective that the "plain vanilla" loop constructors. The tests are conducted with Scala 2.10.2
Important Notes:
The syntax or construct for has a very different meaning in Scsla as in C or Java. It is actually a wrapper or syntactic sugar layer around the monadic chain of flatMap and map transformation as follows
for (
a <- f(x) // flatMap
b <- g(a) // flatMap
c <- h(b) // map
) yield { }
A more elaborate and time consuming benchmark would consist of running multiple tests using several boolean (< !=..) and numeric (+, *, x => sin(x) ..) operators and computes the normalize mean and variance.References
- Scala for the Impatient - C. Horstman - Addison-Wesley 2012
- github.com/prnicolas
your "for-loop" is a very bad example, it is no normal C loop. If you realy wanted to compare C loops against scala iterators you would have to use a while.
ReplyDeletelike this:
var i = 0
while(i < data.length()){ ...; i += 1}
I have no idea what this article is supposed to prove.
ReplyDeleteFirst of all, running a test only 10 times is not enough to guarantee that JIT has warmed up. But I'm going to let it pass. It's an understandable mistake.
In the first test, the foreach loop does absolutely nothing. data.foreach( x => x + z) == () and nothing is modified, map creates a copy of the array, and for modifies the array in place. And finally, in the second test, foreach again does nothing, map does something barely sensible (creates a new list in a very contrived way) and foldLeft... returns the last element of the list.
Of course the results are gonna vary, each of the tested methods in both cases does absolutely different things. It's not even comparing apples to oranges, it's comparing Apple Inc. to oranges.
The purpose of the article was to quantify
ReplyDelete- Overhead of the for loop in Scala (while loop as to be used instead a for comprehensive)
- Cost of some Scala iterators (without any operation)
I take it the objective of the post is unclear...
here are results for Scala 2.11:
ReplyDeletehttps://github.com/maciej/iterators-benchmark
I really baffles me that while performs so badly. How could .sum possibly be faster, except if it uses parallellism, which is shouldn't by default.
ReplyDeleteBy the way, did you compile the program using the scalac -optimize? You should have.
Great Article
ReplyDeleteArtificial Intelligence Projects
Project Center in Chennai
JavaScript Training in Chennai
JavaScript Training in Chennai
air jordan
ReplyDeletekobe sneakers
kyrie 6
pg 1
jordan 6
yeezy boost 350 v2
curry 6 shoes
golden goose sneakers
yeezy boost 700
bape