Description and implementation of the computation of the minimum spanning tree using Prim's algorithm and Scala tail recursion.

**Overview**

Finding the optimum arrangement to connect nodes is a common problem in Network design, transportation projects or electrical wiring. Each connectivity is usually defined as a weight (cost, length, time...). The purpose is to compute the schema that connects all the nodes that minimize the total weight. This problem is known as the

Several algorithms have been developed over the last 70 years to extract the MST from a graph. This post focuses on the implementation of the

**minimum spanning tree**or**MST**related to the nodes connected through an un-directed graph.Several algorithms have been developed over the last 70 years to extract the MST from a graph. This post focuses on the implementation of the

**Prim**algorithm in Scala.**Prim's algorithm**

There are many excellent tutorial on graph algorithm and more specifically on the Prim's algorithm. I recommend
Lecture 7: Minimum Spanning Trees and Prim’s Algorithm Dekai Wu, Department of Computer Science and Engineering -
The Hong Kong University of Science & Technology

Let's PQ is a priority queue, a Graph G(V, E) with n vertices V and E edges w(u,v). A Vertex

Let's PQ is a priority queue, a Graph G(V, E) with n vertices V and E edges w(u,v). A Vertex

*v*is defined by- An identifier
- A load factor,
*load(v)* - A parent
*tree(v)* - The adjacent vertices
*adj(v)*

PQ <- V(G)The Scala implementation relies on a tail recursion to transfer vertices from the priority queue to the spanning treeforeachu in PQ load(u) <- INFINITYwhilePQ nonEmptydou <- v in adj(u)ifv in PQ && load(v) < w(u,v)thentree(v) <- u load(v) <- w(u,v)

**Scala implementation: graph definition**

The first step is to define a graph structure with edges and vertices.
The graph takes two arguments:

The vertices are initialized by with a unique identifier

In most case, the identifier is a characters string or a data structure. As described in the pseudo-code, the load for the root of the spanning tree is defined a 0.

The

The

The graph is un-directed therefore the connection initialized in the method

*numVertices*number of vertices*start*index of the root of the minimum spanning tree

*id*identifier (arbitrary an integer)*load*dynamic load (or key) on the vertex*tree*reference to the minimum spanning tree

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | final class Graph(numVertices: Int, start: Int = 0) { class Vertex(val id: Int, var load: Int = Int.MaxValue, var tree: Int = -1) val vertices = List.tabulate(numVertices)(new Vertex(_)) vertices.head.load = 0 val edges = new HashMap[Vertex, HashMap[Vertex, Int]] def += (from: Int, to: Int, weight: Int): Unit = { val fromV = vertices(from) val toV = vertices(to) connect(fromV, toV, weight) connect(toV, fromV, weight) } def connect(from: Vertex, to: Vertex, weight: Int): Unit = { if( !edges.contains(from)) edges.put(from, new HashMap[Vertex, Int]) edges.get(from).get.put(to, weight) } // ... } |

The vertices are initialized by with a unique identifier

*id*, and a default load*Int.MaxValue*and a default depth*tree*(lines 3-5). The*Vertex*class resides within the scope of the outer class*Graph*to avoid naming conflict. The*vertices*are managed through a linked list (line 7) while the*edges*are defined as hash maps with a map of other edges as value (line 9). The operator += add a new edges between two existing vertices with a specified load (line 11)In most case, the identifier is a characters string or a data structure. As described in the pseudo-code, the load for the root of the spanning tree is defined a 0.

The

*load*is defined as an integer for performance's sake. It is recommended to convert (quantization) a floating point value to an integer for the processing of very large graph, then convert back to a original format on the resulting minimum spanning tree.The

*edges*are defined as hash table with the source vertex as key and the hash table of destination vertex and edge weight as value.The graph is un-directed therefore the connection initialized in the method

*+=*are bi-directional.**Scala implementation: priority queue**

The priority queue is used to reordered the vertices and select the next vertex to be added to the spanning tree.

The implementation of the priority has a impact on the time complexity of the algorithm. The following implementation of the priority queue is provided only to illustrate the Prim's algorithm.

The Scala

We use the

__Note__: There are many different implementation of priority queues in Scala and Java. You need to keep in mind that the Prim's algorithm requires the queue to be reordered after its load is updated (see pseudo-code). The*PriorityQueue*classes in the Scala and Java libraries do not allow elements to be removed or to be explicitly re-ordered. An alternative is to used a binary tree, red-black tree for which elements can be removed and the tree re-balanced.The implementation of the priority has a impact on the time complexity of the algorithm. The following implementation of the priority queue is provided only to illustrate the Prim's algorithm.

1 2 3 4 5 6 7 8 9 10 11 12 | class PQueue(vertices: List[Vertex]) { var queue = vertices./:(new PriorityQueue[Vertex])((pq, v) => pq += v) def += (vertex: Vertex): Unit = queue += vertex def pop: Vertex = queue.dequeue def sort: Unit = {} def push(vertex: Vertex): Unit = queue.enqueue(vertex) def nonEmpty: Boolean = queue.nonEmpty } type MST = ListBuffer[Int] implicit def orderingByLoad[T <: Vertex]: Ordering[T] = Ordering.by( - _.load) |

The Scala

*PriorityQueue*class required the implicit ordering of vertices using their*load*(line 2). This accomplished by defining an implicit conversion of a type*T*with upper-bound type*Vertex*to*Ordering[T]*(line 12).__Note__: The type*T*has to be a sub-class of Vertex. A direct conversion from*Vertex*type to*Ordering[Vertex]*is not allowed in Scala.We use the

*PriorityQueue*from the Java library as it provides more flexibility than the Scala*TreeSet*.**Scala implementation: Prim**

This implementation is the direct translation of the pseudo-code presented in the second paragraph. It relies on the efficient Scala tail recursion (line 5)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | def prim: List[Int] = { val queue = new PQueue(vertices) @scala.annotation.tailrec def prim(parents: MST): Unit = { if( queue.nonEmpty ) { val head = queue.pop val candidates = edges.get(head).get .filter{ case(vt,w) => vt.tree == -1 && w <= vt.load } if( candidates.nonEmpty ) { candidates.foreach {case (vt, w) => vt.load = w } queue.sort } parents.append(head.id) head.tree = 1 prim(parents) } } val parents = new MST prim(parents) parents.toList } |

As long as the priority queue is not empty (line 6), the next element is the priority queue is retrieved (line 7) for which is select the most appropriate candidate for the next vertex (line 8 - 11). The load of each candidates is updated (line 14) and the priority queue is re-sorted (line 15).

__Note__: Although a tree set is more efficient to manage the vertices waiting to be weighted, it does not allow resorted the existing priority queue (immutability).**Time complexity**

As mentioned earlier, the time complexity depends on the implementation of the priority queue. If E is the number of edges, and V the number of vertices:

Minimum spanning tree with linear queue:

Minimum spanning tree with binary heap:

Minimum spanning tree with Fibonacci heap:

Minimum spanning tree with linear queue:

**V**^{2}Minimum spanning tree with binary heap:

**(E + V).LogV**Minimum spanning tree with Fibonacci heap:

**V.LogV**__Note__: See Summary of time complexity of algorithms for details.**References**

*Introduction to Algorithms*Chapter 24 Minimum Spanning Trees - T. Cormen, C. Leiserson, R. Rivest - MIT Press 1989- Lecture 7: Minimum Spanning Trees and Prim’s Algorithm Dekai Wu, Department of Computer Science and Engineering - The Hong Kong University of Science & Technology
*Graph Theory*Chapter 4 Optimization Involving Tree - V.K. Balakrishnan - Schaum's Outlines Series, McGraw Hill, 1997

Really appreciated the information and please keep sharing, I would like to share some information regarding online training.Maxmunus Solutions is providing the best quality of this Apache Spark and Scala programming language. and the training will be online and very convenient for the learner.This course gives you the knowledge you need to achieve success.

ReplyDeleteFor Joining online training batches please feel free to call or email us.

Email : minati@maxmunus.com

Contact No.-+91-9066638196/91-9738075708

website:-www.maxmunus.com