## Friday, April 8, 2022

### Normalized Discounted Cumulative Gain in Scala

This post illustrates the Normalized Discounted Cumulative Gain (NDCG) and its implementation in Scala.
Numerous real-life applications of machine learning require the prediction the most relevant ranking of items to optimize an outcome. For instance
• Evaluate and prioritize counter-measures to cyber-attach
• Ranks symptoms in a clinical trial
• Extract documents relevant to a given topic from a corpus
The Discounted Cumulative Gain (DCG) and its normalized counter part, Normalized Discounted Cumulative Gain (NDCG) is a metric original applied in textual information retrieval and extended to other domains.
This post uses Scala 2.11.8

## Discounted Cumulative Gain

Let's dive into the mathematical formalism for the Discounted Cumulative Gain.

For a indexed target values tj as illustrated in the diagram above, the discounted cumulative gain is computed as
$DCG=\sum_{j=1}^{n}\frac{2^{t_{j}}-1}{log_{2}(j+1)}$ The objective is to compare any given list of ranked/sorted item with a benchmark which represent the optimum ranking (ranking with the highest DCG value).
$p(ranking|IDCG)=\frac{log(2)}{IDCG}\sum_{j=0}^{n}\frac{2^{t_{j}}-1}{log(j+1)}$

## Scala implementation

The implementation of the computation of NDCG in Scala is quite simple, indeed. Given a ranked list of items. The three steps are
• Compute the IDCG (or normalization factor)from the list
• Compute the DCG for the list
• Compute the NDCG = DCG/IDCF
First let's consider list of items, of type T to rank. The method ranking to sort a sample of sorted items is provided as an implicit function. The constructor for NDCG has a single argument: the sample of ranking:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 class NDCG[T]( firstSample: Seq[T]) (implicit ranking: T => Int) { import NDCG._ val iDCG: Double = normalize def score: Double = score(initialSample) private def dcg(samples: Seq[T]): Double = samples.zipWithIndex.aggregate(0.0)( (s, samplej) => s + compute(samplej._2 + 1, ranking(samplej._1)) , _ + _) private def normalize: Double = { val sorted = initialSample.zipWithIndex.sortBy{ case (sample, n) => -ranking(sample) }.map( _._1) dcg(sorted) } } 

The Ideal Discounted Cumulative Gain, iDCG is computed through the normalize method (line 6). iDCG (normalization factor) is computed by first sorting the items of type T by their value in decreasing order (line 16), then scoring this re-sorted list using the dcg method (line 17).
The computation of the Discounted Cumulative Gain by the method dcg (line 10) is a direct application of the formula described in the previous chapter.
Note: The logarithm function uses a base 2. It is computed as natural log(x)/natural log (2)

Let's now consider a list of items of type Item defined as follows:

case class Item(id: String, x: Double, rank: Int)


The list of items, itemsList is implicitly ranked through the last attribute, rank.

val itemsList = Seq[Item](
Item("1", 3.4, 4), Item("2", 1.4, 3),
Item("3", -0.7, 5), Item("4", 5.2, 2),
Item("5", 1.4, 1))

implicit val ranking = (item: Item) => item.rank


It is time to compute the NDCG coefficient for the list of items, by invoking the score method.

val nDCG = new NDCG[Item](itemsList)

println(s"IDCG = ${nDCG.iDCG}") //45.64 println(s"Score =${nDCG.score}")  // 0.801


The ideal discounted cumulative gain, iDCG is 45.6: It is the optimum ranking for this list of time. The first sample score a probability of 0.8

Note The DCG of subsequent samples can be computed using the same iDCG value from the same instance of NDCG.

def score(samples: Seq[T]): Double =
if( samples.size != initialSample.size) 0.0
else dcg(samples)/iDCG