Target audience: Advanced

Estimated reading time: 10'

This post illustrates the Normalized Discounted Cumulative Gain (NDCG) and its implementation in Scala.

Numerous real-life applications of machine learning require the prediction the most relevant ranking of items to optimize an outcome. For instance

- Evaluate and prioritize counter-measures to cyber-attach
- Ranks symptoms in a clinical trial
- Extract documents relevant to a given topic from a corpus

This post uses Scala 2.11.8

## Discounted Cumulative Gain

Let's dive into the mathematical formalism for the Discounted Cumulative Gain.For a indexed target values t

_{j}as illustrated in the diagram above, the discounted cumulative gain is computed as

\[DCG=\sum_{j=1}^{n}\frac{2^{t_{j}}-1}{log_{2}(j+1)}\] The objective is to compare any given list of ranked/sorted item with a benchmark which represent the optimum ranking (ranking with the highest DCG value).

\[p(ranking|IDCG)=\frac{log(2)}{IDCG}\sum_{j=0}^{n}\frac{2^{t_{j}}-1}{log(j+1)}\]

## Scala implementation

The implementation of the computation of NDCG in Scala is quite simple, indeed. Given a ranked list of items. The three steps are

- Compute the IDCG (or normalization factor)from the list
- Compute the DCG for the list
- Compute the NDCG = DCG/IDCF

*NDCG*has a single argument: the sample of ranking:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | class NDCG[T]( firstSample: Seq[T]) (implicit ranking: T => Int) { import NDCG._ val iDCG: Double = normalize def score: Double = score(initialSample) private def dcg(samples: Seq[T]): Double = samples.zipWithIndex.aggregate(0.0)( (s, samplej) => s + compute(samplej._2 + 1, ranking(samplej._1)) , _ + _) private def normalize: Double = { val sorted = initialSample.zipWithIndex.sortBy{ case (sample, n) => -ranking(sample) }.map( _._1) dcg(sorted) } } |

The Ideal Discounted Cumulative Gain,

*iDCG*is computed through the normalize method (line 6). iDCG (normalization factor) is computed by first sorting the items of type T by their value in decreasing order (line 16), then scoring this*re-*sorted list using the dcg method (line 17).The computation of the Discounted Cumulative Gain by the method dcg (line 10) is a direct application of the formula described in the previous chapter.

__Note__

__:__The logarithm function uses a base 2. It is computed as natural log(x)/natural log (2)

Let's now consider a list of items of type Item defined as follows:

```
case class Item(id: String, x: Double, rank: Int)
```

The list of items, itemsList is implicitly ranked through the last attribute, rank.

val itemsList = Seq[Item]( Item("1", 3.4, 4), Item("2", 1.4, 3), Item("3", -0.7, 5), Item("4", 5.2, 2), Item("5", 1.4, 1)) implicit val ranking = (item: Item) => item.rank

It is time to compute the NDCG coefficient for the list of items, by invoking the score method.

val nDCG = new NDCG[Item](itemsList) println(s"IDCG = ${nDCG.iDCG}") //45.64 println(s"Score = ${nDCG.score}") // 0.801

The ideal discounted cumulative gain,

*iDCG*is 45.6: It is the optimum ranking for this list of time. The first sample score a probability of 0.8

__Note__The DCG of subsequent samples can be computed using the same iDCG value from the same instance of NDCG.

def score(samples: Seq[T]): Double = if( samples.size != initialSample.size) 0.0 else dcg(samples)/iDCG