This post illustrates the Normalized Discounted Cumulative Gain (

Numerous real-life applications of machine learning require the prediction the most relevant ranking of items to optimize an outcome. For instance

**NDCG**) and it implementation in Scala.Numerous real-life applications of machine learning require the prediction the most relevant ranking of items to optimize an outcome. For instance

- Evaluate and prioritize counter-measures to cyber-attach
- Ranks symptoms in a clinical trial
- Extract documents relevant to a given topic from a corpus

**Discounted Cumulative Gain**

Let's dive into the mathematical formalism for the Discounted Cumulative Gain.

For a indexed target values t

_{j}as illustrated in the diagram above, the discounted cumulative gain is computed as

\[DCG=\sum_{j=1}^{n}\frac{2^{t_{j}}-1}{log_{2}(j+1)}\] The objective is to compare any given list of ranked/sorted item with a benchmark which represent the optimum ranking (ranking with the highest DCG value).

\[p(ranking|IDCG)=\frac{log(2)}{IDCG}\sum_{j=0}^{n}\frac{2^{t_{j}}-1}{log(j+1)}\]

**Scala implementation**

The implementation of the computation of NDCG in Scala is quite simple, indeed. Given a ranked list of items. The three steps are

The Ideal Discounted Cumulative Gain,

The computation of the Discounted Cumulative Gain by the method

Let's now consider a list of items of type

The list of items,

It is time to compute the NDCG coefficient for the list of items, by invoking the

The ideal discounted cumulative gain,

The DCG of subsequent samples can be computed using the same iDCG value from the same instance of

- Compute the IDCG (or normalization factor)from the list
- Compute the DCG for the list
- Compute the NDCG = DCG/IDCF

*T*to rank. The method*ranking*to sort a sample of sorted items is provided as an implicit function. The constructor for*NDCG*has a single argument: the sample of ranking:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | class NDCG[T]( firstSample: Seq[T]) (implicit ranking: T => Int) { import NDCG._ val iDCG: Double = normalize def score: Double = score(initialSample) private def dcg(samples: Seq[T]): Double = samples.zipWithIndex.aggregate(0.0)( (s, samplej) => s + compute(samplej._2 + 1, ranking(samplej._1)) , _ + _) private def normalize: Double = { val sorted = initialSample.zipWithIndex.sortBy{ case (sample, n) => -ranking(sample) }.map( _._1) dcg(sorted) } } |

The Ideal Discounted Cumulative Gain,

*iDCG*is compute through the*normalize*method (line 6). iDCG (normalization factor) is computed by first sorting the items of type T by their value in decreasing order (line 16), then scoring this*re-sorted*list using the*dcg*method (line 17).The computation of the Discounted Cumulative Gain by the method

*dcg*(line 10) is a direct application of the formula described in the previous chapter.__Note__: The logarithm function uses a base 2. It is computed as natural log(x)/natural log (2)Let's now consider a list of items of type

*Item*defined as follows:case class Item(id: String, x: Double, rank: Int)

The list of items,

*itemsList*is implicitly ranked through the last attribute,*rank*.val itemsList = Seq[Item]( Item("1", 3.4, 4), Item("2", 1.4, 3), Item("3", -0.7, 5), Item("4", 5.2, 2), Item("5", 1.4, 1)) implicit val ranking = (item: Item) => item.rank

It is time to compute the NDCG coefficient for the list of items, by invoking the

*score*method.val nDCG = new NDCG[Item](itemsList) println(s"IDCG = ${nDCG.iDCG}") //45.64 println(s"Score = ${nDCG.score}") // 0.801

The ideal discounted cumulative gain,

*iDCG*is 45.6: It is the optimum ranking for this list of time. The first sample score a probability of 0.8__Note__The DCG of subsequent samples can be computed using the same iDCG value from the same instance of

*NDCG*.def score(samples: Seq[T]): Double = if( samples.size != initialSample.size) 0.0 else dcg(samples)/iDCG

**References**

## No comments:

## Post a Comment