Target audience: Advanced

Estimated reading time: 10'

This post illustrates the Normalized Discounted Cumulative Gain (NDCG) and its implementation in Scala.

Numerous real-life applications of machine learning require the prediction the most relevant ranking of items to optimize an outcome. For instance

- Evaluate and prioritize counter-measures to cyber-attach
- Ranks symptoms in a clinical trial
- Extract documents relevant to a given topic from a corpus

This post uses Scala 2.11.8

## Discounted Cumulative Gain

Let's dive into the mathematical formalism for the Discounted Cumulative Gain.For a indexed target values t

_{j}as illustrated in the diagram above, the discounted cumulative gain is computed as

\[DCG=\sum_{j=1}^{n}\frac{2^{t_{j}}-1}{log_{2}(j+1)}\] The objective is to compare any given list of ranked/sorted item with a benchmark which represent the optimum ranking (ranking with the highest DCG value).

\[p(ranking|IDCG)=\frac{log(2)}{IDCG}\sum_{j=0}^{n}\frac{2^{t_{j}}-1}{log(j+1)}\]

## Scala implementation

The implementation of the computation of NDCG in Scala is quite simple, indeed. Given a ranked list of items. The three steps are

- Compute the IDCG (or normalization factor)from the list
- Compute the DCG for the list
- Compute the NDCG = DCG/IDCF

*NDCG*has a single argument: the sample of ranking:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | class NDCG[T]( firstSample: Seq[T]) (implicit ranking: T => Int) { import NDCG._ val iDCG: Double = normalize def score: Double = score(initialSample) private def dcg(samples: Seq[T]): Double = samples.zipWithIndex.aggregate(0.0)( (s, samplej) => s + compute(samplej._2 + 1, ranking(samplej._1)) , _ + _) private def normalize: Double = { val sorted = initialSample.zipWithIndex.sortBy{ case (sample, n) => -ranking(sample) }.map( _._1) dcg(sorted) } } |

The Ideal Discounted Cumulative Gain,

*iDCG*is computed through the normalize method (line 6). iDCG (normalization factor) is computed by first sorting the items of type T by their value in decreasing order (line 16), then scoring this*re-*sorted list using the dcg method (line 17).The computation of the Discounted Cumulative Gain by the method dcg (line 10) is a direct application of the formula described in the previous chapter.

__Note__

__:__The logarithm function uses a base 2. It is computed as natural log(x)/natural log (2)

Let's now consider a list of items of type Item defined as follows:

```
case class Item(id: String, x: Double, rank: Int)
```

The list of items, itemsList is implicitly ranked through the last attribute, rank.

val itemsList = Seq[Item]( Item("1", 3.4, 4), Item("2", 1.4, 3), Item("3", -0.7, 5), Item("4", 5.2, 2), Item("5", 1.4, 1)) implicit val ranking = (item: Item) => item.rank

It is time to compute the NDCG coefficient for the list of items, by invoking the score method.

val nDCG = new NDCG[Item](itemsList) println(s"IDCG = ${nDCG.iDCG}") //45.64 println(s"Score = ${nDCG.score}") // 0.801

The ideal discounted cumulative gain,

*iDCG*is 45.6: It is the optimum ranking for this list of time. The first sample score a probability of 0.8

__Note__The DCG of subsequent samples can be computed using the same iDCG value from the same instance of NDCG.

def score(samples: Seq[T]): Double = if( samples.size != initialSample.size) 0.0 else dcg(samples)/iDCG

I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor led live training in Apache Storm, kindly contact us http://www.maxmunus.com/contact

ReplyDeleteMaxMunus Offer World Class Virtual Instructor led training on Apache Storm. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.

For Demo Contact us.

Nitesh Kumar

MaxMunus

E-mail: nitesh@maxmunus.com

Skype id: nitesh_maxmunus

Ph:(+91) 8553912023

http://www.maxmunus.com/

Today machine learning has been adapted by most of the Fortune 500 company and they run efficiently using the technology. Machine learning can be called as the application of artificial intelligence which helps the systems to automatically understand and learn, and thus improve from the experiences.

ReplyDelete

ReplyDeleteنجار ابواب بالرياض نجار بالرياض

تركيب ستائر بالرياض شركة تركيب ستائر بالرياض

تنظيف مكيفات بالرياض شركة تنظيف مكيفات بالرياض

شركة تنظيف افران الغاز بالرياض شركة تنظيف افران بالرياض

Great Article

ReplyDeleteArtificial Intelligence Projects

Project Center in Chennai

JavaScript Training in Chennai

JavaScript Training in Chennai

ReplyDeletegolden goose outletbape hoodiegolden goose sneakersjordan shoeskyrie 5 spongebobsupreme t shirtair jordansgolden goose sneakersmoncler jacketmoncler outlet