Monday, January 7, 2013

Naive Bayes Classifier in Java

Introduction
The Naive Bayes approach is a generative supervised learning method which is based on a simplistic hypothesis: it assumes that the existence of a specific feature of a class is unrelated to the existence of another feature. This condition of independence between model features is essential to the proper classification.
Mathematically, Bayes' theorem gives the relationship between the probabilities of A and B, P(A) and P(B), and the conditional probabilities of A given B and B given A, P(A|B) and P(B|A).
In its most common form the Naive Bayes Formula is defined for a proposition (or class) A and evidence (or observation) B with \[p(A|B)= \frac{p(B|A).p(A)}{p(B)}\]
   - P(A), the prior, is the initial degree of belief in A.
   - P(A|B), the posterior, is the degree of belief having accounted for B<
   - P(B|A)/P(B) represents the support B provides for A
The case above can be extended to a network of cause-effect conditional probabilities P(X|Y)



In case of the features of the model are known to be independent. The probability of a observation x =( ...,x,...) to belong to a class C is computed as: \[p(C|\vec{x})=\frac{\prod (p(x_{i}|C).p(C)}{p(\vec{x})}\]. It is usually more convenient to compute the maximum likelihood of the probability of a new observation to belong to a specific class by converting the formula above. \[log\,p(C|\vec{x}) = \sum log\,p(x_{i}|C) + log\,p(C) - log\,p(\vec{x})\]

Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted

Sample Implementation
The class in the example below, implements a basic version Naive Bayes algorithm. The model and its feature is defined by the nested class NClass. This model class defines the features parameters (mean and variance of prior observations) and the class probability p(C).  The computation of the mean and variances of prior is implemented in the NClass.computeStats method.Some of the methods, setters, getters, comments and conditional test on arguments are omitted for the sake of clarity. The kernel function is to be selected at run-time. This implementation supports any number of features and classes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
public final class NaiveBayes implements Classifier {
  
  public final class NClass {
    private double[] _params         = null;
    private double[] _paramsVariance = null;
    private double   _classProb = 0.0;
 
    public NClass(int numParms) { 
      _params = new double[numParams];  
    }
 
    private void add(double[] data) {
      int numObservations = 0;
             
      _paramsVariance = new double[_params.length];
      for(int j = 0; j < data.length; ) {
        j++;
        for( int k = 0; k < _params.length; k++, j++) {
          _params[k] += data[j];
          _paramsVariance[k] += data[j]*data[j];
        }
        numObservations++;
      }
      _classProb = numObservations;
    }
 
    private void computeStats() {
      double  inv = 1.0/_classProb;
      double  invCube = invClassProb*invClassProb*invClassProb;
 
      for( int k = 0; k < _params.length; k++) {
        _params[k] /= _classProb;
        _paramsVariance[k] = _paramsVariance[k]*inv -
                   _params[k]*_params[k]*invCube;
      }
      _classProb /= _numObservations;
    }
  }
}

Kernel functions can be used to improve the classification observations by increasing the distance between prior belonging to a class during the training phase. In the case of 2 classes (Bernoulli classification) C1, C2 the kernel algorithm increases the distance between the mean values m1 and m2 of all the prior observations for each of the two classes, adjusted for the variance.

As Java as does not support local functions or closures we need to create a classes hierarchy to implement the different kernel(discriminant) functions. The example below defines a simple linear and logistic (sigmoid function) kernel functions implemented by nested classes. \[y = \theta x \,\,and\,\,y =\frac{1}{1+e^{-x}}\]
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
public interface Discriminant {
   public double estimate(double value);
}
       
    //Nested class that implement a linear Discriminant 
public static class DiscriminantKernel 
              implements Discriminant  {
   private double _theta = 1.0;
   public DiscriminantKernel(double theta) { 
     _theta = theta; 
   }  
   public double estimate(double value) { 
     return value*_theta; 
   }
}
             
       // Nested class that implements a sigmoid function for kernel
public static class SigmoidKernel implements Kernel {
  public double estimate(double value) { 
    return 1.0/(1.0 + Math.exp(-value) 
  }
}

Ultimately, the NaiveBayes class implements the three key components of the learning algorithm:
  • Training: train
  • Run time classification: classify
A new observation is classify using the logarithmic version of the Naive Bayes formula, logP
First let's define the NaiveBayes class and its constructors.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
public final class NaiveBayes implements Classifier {

   public final class NClass { }

   private CDoubleArray[] _valuesArray = null;
   private NClass[] _classes = null;
   private int _numObservations = 0;
   private int _step = 1;
   private Kernel _kF = null;
      
   public NaiveBayes() { this(0,0) }
   public NaiveBayes(int numParams, int numClasses) { 
     this(numParams, numClasses, new NLinearDiscriminant());
   }
            
   public NaiveBayes(
      int numParams, 
      int numClasses, 
      final Discriminant kf
   ) {
     _classes = new NClass[numClasses];
     _valuesArray = new CDoubleArray[numClasses];
 
     for( int k = 0; k < numClasses; k++) {
       _classes[k] = new NClass(numParams);
       _valuesArray[k] = new CDoubleArray();
     }
     _kF = kf;
     this.discretize(0,numClasses);
  }
   ..
} 

Next the training method, train is defined. The method consists merely in computing the statistics on historical data, _valuesArray and assign them to predefined classes _classes

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
public int train() throws ClassifierException {
  double[] values =  null;
           
  for( int j = 0; j < _valuesArray.length; j++) {
    values = _valuesArray[j].currentValues();
    _classes[j].add(values);
  }
           
  for( int j = 0; j < _classes.length; j++) {
    _classes[j].computeStats();
  }
  return values.length;
}

The run-time classification method classify uses the prior conditional probability to assign a new observation to an existing class. It generate the class id for a set of values or observations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
public int classify(double[] values) {
           
   // Compute the normalizing denominator value
  double[] normalizedPriorProb = new double[values.length],  
           prob = 0.0;

  for( int valueIndex = 0; valueIndex < values.length; valueIndex++) {

    for(int classid = 0; classid < _classes.length; classid++) {
      prob = Math.abs(values[valueIndex] - 
          _classes[classid]._parameters[valueIndex]);
      if( prob > normalizedPriorProb[valueIndex]){               
          normalizedPriorProb[valueIndex] = prob;
      }
    }
  }
  return logP(values, normalizedPriorProb);
}

A new observation values is assigned to the appropriate class croding to its likelihood or log of conditional probability, by the method logP.
logP computes the likelihood for each value and use the Naive Bayes formula for logarithm of prior probability and log of class probability


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
private int logP(double[] values, double[] denominator) {
  double score = 0.0, 
         adjustedValue = 0.0, 
         prior = 0.0,
         bestScore = -Double.MAX_VALUE;
  int bestMatchedClass = -1;
                
  // Walk through all the classes defined in the model
  for(int classid = 0; classid < _classes.length; classid++) {
    double[] classParameters = _classes[classid]._parameters;
                     
    score = 0.0;
    for( int k = 0; k < values.length; k++) {
       adjustedValue = _kF.estimate(values[k]);
       prior = Math.abs(adjustedValue - classParameters[k])/
               denominator[k];
       score += Math.log(1.0 - prior);
    }
    score += Math.log(_classes[classid]._classProb);
                    
    if(score > bestScore) {
        bestScore = score;
        bestMatchedClass = classid;
    }
  }
  return bestMatchedClass;
}

Some of the ancillary private methods are omitted for the sake of clarification. We will look at the implementation of the same classifier in Scala in a later post.


References
  • The Elements of Statistics Learning: Data mining, Inference & Prediction - Hastie, Tibshirani, Friedman - Springer
  • Machine Learning for Multimedia Content Analysis - Y. Gong, W, Xu - Springer
  • Effective Java - J Bloch - Addison-Wesley
  • github.com/prnicolas

10 comments:

  1. Thanks,i learn a lot form your blog. There is a lot of very useful knowledge in your post to help me solve problems.I enjoy reading your article and hope to see more.

    Labels: Bayesian Network Java Barcode Kernel method Machine learning Naive Bayes

    ReplyDelete
  2. Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Java developer learn from Java Training in Chennai. or learn thru Java Online Training India . Nowadays Java has tons of job opportunities on various vertical industry.

    ReplyDelete
  3. There will be numerous progressions in AI, mechanical autonomy, nanotechnology, biotechnology and quantum registering. machine learning course

    ReplyDelete
  4. The web selenium program has started to become very popular and its popularity will keep grow. What every sphere on the job and industry is now being digitalized, the demands of these training continue to soar. Online Advanced JAVA Training

    ReplyDelete
  5. With poor economic news coming in daily, how can management determine whether a dip in sales is due to a sales force that just isn't doing its job or whether there are outside economic factors at play that are nearly impossible to overcome? Answering this question is not always easy, however it is always important when determining how successful your company is now and the future potential of the organization. Below, you will find a few points that ought to help you determine whether a dip in sales could be the...torque gauge

    ReplyDelete
  6. Thanks for sharing nice information with us. i like your post and all you share with us is uptodate and quite informative, i would like to bookmark the page so i can come here again to read you, as you have done a wonderful job. https://belltestchamber.com/environmental-chamber/desktop-environmental-climatic-temperature-test-chamber/

    ReplyDelete
  7. It is in rapidly making a phase and adequately accepting by the world; scholastics interested in electronic thinking and investigate to get the accomplishment if a machine could pick up from data. machine learning course hyderabad

    ReplyDelete
  8. Furthermore, if the ongoing discoveries at the College of Perusing (U.K.) are any sign, we may have just started satisfying said prediction. cyber security course in hyderabad

    ReplyDelete
  9. I imagine that much obliged for the valuabe data and bits of knowledge you have so given here.DGBELL small environmental chamber manufacturer

    ReplyDelete