## Monday, January 7, 2013

### Naive Bayes Classifier in Java

Introduction

The Naive Bayes approach is a generative supervised learning method which is based on a simplistic hypothesis: it assumes that the existence of a specific feature of a class is unrelated to the existence of another feature. This condition of independence between model features is essential to the proper classification.
Mathematically, Bayes' theorem gives the relationship between the probabilities of A and B, P(A) and P(B), and the conditional probabilities of A given B and B given A, P(A|B) and P(B|A).
In its most common form the Naive Bayes Formula is defined for a proposition (or class) A and evidence (or observation) B with $p(A|B)= \frac{p(B|A).p(A)}{p(B)}$
- P(A), the prior, is the initial degree of belief in A.
- P(A|B), the posterior, is the degree of belief having accounted for B<
- P(B|A)/P(B) represents the support B provides for A
The case above can be extended to a network of cause-effect conditional probabilities P(X|Y)

In case of the features of the model are known to be independent. The probability of a observation x =( ...,x,...) to belong to a class C is computed as: $p(C|\vec{x})=\frac{\prod (p(x_{i}|C).p(C)}{p(\vec{x})}$. It is usually more convenient to compute the maximum likelihood of the probability of a new observation to belong to a specific class by converting the formula above. $log\,p(C|\vec{x}) = \sum log\,p(x_{i}|C) + log\,p(C) - log\,p(\vec{x})$

Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted

Software design
The class in the example below, implements a basic version Naive Bayes algorithm. The model and its feature is defined by the nested class NClass. This model class defines the features parameters (mean and variance of prior observations) and the class probability p(C).  The computation of the mean and variances of prior is implemented in the NClass.computeStats method.Some of the methods, setters, getters, comments and conditional test on arguments are omitted for the sake of clarity. The kernel function is to be selected at run-time. This implementation supports any number of features and classes.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 public final class NaiveBayes implements Classifier { public final class NClass { private double[] _params = null; private double[] _paramsVariance = null; private double _classProb = 0.0; public NClass(int numParms) { _params = new double[numParams]; } private void add(double[] data) { int numObservations = 0; _paramsVariance = new double[_params.length]; for(int j = 0; j < data.length; ) { j++; for( int k = 0; k < _params.length; k++, j++) { _params[k] += data[j]; _paramsVariance[k] += data[j]*data[j]; } numObservations++; } _classProb = numObservations; } private void computeStats() { double inv = 1.0/_classProb; double invCube = invClassProb*invClassProb*invClassProb; for( int k = 0; k < _params.length; k++) { _params[k] /= _classProb; _paramsVariance[k] = _paramsVariance[k]*inv - _params[k]*_params[k]*invCube; } _classProb /= _numObservations; } } } 

Kernel functions can be used to improve the classification observations by increasing the distance between prior belonging to a class during the training phase. In the case of 2 classes (Bernoulli classification) C1, C2 the kernel algorithm increases the distance between the mean values m1 and m2 of all the prior observations for each of the two classes, adjusted for the variance.

As Java as does not support local functions or closures we need to create a classes hierarchy to implement the different kernel(discriminant) functions. The example below defines a simple linear and logistic (sigmoid function) kernel functions implemented by nested classes. $y = \theta x \,\,and\,\,y =\frac{1}{1+e^{-x}}$

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 public interface Discriminant { public double estimate(double value); } //Nested class that implement a linear Discriminant public static class DiscriminantKernel implements Discriminant { private double _theta = 1.0; public DiscriminantKernel(double theta) { _theta = theta; } public double estimate(double value) { return value*_theta; } } // Nested class that implements a sigmoid function for kernel public static class SigmoidKernel implements Kernel { public double estimate(double value) { return 1.0/(1.0 + Math.exp(-value) } } 

Ultimately, the NaiveBayes class implements the three key components of the learning algorithm:
• Training: train
• Run time classification: classify
A new observation is classify using the logarithmic version of the Naive Bayes formula, logP
First let's define the NaiveBayes class and its constructors.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 public final class NaiveBayes implements Classifier { public final class NClass { } private CDoubleArray[] _valuesArray = null; private NClass[] _classes = null; private int _numObservations = 0; private int _step = 1; private Kernel _kF = null; public NaiveBayes() { this(0,0) } public NaiveBayes(int numParams, int numClasses) { this(numParams, numClasses, new NLinearDiscriminant()); } public NaiveBayes( int numParams, int numClasses, final Discriminant kf ) { _classes = new NClass[numClasses]; _valuesArray = new CDoubleArray[numClasses]; for( int k = 0; k < numClasses; k++) { _classes[k] = new NClass(numParams); _valuesArray[k] = new CDoubleArray(); } _kF = kf; this.discretize(0,numClasses); } .. } 

Training
Next the training method, train is defined. The method consists merely in computing the statistics on historical data, _valuesArray and assign them to predefined classes _classes

  1 2 3 4 5 6 7 8 9 10 11 12 13 public int train() throws ClassifierException { double[] values = null; for( int j = 0; j < _valuesArray.length; j++) { values = _valuesArray[j].currentValues(); _classes[j].add(values); } for( int j = 0; j < _classes.length; j++) { _classes[j].computeStats(); } return values.length; } 

Classification
The run-time classification method classify uses the prior conditional probability to assign a new observation to an existing class. It generate the class id for a set of values or observations.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 public int classify(double[] values) { // Compute the normalizing denominator value double[] normalizedPriorProb = new double[values.length], prob = 0.0; for( int valueIndex = 0; valueIndex < values.length; valueIndex++) { for(int classid = 0; classid < _classes.length; classid++) { prob = Math.abs(values[valueIndex] - _classes[classid]._parameters[valueIndex]); if( prob > normalizedPriorProb[valueIndex]){ normalizedPriorProb[valueIndex] = prob; } } } return logP(values, normalizedPriorProb); } 

A new observation values is assigned to the appropriate class croding to its likelihood or log of conditional probability, by the method logP.
logP computes the likelihood for each value and use the Naive Bayes formula for logarithm of prior probability and log of class probability

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 private int logP(double[] values, double[] denominator) { double score = 0.0, adjustedValue = 0.0, prior = 0.0, bestScore = -Double.MAX_VALUE; int bestMatchedClass = -1; // Walk through all the classes defined in the model for(int classid = 0; classid < _classes.length; classid++) { double[] classParameters = _classes[classid]._parameters; score = 0.0; for( int k = 0; k < values.length; k++) { adjustedValue = _kF.estimate(values[k]); prior = Math.abs(adjustedValue - classParameters[k])/ denominator[k]; score += Math.log(1.0 - prior); } score += Math.log(_classes[classid]._classProb); if(score > bestScore) { bestScore = score; bestMatchedClass = classid; } } return bestMatchedClass; } 

Some of the ancillary private methods are omitted for the sake of clarification. We will look at the implementation of the same classifier in Scala in a later post.

References
• The Elements of Statistics Learning: Data mining, Inference & Prediction - Hastie, Tibshirani, Friedman - Springer
• Machine Learning for Multimedia Content Analysis - Y. Gong, W, Xu - Springer
• Effective Java - J Bloch - Addison-Wesley
• github.com/prnicolas

1. Thanks,i learn a lot form your blog. There is a lot of very useful knowledge in your post to help me solve problems.I enjoy reading your article and hope to see more.

Labels: Bayesian Network Java Barcode Kernel method Machine learning Naive Bayes

1. Machine Learning Projects for Final Year machine learning projects for final year

Deep Learning Projects assist final year students with improving your applied Deep Learning skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include Deep Learning projects for final year into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Deep Learning Projects for Final Year even arrange a more significant compensation.

Python Training in Chennai Python Training in Chennai Angular Training Project Centers in Chennai

2. Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Java developer learn from Java Training in Chennai. or learn thru Java Online Training India . Nowadays Java has tons of job opportunities on various vertical industry.

3. There will be numerous progressions in AI, mechanical autonomy, nanotechnology, biotechnology and quantum registering. machine learning course

4. The web selenium program has started to become very popular and its popularity will keep grow. What every sphere on the job and industry is now being digitalized, the demands of these training continue to soar. Online Advanced JAVA Training

5. With poor economic news coming in daily, how can management determine whether a dip in sales is due to a sales force that just isn't doing its job or whether there are outside economic factors at play that are nearly impossible to overcome? Answering this question is not always easy, however it is always important when determining how successful your company is now and the future potential of the organization. Below, you will find a few points that ought to help you determine whether a dip in sales could be the...torque gauge

6. Thanks for sharing nice information with us. i like your post and all you share with us is uptodate and quite informative, i would like to bookmark the page so i can come here again to read you, as you have done a wonderful job. https://belltestchamber.com/environmental-chamber/desktop-environmental-climatic-temperature-test-chamber/

7. It is in rapidly making a phase and adequately accepting by the world; scholastics interested in electronic thinking and investigate to get the accomplishment if a machine could pick up from data. machine learning course hyderabad

8. Furthermore, if the ongoing discoveries at the College of Perusing (U.K.) are any sign, we may have just started satisfying said prediction. cyber security course in hyderabad

9. I imagine that much obliged for the valuabe data and bits of knowledge you have so given here.DGBELL small environmental chamber manufacturer

10. Excellent .. Amazing .. I’ll bookmark your blog and take the feeds also…I’m happy to find so many useful info here in the post, we need work out more techniques in this regard, thanks for sharing. best impact wrench for cars

11. To buy tiktok views you just need to visit this site https://soclikes.com/ and click several buttons

12. Cool you write, the information is very good and interesting, I'll give you a link to my site.
Best Institute for Data Science in Hyderabad

13. This post is extremely easy to peruse and acknowledge without forgetting about any subtleties. Incredible work! data scientist training

14. You have an extremely knowledge perspective.It’s incredible how thorough your work is.
360DigiTMG machine learning course malaysia

15. Never surpass the Base Breaking Strength (MBS) of the lash or the Working Burden Cutoff (WLL) of shackles. heavy duty strapping tensioner

16. Electric engines were stuck on the old machines to begin with, however discovered their way into the case turning into a characteristic piece of the machine. K cup sealer

17. Aside from indoor practice machines and recreational (backyard) pitching machines, most coaches and parents elect to purchase a machine that will most closely simulate pitches the player will see in the game.mélybölcsős szállítás Europa-Road Kft.

18. Nice blog. Really useful content and informative blog. Keep sharing more blogs again soon.
Artificial Intelligence Training with placements
Data Science Course with placements

19. It is somewhat fantastic, and yet check out the advice at this treat. ppc management

20. Well we really like to visit this site, many useful information we can get here.
digital marketing courses in hyderabad with placement

21. Nice blog, I feel happy to read this blog. Keep sharing more blogs again soon.

22. Nice blog, Keep sharing more.
Machine Learning Training with Placements

23. I was just examining through the web looking for certain information and ran over your blog.It shows how well you understand this subject. Bookmarked this page, will return for extra. data science course in vadodara

24. Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing. data scientist course in delhi

25. Amazing Article! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.If you are Searching for info click on given link
Data science course in pune

26. I am impressed by the information that you have on this blog!

27. First You got a great blog .I will be interested in more similar topics.I commend you for your excellent report on the knowledge that you have shared in this blog.

free digital marketing course in hyderabad

28. They make many different types that will be right for any type of business. It is very important to take time to compare them before buying.ice tube machine

29. Really an awesome blog, Informative content, and worth being able to read. Keep sharing more stuff like this. Thank you.
Data Science Course Training in Hyderabad
Data Science Course Training Institute in Hyderabad

30. Very interesting blog. A lot of blogs I see these days don't really provide anything that I'm interested in, but I'm most definitely interested in this one. Just thought that I would post and let you know.
Tourism packages

31. Very interesting blog. A lot of blogs I see these days don't really provide anything that I'm interested in, but I'm most definitely interested in this one. Just thought that I would post and let you know.
Tourism companies in Pakistan

32. The dachshund was bred in Germany hundreds of years ago to hunt badgers. https://www.poodlespring.com/ "Dach" means badger and "hund" means dog. The three varieties of dachshund, smooth-, Dachshund puppies for sale wire-,and long-coated, originated at different times. The smooth was the first and arose from a mixture of a miniature French pointer and a pinscher. The breed also comes in two sizes: standard and miniature, with the standard the original size.
The dachshund has short, strong legs that enable the dog to dig out prey and go inside burrows. Larger versions of the breed were used to chase deer or fox. Smaller dachshunds Dachshund puppy for sale were bred for hunting hares and ferrets.
The breed is still used for hunting, primarily in Europe, nine in dachshunds puppies for sale ches in height.All three types are known
The dachshund's coat may be shades of red, black, chocolate, white or gray. Some have tan markings or are spotted or dappled. Dachshunds live about 12 to 15 years. toy poodle for sale espite their size, dachshunds are known for their courageous nature and will take on animals much larger than themselves. Some may be aggressive toward strangers and other dogs.
As family dogs, dachshunds are loyal companions and good watchdogs. They are good with children if treated well. They can be slightly difficult to train.
Some dachshund fanciers say there are personality differences among the different varieties of the breed. For instance, the long-coat dachshund is reportedly calmer teacup poodles for sale than the smooth-coat variety,

33. Dachshunds are bred and shown in two sizes: Standard and Miniature. https://www.cutespupsforsale.com/ Standard Dachshunds of all varieties (Smooth, Wirehair, and Longhair) usually weigh between 16 and 32 pounds. Miniature Dachshunds of all varieties weigh 11 pounds and under at teacup poodle for sale maturity. Dachshunds that weigh between 11 and 16 pounds are called Tweenies. Some people who breed exceptionally small Dachshunds advertise them as Toy Dachshunds, but this is purely a poodles for sale marketing term, not a recognized designation. He's bred for perseverance, which is another way of saying that he can be stubborn. Dachshunds have a reputation for being dachshund puppies sale entertaining and fearless, but what they want most is to cuddle with their people. Longhairs are calm and quiet, and Smooths have dachshund for sale a personality that lies somewhere in between. Some Mini Dachshunds can be nervous or shy, but this isn't correct for the breed. Avoid puppies that show these characteristics.Like every dog, Dachshunds need early socialization-exposure to many different people, dachshund puppies for sale near me sights, sounds, and experiences-when they're young. Socialization helps ensure that your Dachshund puppy grows up to be a well-rounded dog. .

34. Hi! I could have sworn I’ve been to this website before but after
looking at some of the posts I realized it’s new to me.
Regardless, I’m certainly happy I found it and I’ll be
book-marking it and checking back often!
great dane puppy for sale
great dane puppies for sale near me
goldendoodle for sale
ragdoll kitten near me
ragdoll kittens for sale near me
bernedoodles for sale
aussiedoodle for sale
havanese puppies for sale near me

35. Hi, I do believe this is a great website. I stumbledupon it ;) I am going to revisit yet
again since I book-marked it. Money and freedom is the greatest way to
change, may you be rich and continue to help other people.
teacup havanese puppies for sale
teacup havanese puppies for sale
pomeranian teacup for sale
doodle puppies
aussiedoodle puppies for sale
bernedoodle puppies for sale
goldendoodle puppies for sale
ragdoll kitten for sale
ragdoll kittens for sale

36. Thank you again for all the knowledge you distribute,Good post. I was very interested in the article, it's quite inspiring I should admit. I like visiting you site since I always come across interesting articles like this one.Great Job, I greatly appreciate that.Do Keep sharing! Regards, truck

37. They need to be good in python and other programming skills and modern AI tools to develop a website whose success is guaranteed.
data science training in shimla

38. Learn to use analytics tools and techniques to manage and analyze large sets of data from Data Science training institutes in Bangalore. Learn to take on business challenges and solve problems by uncovering valuable insights from data. Learn from the comprehensively designed curriculum by the industry experts and work on live projects to sharpen your skills.

Data Science Course in Bangalore

39. Data Science is a dynamic domain with a promising future, start your Data Science Course today with 360DigiTMG and become a Data Scientist without hassle.

Data Science in Bangalore

40. Gain mastery over the core principles of data science and get ready to work with top companies. Get acquainted with the bright and exciting future of data science by enrolling in the best data science institute in Bangalore. Learn to empower more meaningful business decisions by representing data with tools of visualization.

Data Analytics Course in Calicut

41. Are you worried for your job interview and looking for Data Science Interview Questions that will help you crack the interview and bag your dream job? Get placement assistance at 360DigiTMG and get a high paying job.

Data Science Training in Jodhpur

42. With decision making becoming more and more data-driven, learn the skills necessary to unveil patterns useful to make valuable decisions from the data collected. Also, get a chance to work with various datasets that are collected from various sources and discover the relationships between them. Ace all the skills and tools of Data Science and step into the world of opportunities with the Best Data Science training institutes in Bangalore.

Data Science Course in Delhi

43. Boost your professional reputation with a surefire way to pick up some impressive new skills in data science by registering for the Data science courses near me. Learn to collect, clean, and analyze data with tools like Hadoop and Spark. Learn to develop algorithms and build models in machine learning to optimize product performance and gross profit for your organization. Become an expert in techniques like Data Mining, Data Cleansing, and Data Exploring that help refine data, making it possible to present it in an understandable format.

Data Scientist Course in Delhi

44. Data Science has understood the necessity of every scholar and ensure that every scholar gets an unmatched studying experience for the lifetime