Target audience: Intermediate
Estimated reading time: 20'
The multi-variate normal sampling function is used in various machine learning techniques such as Markov chains Monte Carlo (MCMC) or contextual multi-armed bandit.
The implementation of the multivariate normal relies on data vectorization, technique well-known to Python developers and data scientists alike.
Note: The post requires some knowledge of data vectorization (numpy, datavec, ND4j..) as well as Scala programming language.
Vectorization
Python Numpy is a well-known and reliable vectorized linear algebra library which is a foundation of scientific (SciPy) and machine learning (Sciktlearn) libraries. No serious data scientific projects can reasonably succeeds with the power and efficiency of numpy.
The vectorization of datasets is the main reason behind the performance of machine learning models (training and prediction) build in Python.
Is there a similar linear algebra library, supporting vectorization, available to Scala and Spark developers? Yes, ND4j
Python Numpy is a well-known and reliable vectorized linear algebra library which is a foundation of scientific (SciPy) and machine learning (Sciktlearn) libraries. No serious data scientific projects can reasonably succeeds with the power and efficiency of numpy.
The vectorization of datasets is the main reason behind the performance of machine learning models (training and prediction) build in Python.
Is there a similar linear algebra library, supporting vectorization, available to Scala and Spark developers? Yes, ND4j
ND4j, BLAS and LAPACK
ND4j library replicates the functionality of numpy for Java developers. ND4j can be downloaded as an independent library or as component of the deep learning library, deeplearning4j. It leverages the BLAS and LAPACK libraries.
From a Java developer perspective, the data represented by an NDArray is stored outside of the Java Virtual Machine. This design has the following benefits:
OpenBLAS is an optimized Basic Linear Algebra Subprograms (BLAS) library based on GotoBLAS2, a C-library of linear algebra supporting a large variety of micro-processor. Its usage is governed by the BSD license.
ND4j library replicates the functionality of numpy for Java developers. ND4j can be downloaded as an independent library or as component of the deep learning library, deeplearning4j. It leverages the BLAS and LAPACK libraries.
From a Java developer perspective, the data represented by an NDArray is stored outside of the Java Virtual Machine. This design has the following benefits:
- Avoid taxing the Garbage Collector
- Interoperability with high-performance BLAS libraries such as OpenBlas
- Allow number of array elements exceeds Int.MaxValue
The BLAS (Basic Linear Algebra Subprograms) are functions performing basic vector and matrix operations. The library is divided in 3 levels:
- Level 1 BLAS perform scalar, vector and vector-vector operations,
- Level 2 BLAS perform matrix-vector operations
- Level 3 BLAS perform matrix-matrix operations.
LAPACK are Fortran routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems and matrix factorizations.
Implicit ND4j array conversion
The first step is to create a implicit conversation between ND4j and Scala data types.
The following code convert an array of double into a INDArray using org.nd4j.linalg.factory.Nd4j java class and its constructor create(double[] values, int[] shape)
The first step is to create a implicit conversation between ND4j and Scala data types.
The following code convert an array of double into a INDArray using org.nd4j.linalg.factory.Nd4j java class and its constructor create(double[] values, int[] shape)
- In case of a vector, the shape is defined in scala as Array[Int](size_vector)
- In case of a matrix, the shape is defined as Array[Int](numRows, numCols).
1 2 3 4 5 6 | @throws(clazz = classOf[IllegalArgumentException]) implicit def double2NDArray(values: Array[Double]): INDArray = { require(values.nonEmpty, "ERROR: ND4, conversion ...") Nd4j.create(values, Array[Int](1, values.length)) } |
Multivariate Normal distribution implementation
The sampling of a multivariate normal distribution is defined by the formula
\[\mathbb{N}(\mu, \Sigma )=\frac{1}{\sqrt{(2\pi)^n \left | \Sigma \right |}} e^{\frac{1}{2}(x-\mu)^T {\Sigma '}^{-1}(x-\mu)}\] A generalized definition adds a very small random perturbation factor r |r| <= 1 on the variance value (diagonal of the covariance matrix) \[\Sigma ' = \Sigma + r.I\] Sigma is the covariance matrix and the mu is the mean value.
The sampling of a multivariate normal distribution is defined by the formula
\[\mathbb{N}(\mu, \Sigma )=\frac{1}{\sqrt{(2\pi)^n \left | \Sigma \right |}} e^{\frac{1}{2}(x-\mu)^T {\Sigma '}^{-1}(x-\mu)}\] A generalized definition adds a very small random perturbation factor r |r| <= 1 on the variance value (diagonal of the covariance matrix) \[\Sigma ' = \Sigma + r.I\] Sigma is the covariance matrix and the mu is the mean value.
The computation of the multivariate normal sampling can be approximated by the Cholesky decomposition. In a nutshell, the Cholesky algorithm decompose a positive-definite matrix into a product of two matrix
The following implementation relies on the direct invocation of LAPACK library function potrf. The LAPACK functionality are accessed through the BLAS wrapper getBlasWrapper.
Note that the matrix is duplicated prior to the LAPACK function call as we do not want to alter the input matrix.
Let's implement the multivariate Normal sampler with perturbation using the Cholesky decomposition.
Let's look at the full implementation of the multi-variate normal sampling.
The first step validates the shape of the mean and covariance input matrices [line 8, 9]. As mentioned earlier, the generalized normal distribution introduces an optional random perturbation of small magnitude (~1e-4) [line 14-17] that is useful for application that requires some stochastic
The 'perturbed' covariance matrix is factorized using the Cholesky decomposition [line 22]. The normal probability density function (mean 0.0 and standard deviation 1.0) is used to generate random values [line 24-30] which is then applied to the covariance matrix [line 33].
The normal randomized variance is added to the vector of mean values [line 35].
For the sake of convenience, the multivariate normal density function uses the Apache Math common 3.5 API [line 24].
Environment
Scala 2.13.2
JDK 1.8
ND4j 1.0.0-beta3
References
- lower triangle matrix
- transposition of its conjugate
The following implementation relies on the direct invocation of LAPACK library function potrf. The LAPACK functionality are accessed through the BLAS wrapper getBlasWrapper.
1 2 3 4 5 | final def choleskyDecomposition(matrix: INDArray): INDArray = { val matrixDup = matrix.dup Nd4j.getBlasWrapper.lapack.potrf(matrixDup, false) matrixDup } |
Note that the matrix is duplicated prior to the LAPACK function call as we do not want to alter the input matrix.
Let's implement the multivariate Normal sampler with perturbation using the Cholesky decomposition.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | @throws(clazz = classOf[IllegalArgumentException]) @throws(clazz = classOf[IllegalStateException]) final def multiVariateNormalSampling( mean: INDArray, cov: INDArray, perturbation: Double = 0.0): INDArray = { import scala.util.Random require(cov.size(0) == mean.length, s"Sigma shape ${cov.size(0)} should be mean size ${mean.length}") require(cov.size(1) == mean.length, s"Sigma shape ${cov.size(1)} should be ${mean.length}") try { // Add a perturbation epsilon value to the covariance matrix // if it is defined val perturbMatrix = if(perturbation > 0.0) cov.add( squareIdentityMatrix(cov.size(0), perturbation*(Random.nextDouble-0.5))) else cov // Apply the Cholesky decomposition val perturbed: INDArray = choleskyDecomposition(perturbMatrix) // Instantiate a normal distribution val normalPdf = new NormalDistribution( new DefaultRandom, 0.0, 1.0) val normalDensity = Array.fill(mean.size(0))(normalPdf.sample) val normalNDArray: INDArray = normalDensity // This is an implementation of the Dot product val normalCov = perturbed.mmul(normalNDArray.transpose) // Add the normalized perturbed covariance to the mean value mean.add(normalCov) } catch { case e: org.nd4j.linalg.api.blas.BlasException => throw new IllegalStateException(s"multiVariateNormalSampling: ${e.toString}") case e: org.apache.commons.math3.exception.NotStrictlyPositiveException => throw new IllegalStateException(s"multiVariateNormalSampling: ${e.toString}") case e: Throwable => throw new IllegalStateException(s"multiVariateNormalSampling: ${e.toString}") } } |
Let's look at the full implementation of the multi-variate normal sampling.
The first step validates the shape of the mean and covariance input matrices [line 8, 9]. As mentioned earlier, the generalized normal distribution introduces an optional random perturbation of small magnitude (~1e-4) [line 14-17] that is useful for application that requires some stochastic
The 'perturbed' covariance matrix is factorized using the Cholesky decomposition [line 22]. The normal probability density function (mean 0.0 and standard deviation 1.0) is used to generate random values [line 24-30] which is then applied to the covariance matrix [line 33].
The normal randomized variance is added to the vector of mean values [line 35].
For the sake of convenience, the multivariate normal density function uses the Apache Math common 3.5 API [line 24].
Environment
Scala 2.13.2
JDK 1.8
ND4j 1.0.0-beta3
References
- Scala 2.13
- ND4j
- Deep learning 4j
- Cholesky decomposition
- Approximation multivariate Normal sampling
- Apache common math
BA Exam Result - BA 1st Year, 2nd Year and 3rd Year Result
ReplyDeleteBsc Exam Result - Bsc 1st Year, 2nd Year and 3rd Year Result
Machine Learning Projects for Final Year machine learning projects for final year
DeleteDeep Learning Projects assist final year students with improving your applied Deep Learning skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include Deep Learning projects for final year into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Deep Learning Projects for Final Year even arrange a more significant compensation.
Python Training in Chennai Python Training in Chennai Angular Training Project Centers in Chennai
your article on data science is very interesting thank you so much.
ReplyDeleteData Science coaching in Hyderabad
good article about data science has given it is very nice thank you for sharing.
ReplyDeleteData Science coaching in Hyderabad
ReplyDeleteاعالى الخليج تقدم افضل خدمات نقل العفش الدولى المتميزه باسعار متميزة ومنها :
شركة شحن عفش من الرياض الى الامارات
نقل عفش من الرياض الى الاردن شركة شحن عفش من الرياض الى الاردن
Treasurebox provide you all the items which you will be get online in a single click in Auckland Newzealand. We provide you quality items and all types of outdoor furniture at lowest price.
ReplyDeletewatch your favourite pinoy tambayan replays online in hd. All the pinoy channels and all the replays you will be watch online in hd.
ReplyDeleteThanks for sharing useful information. I learned something new from your bog. Its very interesting and informative. keep updating. If you are looking for any apache spark scala related information, please visit our website Apache spark training institute in Bangalore
ReplyDelete
ReplyDeleteReally appreciate this wonderful post that you have provided for us.Great site and a great topic as well I really get amazed to read this. It's really good.
I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!.mobile phone repair in Fredericksburg
mobile phone repair in Fredericksburg
mobile phone repair in Fredericksburg
mobile phone repair in Fredericksburg
mobile phone repair in Fredericksburg
mobile phone repair in Fredericksburg
mobile phone repair in Fredericksburg
mobile phone repair in Fredericksburg
mobile phone repair in Fredericksburg
mobile phone repair in Fredericksburg
Get all the latest clicksud seriale online of clicksud and all the seriale online daily on this blog.
ReplyDeleteDoramasmp4 esperamos que la estés pasando genial viendo. DoramasMP4 siempre será el primero en tener el episodio, así que por favor marque y agréguenos en Facebook para actualizarlo.
ReplyDeletehttps://doramas-sub-espanol.jimdosite.com/
Nice post..
ReplyDeletehttps://www.kitsonlinetrainings.com/course/salesforce-online-training
http://www.kitsonlinetrainings.com/course/mulesoft-online-training
https://www.kitsonlinetrainings.com/course/web-methods-online-training
Its very interesting post.
ReplyDeleteFind the best Digital marketing online training in India at low fees with Internship opportunity.
https://www.inventateq.com/
I read this post,thanks for sharing this information.
ReplyDeleteMachine learning classes in Pune
Really nice and informative blog, keep it up. Thanks for sharing and I have some suggestions.
ReplyDeleteVisit a Website:- 100% Job Guarantee Training In Chennai thanks .
Hvala na ispravci. ažuriramo Turske serije sa prevodom. gledajte online turske serije online sa prevodom
ReplyDelete