Target audience: Intermediate
Estimated reading time: 10'
Summary of the time complexity of some common optimization and machine learning algorithms
Overview
Time complexity (or worst case scenario for the duration of execution given a number of elements) is commonly used in computing science. However, you will be hard pressed to find a comparison of machine learning algorithms using their asymptotic execution time.
Summary
The following summary list the time complexity of some of the most common algorithms used in machine learning including, recursive computation for dynamic programming, linear algebra, optimization and discriminative supervised training.
(*): Training
Algorithm | Time complexity | Description |
Recursion with one element reduction | N2 | N: Number of elements |
Recursion with halving reduction | LogN | |
Recursion with halving reduction and processing | N | |
Nearest neighbors search | M.logk.N.LogN | M: number of features k: number of neighbors N: number of observations |
Matrix multiplication (m,n)x(n,d) | m.n.d | m,n,d matrices dimension |
Matrix multiplication (n,n) | n3 | n matrix dimension |
Matrix multiplication (n,n) Strassen | n2.8 | n matrix dimension |
Partial eigenvalues extraction (n,n) | e.N2 | e: number of eigenvalues N: number of observations |
Complete eigenvalues extraction (n,n) | N3 | N: number of observations |
Minimum spanning tree Prim linear queue | V2 | V: number vertices |
Minimum spanning tree Prim binary heap | (E + V).LogV | E: number of edges V: number vertices |
Minimum spanning tree Prim Fibonacci heap | V.LogV | E: number of edges V: number vertices |
Shortest paths Graph Dijkstra linear sorting | V2 | V: number of vertices |
Shortest paths Graph Dijkstra binary heap | (E + V).logV | V: number of vertices |
Shortest paths Graph Dijkstra Fibonacci heap | V.log | E: number of edges V: number of vertices |
Shortest paths kNN Graph - Dijkstra | (k + logN).N2 | k: number of neighbors N: number of observations |
Shortest paths kNN Graph - Floyd-Warshall | N3 | N: number of observations |
Fast Fourier transform | N.LogN | N: Number of observations |
Batched gradient descent | N.P.I | N: Number of observations P: number of parameters I: number of iterations |
Stochastic gradient descent | N.P.I | N: number of observations P: Number of variables I: number of epochs |
Newton with Hessian computation | N3.I | N: number of observations I: number iterations |
Conjugate gradient | N.m.sqrt(k) | N: number of observations m: number of non-zero k condition of the matrix |
L-BFGS | N.M.I | N: number of observations M: estimated number of grads I: number of iterations |
K-means (*) | C.N.M.I | C: Number of clusters M: Dimension N: number observations I: number of iterations |
Lasso regularization - LARS(*) | N.M.min(N,M) | M: Dimension N: number observations |
Hidden Markov model Forward-backward pass | N2.M | N: number of states M: number of observations |
Multilayer Perceptron (*) | n.M.P.N.e | n: input variables M: number hidden neurons P: number output values N: number of observations e: Number of epochs |
Support vector machine (*) using Newton | N3 | N: number of observations |
Support vector machine (*) using Cholesky | N2 | N: number of observations |
Support vector machine (*) Sequential minimal optimization | N2 | N: number of observations |
Principal Components Analysis (*) | M2N + N3 | N: Number of observations M: number of features |
Expectation-Maximization (*) | M2N | N: Number of observations M: number of variables |
Laplacian Eigenmaps | M.logk.N.logN + m.N.k2 + d.N2 | N: Number of observations M: number of variables |
Genetic algorithms | P.logP.I.C | C: number of genes/chromosome P: population size I: Number of iterations |
References
- Introduction to Algorithms T. Cormen, C. Leiserson, R. Rivest - MIT Press 1990
- Machine Learning: A probabilistic Perspective K. Murphy - MIT Press, 2012
- Big O cheat sheet