|
Is there a paper or book that talks about the mathematical relation between different machine learning models - how they are different and how they can (sometimes) be equivalent? For instance logistic regression can be viewed as a single-layer perceptron. Other examples are here |
|
There are many different possible frameworks for describing machine learning methods. A very general one is the energy-based framework: tutorial. Informally it states that most machine learning algorithms aim to learn an energy score that tells you how compatible the input variables are. Logistic regression, the single layer perceptron, and the linear support vector machine are the same type of energy model under this framework. What makes them different is that they are optimizing different loss functions. |
|
Adding to what Philemon said, from Probabilistic modeling point of view Roweis and Ghahramani [1999] is a great reference to analyze different methods, like HMM, state-space models, ICA, sparse coding, etc., through Bayesian learning framework. While the above said ref. is good for linear models, similar analysis is done for non-linear models in Friston [2008], though its a little more involved to understand. |
Actually, LR is like the single-layer version of the multilayer perceptron, but it's not really the original perceptron (the multilayer perceptron isn't a perceptron and shouldn't have been named that, as Hinton has admitted). I think a firm grasp of the mathematics of ML will allow you to see these (near-)equivalences for yourself -- there's quite a lot of them, really.
I agree that it's not hard to see the equivalences once you look at the math. But having it in one place would be useful. Also even if the math is similar, one sometimes doesn't immediately see subtleties like one method might be easier to train for some reason or applicable to certain data but not others, etc.