|
Hi guys, I am trying to implement an Active Learning algorithm using the Fisher's Information Matrix as the selection strategy. I tried different papers but I couldn't understand how the matrix is obtained. Some of these papers are: A Probability Analysis on the Value of Unlabeled Data for Classification Problems Tong Zhang, Frank J. Oles Active Learning for Logistic Regression: an evaluation Andrew I. Schein, Lyle H. Ungar I will be grateful if someone can explain this matrix or provide me with good references. A reference implementation would be really nice too. Thanks! |
|
The fisher matrix is, as defined by wikipedia, the matrix that, given a density P(x|theta), has in its entry ij the expectation, over x, of the product of the partial derivatives of the log-probability of x given the i-th and j-th component of theta. So, for example, if we have a logistic classification problem, and x is our latent label, the fisher information matrix for an object with features f is F_ij = 2 f_i f_j sigma'(theta dot f)^2 / sigma(theta dot f) (you can get this by writing it down as p(x=1) (d/dtheta_i log p(x=1|theta))(d/dtheta_j log p(x=1|theta)) + p(x=0) (d/dtheta_i log p(x=0|theta))(d/dtheta_j log p(x=0|theta)), the fact that p(x=1|theta) = sigma(theta dot f), and the fact that d/dtheta_i log g(theta dot f) = f_i g'(theta dot f) / g(theta dot f)). This matrix has all sorts of nice properties, and if multiplied with the gradient of a learning problem it gives you the natural gradient, which is very useful in machine learning, as its the direction that minimizes expected generalization error. I haven't read those papers so I'm not sure how they use this matrix for active learning. |