I am trying to figure out which of the above mentioned graph sparsification techniques is the better one? Are there any proofs which shows if one is better than the other?

How does one select the threshold parameter, epsilon, if epsilon-NN graph is selected for graph sparsification?

Also, how can we compare between the two forms of graphs?

Currently, I have a fully connected graph. I am trying to learn the geodesics of the underlying manifold. I want to see which of the two sparsification methods works better. I also want to try out the manifold learning procedure for various values of k {(k1,y1), ... (kn,yn)}, and for various values of epsilon, {(e1,y1), ... (en,yn)}, where yi is some performance output (higher the yi, better the sparsification). How do I generate (e1,...en)? And, the domains of the two sets for one of the dimensions being different (k and e), how can I compare the two sets of results?

asked Feb 04 '13 at 06:45

Vittal's gravatar image

Vittal
16335


One Answer:

I'm not aware of any theoretical results on what sparsification method is better in general (I suspect it will always depend on the specifics of the data set and the distance measure you're working with). In addition to epsilon and k-NN sparsification, you could also check out the approach of Jebara et al. based on b-matching. See, e.g., the paper Graph Construction and b-Matching for Semi-Supervised Learning. This is similar to k-NN, but instead of sparsifying the neighborhood centered at each vertex in isolation, b-matching makes sure that each vertex has exactly b neighbors after sparsification.

As for your question on how to choose the set {e1,...,en} and {k1,...,kn}, I would just go for a grid search. Since the ks are discrete, you can just try, e.g., k=1 to k=20. For the epsilons, you could either select points evenly or sample them from an interval. Of course, you need to run this on separate development data, or do cross-validation on your training set.

I'm not sure what you mean by the last question. Are you interested in something more than just finding the sparse graph that gives you the best results in manifold learning? Note that in general it is unlikely that there is some (non-trivial) e and some k such that they give rise to the same sparsified graph, so I don't think you can really compare them directly.

answered Feb 04 '13 at 12:02

Oscar%20T%C3%A4ckstr%C3%B6m's gravatar image

Oscar Täckström
2039133450

edited Feb 05 '13 at 07:32

Thanks for the reference. Any suggestions on how I can compare the same metric across different domains?

(Feb 05 '13 at 00:46) Vittal

I'm not sure what you mean by this. What aspects do you want to compare? By "metric", do you mean the metric of the ambient space?

(Feb 05 '13 at 03:16) Oscar Täckström

Thank you. I think I understand your answer better after the edit. By compare, I wanted to see which of the two sparsification methods performed better for my application. I now realise that it is very difficult to find the best epsilon that can be be considered comparable to a particular k, or vice-versa.

(Feb 05 '13 at 05:21) Vittal

I guess you could always look at the average number of NNs for the graph for a particular value of epsilon. Not sure what that tells you though.

(Feb 05 '13 at 13:32) Oscar Täckström
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.