|
I would look to know which gives better approximation of similarity between two vectors,euclidean distance or cosine similarity. I would like to take the case of word embedings,latent feature representation of words. If i get cos_theta between two representation of words nearly equal to 1,can I assume that they have similar latent variable representation and carries the similar syntactic and semantic features? Or do I need to calculate euclidean distance between them? |
|
Rare words, even after training, will generally have vectors close to their initialization point. Typically people initialize word representations from a distribution with mean zero that puts the initial vectors pointing in all sorts of directions. This means that if you use cosine distance, rare words will randomly appear to be very close to other (possibly more frequent) words. However, if you use euclidean distance, all rare words will seem similar to one another. I still think euclidean distance is probably a better choice since in some sense all the rare words being the same isn't that bad. In general, keep in mind that the exact distance, however you decide to measure it, between words will not be very meaningful, only rank ordering of distances or some other relative notion of distance. Furthermore, I would also like to point out that most of the information encoded in word representations learned by a typical neural language model that looks at short local windows of words will be syntactic. What I mean by that is that most ways of learning word representations will learn that "japan" should be much closer to "china" than to "japanese". This might not be what you want. |