I did read once that when I multiply 2 vectors (dot product between them), I am calculating the cosine of the angle between them. While some times they say that when multiplying 2 vectors we project the second one on the first one.

where is the truth ? or both cases are correct?

I hope any one explain both cases specially the second one

thank you

asked May 24 '11 at 08:42

Omar%20Osama's gravatar image

Omar Osama
35678

closed May 24 '11 at 10:50

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

@Alexandre: I'm going to re-open this question, since I think it can be useful to talk about intuitions that are relevant to understanding ML. But if we see more questions like this, let's re-evaluate my position.

(May 24 '11 at 15:47) Joseph Turian ♦♦

When the vectors are normalized to unit length, which they often are in ML, the most common intuition is "similarity". How much the vectors "amplify" each other's signal in each basis dimension tells you how similar they are overall.

(May 24 '11 at 17:23) Jacob Jensen

3 Answers:

In a broader sense, the dot product is one type of metric. From that perspective, it gives you a tool for comparing the similarity of two vectors.

Kernel methods are probably one of the more approachable machine learning algorithms where I can give you an intuitive description, so I hope this helps.

In an SVM, you start with a set of training samples and from that set a black-box implementation will do the following for you:

  • Identify a subset of the training samples (eg, K samples) to use as 'support' vectors.
  • Whether implicitly or explicitly, a KxK matrix is constructed where each entry is a scalar value that describes how similar the ith support vector is to the jth support vector.
    • note: The scalar operator used to do this comparison is commonly described as a genarlized dot product. It may be a literal dot product between two vectors, or it may be some other metric such as a radial basis function.
  • By whatever means the author chooses to implement, when non-training-set samples are seen they are compared using the same metric against every support vector. The algorithm then determines which support vector(s) the new sample is most similar to, then performs classification or regression based on that similarity.

The dot product you're most likely referring to has a few important characteristics that are quite handy:

  • Given two perpendicular vectors x and y: x.y = 0
  • Given two un-normalized, parallel vectors x and y: x.y = ||x|| ||y|| cos(t) = (+-)||x|| ||y||. Obviously, if they're normalized then x.y = (+-)1.

For a measure to be usable it needs to exhibit some other properties (eg, it must be symmetric semi-definite) for the SVM algorithm to work, but that's beyond the scope of your question. These two properties, however, are very important. They distinguish two classes of samples from each-other:

  • In a limit sense, as two samples become more dissimilar to each-other their dot product approaches zero
  • The more similar two samples are to each-other, their dot product increases in magnitude quadratically

In a more general sense, you can also think of many things that aren't an Nx1 array of numbers as being a vector. $sin(x)$ and $cos(x)$, as an example, are the eigenvectors of the differential equation $y''(x) = -y(x)$. Coming from a CS background, wrapping my head around the idea of a scalar function being treated as a vector was a bit off-putting, but this made it easier to swallow the idea of a matrix (or tensor) being treated as a vector as well.

This answer is marked "community wiki".

answered May 25 '11 at 14:58

Brian%20Vandenberg's gravatar image

Brian Vandenberg
824213746

The best explanation I can come up for an intuition on the dot product is this:

The point product between 2 vectors will give you the projection of one of the vectors over the other (which is indirectly refereed as the cos).

In an ML context, you might have a large non unitary vector. And you have set of data (thing 2D for the time being) If you project each data point to the main vector via a dot product, you get all of the points in a 1D representation over the original vector. With this you can get an idea of the relationship the data has (with respect of this original large vector). If you read the last of Andrew Ng's lecture on ML, you'll see this is the intuition behind PCA.

answered May 25 '11 at 01:58

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

The first is only true if both have unit length (see wikipedia). In this case, both interpretations are equivalent. Look at this illustration for example. The cosine of an angle is it's dot product of the vector having that angle and unit length with (1,0), which is the same as the length of its projection to (1,0). I think wikipedia explains this quite well.

answered May 24 '11 at 10:20

Andreas%20Mueller's gravatar image

Andreas Mueller
2686185893

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.