|
Hey all, This is my first post on MetaOptimize. I am about to pull my hair out trying to figure out, how exactly, a U-matrix is constructed for visualization of Self-Organizing-Maps. (SOMS, aka Kohonen Nets). Every last google result I have found does not help, is contradictory, has a massive number of typos, or otherwise very broad. I am asking a simple question: I have an output grid of 3x3 output units: How do I construct a U-matrix from this?? Links so far: 1) Original Paper. (RIDDLED with errors, typos, and misleading information. The U-matrix part is so full of errors I do not know how this paper got published.) 2) The SOM toolbox manual that quotes the above paper. (Explains how to do it for an output line, but does not explain how to do it for an output grid). 3) Another paper. (Explains how to make a U-matrix, but completely contradicts his first paper, and the SOM toolbox that it is based on). 4) A similar question on SE that didnt really get anywhere. 5) Another similar question on SE that is good, but doesnt explain how-exactly to make a U-matrix. To facilitate this ... I have made up a very simple example: I have a 3x3 output grid, that means, 3x3 output neurons that have already been trained. All neurons have dimension, say, 4. Now I want to make a U-matrix. How exactly do I do that?
showing 5 of 7
show all
|
|
In the past, I spent a lot of time working with SOM. The U-matrix is just a visualization technique. I don't think it warrants a lot of fuss about precisely how it's implemented. It's supposed to help you visualize the the local distance between neighboring vectors. I think a 9-neighbor topology is getter than a 4-neighbor topology in a rectangular grid, because you have more points and the average will be more stable. In the case of the 9 point neighborhood, you can take into account the diagonal distance is longer by a factor of root of 2. I used to divide the diagonal distances by root 2 to normalize them. Another good visualization is to plot the components of the SOM nodes. So, lets say you have 17-dimensional data. Your nodes will each have 17 components. For each node, plot the value of the component at that node. It gives you an idea of how the parameters are varying across your map. Thanks HenryB, I am a lot clearer now than I used to be. A question, I am still putting my feelers out for SOMs, but a lot in the ML community seem to think... not too highly of them in particular, because it lacks 'theoretical basis'...you seem to have used them quite a bit - what is your take on it? In what application areas have you found yourself using them? Did you find them more useful for vizualizations, or for clustering? Thanks!!
(Dec 04 '12 at 14:35)
Tarantulus
I found SOM to mostly be useful for data exploration and visualization. Although, it can be hard to interpret what a plot means once you have it. There is an approach to SOM that I learned about by reading papers by Ultsch. It's called emergent SOM. This is to have a large number of nodes, more than your expected number of clusters, and possibly more nodes than your number of observation. When you use a few nodes, the SOM acts like a type of clustering where the nodes are being tuned to cluster centers and I think it's not very special. It ends up doing something very similar to K-means. In that case, I would do K-means since it's more well-known, accepted and its results are easier to interpret. When you have a much larger number of nodes, the SOM gives you a map of the space associated with your data. I would often do SOM and some kind of clustering like K-means and plot the K-means clusters on the emergent SOM. This way you have an idea of the location of the clusters in the data-space. It also doesn't hurt to do PCA and plot the principal components on the SOM too. You can also plot variables that you didn't train the network on and see if you can see any variation. For instance, you might do SOM on variables you think predict a heart attack and then plot who did and didn't get a heart attack on the SOM surface.
(Dec 05 '12 at 08:31)
Henry B
|
|
Yes, you are right, it depends on which training paradigm you are using. I usually just assigned one input to the closest node, and updated the neighborhood. AS you have all the inputs falling into the map, you are going to end up only with inputs in the MAP, and the U-Matrix is the distance between those. If instead you use the traditional training algorithm, the interpretation is different, you use the trained nodes, let sat you have a trained Map A. Then, you take node (1,1) with 4 dimensions, then you just take the difference between that nodes and neighbors (1,2) and (2,1). You will have 2 distances: Perhaps 2 and 1.5. Then you just take the average of that distance, and that is the particular U-Value for that Node. You can see that intuitively this tells you how close is a node to its neighbors, if the average is low, it means is very far from its neighbors so you have a pit, and instead if it is large, you'll have a cluster. Thorough some experimentation, I found this to be a bit suboptimal, because you end up with very uninformative maps if you have middle values, I would rather see how the independent dimensions behave. Thanks @Leon. Some followups: When you say "I would rather see how the independent dimensions behave.", what do you mean independent dimensions? "Then, you take node (1,1) with 4 dimensions, then you just take the difference between that nodes and neighbors (1,2) and (2,1)." Why not also (2,2)? Are the diagonals not considered neighbors? I have also seen an implementation that uses some intermediate nodes as well. For example, if you start with a 3x3, the U-matrix is a 5x5. Is this method better you think? Thanks so much.
(Nov 30 '12 at 12:40)
Tarantulus
The whole idea of the U-Matrix, and using intermediate nodes increases the granularity of your visualization, like the difference between taking a picture of a forest with a 5 MP and a 10 MP camera, if the forest has clear features, the 5 MP might be sufficient, but if it does not perhaps you'll need the 10 MP to increase the resolution. Again, depends in the architecture you are using, you could use a honeycomb like architecture, and then you have six neighbors, I usually stayed with the rectangular architecture, but you could also take distances to diagonal nodes if you wish. In the link I sent you before the description of the U-Matrix, there are some methods to visualize how different dimensions interact between each other, and as far as I remember the Matlab toolbox also has it, it is usually more informative to see this variations, to check if you have any correlated features you might want to disregard and are slowing the training of your map.
(Nov 30 '12 at 22:21)
Leon Palafox ♦
|
Really? No one? ... are SOMs really a black art??...
They are not a black art, just that only a few people really study them
The explanation here seems straightforward http://www.peltarion.com/doc/index.php?title=Self-organizing_map#U-Matrix
As far as I used them they are nice to use to see how the data really behaves, but I honestly never found them better than K-means or other less cloudy clustering techniques.
BTW, the U=Matrix is going to depend on your inputs, not in the shape of the Matrix. You basically take each input, see where it falls in the 3x3 trained grid, once every input has been assigned, you calculate the distance between the inputs, and that is the U-Matrix
@Leon Palafox: Thanks for the link. (The link is good btw, what is this from?). The problem is this phrase: " imagine if we instead showed the average distance of that unit to other units. ". This is the only way they describe U-Matrix. :-/
P.S. What started all this was me interested in making a list of unsupervised learning techniques. SOMs were listed as one such type.
@Leon Palafox Can you please elaborate on your last paragraph - I didnt quite get it. You are saying look to see where EACH input vector best fits the output units in the grid, (ok), ... and then what?
This BTW is very different from what the other links are saying... It seems as though they are all saying compare the distance of all OUTPUT units to each other.
I tried to extend a bit more on the answer